linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [patch 0/8] x86/tsc: Utilize TSC_ADJUST MSR
@ 2016-11-19 13:47 Thomas Gleixner
  2016-11-19 13:47 ` [patch 1/8] x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() Thomas Gleixner
                   ` (7 more replies)
  0 siblings, 8 replies; 36+ messages in thread
From: Thomas Gleixner @ 2016-11-19 13:47 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Peter Zijlstra, x86, Borislav Petkov, Yinghai Lu

The TSC_ADJUST MSR shows whether the TSC has been modified. This is helpful
in two aspects:

1) It allows to detect BIOS wreckage, where SMM code tries to 'hide' the
   cycles spent by storing the TSC value at SMM entry and restoring it at
   SMM exit. On affected machines the TSCs run slowly out of sync up to the
   point where the clocksource watchdog (if available) detects it.

   The TSC_ADJUST MSR allows to detect the TSC modification before that and
   eventually restore it. This is also important for SoCs which have no
   watchdog clocksource and therefore TSC wreckage cannot be detected and
   acted upon.

2) All threads in a package are required to have the same TSC_ADJUST
   value. Broken BIOSes break that and as a result the TSC synchronization
   check fails.

   The TSC_ADJUST MSR allows to detect the deviation when a CPU comes
   online. If detected set it to the value of an already online CPU in the
   same package. This also allows to reduce the number of sync tests
   because with that in place the test is only required for the first CPU
   in a package.

   In principle all CPUs in a system should have the same TSC_ADJUST value
   even across packages, but with physical CPU hotplug this assumption is
   not true because the TSC starts with power on, so physical hotplug has
   to do some trickery to bring the TSC into sync with already running
   packages, which requires to use an TSC_ADJUST value different from CPUs
   which got powered earlier.

   A final enhancement is the opportunity to compensate for unsynced TSCs
   accross nodes at boot time and make the TSC usable that way. It won't
   help for TSCs which run apart due to frequency skew between packages,
   but this gets detected by the clocksource watchdog later.

This patch series implements all of the avove.

Thanks,

	tglx

---
 include/asm/tsc.h |    8 +
 kernel/Makefile   |    2 
 kernel/process.c  |    1 
 kernel/tsc.c      |   16 ++-
 kernel/tsc_sync.c |  237 +++++++++++++++++++++++++++++++++++++++++++++++++++---
 5 files changed, 244 insertions(+), 20 deletions(-)

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [patch 1/8] x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art()
  2016-11-19 13:47 [patch 0/8] x86/tsc: Utilize TSC_ADJUST MSR Thomas Gleixner
@ 2016-11-19 13:47 ` Thomas Gleixner
  2016-11-29 16:46   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
  2016-11-29 18:28   ` tip-bot for Thomas Gleixner
  2016-11-19 13:47 ` [patch 2/8] x86/tsc: Detect random warps Thomas Gleixner
                   ` (6 subsequent siblings)
  7 siblings, 2 replies; 36+ messages in thread
From: Thomas Gleixner @ 2016-11-19 13:47 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Peter Zijlstra, x86, Borislav Petkov, Yinghai Lu

[-- Attachment #1: x86-tsc--Use-the-X86_FEATURE_TSC_ADJUST-feature-bit.patch --]
[-- Type: text/plain, Size: 1283 bytes --]

The art detection uses rdmsrl_safe() to detect the availablity of the
TSC_ADJUST MSR.

That's pointless because we have a feature bit for this. Use it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc.c |   14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1043,18 +1043,20 @@ static void detect_art(void)
 	if (boot_cpu_data.cpuid_level < ART_CPUID_LEAF)
 		return;
 
-	cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
-	      &art_to_tsc_numerator, unused, unused+1);
-
-	/* Don't enable ART in a VM, non-stop TSC required */
+	/* Don't enable ART in a VM, non-stop TSC and TSC_ADJUST required */
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) ||
 	    !boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ||
-	    art_to_tsc_denominator < ART_MIN_DENOMINATOR)
+	    !boot_cpu_has(X86_FEATURE_TSC_ADJUST))
 		return;
 
-	if (rdmsrl_safe(MSR_IA32_TSC_ADJUST, &art_to_tsc_offset))
+	cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
+	      &art_to_tsc_numerator, unused, unused+1);
+
+	if (art_to_tsc_denominator < ART_MIN_DENOMINATOR)
 		return;
 
+	rdmsrl(MSR_IA32_TSC_ADJUST, art_to_tsc_offset);
+
 	/* Make this sticky over multiple CPU init calls */
 	setup_force_cpu_cap(X86_FEATURE_ART);
 }

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [patch 2/8] x86/tsc: Detect random warps
  2016-11-19 13:47 [patch 0/8] x86/tsc: Utilize TSC_ADJUST MSR Thomas Gleixner
  2016-11-19 13:47 ` [patch 1/8] x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() Thomas Gleixner
@ 2016-11-19 13:47 ` Thomas Gleixner
  2016-11-29 16:46   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
  2016-11-29 18:28   ` tip-bot for Thomas Gleixner
  2016-11-19 13:47 ` [patch 3/8] x86/tsc: Store and check TSC ADJUST MSR Thomas Gleixner
                   ` (5 subsequent siblings)
  7 siblings, 2 replies; 36+ messages in thread
From: Thomas Gleixner @ 2016-11-19 13:47 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Peter Zijlstra, x86, Borislav Petkov, Yinghai Lu

[-- Attachment #1: x86-tsc--Detect-random-warps.patch --]
[-- Type: text/plain, Size: 1900 bytes --]

If time warps can be observed then they should only ever be observed on one
CPU. If they are observed on both CPUs then the system is completely hosed.

Add a check for this condition and notify if it happens.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc_sync.c |   13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -37,6 +37,7 @@ static arch_spinlock_t sync_lock = __ARC
 static cycles_t last_tsc;
 static cycles_t max_warp;
 static int nr_warps;
+static int random_warps;
 
 /*
  * TSC-warp measurement loop running on both CPUs.  This is not called
@@ -45,7 +46,7 @@ static int nr_warps;
 static void check_tsc_warp(unsigned int timeout)
 {
 	cycles_t start, now, prev, end;
-	int i;
+	int i, cur_warps = 0;
 
 	start = rdtsc_ordered();
 	/*
@@ -85,7 +86,14 @@ static void check_tsc_warp(unsigned int
 		if (unlikely(prev > now)) {
 			arch_spin_lock(&sync_lock);
 			max_warp = max(max_warp, prev - now);
+			/*
+			 * Check whether this bounces back and forth. Only
+			 * one CPU should observe time going backwards.
+			 */
+			if (cur_warps != nr_warps)
+				random_warps++;
 			nr_warps++;
+			cur_warps = nr_warps;
 			arch_spin_unlock(&sync_lock);
 		}
 	}
@@ -160,6 +168,8 @@ void check_tsc_sync_source(int cpu)
 			smp_processor_id(), cpu);
 		pr_warning("Measured %Ld cycles TSC warp between CPUs, "
 			   "turning off TSC clock.\n", max_warp);
+		if (random_warps)
+			pr_warning("TSC warped randomly between CPUs\n");
 		mark_tsc_unstable("check_tsc_sync_source failed");
 	} else {
 		pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
@@ -170,6 +180,7 @@ void check_tsc_sync_source(int cpu)
 	 * Reset it - just in case we boot another CPU later:
 	 */
 	atomic_set(&start_count, 0);
+	random_warps = 0;
 	nr_warps = 0;
 	max_warp = 0;
 	last_tsc = 0;

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [patch 3/8] x86/tsc: Store and check TSC ADJUST MSR
  2016-11-19 13:47 [patch 0/8] x86/tsc: Utilize TSC_ADJUST MSR Thomas Gleixner
  2016-11-19 13:47 ` [patch 1/8] x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() Thomas Gleixner
  2016-11-19 13:47 ` [patch 2/8] x86/tsc: Detect random warps Thomas Gleixner
@ 2016-11-19 13:47 ` Thomas Gleixner
  2016-11-29 16:47   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
                     ` (2 more replies)
  2016-11-19 13:47 ` [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle Thomas Gleixner
                   ` (4 subsequent siblings)
  7 siblings, 3 replies; 36+ messages in thread
From: Thomas Gleixner @ 2016-11-19 13:47 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Peter Zijlstra, x86, Borislav Petkov, Yinghai Lu

[-- Attachment #1: x86-tsc--Store-and-check-TSC-ADJUST-MSR.patch --]
[-- Type: text/plain, Size: 6726 bytes --]

The TSC_ADJUST MSR shows whether the TSC has been modified. This is helpful
in a two aspects:

1) It allows to detect BIOS wreckage, where SMM code tries to 'hide' the
   cycles spent by storing the TSC value at SMM entry and restoring it at
   SMM exit. On affected machines the TSCs run slowly out of sync up to the
   point where the clocksource watchdog (if available) detects it.

   The TSC_ADJUST MSR allows to detect the TSC modification before that and
   eventually restore it. This is also important for SoCs which have no
   watchdog clocksource and therefore TSC wreckage cannot be detected and
   acted upon.

2) All threads in a package are required to have the same TSC_ADJUST
   value. Broken BIOSes break that and as a result the TSC synchronization
   check fails.

   The TSC_ADJUST MSR allows to detect the deviation when a CPU comes
   online. If detected set it to the value of an already online CPU in the
   same package. This also allows to reduce the number of sync tests
   because with that in place the test is only required for the first CPU
   in a package.

   In principle all CPUs in a system should have the same TSC_ADJUST value
   even across packages, but with physical CPU hotplug this assumption is
   not true because the TSC starts with power on, so physical hotplug has
   to do some trickery to bring the TSC into sync with already running
   packages, which requires to use an TSC_ADJUST value different from CPUs
   which got powered earlier.

   A final enhancement is the opportunity to compensate for unsynced TSCs
   accross nodes at boot time and make the TSC usable that way. It won't
   help for TSCs which run apart due to frequency skew between packages,
   but this gets detected by the clocksource watchdog later.

The first step toward this is to store the TSC_ADJUST value of a starting
CPU and compare it with the value of an already online CPU in the same
package. If they differ, emit a warning and adjust it to the reference
value. The !SMP version just stores the boot value for later verification.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/tsc.h |    6 +++
 arch/x86/kernel/Makefile   |    2 -
 arch/x86/kernel/tsc.c      |    2 +
 arch/x86/kernel/tsc_sync.c |   88 +++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 97 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -48,6 +48,12 @@ extern int tsc_clocksource_reliable;
 extern void check_tsc_sync_source(int cpu);
 extern void check_tsc_sync_target(void);
 
+#ifdef CONFIG_X86_TSC
+extern void tsc_store_and_check_tsc_adjust(void);
+#else
+static inline void tsc_store_and_check_tsc_adjust(void) { }
+#endif
+
 extern int notsc_setup(char *);
 extern void tsc_save_sched_clock_state(void);
 extern void tsc_restore_sched_clock_state(void);
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -75,7 +75,7 @@ apm-y				:= apm_32.o
 obj-$(CONFIG_APM)		+= apm.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_SMP)		+= smpboot.o
-obj-$(CONFIG_SMP)		+= tsc_sync.o
+obj-$(CONFIG_X86_TSC)		+= tsc_sync.o
 obj-$(CONFIG_SMP)		+= setup_percpu.o
 obj-$(CONFIG_X86_MPPARSE)	+= mpparse.o
 obj-y				+= apic/
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1365,6 +1365,8 @@ void __init tsc_init(void)
 
 	if (unsynchronized_tsc())
 		mark_tsc_unstable("TSCs unsynchronized");
+	else
+		tsc_store_and_check_tsc_adjust();
 
 	check_system_tsc_reliable();
 
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -14,12 +14,95 @@
  * ( The serial nature of the boot logic and the CPU hotplug lock
  *   protects against more than 2 CPUs entering this code. )
  */
+#include <linux/topology.h>
 #include <linux/spinlock.h>
 #include <linux/kernel.h>
 #include <linux/smp.h>
 #include <linux/nmi.h>
 #include <asm/tsc.h>
 
+struct tsc_adjust {
+	s64	bootval;
+	s64	adjusted;
+};
+
+static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
+
+#ifndef CONFIG_SMP
+void __init tsc_store_and_check_tsc_adjust(void)
+{
+	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
+	s64 bootval;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		return;
+
+	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
+	cur->bootval = bootval;
+	cur->adjusted = bootval;
+	pr_info("TSC ADJUST: Boot CPU%u: %lld\n",cpu,  bootval);
+}
+
+#else /* !CONFIG_SMP */
+
+/*
+ * Store and check the TSC ADJUST MSR if available
+ */
+void tsc_store_and_check_tsc_adjust(void)
+{
+	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
+	unsigned int refcpu, cpu = smp_processor_id();
+	s64 bootval;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		return;
+
+	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
+	cur->bootval = bootval;
+
+	/*
+	 * Check whether this CPU is the first in a package to come up. In
+	 * this case do not check the boot value against another package
+	 * because the package might have been physically hotplugged, where
+	 * TSC_ADJUST is expected to be different.
+	 */
+	refcpu = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+
+	if (refcpu >= nr_cpu_ids) {
+		/*
+		 * First online CPU in a package stores the boot value in
+		 * the adjustment value. This value might change later via
+		 * the sync mechanism. If that fails we still can yell
+		 * about boot values not being consistent.
+		 */
+		cur->adjusted = bootval;
+		pr_info_once("TSC ADJUST: Boot CPU%u: %lld\n", cpu,  bootval);
+		return;
+	}
+
+	ref = per_cpu_ptr(&tsc_adjust, refcpu);
+	/*
+	 * Compare the boot value and complain if it differs in the
+	 * package.
+	 */
+	if (bootval != ref->bootval) {
+		pr_warn("TSC ADJUST differs: Reference CPU%u: %lld CPU%u: %lld\n",
+			refcpu, ref->bootval, cpu, bootval);
+	}
+	/*
+	 * The TSC_ADJUST values in a package must be the same. If the boot
+	 * value on this newly upcoming CPU differs from the adjustment
+	 * value of the already online CPU in this package, set it to that
+	 * adjusted value.
+	 */
+	if (bootval != ref->adjusted) {
+		pr_warn("TSC ADJUST synchronize: Reference CPU%u: %lld CPU%u: %lld\n",
+			refcpu, ref->adjusted, cpu, bootval);
+		cur->adjusted = ref->adjusted;
+		wrmsrl(MSR_IA32_TSC_ADJUST, ref->adjusted);
+	}
+}
+
 /*
  * Entry/exit counters that make sure that both CPUs
  * run the measurement code at once:
@@ -202,6 +285,9 @@ void check_tsc_sync_target(void)
 	if (unsynchronized_tsc() || tsc_clocksource_reliable)
 		return;
 
+	/* Store and check the TSC ADJUST MSR */
+	tsc_store_and_check_tsc_adjust();
+
 	/*
 	 * Register this CPU's participation and wait for the
 	 * source CPU to start the measurement:
@@ -223,3 +309,5 @@ void check_tsc_sync_target(void)
 	while (atomic_read(&stop_count) != cpus)
 		cpu_relax();
 }
+
+#endif /* CONFIG_SMP */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle
  2016-11-19 13:47 [patch 0/8] x86/tsc: Utilize TSC_ADJUST MSR Thomas Gleixner
                   ` (2 preceding siblings ...)
  2016-11-19 13:47 ` [patch 3/8] x86/tsc: Store and check TSC ADJUST MSR Thomas Gleixner
@ 2016-11-19 13:47 ` Thomas Gleixner
  2016-11-20 13:10   ` Peter Zijlstra
                     ` (3 more replies)
  2016-11-19 13:47 ` [patch 5/8] x86/tsc: Sync test only for the first cpu in a package Thomas Gleixner
                   ` (3 subsequent siblings)
  7 siblings, 4 replies; 36+ messages in thread
From: Thomas Gleixner @ 2016-11-19 13:47 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Peter Zijlstra, x86, Borislav Petkov, Yinghai Lu

[-- Attachment #1: x86-tsc--Verify-TSC_ADJUST-from-idle.patch --]
[-- Type: text/plain, Size: 2445 bytes --]

When entering idle, it's a good oportunity to verify that the TSC_ADJUST
MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is
detected, emit a warning and restore it to the previous value.

This is especially important for machines, which mark the TSC reliable
because there is no watchdog clocksource available (SoCs).

This is not sufficient for HPC (NOHZ_FULL) situations where a CPU never
goes idle, but adding a timer to do the check periodically is not an option
either. On a machine, which has this issue, the check triggeres right
during boot, so there is a decent chance that the sysadmin will notice.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/tsc.h |    2 ++
 arch/x86/kernel/process.c  |    1 +
 arch/x86/kernel/tsc_sync.c |   20 +++++++++++++++++++-
 3 files changed, 22 insertions(+), 1 deletion(-)

--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -50,8 +50,10 @@ extern void check_tsc_sync_target(void);
 
 #ifdef CONFIG_X86_TSC
 extern void tsc_store_and_check_tsc_adjust(void);
+extern void tsc_verify_tsc_adjust(void);
 #else
 static inline void tsc_store_and_check_tsc_adjust(void) { }
+static inline void tsc_verify_tsc_adjust(void) { }
 #endif
 
 extern int notsc_setup(char *);
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -277,6 +277,7 @@ void exit_idle(void)
 
 void arch_cpu_idle_enter(void)
 {
+	tsc_verify_tsc_adjust();
 	local_touch_nmi();
 	enter_idle();
 }
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -28,6 +28,24 @@ struct tsc_adjust {
 
 static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
 
+void tsc_verify_tsc_adjust(void)
+{
+	struct tsc_adjust *adj = this_cpu_ptr(&tsc_adjust);
+	s64 curval;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		return;
+
+	rdmsrl(MSR_IA32_TSC_ADJUST, curval);
+	if (adj->adjusted == curval)
+		return;
+
+	pr_warn(FW_BUG "TSC ADJUST differs: CPU%u %lld --> %lld. Restoring\n",
+		smp_processor_id(), adj->adjusted, curval);
+
+	wrmsrl(MSR_IA32_TSC_ADJUST, adj->adjusted);
+}
+
 #ifndef CONFIG_SMP
 void __init tsc_store_and_check_tsc_adjust(void)
 {
@@ -40,7 +58,7 @@ void __init tsc_store_and_check_tsc_adju
 	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
 	cur->bootval = bootval;
 	cur->adjusted = bootval;
-	pr_info("TSC ADJUST: Boot CPU%u: %lld\n",cpu,  bootval);
+	pr_info("TSC ADJUST: Boot CPU%u: %lld\n", cpu,  bootval);
 }
 
 #else /* !CONFIG_SMP */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [patch 5/8] x86/tsc: Sync test only for the first cpu in a package
  2016-11-19 13:47 [patch 0/8] x86/tsc: Utilize TSC_ADJUST MSR Thomas Gleixner
                   ` (3 preceding siblings ...)
  2016-11-19 13:47 ` [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle Thomas Gleixner
@ 2016-11-19 13:47 ` Thomas Gleixner
  2016-11-29 16:48   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
  2016-11-29 18:30   ` tip-bot for Thomas Gleixner
  2016-11-19 13:47 ` [patch 6/8] x86/tsc: Move sync cleanup to a safe place Thomas Gleixner
                   ` (2 subsequent siblings)
  7 siblings, 2 replies; 36+ messages in thread
From: Thomas Gleixner @ 2016-11-19 13:47 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Peter Zijlstra, x86, Borislav Petkov, Yinghai Lu

[-- Attachment #1: x86-tsc--Sync-test-only-for-the-first-cpu-in-a-package.patch --]
[-- Type: text/plain, Size: 3722 bytes --]

If the TSC_ADJUST MSR is available all CPUs in a package are forced to the
same value. So TSCs cannot be out of sync when the first CPU in the package
was in sync.

That allows to skip the sync test for all CPUs except the first starting
CPU in a package.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/include/asm/tsc.h |    4 ++--
 arch/x86/kernel/tsc_sync.c |   37 ++++++++++++++++++++++++++++---------
 2 files changed, 30 insertions(+), 11 deletions(-)

--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -49,10 +49,10 @@ extern void check_tsc_sync_source(int cp
 extern void check_tsc_sync_target(void);
 
 #ifdef CONFIG_X86_TSC
-extern void tsc_store_and_check_tsc_adjust(void);
+extern bool tsc_store_and_check_tsc_adjust(void);
 extern void tsc_verify_tsc_adjust(void);
 #else
-static inline void tsc_store_and_check_tsc_adjust(void) { }
+static inline bool tsc_store_and_check_tsc_adjust(void) { }
 static inline void tsc_verify_tsc_adjust(void) { }
 #endif
 
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -47,18 +47,19 @@ void tsc_verify_tsc_adjust(void)
 }
 
 #ifndef CONFIG_SMP
-void __init tsc_store_and_check_tsc_adjust(void)
+bool __init tsc_store_and_check_tsc_adjust(void)
 {
 	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
 	s64 bootval;
 
 	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
-		return;
+		return false;
 
 	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
 	cur->bootval = bootval;
 	cur->adjusted = bootval;
 	pr_info("TSC ADJUST: Boot CPU%u: %lld\n", cpu,  bootval);
+	return false;
 }
 
 #else /* !CONFIG_SMP */
@@ -66,14 +67,14 @@ void __init tsc_store_and_check_tsc_adju
 /*
  * Store and check the TSC ADJUST MSR if available
  */
-void tsc_store_and_check_tsc_adjust(void)
+bool tsc_store_and_check_tsc_adjust(void)
 {
 	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
 	unsigned int refcpu, cpu = smp_processor_id();
 	s64 bootval;
 
 	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
-		return;
+		return false;
 
 	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
 	cur->bootval = bootval;
@@ -95,7 +96,7 @@ void tsc_store_and_check_tsc_adjust(void
 		 */
 		cur->adjusted = bootval;
 		pr_info_once("TSC ADJUST: Boot CPU%u: %lld\n", cpu,  bootval);
-		return;
+		return false;
 	}
 
 	ref = per_cpu_ptr(&tsc_adjust, refcpu);
@@ -119,6 +120,11 @@ void tsc_store_and_check_tsc_adjust(void
 		cur->adjusted = ref->adjusted;
 		wrmsrl(MSR_IA32_TSC_ADJUST, ref->adjusted);
 	}
+	/*
+	 * We have the TSCs forced to be in sync on this package. Skip sync
+	 * test:
+	 */
+	return true;
 }
 
 /*
@@ -127,6 +133,7 @@ void tsc_store_and_check_tsc_adjust(void
  */
 static atomic_t start_count;
 static atomic_t stop_count;
+static atomic_t skip_test;
 
 /*
  * We use a raw spinlock in this exceptional case, because
@@ -250,10 +257,16 @@ void check_tsc_sync_source(int cpu)
 	atomic_set(&stop_count, 0);
 
 	/*
-	 * Wait for the target to arrive:
+	 * Wait for the target to start or to skip the test:
 	 */
-	while (atomic_read(&start_count) != cpus-1)
+	while (atomic_read(&start_count) != cpus - 1) {
+		if (atomic_read(&skip_test) > 0) {
+			atomic_set(&skip_test, 0);
+			return;
+		}
 		cpu_relax();
+	}
+
 	/*
 	 * Trigger the target to continue into the measurement too:
 	 */
@@ -303,8 +316,14 @@ void check_tsc_sync_target(void)
 	if (unsynchronized_tsc() || tsc_clocksource_reliable)
 		return;
 
-	/* Store and check the TSC ADJUST MSR */
-	tsc_store_and_check_tsc_adjust();
+	/*
+	 * Store, verify and sanitize the TSC adjust register. If
+	 * successful skip the test.
+	 */
+	if (tsc_store_and_check_tsc_adjust()) {
+		atomic_inc(&skip_test);
+		return;
+	}
 
 	/*
 	 * Register this CPU's participation and wait for the

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [patch 6/8] x86/tsc: Move sync cleanup to a safe place
  2016-11-19 13:47 [patch 0/8] x86/tsc: Utilize TSC_ADJUST MSR Thomas Gleixner
                   ` (4 preceding siblings ...)
  2016-11-19 13:47 ` [patch 5/8] x86/tsc: Sync test only for the first cpu in a package Thomas Gleixner
@ 2016-11-19 13:47 ` Thomas Gleixner
  2016-11-29 16:48   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
  2016-11-29 18:30   ` tip-bot for Thomas Gleixner
  2016-11-19 13:47 ` [patch 7/8] x86/tsc: Prepare warp test for TSC adjustment Thomas Gleixner
  2016-11-19 13:47 ` [patch 8/8] x86/tsc: Try to adjust TSC if sync test fails Thomas Gleixner
  7 siblings, 2 replies; 36+ messages in thread
From: Thomas Gleixner @ 2016-11-19 13:47 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Peter Zijlstra, x86, Borislav Petkov, Yinghai Lu

[-- Attachment #1: x86-tsc--Move-sync-cleanup-to-a-safe-place.patch --]
[-- Type: text/plain, Size: 849 bytes --]

Cleaning up the stop marker on the control CPU is wrong when we want to add
retry support. Move the cleanup to the starting CPU.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc_sync.c |   10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -252,11 +252,6 @@ void check_tsc_sync_source(int cpu)
 	}
 
 	/*
-	 * Reset it - in case this is a second bootup:
-	 */
-	atomic_set(&stop_count, 0);
-
-	/*
 	 * Wait for the target to start or to skip the test:
 	 */
 	while (atomic_read(&start_count) != cpus - 1) {
@@ -345,6 +340,11 @@ void check_tsc_sync_target(void)
 	 */
 	while (atomic_read(&stop_count) != cpus)
 		cpu_relax();
+
+	/*
+	 * Reset it for the next sync test:
+	 */
+	atomic_set(&stop_count, 0);
 }
 
 #endif /* CONFIG_SMP */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [patch 7/8] x86/tsc: Prepare warp test for TSC adjustment
  2016-11-19 13:47 [patch 0/8] x86/tsc: Utilize TSC_ADJUST MSR Thomas Gleixner
                   ` (5 preceding siblings ...)
  2016-11-19 13:47 ` [patch 6/8] x86/tsc: Move sync cleanup to a safe place Thomas Gleixner
@ 2016-11-19 13:47 ` Thomas Gleixner
  2016-11-29 16:49   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
  2016-11-29 18:31   ` tip-bot for Thomas Gleixner
  2016-11-19 13:47 ` [patch 8/8] x86/tsc: Try to adjust TSC if sync test fails Thomas Gleixner
  7 siblings, 2 replies; 36+ messages in thread
From: Thomas Gleixner @ 2016-11-19 13:47 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Peter Zijlstra, x86, Borislav Petkov, Yinghai Lu

[-- Attachment #1: x86-tsc--Prepare-warp-test-for-TSC-adjustment.patch --]
[-- Type: text/plain, Size: 1323 bytes --]

To allow TSC compensation cross nodes its necessary to know in which
direction the TSC warp was observed. Return the maximum observed value on
the calling CPU so the caller can determine the direction later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc_sync.c |    6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -151,9 +151,9 @@ static int random_warps;
  * TSC-warp measurement loop running on both CPUs.  This is not called
  * if there is no TSC.
  */
-static void check_tsc_warp(unsigned int timeout)
+static cycles_t check_tsc_warp(unsigned int timeout)
 {
-	cycles_t start, now, prev, end;
+	cycles_t start, now, prev, end, cur_max_warp = 0;
 	int i, cur_warps = 0;
 
 	start = rdtsc_ordered();
@@ -194,6 +194,7 @@ static void check_tsc_warp(unsigned int
 		if (unlikely(prev > now)) {
 			arch_spin_lock(&sync_lock);
 			max_warp = max(max_warp, prev - now);
+			cur_max_warp = max_warp;
 			/*
 			 * Check whether this bounces back and forth. Only
 			 * one CPU should observe time going backwards.
@@ -208,6 +209,7 @@ static void check_tsc_warp(unsigned int
 	WARN(!(now-start),
 		"Warning: zero tsc calibration delta: %Ld [max: %Ld]\n",
 			now-start, end-start);
+	return cur_max_warp;
 }
 
 /*

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [patch 8/8] x86/tsc: Try to adjust TSC if sync test fails
  2016-11-19 13:47 [patch 0/8] x86/tsc: Utilize TSC_ADJUST MSR Thomas Gleixner
                   ` (6 preceding siblings ...)
  2016-11-19 13:47 ` [patch 7/8] x86/tsc: Prepare warp test for TSC adjustment Thomas Gleixner
@ 2016-11-19 13:47 ` Thomas Gleixner
  2016-11-29 16:49   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
  2016-11-29 18:31   ` tip-bot for Thomas Gleixner
  7 siblings, 2 replies; 36+ messages in thread
From: Thomas Gleixner @ 2016-11-19 13:47 UTC (permalink / raw)
  To: LKML; +Cc: Ingo Molnar, Peter Zijlstra, x86, Borislav Petkov, Yinghai Lu

[-- Attachment #1: x86-tsc--Try-to-adjust-TSC-if-sync-test-fails.patch --]
[-- Type: text/plain, Size: 5433 bytes --]

If the first CPU of a package comes online, it is necessary to test whether
the TSC is in sync with a CPU on some other package. When a deviation is
observed (time going backward between the two CPUs) the TSC is marked
unstable, which is a problem on large machines as they have to fall back to
the HPET clocksource, which is insanely slow.

It has been attempted to compensate the TSC by adding the offset to the TSC
and writing it back earlier, but it got dropped because it did not turn out
to be stable, especially not on older systems.

Modern systems have become more stable in that regard and the TSC_ADJUST
MSR allows us to compensate for the time deviation in a sane way. If it's
available allow up to three synchronization runs and if a time warp is
detected the starting CPU can compensate the time warp via the TSC_ADJUST
MSR and retry. If the third run still shows a deviation or when random time
warps are detected the test terminally fails.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
---
 arch/x86/kernel/tsc_sync.c |   83 ++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 78 insertions(+), 5 deletions(-)

--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -134,6 +134,7 @@ bool tsc_store_and_check_tsc_adjust(void
 static atomic_t start_count;
 static atomic_t stop_count;
 static atomic_t skip_test;
+static atomic_t test_runs;
 
 /*
  * We use a raw spinlock in this exceptional case, because
@@ -254,6 +255,16 @@ void check_tsc_sync_source(int cpu)
 	}
 
 	/*
+	 * Set the maximum number of test runs to
+	 *  1 if the CPU does not provide the TSC_ADJUST MSR
+	 *  3 if the MSR is available, so the target can try to adjust
+	 */
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		atomic_set(&test_runs, 1);
+	else
+		atomic_set(&test_runs, 3);
+retry:
+	/*
 	 * Wait for the target to start or to skip the test:
 	 */
 	while (atomic_read(&start_count) != cpus - 1) {
@@ -274,7 +285,21 @@ void check_tsc_sync_source(int cpu)
 	while (atomic_read(&stop_count) != cpus-1)
 		cpu_relax();
 
-	if (nr_warps) {
+	/*
+	 * If the test was successful set the number of runs to zero and
+	 * stop. If not, decrement the number of runs an check if we can
+	 * retry. In case of random warps no retry is attempted.
+	 */
+	if (!nr_warps) {
+		atomic_set(&test_runs, 0);
+
+		pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
+			smp_processor_id(), cpu);
+
+	} else if (atomic_dec_and_test(&test_runs) || random_warps) {
+		/* Force it to 0 if random warps brought us here */
+		atomic_set(&test_runs, 0);
+
 		pr_warning("TSC synchronization [CPU#%d -> CPU#%d]:\n",
 			smp_processor_id(), cpu);
 		pr_warning("Measured %Ld cycles TSC warp between CPUs, "
@@ -282,9 +307,6 @@ void check_tsc_sync_source(int cpu)
 		if (random_warps)
 			pr_warning("TSC warped randomly between CPUs\n");
 		mark_tsc_unstable("check_tsc_sync_source failed");
-	} else {
-		pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
-			smp_processor_id(), cpu);
 	}
 
 	/*
@@ -300,6 +322,12 @@ void check_tsc_sync_source(int cpu)
 	 * Let the target continue with the bootup:
 	 */
 	atomic_inc(&stop_count);
+
+	/*
+	 * Retry, if there is a chance to do so.
+	 */
+	if (atomic_read(&test_runs) > 0)
+		goto retry;
 }
 
 /*
@@ -307,6 +335,9 @@ void check_tsc_sync_source(int cpu)
  */
 void check_tsc_sync_target(void)
 {
+	struct tsc_adjust *cur = this_cpu_ptr(&tsc_adjust);
+	unsigned int cpu = smp_processor_id();
+	cycles_t cur_max_warp, gbl_max_warp;
 	int cpus = 2;
 
 	/* Also aborts if there is no TSC. */
@@ -322,6 +353,7 @@ void check_tsc_sync_target(void)
 		return;
 	}
 
+retry:
 	/*
 	 * Register this CPU's participation and wait for the
 	 * source CPU to start the measurement:
@@ -330,7 +362,12 @@ void check_tsc_sync_target(void)
 	while (atomic_read(&start_count) != cpus)
 		cpu_relax();
 
-	check_tsc_warp(loop_timeout(smp_processor_id()));
+	cur_max_warp = check_tsc_warp(loop_timeout(cpu));
+
+	/*
+	 * Store the maximum observed warp value for a potential retry:
+	 */
+	gbl_max_warp = max_warp;
 
 	/*
 	 * Ok, we are done:
@@ -347,6 +384,42 @@ void check_tsc_sync_target(void)
 	 * Reset it for the next sync test:
 	 */
 	atomic_set(&stop_count, 0);
+
+	/*
+	 * Check the number of remaining test runs. If not zero, the test
+	 * failed and a retry with adjusted TSC is possible. If zero the
+	 * test was either successful or failed terminally.
+	 */
+	if (!atomic_read(&test_runs))
+		return;
+
+	/*
+	 * If the warp value of this CPU is 0, then the other CPU
+	 * observed time going backwards so this TSC was ahead and
+	 * needs to move backwards.
+	 */
+	if (!cur_max_warp)
+		cur_max_warp = -gbl_max_warp;
+
+	/*
+	 * Add the result to the previous adjustment value.
+	 *
+	 * The adjustement value is slightly off by the overhead of the
+	 * sync mechanism (observed values are ~200 TSC cycles), but this
+	 * really depends on CPU, node distance and frequency. So
+	 * compensating for this is hard to get right. Experiments show
+	 * that the warp is not longer detectable when the observed warp
+	 * value is used. In the worst case the adjustment needs to go
+	 * through a 3rd run for fine tuning.
+	 */
+	cur->adjusted += cur_max_warp;
+
+	pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp. Adjust: %lld\n",
+		cpu, cur_max_warp, cur->adjusted);
+
+	wrmsrl(MSR_IA32_TSC_ADJUST, cur->adjusted);
+	goto retry;
+
 }
 
 #endif /* CONFIG_SMP */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle
  2016-11-19 13:47 ` [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle Thomas Gleixner
@ 2016-11-20 13:10   ` Peter Zijlstra
  2016-11-21  8:16     ` Thomas Gleixner
  2016-11-21 22:57   ` Andi Kleen
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2016-11-20 13:10 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Ingo Molnar, x86, Borislav Petkov, Yinghai Lu

On Sat, Nov 19, 2016 at 01:47:37PM -0000, Thomas Gleixner wrote:
> When entering idle, it's a good oportunity to verify that the TSC_ADJUST
> MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is
> detected, emit a warning and restore it to the previous value.

> +++ b/arch/x86/kernel/process.c
> @@ -277,6 +277,7 @@ void exit_idle(void)
>  
>  void arch_cpu_idle_enter(void)
>  {
> +	tsc_verify_tsc_adjust();
>  	local_touch_nmi();
>  	enter_idle();
>  }

Doing a RDMSR on the idle path isn't going to be popular. That path is
already way too slow.

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle
  2016-11-20 13:10   ` Peter Zijlstra
@ 2016-11-21  8:16     ` Thomas Gleixner
  2016-11-21 11:06       ` Peter Zijlstra
  0 siblings, 1 reply; 36+ messages in thread
From: Thomas Gleixner @ 2016-11-21  8:16 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Ingo Molnar, x86, Borislav Petkov, Yinghai Lu

On Sun, 20 Nov 2016, Peter Zijlstra wrote:
> On Sat, Nov 19, 2016 at 01:47:37PM -0000, Thomas Gleixner wrote:
> > When entering idle, it's a good oportunity to verify that the TSC_ADJUST
> > MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is
> > detected, emit a warning and restore it to the previous value.
> 
> > +++ b/arch/x86/kernel/process.c
> > @@ -277,6 +277,7 @@ void exit_idle(void)
> >  
> >  void arch_cpu_idle_enter(void)
> >  {
> > +	tsc_verify_tsc_adjust();
> >  	local_touch_nmi();
> >  	enter_idle();
> >  }
> 
> Doing a RDMSR on the idle path isn't going to be popular. That path is
> already way too slow.

Of course we can ratelimit that MSR read with jiffies, but do you have any
better suggestion aside of doing it timer based?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle
  2016-11-21  8:16     ` Thomas Gleixner
@ 2016-11-21 11:06       ` Peter Zijlstra
  2016-11-29  9:17         ` Thomas Gleixner
  0 siblings, 1 reply; 36+ messages in thread
From: Peter Zijlstra @ 2016-11-21 11:06 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: LKML, Ingo Molnar, x86, Borislav Petkov, Yinghai Lu

On Mon, Nov 21, 2016 at 09:16:44AM +0100, Thomas Gleixner wrote:
> On Sun, 20 Nov 2016, Peter Zijlstra wrote:
> > On Sat, Nov 19, 2016 at 01:47:37PM -0000, Thomas Gleixner wrote:
> > > When entering idle, it's a good oportunity to verify that the TSC_ADJUST
> > > MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is
> > > detected, emit a warning and restore it to the previous value.
> > 
> > > +++ b/arch/x86/kernel/process.c
> > > @@ -277,6 +277,7 @@ void exit_idle(void)
> > >  
> > >  void arch_cpu_idle_enter(void)
> > >  {
> > > +	tsc_verify_tsc_adjust();
> > >  	local_touch_nmi();
> > >  	enter_idle();
> > >  }
> > 
> > Doing a RDMSR on the idle path isn't going to be popular. That path is
> > already way too slow.
> 
> Of course we can ratelimit that MSR read with jiffies, but do you have any
> better suggestion aside of doing it timer based?

Not really :/ 

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle
  2016-11-19 13:47 ` [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle Thomas Gleixner
  2016-11-20 13:10   ` Peter Zijlstra
@ 2016-11-21 22:57   ` Andi Kleen
  2016-11-29 16:47   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
  2016-11-29 18:29   ` tip-bot for Thomas Gleixner
  3 siblings, 0 replies; 36+ messages in thread
From: Andi Kleen @ 2016-11-21 22:57 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, x86, Borislav Petkov, Yinghai Lu

Thomas Gleixner <tglx@linutronix.de> writes:

> When entering idle, it's a good oportunity to verify that the TSC_ADJUST
> MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is
> detected, emit a warning and restore it to the previous value.

idle entry is a time critical code path too, because idle periods are
often very short.  rdmsr is a us+ at least. You'll likely make
workloads that do a lot of short sleeps noticeable slower.

If anything load limit it aggressively.

BTW I don't think this situation of SMM tampering with TSC is common
anyways, so it may be simply not worth checking, or perhaps
only with a debug boot option.

-Andi

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle
  2016-11-21 11:06       ` Peter Zijlstra
@ 2016-11-29  9:17         ` Thomas Gleixner
  2016-11-29 14:25           ` Ingo Molnar
  0 siblings, 1 reply; 36+ messages in thread
From: Thomas Gleixner @ 2016-11-29  9:17 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: LKML, Ingo Molnar, x86, Borislav Petkov, Yinghai Lu

On Mon, 21 Nov 2016, Peter Zijlstra wrote:
> On Mon, Nov 21, 2016 at 09:16:44AM +0100, Thomas Gleixner wrote:
> > On Sun, 20 Nov 2016, Peter Zijlstra wrote:
> > > On Sat, Nov 19, 2016 at 01:47:37PM -0000, Thomas Gleixner wrote:
> > > > When entering idle, it's a good oportunity to verify that the TSC_ADJUST
> > > > MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is
> > > > detected, emit a warning and restore it to the previous value.
> > > 
> > > > +++ b/arch/x86/kernel/process.c
> > > > @@ -277,6 +277,7 @@ void exit_idle(void)
> > > >  
> > > >  void arch_cpu_idle_enter(void)
> > > >  {
> > > > +	tsc_verify_tsc_adjust();
> > > >  	local_touch_nmi();
> > > >  	enter_idle();
> > > >  }
> > > 
> > > Doing a RDMSR on the idle path isn't going to be popular. That path is
> > > already way too slow.
> > 
> > Of course we can ratelimit that MSR read with jiffies, but do you have any
> > better suggestion aside of doing it timer based?
> 
> Not really :/ 

Revamped patch below.

Thanks,

	tglx

8<-----------------------

Subject: x86/tsc: Verify TSC_ADJUST from idle
From: Thomas Gleixner <tglx@linutronix.de>
Date: Sat, 19 Nov 2016 13:47:37 -0000

When entering idle, it's a good oportunity to verify that the TSC_ADJUST
MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is
detected, emit a warning and restore it to the previous value.

This is especially important for machines, which mark the TSC reliable
because there is no watchdog clocksource available (SoCs).

This is not sufficient for HPC (NOHZ_FULL) situations where a CPU never
goes idle, but adding a timer to do the check periodically is not an option
either. On a machine, which has this issue, the check triggeres right
during boot, so there is a decent chance that the sysadmin will notice.

Rate limit the check to once per second and warn only once per cpu.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.732180441@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/tsc.h |    2 ++
 arch/x86/kernel/process.c  |    1 +
 arch/x86/kernel/tsc_sync.c |   37 ++++++++++++++++++++++++++++++++++---
 3 files changed, 37 insertions(+), 3 deletions(-)

--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -50,8 +50,10 @@ extern void check_tsc_sync_target(void);
 
 #ifdef CONFIG_X86_TSC
 extern void tsc_store_and_check_tsc_adjust(void);
+extern void tsc_verify_tsc_adjust(void);
 #else
 static inline void tsc_store_and_check_tsc_adjust(void) { }
+static inline void tsc_verify_tsc_adjust(void) { }
 #endif
 
 extern int notsc_setup(char *);
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -277,6 +277,7 @@ void exit_idle(void)
 
 void arch_cpu_idle_enter(void)
 {
+	tsc_verify_tsc_adjust();
 	local_touch_nmi();
 	enter_idle();
 }
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -22,12 +22,42 @@
 #include <asm/tsc.h>
 
 struct tsc_adjust {
-	s64	bootval;
-	s64	adjusted;
+	s64		bootval;
+	s64		adjusted;
+	unsigned long	lastcheck;
+	bool		warned;
 };
 
 static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
 
+void tsc_verify_tsc_adjust(void)
+{
+	struct tsc_adjust *adj = this_cpu_ptr(&tsc_adjust);
+	s64 curval;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		return;
+
+	/* Rate limit the MSR check */
+	if (time_before(jiffies, adj->lastcheck + HZ))
+		return;
+
+	adj->lastcheck = jiffies;
+
+	rdmsrl(MSR_IA32_TSC_ADJUST, curval);
+	if (adj->adjusted == curval)
+		return;
+
+	/* Restore the original value */
+	wrmsrl(MSR_IA32_TSC_ADJUST, adj->adjusted);
+
+	if (!adj->warned) {
+		pr_warn(FW_BUG "TSC ADJUST differs: CPU%u %lld --> %lld. Restoring\n",
+			smp_processor_id(), adj->adjusted, curval);
+		adj->warned = true;
+	}
+}
+
 #ifndef CONFIG_SMP
 void __init tsc_store_and_check_tsc_adjust(void)
 {
@@ -40,7 +70,8 @@ void __init tsc_store_and_check_tsc_adju
 	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
 	cur->bootval = bootval;
 	cur->adjusted = bootval;
-	pr_info("TSC ADJUST: Boot CPU%u: %lld\n",cpu,  bootval);
+	cur->lastcheck = jiffies;
+	pr_info("TSC ADJUST: Boot CPU%u: %lld\n", cpu,  bootval);
 }
 
 #else /* !CONFIG_SMP */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle
  2016-11-29  9:17         ` Thomas Gleixner
@ 2016-11-29 14:25           ` Ingo Molnar
  0 siblings, 0 replies; 36+ messages in thread
From: Ingo Molnar @ 2016-11-29 14:25 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Peter Zijlstra, LKML, x86, Borislav Petkov, Yinghai Lu


* Thomas Gleixner <tglx@linutronix.de> wrote:

> On Mon, 21 Nov 2016, Peter Zijlstra wrote:
> > On Mon, Nov 21, 2016 at 09:16:44AM +0100, Thomas Gleixner wrote:
> > > On Sun, 20 Nov 2016, Peter Zijlstra wrote:
> > > > On Sat, Nov 19, 2016 at 01:47:37PM -0000, Thomas Gleixner wrote:
> > > > > When entering idle, it's a good oportunity to verify that the TSC_ADJUST
> > > > > MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is
> > > > > detected, emit a warning and restore it to the previous value.
> > > > 
> > > > > +++ b/arch/x86/kernel/process.c
> > > > > @@ -277,6 +277,7 @@ void exit_idle(void)
> > > > >  
> > > > >  void arch_cpu_idle_enter(void)
> > > > >  {
> > > > > +	tsc_verify_tsc_adjust();
> > > > >  	local_touch_nmi();
> > > > >  	enter_idle();
> > > > >  }
> > > > 
> > > > Doing a RDMSR on the idle path isn't going to be popular. That path is
> > > > already way too slow.
> > > 
> > > Of course we can ratelimit that MSR read with jiffies, but do you have any
> > > better suggestion aside of doing it timer based?
> > 
> > Not really :/ 
> 
> Revamped patch below.

The whole series:

  Reviewed-by: Ingo Molnar <mingo@kernel.org>

	Ingo

^ permalink raw reply	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art()
  2016-11-19 13:47 ` [patch 1/8] x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() Thomas Gleixner
@ 2016-11-29 16:46   ` tip-bot for Thomas Gleixner
  2016-11-29 18:28   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 16:46 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: mingo, linux-kernel, bp, yinghai, peterz, hpa, tglx

Commit-ID:  67693acebe8c90ec93703b610d9622134249646a
Gitweb:     http://git.kernel.org/tip/67693acebe8c90ec93703b610d9622134249646a
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:33 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 17:29:46 +0100

x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art()

The art detection uses rdmsrl_safe() to detect the availablity of the
TSC_ADJUST MSR.

That's pointless because we have a feature bit for this. Use it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.483561692@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/tsc.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 0ff1ec6..2b27c5a 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1057,18 +1057,20 @@ static void detect_art(void)
 	if (boot_cpu_data.cpuid_level < ART_CPUID_LEAF)
 		return;
 
-	cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
-	      &art_to_tsc_numerator, unused, unused+1);
-
-	/* Don't enable ART in a VM, non-stop TSC required */
+	/* Don't enable ART in a VM, non-stop TSC and TSC_ADJUST required */
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) ||
 	    !boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ||
-	    art_to_tsc_denominator < ART_MIN_DENOMINATOR)
+	    !boot_cpu_has(X86_FEATURE_TSC_ADJUST))
 		return;
 
-	if (rdmsrl_safe(MSR_IA32_TSC_ADJUST, &art_to_tsc_offset))
+	cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
+	      &art_to_tsc_numerator, unused, unused+1);
+
+	if (art_to_tsc_denominator < ART_MIN_DENOMINATOR)
 		return;
 
+	rdmsrl(MSR_IA32_TSC_ADJUST, art_to_tsc_offset);
+
 	/* Make this sticky over multiple CPU init calls */
 	setup_force_cpu_cap(X86_FEATURE_ART);
 }

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Detect random warps
  2016-11-19 13:47 ` [patch 2/8] x86/tsc: Detect random warps Thomas Gleixner
@ 2016-11-29 16:46   ` tip-bot for Thomas Gleixner
  2016-11-29 18:28   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 16:46 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: hpa, bp, mingo, linux-kernel, peterz, tglx, yinghai

Commit-ID:  1115b98b49f321c037b27b81f04ebf3147452ad6
Gitweb:     http://git.kernel.org/tip/1115b98b49f321c037b27b81f04ebf3147452ad6
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:35 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 17:29:46 +0100

x86/tsc: Detect random warps

If time warps can be observed then they should only ever be observed on one
CPU. If they are observed on both CPUs then the system is completely hosed.

Add a check for this condition and notify if it happens.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.574838461@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/tsc_sync.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 78083bf..40f8edd 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -37,6 +37,7 @@ static arch_spinlock_t sync_lock = __ARCH_SPIN_LOCK_UNLOCKED;
 static cycles_t last_tsc;
 static cycles_t max_warp;
 static int nr_warps;
+static int random_warps;
 
 /*
  * TSC-warp measurement loop running on both CPUs.  This is not called
@@ -45,7 +46,7 @@ static int nr_warps;
 static void check_tsc_warp(unsigned int timeout)
 {
 	cycles_t start, now, prev, end;
-	int i;
+	int i, cur_warps = 0;
 
 	start = rdtsc_ordered();
 	/*
@@ -85,7 +86,14 @@ static void check_tsc_warp(unsigned int timeout)
 		if (unlikely(prev > now)) {
 			arch_spin_lock(&sync_lock);
 			max_warp = max(max_warp, prev - now);
+			/*
+			 * Check whether this bounces back and forth. Only
+			 * one CPU should observe time going backwards.
+			 */
+			if (cur_warps != nr_warps)
+				random_warps++;
 			nr_warps++;
+			cur_warps = nr_warps;
 			arch_spin_unlock(&sync_lock);
 		}
 	}
@@ -160,6 +168,8 @@ void check_tsc_sync_source(int cpu)
 			smp_processor_id(), cpu);
 		pr_warning("Measured %Ld cycles TSC warp between CPUs, "
 			   "turning off TSC clock.\n", max_warp);
+		if (random_warps)
+			pr_warning("TSC warped randomly between CPUs\n");
 		mark_tsc_unstable("check_tsc_sync_source failed");
 	} else {
 		pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
@@ -170,6 +180,7 @@ void check_tsc_sync_source(int cpu)
 	 * Reset it - just in case we boot another CPU later:
 	 */
 	atomic_set(&start_count, 0);
+	random_warps = 0;
 	nr_warps = 0;
 	max_warp = 0;
 	last_tsc = 0;

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Store and check TSC ADJUST MSR
  2016-11-19 13:47 ` [patch 3/8] x86/tsc: Store and check TSC ADJUST MSR Thomas Gleixner
@ 2016-11-29 16:47   ` tip-bot for Thomas Gleixner
  2016-11-29 18:29   ` tip-bot for Thomas Gleixner
  2017-01-27 13:36   ` [3/8] " Henning Schild
  2 siblings, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 16:47 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, yinghai, hpa, bp, tglx, mingo, peterz

Commit-ID:  9ad337981a09fb5e53884f6614eb277e1f782460
Gitweb:     http://git.kernel.org/tip/9ad337981a09fb5e53884f6614eb277e1f782460
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:36 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 17:29:47 +0100

x86/tsc: Store and check TSC ADJUST MSR

The TSC_ADJUST MSR shows whether the TSC has been modified. This is helpful
in a two aspects:

1) It allows to detect BIOS wreckage, where SMM code tries to 'hide' the
   cycles spent by storing the TSC value at SMM entry and restoring it at
   SMM exit. On affected machines the TSCs run slowly out of sync up to the
   point where the clocksource watchdog (if available) detects it.

   The TSC_ADJUST MSR allows to detect the TSC modification before that and
   eventually restore it. This is also important for SoCs which have no
   watchdog clocksource and therefore TSC wreckage cannot be detected and
   acted upon.

2) All threads in a package are required to have the same TSC_ADJUST
   value. Broken BIOSes break that and as a result the TSC synchronization
   check fails.

   The TSC_ADJUST MSR allows to detect the deviation when a CPU comes
   online. If detected set it to the value of an already online CPU in the
   same package. This also allows to reduce the number of sync tests
   because with that in place the test is only required for the first CPU
   in a package.

   In principle all CPUs in a system should have the same TSC_ADJUST value
   even across packages, but with physical CPU hotplug this assumption is
   not true because the TSC starts with power on, so physical hotplug has
   to do some trickery to bring the TSC into sync with already running
   packages, which requires to use an TSC_ADJUST value different from CPUs
   which got powered earlier.

   A final enhancement is the opportunity to compensate for unsynced TSCs
   accross nodes at boot time and make the TSC usable that way. It won't
   help for TSCs which run apart due to frequency skew between packages,
   but this gets detected by the clocksource watchdog later.

The first step toward this is to store the TSC_ADJUST value of a starting
CPU and compare it with the value of an already online CPU in the same
package. If they differ, emit a warning and adjust it to the reference
value. The !SMP version just stores the boot value for later verification.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.655323776@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/tsc.h |  6 ++++
 arch/x86/kernel/Makefile   |  2 +-
 arch/x86/kernel/tsc.c      |  2 ++
 arch/x86/kernel/tsc_sync.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 33b6365..1ec0595 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -48,6 +48,12 @@ extern int tsc_clocksource_reliable;
 extern void check_tsc_sync_source(int cpu);
 extern void check_tsc_sync_target(void);
 
+#ifdef CONFIG_X86_TSC
+extern void tsc_store_and_check_tsc_adjust(void);
+#else
+static inline void tsc_store_and_check_tsc_adjust(void) { }
+#endif
+
 extern int notsc_setup(char *);
 extern void tsc_save_sched_clock_state(void);
 extern void tsc_restore_sched_clock_state(void);
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 79076d7..c0ac317 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -75,7 +75,7 @@ apm-y				:= apm_32.o
 obj-$(CONFIG_APM)		+= apm.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_SMP)		+= smpboot.o
-obj-$(CONFIG_SMP)		+= tsc_sync.o
+obj-$(CONFIG_X86_TSC)		+= tsc_sync.o
 obj-$(CONFIG_SMP)		+= setup_percpu.o
 obj-$(CONFIG_X86_MPPARSE)	+= mpparse.o
 obj-y				+= apic/
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2b27c5a..2bb8de4 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1379,6 +1379,8 @@ void __init tsc_init(void)
 
 	if (unsynchronized_tsc())
 		mark_tsc_unstable("TSCs unsynchronized");
+	else
+		tsc_store_and_check_tsc_adjust();
 
 	check_system_tsc_reliable();
 
diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 40f8edd..e4b2c04 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -14,12 +14,95 @@
  * ( The serial nature of the boot logic and the CPU hotplug lock
  *   protects against more than 2 CPUs entering this code. )
  */
+#include <linux/topology.h>
 #include <linux/spinlock.h>
 #include <linux/kernel.h>
 #include <linux/smp.h>
 #include <linux/nmi.h>
 #include <asm/tsc.h>
 
+struct tsc_adjust {
+	s64	bootval;
+	s64	adjusted;
+};
+
+static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
+
+#ifndef CONFIG_SMP
+void __init tsc_store_and_check_tsc_adjust(void)
+{
+	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
+	s64 bootval;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		return;
+
+	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
+	cur->bootval = bootval;
+	cur->adjusted = bootval;
+	pr_info("TSC ADJUST: Boot CPU%u: %lld\n",cpu, bootval);
+}
+
+#else /* !CONFIG_SMP */
+
+/*
+ * Store and check the TSC ADJUST MSR if available
+ */
+void tsc_store_and_check_tsc_adjust(void)
+{
+	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
+	unsigned int refcpu, cpu = smp_processor_id();
+	s64 bootval;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		return;
+
+	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
+	cur->bootval = bootval;
+
+	/*
+	 * Check whether this CPU is the first in a package to come up. In
+	 * this case do not check the boot value against another package
+	 * because the package might have been physically hotplugged, where
+	 * TSC_ADJUST is expected to be different.
+	 */
+	refcpu = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+
+	if (refcpu >= nr_cpu_ids) {
+		/*
+		 * First online CPU in a package stores the boot value in
+		 * the adjustment value. This value might change later via
+		 * the sync mechanism. If that fails we still can yell
+		 * about boot values not being consistent.
+		 */
+		cur->adjusted = bootval;
+		pr_info_once("TSC ADJUST: Boot CPU%u: %lld\n", cpu,  bootval);
+		return;
+	}
+
+	ref = per_cpu_ptr(&tsc_adjust, refcpu);
+	/*
+	 * Compare the boot value and complain if it differs in the
+	 * package.
+	 */
+	if (bootval != ref->bootval) {
+		pr_warn("TSC ADJUST differs: Reference CPU%u: %lld CPU%u: %lld\n",
+			refcpu, ref->bootval, cpu, bootval);
+	}
+	/*
+	 * The TSC_ADJUST values in a package must be the same. If the boot
+	 * value on this newly upcoming CPU differs from the adjustment
+	 * value of the already online CPU in this package, set it to that
+	 * adjusted value.
+	 */
+	if (bootval != ref->adjusted) {
+		pr_warn("TSC ADJUST synchronize: Reference CPU%u: %lld CPU%u: %lld\n",
+			refcpu, ref->adjusted, cpu, bootval);
+		cur->adjusted = ref->adjusted;
+		wrmsrl(MSR_IA32_TSC_ADJUST, ref->adjusted);
+	}
+}
+
 /*
  * Entry/exit counters that make sure that both CPUs
  * run the measurement code at once:
@@ -202,6 +285,9 @@ void check_tsc_sync_target(void)
 	if (unsynchronized_tsc() || tsc_clocksource_reliable)
 		return;
 
+	/* Store and check the TSC ADJUST MSR */
+	tsc_store_and_check_tsc_adjust();
+
 	/*
 	 * Register this CPU's participation and wait for the
 	 * source CPU to start the measurement:
@@ -223,3 +309,5 @@ void check_tsc_sync_target(void)
 	while (atomic_read(&stop_count) != cpus)
 		cpu_relax();
 }
+
+#endif /* CONFIG_SMP */

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Verify TSC_ADJUST from idle
  2016-11-19 13:47 ` [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle Thomas Gleixner
  2016-11-20 13:10   ` Peter Zijlstra
  2016-11-21 22:57   ` Andi Kleen
@ 2016-11-29 16:47   ` tip-bot for Thomas Gleixner
  2016-11-29 18:29   ` tip-bot for Thomas Gleixner
  3 siblings, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 16:47 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: tglx, hpa, peterz, linux-kernel, mingo, yinghai, bp

Commit-ID:  cea7d48122b134e64d9617cc0a0ad5d53a407ea9
Gitweb:     http://git.kernel.org/tip/cea7d48122b134e64d9617cc0a0ad5d53a407ea9
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:37 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 17:29:47 +0100

x86/tsc: Verify TSC_ADJUST from idle

When entering idle, it's a good oportunity to verify that the TSC_ADJUST
MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is
detected, emit a warning and restore it to the previous value.

This is especially important for machines, which mark the TSC reliable
because there is no watchdog clocksource available (SoCs).

This is not sufficient for HPC (NOHZ_FULL) situations where a CPU never
goes idle, but adding a timer to do the check periodically is not an option
either. On a machine, which has this issue, the check triggeres right
during boot, so there is a decent chance that the sysadmin will notice.

Rate limit the check to once per second and warn only once per cpu.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.732180441@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/tsc.h |  2 ++
 arch/x86/kernel/process.c  |  1 +
 arch/x86/kernel/tsc_sync.c | 37 +++++++++++++++++++++++++++++++++++--
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 1ec0595..b896e9e 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -50,8 +50,10 @@ extern void check_tsc_sync_target(void);
 
 #ifdef CONFIG_X86_TSC
 extern void tsc_store_and_check_tsc_adjust(void);
+extern void tsc_verify_tsc_adjust(void);
 #else
 static inline void tsc_store_and_check_tsc_adjust(void) { }
+static inline void tsc_verify_tsc_adjust(void) { }
 #endif
 
 extern int notsc_setup(char *);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0888a87..4fe5dc8 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -277,6 +277,7 @@ void exit_idle(void)
 
 void arch_cpu_idle_enter(void)
 {
+	tsc_verify_tsc_adjust();
 	local_touch_nmi();
 	enter_idle();
 }
diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index e4b2c04..f9c291e 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -22,12 +22,42 @@
 #include <asm/tsc.h>
 
 struct tsc_adjust {
-	s64	bootval;
-	s64	adjusted;
+	s64		bootval;
+	s64		adjusted;
+	unsigned long	nextcheck;
+	bool		warned;
 };
 
 static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
 
+void tsc_verify_tsc_adjust(void)
+{
+	struct tsc_adjust *adj = this_cpu_ptr(&tsc_adjust);
+	s64 curval;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		return;
+
+	/* Rate limit the MSR check */
+	if (time_before(jiffies, adj->nextcheck))
+		return;
+
+	adj->nextcheck = jiffies + HZ;
+
+	rdmsrl(MSR_IA32_TSC_ADJUST, curval);
+	if (adj->adjusted == curval)
+		return;
+
+	/* Restore the original value */
+	wrmsrl(MSR_IA32_TSC_ADJUST, adj->adjusted);
+
+	if (!adj->warned) {
+		pr_warn(FW_BUG "TSC ADJUST differs: CPU%u %lld --> %lld. Restoring\n",
+			smp_processor_id(), adj->adjusted, curval);
+		adj->warned = true;
+	}
+}
+
 #ifndef CONFIG_SMP
 void __init tsc_store_and_check_tsc_adjust(void)
 {
@@ -40,6 +70,7 @@ void __init tsc_store_and_check_tsc_adjust(void)
 	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
 	cur->bootval = bootval;
 	cur->adjusted = bootval;
+	cur->nextcheck = jiffies + HZ;
 	pr_info("TSC ADJUST: Boot CPU%u: %lld\n",cpu, bootval);
 }
 
@@ -59,6 +90,8 @@ void tsc_store_and_check_tsc_adjust(void)
 
 	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
 	cur->bootval = bootval;
+	cur->nextcheck = jiffies + HZ;
+	cur->warned = false;
 
 	/*
 	 * Check whether this CPU is the first in a package to come up. In

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Sync test only for the first cpu in a package
  2016-11-19 13:47 ` [patch 5/8] x86/tsc: Sync test only for the first cpu in a package Thomas Gleixner
@ 2016-11-29 16:48   ` tip-bot for Thomas Gleixner
  2016-11-29 18:30   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 16:48 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: tglx, peterz, yinghai, mingo, linux-kernel, bp, hpa

Commit-ID:  ba75fb64693127c7b6e8a822c68d3480cbf56d6d
Gitweb:     http://git.kernel.org/tip/ba75fb64693127c7b6e8a822c68d3480cbf56d6d
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:39 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 17:29:47 +0100

x86/tsc: Sync test only for the first cpu in a package

If the TSC_ADJUST MSR is available all CPUs in a package are forced to the
same value. So TSCs cannot be out of sync when the first CPU in the package
was in sync.

That allows to skip the sync test for all CPUs except the first starting
CPU in a package.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.809901363@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/tsc.h |  4 ++--
 arch/x86/kernel/tsc_sync.c | 37 ++++++++++++++++++++++++++++---------
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index b896e9e..04721d5 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -49,10 +49,10 @@ extern void check_tsc_sync_source(int cpu);
 extern void check_tsc_sync_target(void);
 
 #ifdef CONFIG_X86_TSC
-extern void tsc_store_and_check_tsc_adjust(void);
+extern bool tsc_store_and_check_tsc_adjust(void);
 extern void tsc_verify_tsc_adjust(void);
 #else
-static inline void tsc_store_and_check_tsc_adjust(void) { }
+static inline bool tsc_store_and_check_tsc_adjust(void) { }
 static inline void tsc_verify_tsc_adjust(void) { }
 #endif
 
diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index f9c291e..1770f60 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -59,19 +59,20 @@ void tsc_verify_tsc_adjust(void)
 }
 
 #ifndef CONFIG_SMP
-void __init tsc_store_and_check_tsc_adjust(void)
+bool __init tsc_store_and_check_tsc_adjust(void)
 {
 	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
 	s64 bootval;
 
 	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
-		return;
+		return false;
 
 	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
 	cur->bootval = bootval;
 	cur->adjusted = bootval;
 	cur->nextcheck = jiffies + HZ;
 	pr_info("TSC ADJUST: Boot CPU%u: %lld\n",cpu, bootval);
+	return false;
 }
 
 #else /* !CONFIG_SMP */
@@ -79,14 +80,14 @@ void __init tsc_store_and_check_tsc_adjust(void)
 /*
  * Store and check the TSC ADJUST MSR if available
  */
-void tsc_store_and_check_tsc_adjust(void)
+bool tsc_store_and_check_tsc_adjust(void)
 {
 	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
 	unsigned int refcpu, cpu = smp_processor_id();
 	s64 bootval;
 
 	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
-		return;
+		return false;
 
 	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
 	cur->bootval = bootval;
@@ -110,7 +111,7 @@ void tsc_store_and_check_tsc_adjust(void)
 		 */
 		cur->adjusted = bootval;
 		pr_info_once("TSC ADJUST: Boot CPU%u: %lld\n", cpu,  bootval);
-		return;
+		return false;
 	}
 
 	ref = per_cpu_ptr(&tsc_adjust, refcpu);
@@ -134,6 +135,11 @@ void tsc_store_and_check_tsc_adjust(void)
 		cur->adjusted = ref->adjusted;
 		wrmsrl(MSR_IA32_TSC_ADJUST, ref->adjusted);
 	}
+	/*
+	 * We have the TSCs forced to be in sync on this package. Skip sync
+	 * test:
+	 */
+	return true;
 }
 
 /*
@@ -142,6 +148,7 @@ void tsc_store_and_check_tsc_adjust(void)
  */
 static atomic_t start_count;
 static atomic_t stop_count;
+static atomic_t skip_test;
 
 /*
  * We use a raw spinlock in this exceptional case, because
@@ -265,10 +272,16 @@ void check_tsc_sync_source(int cpu)
 	atomic_set(&stop_count, 0);
 
 	/*
-	 * Wait for the target to arrive:
+	 * Wait for the target to start or to skip the test:
 	 */
-	while (atomic_read(&start_count) != cpus-1)
+	while (atomic_read(&start_count) != cpus - 1) {
+		if (atomic_read(&skip_test) > 0) {
+			atomic_set(&skip_test, 0);
+			return;
+		}
 		cpu_relax();
+	}
+
 	/*
 	 * Trigger the target to continue into the measurement too:
 	 */
@@ -318,8 +331,14 @@ void check_tsc_sync_target(void)
 	if (unsynchronized_tsc() || tsc_clocksource_reliable)
 		return;
 
-	/* Store and check the TSC ADJUST MSR */
-	tsc_store_and_check_tsc_adjust();
+	/*
+	 * Store, verify and sanitize the TSC adjust register. If
+	 * successful skip the test.
+	 */
+	if (tsc_store_and_check_tsc_adjust()) {
+		atomic_inc(&skip_test);
+		return;
+	}
 
 	/*
 	 * Register this CPU's participation and wait for the

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Move sync cleanup to a safe place
  2016-11-19 13:47 ` [patch 6/8] x86/tsc: Move sync cleanup to a safe place Thomas Gleixner
@ 2016-11-29 16:48   ` tip-bot for Thomas Gleixner
  2016-11-29 18:30   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 16:48 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: linux-kernel, peterz, mingo, tglx, hpa, yinghai, bp

Commit-ID:  ce68ad5a095f151426cab18cc8fad9c968421da3
Gitweb:     http://git.kernel.org/tip/ce68ad5a095f151426cab18cc8fad9c968421da3
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:40 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 17:29:48 +0100

x86/tsc: Move sync cleanup to a safe place

Cleaning up the stop marker on the control CPU is wrong when we want to add
retry support. Move the cleanup to the starting CPU.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.892095627@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/tsc_sync.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 1770f60..8642223 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -267,11 +267,6 @@ void check_tsc_sync_source(int cpu)
 	}
 
 	/*
-	 * Reset it - in case this is a second bootup:
-	 */
-	atomic_set(&stop_count, 0);
-
-	/*
 	 * Wait for the target to start or to skip the test:
 	 */
 	while (atomic_read(&start_count) != cpus - 1) {
@@ -360,6 +355,11 @@ void check_tsc_sync_target(void)
 	 */
 	while (atomic_read(&stop_count) != cpus)
 		cpu_relax();
+
+	/*
+	 * Reset it for the next sync test:
+	 */
+	atomic_set(&stop_count, 0);
 }
 
 #endif /* CONFIG_SMP */

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Prepare warp test for TSC adjustment
  2016-11-19 13:47 ` [patch 7/8] x86/tsc: Prepare warp test for TSC adjustment Thomas Gleixner
@ 2016-11-29 16:49   ` tip-bot for Thomas Gleixner
  2016-11-29 18:31   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 16:49 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: bp, hpa, mingo, linux-kernel, yinghai, peterz, tglx

Commit-ID:  1e849cff359ccd8b6f3096205e3759e70f8d30ea
Gitweb:     http://git.kernel.org/tip/1e849cff359ccd8b6f3096205e3759e70f8d30ea
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:41 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 17:29:48 +0100

x86/tsc: Prepare warp test for TSC adjustment

To allow TSC compensation cross nodes its necessary to know in which
direction the TSC warp was observed. Return the maximum observed value on
the calling CPU so the caller can determine the direction later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.970859287@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/tsc_sync.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 8642223..8a4ef7e 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -166,9 +166,9 @@ static int random_warps;
  * TSC-warp measurement loop running on both CPUs.  This is not called
  * if there is no TSC.
  */
-static void check_tsc_warp(unsigned int timeout)
+static cycles_t check_tsc_warp(unsigned int timeout)
 {
-	cycles_t start, now, prev, end;
+	cycles_t start, now, prev, end, cur_max_warp = 0;
 	int i, cur_warps = 0;
 
 	start = rdtsc_ordered();
@@ -209,6 +209,7 @@ static void check_tsc_warp(unsigned int timeout)
 		if (unlikely(prev > now)) {
 			arch_spin_lock(&sync_lock);
 			max_warp = max(max_warp, prev - now);
+			cur_max_warp = max_warp;
 			/*
 			 * Check whether this bounces back and forth. Only
 			 * one CPU should observe time going backwards.
@@ -223,6 +224,7 @@ static void check_tsc_warp(unsigned int timeout)
 	WARN(!(now-start),
 		"Warning: zero tsc calibration delta: %Ld [max: %Ld]\n",
 			now-start, end-start);
+	return cur_max_warp;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Try to adjust TSC if sync test fails
  2016-11-19 13:47 ` [patch 8/8] x86/tsc: Try to adjust TSC if sync test fails Thomas Gleixner
@ 2016-11-29 16:49   ` tip-bot for Thomas Gleixner
  2016-11-29 18:31   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 16:49 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: tglx, bp, linux-kernel, hpa, mingo, peterz, yinghai

Commit-ID:  950e7481b09cd3f426a9322b0f35139224660c0d
Gitweb:     http://git.kernel.org/tip/950e7481b09cd3f426a9322b0f35139224660c0d
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:43 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 17:29:49 +0100

x86/tsc: Try to adjust TSC if sync test fails

If the first CPU of a package comes online, it is necessary to test whether
the TSC is in sync with a CPU on some other package. When a deviation is
observed (time going backwards between the two CPUs) the TSC is marked
unstable, which is a problem on large machines as they have to fall back to
the HPET clocksource, which is insanely slow.

It has been attempted to compensate the TSC by adding the offset to the TSC
and writing it back some time ago, but this never was merged because it did
not turn out to be stable, especially not on older systems.

Modern systems have become more stable in that regard and the TSC_ADJUST
MSR allows us to compensate for the time deviation in a sane way. If it's
available allow up to three synchronization runs and if a time warp is
detected the starting CPU can compensate the time warp via the TSC_ADJUST
MSR and retry. If the third run still shows a deviation or when random time
warps are detected the test terminally fails.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134018.048237517@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/tsc_sync.c | 83 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 78 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 8a4ef7e..cf70b30 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -149,6 +149,7 @@ bool tsc_store_and_check_tsc_adjust(void)
 static atomic_t start_count;
 static atomic_t stop_count;
 static atomic_t skip_test;
+static atomic_t test_runs;
 
 /*
  * We use a raw spinlock in this exceptional case, because
@@ -269,6 +270,16 @@ void check_tsc_sync_source(int cpu)
 	}
 
 	/*
+	 * Set the maximum number of test runs to
+	 *  1 if the CPU does not provide the TSC_ADJUST MSR
+	 *  3 if the MSR is available, so the target can try to adjust
+	 */
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		atomic_set(&test_runs, 1);
+	else
+		atomic_set(&test_runs, 3);
+retry:
+	/*
 	 * Wait for the target to start or to skip the test:
 	 */
 	while (atomic_read(&start_count) != cpus - 1) {
@@ -289,7 +300,21 @@ void check_tsc_sync_source(int cpu)
 	while (atomic_read(&stop_count) != cpus-1)
 		cpu_relax();
 
-	if (nr_warps) {
+	/*
+	 * If the test was successful set the number of runs to zero and
+	 * stop. If not, decrement the number of runs an check if we can
+	 * retry. In case of random warps no retry is attempted.
+	 */
+	if (!nr_warps) {
+		atomic_set(&test_runs, 0);
+
+		pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
+			smp_processor_id(), cpu);
+
+	} else if (atomic_dec_and_test(&test_runs) || random_warps) {
+		/* Force it to 0 if random warps brought us here */
+		atomic_set(&test_runs, 0);
+
 		pr_warning("TSC synchronization [CPU#%d -> CPU#%d]:\n",
 			smp_processor_id(), cpu);
 		pr_warning("Measured %Ld cycles TSC warp between CPUs, "
@@ -297,9 +322,6 @@ void check_tsc_sync_source(int cpu)
 		if (random_warps)
 			pr_warning("TSC warped randomly between CPUs\n");
 		mark_tsc_unstable("check_tsc_sync_source failed");
-	} else {
-		pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
-			smp_processor_id(), cpu);
 	}
 
 	/*
@@ -315,6 +337,12 @@ void check_tsc_sync_source(int cpu)
 	 * Let the target continue with the bootup:
 	 */
 	atomic_inc(&stop_count);
+
+	/*
+	 * Retry, if there is a chance to do so.
+	 */
+	if (atomic_read(&test_runs) > 0)
+		goto retry;
 }
 
 /*
@@ -322,6 +350,9 @@ void check_tsc_sync_source(int cpu)
  */
 void check_tsc_sync_target(void)
 {
+	struct tsc_adjust *cur = this_cpu_ptr(&tsc_adjust);
+	unsigned int cpu = smp_processor_id();
+	cycles_t cur_max_warp, gbl_max_warp;
 	int cpus = 2;
 
 	/* Also aborts if there is no TSC. */
@@ -337,6 +368,7 @@ void check_tsc_sync_target(void)
 		return;
 	}
 
+retry:
 	/*
 	 * Register this CPU's participation and wait for the
 	 * source CPU to start the measurement:
@@ -345,7 +377,12 @@ void check_tsc_sync_target(void)
 	while (atomic_read(&start_count) != cpus)
 		cpu_relax();
 
-	check_tsc_warp(loop_timeout(smp_processor_id()));
+	cur_max_warp = check_tsc_warp(loop_timeout(cpu));
+
+	/*
+	 * Store the maximum observed warp value for a potential retry:
+	 */
+	gbl_max_warp = max_warp;
 
 	/*
 	 * Ok, we are done:
@@ -362,6 +399,42 @@ void check_tsc_sync_target(void)
 	 * Reset it for the next sync test:
 	 */
 	atomic_set(&stop_count, 0);
+
+	/*
+	 * Check the number of remaining test runs. If not zero, the test
+	 * failed and a retry with adjusted TSC is possible. If zero the
+	 * test was either successful or failed terminally.
+	 */
+	if (!atomic_read(&test_runs))
+		return;
+
+	/*
+	 * If the warp value of this CPU is 0, then the other CPU
+	 * observed time going backwards so this TSC was ahead and
+	 * needs to move backwards.
+	 */
+	if (!cur_max_warp)
+		cur_max_warp = -gbl_max_warp;
+
+	/*
+	 * Add the result to the previous adjustment value.
+	 *
+	 * The adjustement value is slightly off by the overhead of the
+	 * sync mechanism (observed values are ~200 TSC cycles), but this
+	 * really depends on CPU, node distance and frequency. So
+	 * compensating for this is hard to get right. Experiments show
+	 * that the warp is not longer detectable when the observed warp
+	 * value is used. In the worst case the adjustment needs to go
+	 * through a 3rd run for fine tuning.
+	 */
+	cur->adjusted += cur_max_warp;
+
+	pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp. Adjust: %lld\n",
+		cpu, cur_max_warp, cur->adjusted);
+
+	wrmsrl(MSR_IA32_TSC_ADJUST, cur->adjusted);
+	goto retry;
+
 }
 
 #endif /* CONFIG_SMP */

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art()
  2016-11-19 13:47 ` [patch 1/8] x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() Thomas Gleixner
  2016-11-29 16:46   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
@ 2016-11-29 18:28   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 18:28 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: yinghai, linux-kernel, tglx, peterz, hpa, mingo, bp

Commit-ID:  7b3d2f6e08ed5eb6bcf6912938f7a542405f8e8e
Gitweb:     http://git.kernel.org/tip/7b3d2f6e08ed5eb6bcf6912938f7a542405f8e8e
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:33 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 19:23:15 +0100

x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art()

The art detection uses rdmsrl_safe() to detect the availablity of the
TSC_ADJUST MSR.

That's pointless because we have a feature bit for this. Use it.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.483561692@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/tsc.c | 14 ++++++++------
 1 file changed, 8 insertions(+), 6 deletions(-)

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 0ff1ec6..2b27c5a 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1057,18 +1057,20 @@ static void detect_art(void)
 	if (boot_cpu_data.cpuid_level < ART_CPUID_LEAF)
 		return;
 
-	cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
-	      &art_to_tsc_numerator, unused, unused+1);
-
-	/* Don't enable ART in a VM, non-stop TSC required */
+	/* Don't enable ART in a VM, non-stop TSC and TSC_ADJUST required */
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) ||
 	    !boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ||
-	    art_to_tsc_denominator < ART_MIN_DENOMINATOR)
+	    !boot_cpu_has(X86_FEATURE_TSC_ADJUST))
 		return;
 
-	if (rdmsrl_safe(MSR_IA32_TSC_ADJUST, &art_to_tsc_offset))
+	cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
+	      &art_to_tsc_numerator, unused, unused+1);
+
+	if (art_to_tsc_denominator < ART_MIN_DENOMINATOR)
 		return;
 
+	rdmsrl(MSR_IA32_TSC_ADJUST, art_to_tsc_offset);
+
 	/* Make this sticky over multiple CPU init calls */
 	setup_force_cpu_cap(X86_FEATURE_ART);
 }

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Detect random warps
  2016-11-19 13:47 ` [patch 2/8] x86/tsc: Detect random warps Thomas Gleixner
  2016-11-29 16:46   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
@ 2016-11-29 18:28   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 18:28 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: yinghai, bp, linux-kernel, hpa, tglx, peterz, mingo

Commit-ID:  bec8520dca0d27c1ddac703f9d0a78275ca2603e
Gitweb:     http://git.kernel.org/tip/bec8520dca0d27c1ddac703f9d0a78275ca2603e
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:35 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 19:23:15 +0100

x86/tsc: Detect random warps

If time warps can be observed then they should only ever be observed on one
CPU. If they are observed on both CPUs then the system is completely hosed.

Add a check for this condition and notify if it happens.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.574838461@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/tsc_sync.c | 13 ++++++++++++-
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 78083bf..40f8edd 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -37,6 +37,7 @@ static arch_spinlock_t sync_lock = __ARCH_SPIN_LOCK_UNLOCKED;
 static cycles_t last_tsc;
 static cycles_t max_warp;
 static int nr_warps;
+static int random_warps;
 
 /*
  * TSC-warp measurement loop running on both CPUs.  This is not called
@@ -45,7 +46,7 @@ static int nr_warps;
 static void check_tsc_warp(unsigned int timeout)
 {
 	cycles_t start, now, prev, end;
-	int i;
+	int i, cur_warps = 0;
 
 	start = rdtsc_ordered();
 	/*
@@ -85,7 +86,14 @@ static void check_tsc_warp(unsigned int timeout)
 		if (unlikely(prev > now)) {
 			arch_spin_lock(&sync_lock);
 			max_warp = max(max_warp, prev - now);
+			/*
+			 * Check whether this bounces back and forth. Only
+			 * one CPU should observe time going backwards.
+			 */
+			if (cur_warps != nr_warps)
+				random_warps++;
 			nr_warps++;
+			cur_warps = nr_warps;
 			arch_spin_unlock(&sync_lock);
 		}
 	}
@@ -160,6 +168,8 @@ void check_tsc_sync_source(int cpu)
 			smp_processor_id(), cpu);
 		pr_warning("Measured %Ld cycles TSC warp between CPUs, "
 			   "turning off TSC clock.\n", max_warp);
+		if (random_warps)
+			pr_warning("TSC warped randomly between CPUs\n");
 		mark_tsc_unstable("check_tsc_sync_source failed");
 	} else {
 		pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
@@ -170,6 +180,7 @@ void check_tsc_sync_source(int cpu)
 	 * Reset it - just in case we boot another CPU later:
 	 */
 	atomic_set(&start_count, 0);
+	random_warps = 0;
 	nr_warps = 0;
 	max_warp = 0;
 	last_tsc = 0;

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Store and check TSC ADJUST MSR
  2016-11-19 13:47 ` [patch 3/8] x86/tsc: Store and check TSC ADJUST MSR Thomas Gleixner
  2016-11-29 16:47   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
@ 2016-11-29 18:29   ` tip-bot for Thomas Gleixner
  2017-01-27 13:36   ` [3/8] " Henning Schild
  2 siblings, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 18:29 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: hpa, linux-kernel, yinghai, tglx, mingo, bp, peterz

Commit-ID:  8b223bc7abe0e30e8d297a24ee6c6c07ef8d0bb9
Gitweb:     http://git.kernel.org/tip/8b223bc7abe0e30e8d297a24ee6c6c07ef8d0bb9
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:36 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 19:23:16 +0100

x86/tsc: Store and check TSC ADJUST MSR

The TSC_ADJUST MSR shows whether the TSC has been modified. This is helpful
in a two aspects:

1) It allows to detect BIOS wreckage, where SMM code tries to 'hide' the
   cycles spent by storing the TSC value at SMM entry and restoring it at
   SMM exit. On affected machines the TSCs run slowly out of sync up to the
   point where the clocksource watchdog (if available) detects it.

   The TSC_ADJUST MSR allows to detect the TSC modification before that and
   eventually restore it. This is also important for SoCs which have no
   watchdog clocksource and therefore TSC wreckage cannot be detected and
   acted upon.

2) All threads in a package are required to have the same TSC_ADJUST
   value. Broken BIOSes break that and as a result the TSC synchronization
   check fails.

   The TSC_ADJUST MSR allows to detect the deviation when a CPU comes
   online. If detected set it to the value of an already online CPU in the
   same package. This also allows to reduce the number of sync tests
   because with that in place the test is only required for the first CPU
   in a package.

   In principle all CPUs in a system should have the same TSC_ADJUST value
   even across packages, but with physical CPU hotplug this assumption is
   not true because the TSC starts with power on, so physical hotplug has
   to do some trickery to bring the TSC into sync with already running
   packages, which requires to use an TSC_ADJUST value different from CPUs
   which got powered earlier.

   A final enhancement is the opportunity to compensate for unsynced TSCs
   accross nodes at boot time and make the TSC usable that way. It won't
   help for TSCs which run apart due to frequency skew between packages,
   but this gets detected by the clocksource watchdog later.

The first step toward this is to store the TSC_ADJUST value of a starting
CPU and compare it with the value of an already online CPU in the same
package. If they differ, emit a warning and adjust it to the reference
value. The !SMP version just stores the boot value for later verification.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.655323776@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/tsc.h |  6 ++++
 arch/x86/kernel/Makefile   |  2 +-
 arch/x86/kernel/tsc.c      |  2 ++
 arch/x86/kernel/tsc_sync.c | 88 ++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 97 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 33b6365..1ec0595 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -48,6 +48,12 @@ extern int tsc_clocksource_reliable;
 extern void check_tsc_sync_source(int cpu);
 extern void check_tsc_sync_target(void);
 
+#ifdef CONFIG_X86_TSC
+extern void tsc_store_and_check_tsc_adjust(void);
+#else
+static inline void tsc_store_and_check_tsc_adjust(void) { }
+#endif
+
 extern int notsc_setup(char *);
 extern void tsc_save_sched_clock_state(void);
 extern void tsc_restore_sched_clock_state(void);
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 79076d7..c0ac317 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -75,7 +75,7 @@ apm-y				:= apm_32.o
 obj-$(CONFIG_APM)		+= apm.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_SMP)		+= smpboot.o
-obj-$(CONFIG_SMP)		+= tsc_sync.o
+obj-$(CONFIG_X86_TSC)		+= tsc_sync.o
 obj-$(CONFIG_SMP)		+= setup_percpu.o
 obj-$(CONFIG_X86_MPPARSE)	+= mpparse.o
 obj-y				+= apic/
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 2b27c5a..2bb8de4 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1379,6 +1379,8 @@ void __init tsc_init(void)
 
 	if (unsynchronized_tsc())
 		mark_tsc_unstable("TSCs unsynchronized");
+	else
+		tsc_store_and_check_tsc_adjust();
 
 	check_system_tsc_reliable();
 
diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 40f8edd..bd2bd5e 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -14,12 +14,95 @@
  * ( The serial nature of the boot logic and the CPU hotplug lock
  *   protects against more than 2 CPUs entering this code. )
  */
+#include <linux/topology.h>
 #include <linux/spinlock.h>
 #include <linux/kernel.h>
 #include <linux/smp.h>
 #include <linux/nmi.h>
 #include <asm/tsc.h>
 
+struct tsc_adjust {
+	s64	bootval;
+	s64	adjusted;
+};
+
+static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
+
+#ifndef CONFIG_SMP
+void __init tsc_store_and_check_tsc_adjust(void)
+{
+	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
+	s64 bootval;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		return;
+
+	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
+	cur->bootval = bootval;
+	cur->adjusted = bootval;
+	pr_info("TSC ADJUST: Boot CPU0: %lld\n", bootval);
+}
+
+#else /* !CONFIG_SMP */
+
+/*
+ * Store and check the TSC ADJUST MSR if available
+ */
+void tsc_store_and_check_tsc_adjust(void)
+{
+	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
+	unsigned int refcpu, cpu = smp_processor_id();
+	s64 bootval;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		return;
+
+	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
+	cur->bootval = bootval;
+
+	/*
+	 * Check whether this CPU is the first in a package to come up. In
+	 * this case do not check the boot value against another package
+	 * because the package might have been physically hotplugged, where
+	 * TSC_ADJUST is expected to be different.
+	 */
+	refcpu = cpumask_any_but(topology_core_cpumask(cpu), cpu);
+
+	if (refcpu >= nr_cpu_ids) {
+		/*
+		 * First online CPU in a package stores the boot value in
+		 * the adjustment value. This value might change later via
+		 * the sync mechanism. If that fails we still can yell
+		 * about boot values not being consistent.
+		 */
+		cur->adjusted = bootval;
+		pr_info_once("TSC ADJUST: Boot CPU%u: %lld\n", cpu,  bootval);
+		return;
+	}
+
+	ref = per_cpu_ptr(&tsc_adjust, refcpu);
+	/*
+	 * Compare the boot value and complain if it differs in the
+	 * package.
+	 */
+	if (bootval != ref->bootval) {
+		pr_warn("TSC ADJUST differs: Reference CPU%u: %lld CPU%u: %lld\n",
+			refcpu, ref->bootval, cpu, bootval);
+	}
+	/*
+	 * The TSC_ADJUST values in a package must be the same. If the boot
+	 * value on this newly upcoming CPU differs from the adjustment
+	 * value of the already online CPU in this package, set it to that
+	 * adjusted value.
+	 */
+	if (bootval != ref->adjusted) {
+		pr_warn("TSC ADJUST synchronize: Reference CPU%u: %lld CPU%u: %lld\n",
+			refcpu, ref->adjusted, cpu, bootval);
+		cur->adjusted = ref->adjusted;
+		wrmsrl(MSR_IA32_TSC_ADJUST, ref->adjusted);
+	}
+}
+
 /*
  * Entry/exit counters that make sure that both CPUs
  * run the measurement code at once:
@@ -202,6 +285,9 @@ void check_tsc_sync_target(void)
 	if (unsynchronized_tsc() || tsc_clocksource_reliable)
 		return;
 
+	/* Store and check the TSC ADJUST MSR */
+	tsc_store_and_check_tsc_adjust();
+
 	/*
 	 * Register this CPU's participation and wait for the
 	 * source CPU to start the measurement:
@@ -223,3 +309,5 @@ void check_tsc_sync_target(void)
 	while (atomic_read(&stop_count) != cpus)
 		cpu_relax();
 }
+
+#endif /* CONFIG_SMP */

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Verify TSC_ADJUST from idle
  2016-11-19 13:47 ` [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle Thomas Gleixner
                     ` (2 preceding siblings ...)
  2016-11-29 16:47   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
@ 2016-11-29 18:29   ` tip-bot for Thomas Gleixner
  3 siblings, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 18:29 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: mingo, peterz, linux-kernel, bp, tglx, yinghai, hpa

Commit-ID:  1d0095feea591bbd94f35d8a98aed746319783e1
Gitweb:     http://git.kernel.org/tip/1d0095feea591bbd94f35d8a98aed746319783e1
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:37 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 19:23:16 +0100

x86/tsc: Verify TSC_ADJUST from idle

When entering idle, it's a good oportunity to verify that the TSC_ADJUST
MSR has not been tampered with (BIOS hiding SMM cycles). If tampering is
detected, emit a warning and restore it to the previous value.

This is especially important for machines, which mark the TSC reliable
because there is no watchdog clocksource available (SoCs).

This is not sufficient for HPC (NOHZ_FULL) situations where a CPU never
goes idle, but adding a timer to do the check periodically is not an option
either. On a machine, which has this issue, the check triggeres right
during boot, so there is a decent chance that the sysadmin will notice.

Rate limit the check to once per second and warn only once per cpu.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.732180441@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/tsc.h |  2 ++
 arch/x86/kernel/process.c  |  1 +
 arch/x86/kernel/tsc_sync.c | 37 +++++++++++++++++++++++++++++++++++--
 3 files changed, 38 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 1ec0595..b896e9e 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -50,8 +50,10 @@ extern void check_tsc_sync_target(void);
 
 #ifdef CONFIG_X86_TSC
 extern void tsc_store_and_check_tsc_adjust(void);
+extern void tsc_verify_tsc_adjust(void);
 #else
 static inline void tsc_store_and_check_tsc_adjust(void) { }
+static inline void tsc_verify_tsc_adjust(void) { }
 #endif
 
 extern int notsc_setup(char *);
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0888a87..4fe5dc8 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -277,6 +277,7 @@ void exit_idle(void)
 
 void arch_cpu_idle_enter(void)
 {
+	tsc_verify_tsc_adjust();
 	local_touch_nmi();
 	enter_idle();
 }
diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index bd2bd5e..f713e84 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -22,12 +22,42 @@
 #include <asm/tsc.h>
 
 struct tsc_adjust {
-	s64	bootval;
-	s64	adjusted;
+	s64		bootval;
+	s64		adjusted;
+	unsigned long	nextcheck;
+	bool		warned;
 };
 
 static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
 
+void tsc_verify_tsc_adjust(void)
+{
+	struct tsc_adjust *adj = this_cpu_ptr(&tsc_adjust);
+	s64 curval;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		return;
+
+	/* Rate limit the MSR check */
+	if (time_before(jiffies, adj->nextcheck))
+		return;
+
+	adj->nextcheck = jiffies + HZ;
+
+	rdmsrl(MSR_IA32_TSC_ADJUST, curval);
+	if (adj->adjusted == curval)
+		return;
+
+	/* Restore the original value */
+	wrmsrl(MSR_IA32_TSC_ADJUST, adj->adjusted);
+
+	if (!adj->warned) {
+		pr_warn(FW_BUG "TSC ADJUST differs: CPU%u %lld --> %lld. Restoring\n",
+			smp_processor_id(), adj->adjusted, curval);
+		adj->warned = true;
+	}
+}
+
 #ifndef CONFIG_SMP
 void __init tsc_store_and_check_tsc_adjust(void)
 {
@@ -40,6 +70,7 @@ void __init tsc_store_and_check_tsc_adjust(void)
 	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
 	cur->bootval = bootval;
 	cur->adjusted = bootval;
+	cur->nextcheck = jiffies + HZ;
 	pr_info("TSC ADJUST: Boot CPU0: %lld\n", bootval);
 }
 
@@ -59,6 +90,8 @@ void tsc_store_and_check_tsc_adjust(void)
 
 	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
 	cur->bootval = bootval;
+	cur->nextcheck = jiffies + HZ;
+	cur->warned = false;
 
 	/*
 	 * Check whether this CPU is the first in a package to come up. In

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Sync test only for the first cpu in a package
  2016-11-19 13:47 ` [patch 5/8] x86/tsc: Sync test only for the first cpu in a package Thomas Gleixner
  2016-11-29 16:48   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
@ 2016-11-29 18:30   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 18:30 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: peterz, tglx, hpa, yinghai, mingo, bp, linux-kernel

Commit-ID:  a36f5136814b6a87601220927cb9ad9ecc731e92
Gitweb:     http://git.kernel.org/tip/a36f5136814b6a87601220927cb9ad9ecc731e92
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:39 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 19:23:17 +0100

x86/tsc: Sync test only for the first cpu in a package

If the TSC_ADJUST MSR is available all CPUs in a package are forced to the
same value. So TSCs cannot be out of sync when the first CPU in the package
was in sync.

That allows to skip the sync test for all CPUs except the first starting
CPU in a package.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.809901363@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/include/asm/tsc.h |  4 ++--
 arch/x86/kernel/tsc_sync.c | 37 ++++++++++++++++++++++++++++---------
 2 files changed, 30 insertions(+), 11 deletions(-)

diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index b896e9e..04721d5 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -49,10 +49,10 @@ extern void check_tsc_sync_source(int cpu);
 extern void check_tsc_sync_target(void);
 
 #ifdef CONFIG_X86_TSC
-extern void tsc_store_and_check_tsc_adjust(void);
+extern bool tsc_store_and_check_tsc_adjust(void);
 extern void tsc_verify_tsc_adjust(void);
 #else
-static inline void tsc_store_and_check_tsc_adjust(void) { }
+static inline bool tsc_store_and_check_tsc_adjust(void) { }
 static inline void tsc_verify_tsc_adjust(void) { }
 #endif
 
diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index f713e84..fdb3b7b 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -59,19 +59,20 @@ void tsc_verify_tsc_adjust(void)
 }
 
 #ifndef CONFIG_SMP
-void __init tsc_store_and_check_tsc_adjust(void)
+bool __init tsc_store_and_check_tsc_adjust(void)
 {
 	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
 	s64 bootval;
 
 	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
-		return;
+		return false;
 
 	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
 	cur->bootval = bootval;
 	cur->adjusted = bootval;
 	cur->nextcheck = jiffies + HZ;
 	pr_info("TSC ADJUST: Boot CPU0: %lld\n", bootval);
+	return false;
 }
 
 #else /* !CONFIG_SMP */
@@ -79,14 +80,14 @@ void __init tsc_store_and_check_tsc_adjust(void)
 /*
  * Store and check the TSC ADJUST MSR if available
  */
-void tsc_store_and_check_tsc_adjust(void)
+bool tsc_store_and_check_tsc_adjust(void)
 {
 	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
 	unsigned int refcpu, cpu = smp_processor_id();
 	s64 bootval;
 
 	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
-		return;
+		return false;
 
 	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
 	cur->bootval = bootval;
@@ -110,7 +111,7 @@ void tsc_store_and_check_tsc_adjust(void)
 		 */
 		cur->adjusted = bootval;
 		pr_info_once("TSC ADJUST: Boot CPU%u: %lld\n", cpu,  bootval);
-		return;
+		return false;
 	}
 
 	ref = per_cpu_ptr(&tsc_adjust, refcpu);
@@ -134,6 +135,11 @@ void tsc_store_and_check_tsc_adjust(void)
 		cur->adjusted = ref->adjusted;
 		wrmsrl(MSR_IA32_TSC_ADJUST, ref->adjusted);
 	}
+	/*
+	 * We have the TSCs forced to be in sync on this package. Skip sync
+	 * test:
+	 */
+	return true;
 }
 
 /*
@@ -142,6 +148,7 @@ void tsc_store_and_check_tsc_adjust(void)
  */
 static atomic_t start_count;
 static atomic_t stop_count;
+static atomic_t skip_test;
 
 /*
  * We use a raw spinlock in this exceptional case, because
@@ -265,10 +272,16 @@ void check_tsc_sync_source(int cpu)
 	atomic_set(&stop_count, 0);
 
 	/*
-	 * Wait for the target to arrive:
+	 * Wait for the target to start or to skip the test:
 	 */
-	while (atomic_read(&start_count) != cpus-1)
+	while (atomic_read(&start_count) != cpus - 1) {
+		if (atomic_read(&skip_test) > 0) {
+			atomic_set(&skip_test, 0);
+			return;
+		}
 		cpu_relax();
+	}
+
 	/*
 	 * Trigger the target to continue into the measurement too:
 	 */
@@ -318,8 +331,14 @@ void check_tsc_sync_target(void)
 	if (unsynchronized_tsc() || tsc_clocksource_reliable)
 		return;
 
-	/* Store and check the TSC ADJUST MSR */
-	tsc_store_and_check_tsc_adjust();
+	/*
+	 * Store, verify and sanitize the TSC adjust register. If
+	 * successful skip the test.
+	 */
+	if (tsc_store_and_check_tsc_adjust()) {
+		atomic_inc(&skip_test);
+		return;
+	}
 
 	/*
 	 * Register this CPU's participation and wait for the

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Move sync cleanup to a safe place
  2016-11-19 13:47 ` [patch 6/8] x86/tsc: Move sync cleanup to a safe place Thomas Gleixner
  2016-11-29 16:48   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
@ 2016-11-29 18:30   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 18:30 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: yinghai, bp, linux-kernel, mingo, tglx, hpa, peterz

Commit-ID:  4c5e3c63752162262c42424147f319b8571a20af
Gitweb:     http://git.kernel.org/tip/4c5e3c63752162262c42424147f319b8571a20af
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:40 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 19:23:18 +0100

x86/tsc: Move sync cleanup to a safe place

Cleaning up the stop marker on the control CPU is wrong when we want to add
retry support. Move the cleanup to the starting CPU.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.892095627@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/tsc_sync.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index fdb3b7b..8f394eeb 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -267,11 +267,6 @@ void check_tsc_sync_source(int cpu)
 	}
 
 	/*
-	 * Reset it - in case this is a second bootup:
-	 */
-	atomic_set(&stop_count, 0);
-
-	/*
 	 * Wait for the target to start or to skip the test:
 	 */
 	while (atomic_read(&start_count) != cpus - 1) {
@@ -360,6 +355,11 @@ void check_tsc_sync_target(void)
 	 */
 	while (atomic_read(&stop_count) != cpus)
 		cpu_relax();
+
+	/*
+	 * Reset it for the next sync test:
+	 */
+	atomic_set(&stop_count, 0);
 }
 
 #endif /* CONFIG_SMP */

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Prepare warp test for TSC adjustment
  2016-11-19 13:47 ` [patch 7/8] x86/tsc: Prepare warp test for TSC adjustment Thomas Gleixner
  2016-11-29 16:49   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
@ 2016-11-29 18:31   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 18:31 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: tglx, hpa, yinghai, mingo, peterz, bp, linux-kernel

Commit-ID:  76d3b85158509cafec5be7675a97ef80118e288e
Gitweb:     http://git.kernel.org/tip/76d3b85158509cafec5be7675a97ef80118e288e
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:41 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 19:23:18 +0100

x86/tsc: Prepare warp test for TSC adjustment

To allow TSC compensation cross nodes its necessary to know in which
direction the TSC warp was observed. Return the maximum observed value on
the calling CPU so the caller can determine the direction later.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134017.970859287@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/tsc_sync.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 8f394eeb..9ad074c 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -166,9 +166,9 @@ static int random_warps;
  * TSC-warp measurement loop running on both CPUs.  This is not called
  * if there is no TSC.
  */
-static void check_tsc_warp(unsigned int timeout)
+static cycles_t check_tsc_warp(unsigned int timeout)
 {
-	cycles_t start, now, prev, end;
+	cycles_t start, now, prev, end, cur_max_warp = 0;
 	int i, cur_warps = 0;
 
 	start = rdtsc_ordered();
@@ -209,6 +209,7 @@ static void check_tsc_warp(unsigned int timeout)
 		if (unlikely(prev > now)) {
 			arch_spin_lock(&sync_lock);
 			max_warp = max(max_warp, prev - now);
+			cur_max_warp = max_warp;
 			/*
 			 * Check whether this bounces back and forth. Only
 			 * one CPU should observe time going backwards.
@@ -223,6 +224,7 @@ static void check_tsc_warp(unsigned int timeout)
 	WARN(!(now-start),
 		"Warning: zero tsc calibration delta: %Ld [max: %Ld]\n",
 			now-start, end-start);
+	return cur_max_warp;
 }
 
 /*

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* [tip:x86/timers] x86/tsc: Try to adjust TSC if sync test fails
  2016-11-19 13:47 ` [patch 8/8] x86/tsc: Try to adjust TSC if sync test fails Thomas Gleixner
  2016-11-29 16:49   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
@ 2016-11-29 18:31   ` tip-bot for Thomas Gleixner
  1 sibling, 0 replies; 36+ messages in thread
From: tip-bot for Thomas Gleixner @ 2016-11-29 18:31 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: yinghai, linux-kernel, mingo, bp, peterz, tglx, hpa

Commit-ID:  cc4db26899dcd0e6ff0448c77abd8eb61b1a1333
Gitweb:     http://git.kernel.org/tip/cc4db26899dcd0e6ff0448c77abd8eb61b1a1333
Author:     Thomas Gleixner <tglx@linutronix.de>
AuthorDate: Sat, 19 Nov 2016 13:47:43 +0000
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Tue, 29 Nov 2016 19:23:18 +0100

x86/tsc: Try to adjust TSC if sync test fails

If the first CPU of a package comes online, it is necessary to test whether
the TSC is in sync with a CPU on some other package. When a deviation is
observed (time going backwards between the two CPUs) the TSC is marked
unstable, which is a problem on large machines as they have to fall back to
the HPET clocksource, which is insanely slow.

It has been attempted to compensate the TSC by adding the offset to the TSC
and writing it back some time ago, but this never was merged because it did
not turn out to be stable, especially not on older systems.

Modern systems have become more stable in that regard and the TSC_ADJUST
MSR allows us to compensate for the time deviation in a sane way. If it's
available allow up to three synchronization runs and if a time warp is
detected the starting CPU can compensate the time warp via the TSC_ADJUST
MSR and retry. If the third run still shows a deviation or when random time
warps are detected the test terminally fails.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yinghai Lu <yinghai@kernel.org>
Cc: Borislav Petkov <bp@alien8.de>
Link: http://lkml.kernel.org/r/20161119134018.048237517@linutronix.de
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

---
 arch/x86/kernel/tsc_sync.c | 83 +++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 78 insertions(+), 5 deletions(-)

diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 9ad074c..d7f48a6 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -149,6 +149,7 @@ bool tsc_store_and_check_tsc_adjust(void)
 static atomic_t start_count;
 static atomic_t stop_count;
 static atomic_t skip_test;
+static atomic_t test_runs;
 
 /*
  * We use a raw spinlock in this exceptional case, because
@@ -269,6 +270,16 @@ void check_tsc_sync_source(int cpu)
 	}
 
 	/*
+	 * Set the maximum number of test runs to
+	 *  1 if the CPU does not provide the TSC_ADJUST MSR
+	 *  3 if the MSR is available, so the target can try to adjust
+	 */
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		atomic_set(&test_runs, 1);
+	else
+		atomic_set(&test_runs, 3);
+retry:
+	/*
 	 * Wait for the target to start or to skip the test:
 	 */
 	while (atomic_read(&start_count) != cpus - 1) {
@@ -289,7 +300,21 @@ void check_tsc_sync_source(int cpu)
 	while (atomic_read(&stop_count) != cpus-1)
 		cpu_relax();
 
-	if (nr_warps) {
+	/*
+	 * If the test was successful set the number of runs to zero and
+	 * stop. If not, decrement the number of runs an check if we can
+	 * retry. In case of random warps no retry is attempted.
+	 */
+	if (!nr_warps) {
+		atomic_set(&test_runs, 0);
+
+		pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
+			smp_processor_id(), cpu);
+
+	} else if (atomic_dec_and_test(&test_runs) || random_warps) {
+		/* Force it to 0 if random warps brought us here */
+		atomic_set(&test_runs, 0);
+
 		pr_warning("TSC synchronization [CPU#%d -> CPU#%d]:\n",
 			smp_processor_id(), cpu);
 		pr_warning("Measured %Ld cycles TSC warp between CPUs, "
@@ -297,9 +322,6 @@ void check_tsc_sync_source(int cpu)
 		if (random_warps)
 			pr_warning("TSC warped randomly between CPUs\n");
 		mark_tsc_unstable("check_tsc_sync_source failed");
-	} else {
-		pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
-			smp_processor_id(), cpu);
 	}
 
 	/*
@@ -315,6 +337,12 @@ void check_tsc_sync_source(int cpu)
 	 * Let the target continue with the bootup:
 	 */
 	atomic_inc(&stop_count);
+
+	/*
+	 * Retry, if there is a chance to do so.
+	 */
+	if (atomic_read(&test_runs) > 0)
+		goto retry;
 }
 
 /*
@@ -322,6 +350,9 @@ void check_tsc_sync_source(int cpu)
  */
 void check_tsc_sync_target(void)
 {
+	struct tsc_adjust *cur = this_cpu_ptr(&tsc_adjust);
+	unsigned int cpu = smp_processor_id();
+	cycles_t cur_max_warp, gbl_max_warp;
 	int cpus = 2;
 
 	/* Also aborts if there is no TSC. */
@@ -337,6 +368,7 @@ void check_tsc_sync_target(void)
 		return;
 	}
 
+retry:
 	/*
 	 * Register this CPU's participation and wait for the
 	 * source CPU to start the measurement:
@@ -345,7 +377,12 @@ void check_tsc_sync_target(void)
 	while (atomic_read(&start_count) != cpus)
 		cpu_relax();
 
-	check_tsc_warp(loop_timeout(smp_processor_id()));
+	cur_max_warp = check_tsc_warp(loop_timeout(cpu));
+
+	/*
+	 * Store the maximum observed warp value for a potential retry:
+	 */
+	gbl_max_warp = max_warp;
 
 	/*
 	 * Ok, we are done:
@@ -362,6 +399,42 @@ void check_tsc_sync_target(void)
 	 * Reset it for the next sync test:
 	 */
 	atomic_set(&stop_count, 0);
+
+	/*
+	 * Check the number of remaining test runs. If not zero, the test
+	 * failed and a retry with adjusted TSC is possible. If zero the
+	 * test was either successful or failed terminally.
+	 */
+	if (!atomic_read(&test_runs))
+		return;
+
+	/*
+	 * If the warp value of this CPU is 0, then the other CPU
+	 * observed time going backwards so this TSC was ahead and
+	 * needs to move backwards.
+	 */
+	if (!cur_max_warp)
+		cur_max_warp = -gbl_max_warp;
+
+	/*
+	 * Add the result to the previous adjustment value.
+	 *
+	 * The adjustement value is slightly off by the overhead of the
+	 * sync mechanism (observed values are ~200 TSC cycles), but this
+	 * really depends on CPU, node distance and frequency. So
+	 * compensating for this is hard to get right. Experiments show
+	 * that the warp is not longer detectable when the observed warp
+	 * value is used. In the worst case the adjustment needs to go
+	 * through a 3rd run for fine tuning.
+	 */
+	cur->adjusted += cur_max_warp;
+
+	pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp. Adjust: %lld\n",
+		cpu, cur_max_warp, cur->adjusted);
+
+	wrmsrl(MSR_IA32_TSC_ADJUST, cur->adjusted);
+	goto retry;
+
 }
 
 #endif /* CONFIG_SMP */

^ permalink raw reply related	[flat|nested] 36+ messages in thread

* Re: [3/8] x86/tsc: Store and check TSC ADJUST MSR
  2016-11-19 13:47 ` [patch 3/8] x86/tsc: Store and check TSC ADJUST MSR Thomas Gleixner
  2016-11-29 16:47   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
  2016-11-29 18:29   ` tip-bot for Thomas Gleixner
@ 2017-01-27 13:36   ` Henning Schild
  2017-01-30 10:20     ` Thomas Gleixner
  2 siblings, 1 reply; 36+ messages in thread
From: Henning Schild @ 2017-01-27 13:36 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Borislav Petkov, Yinghai Lu

Thomas,

did you by any chance look into TSC synchronization by adjusting the
absolute value (MSR_IA32_TSC) as well? As far as i have seen Linux did
that a long time ago and eventually it was stopped because it caused
more harm than good.
https://github.com/torvalds/linux/commit/95492e4646e5de8b43d9a7908d6177fb737b61f0

Before your series the whole TSC code was not touched in a long time,
with a few attempts to introduce synchronization over the years.
i.e.
https://lkml.org/lkml/2015/11/9/639
which turned out to be the BIOS messing with the ADJUST MSR

The ADJUST MSR offers an easy way to synchronize, still taking care of
all the special cases resulted in an 8-patch series. Synching without
that using the absolute value is likely much harder, but that series
might be a good foundation.

The big question is whether we can rely on all future CPUs to 
support that MSR. Do "new MSRs" disappear again at some point? If we
can not rely on the ADJUST MSR, now might be a good time to revisit the
idea of synching the absolute values.

I remember having read somewhere that this series might get backported
to longterm kernels, what is the status on that?

regards,
Henning

On Sat, 19 Nov 2016 13:47:36 +0000
Thomas Gleixner <tglx@linutronix.de> wrote:

> The TSC_ADJUST MSR shows whether the TSC has been modified. This is
> helpful in a two aspects:
> 
> 1) It allows to detect BIOS wreckage, where SMM code tries to 'hide'
> the cycles spent by storing the TSC value at SMM entry and restoring
> it at SMM exit. On affected machines the TSCs run slowly out of sync
> up to the point where the clocksource watchdog (if available) detects
> it.
> 
>    The TSC_ADJUST MSR allows to detect the TSC modification before
> that and eventually restore it. This is also important for SoCs which
> have no watchdog clocksource and therefore TSC wreckage cannot be
> detected and acted upon.
> 
> 2) All threads in a package are required to have the same TSC_ADJUST
>    value. Broken BIOSes break that and as a result the TSC
> synchronization check fails.
> 
>    The TSC_ADJUST MSR allows to detect the deviation when a CPU comes
>    online. If detected set it to the value of an already online CPU
> in the same package. This also allows to reduce the number of sync
> tests because with that in place the test is only required for the
> first CPU in a package.
> 
>    In principle all CPUs in a system should have the same TSC_ADJUST
> value even across packages, but with physical CPU hotplug this
> assumption is not true because the TSC starts with power on, so
> physical hotplug has to do some trickery to bring the TSC into sync
> with already running packages, which requires to use an TSC_ADJUST
> value different from CPUs which got powered earlier.
> 
>    A final enhancement is the opportunity to compensate for unsynced
> TSCs accross nodes at boot time and make the TSC usable that way. It
> won't help for TSCs which run apart due to frequency skew between
> packages, but this gets detected by the clocksource watchdog later.
> 
> The first step toward this is to store the TSC_ADJUST value of a
> starting CPU and compare it with the value of an already online CPU
> in the same package. If they differ, emit a warning and adjust it to
> the reference value. The !SMP version just stores the boot value for
> later verification.
> 
> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
> ---
>  arch/x86/include/asm/tsc.h |    6 +++
>  arch/x86/kernel/Makefile   |    2 -
>  arch/x86/kernel/tsc.c      |    2 +
>  arch/x86/kernel/tsc_sync.c |   88
> +++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 97
> insertions(+), 1 deletion(-)
> 
> --- a/arch/x86/include/asm/tsc.h
> +++ b/arch/x86/include/asm/tsc.h
> @@ -48,6 +48,12 @@ extern int tsc_clocksource_reliable;
>  extern void check_tsc_sync_source(int cpu);
>  extern void check_tsc_sync_target(void);
>  
> +#ifdef CONFIG_X86_TSC
> +extern void tsc_store_and_check_tsc_adjust(void);
> +#else
> +static inline void tsc_store_and_check_tsc_adjust(void) { }
> +#endif
> +
>  extern int notsc_setup(char *);
>  extern void tsc_save_sched_clock_state(void);
>  extern void tsc_restore_sched_clock_state(void);
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -75,7 +75,7 @@ apm-y				:= apm_32.o
>  obj-$(CONFIG_APM)		+= apm.o
>  obj-$(CONFIG_SMP)		+= smp.o
>  obj-$(CONFIG_SMP)		+= smpboot.o
> -obj-$(CONFIG_SMP)		+= tsc_sync.o
> +obj-$(CONFIG_X86_TSC)		+= tsc_sync.o
>  obj-$(CONFIG_SMP)		+= setup_percpu.o
>  obj-$(CONFIG_X86_MPPARSE)	+= mpparse.o
>  obj-y				+= apic/
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -1365,6 +1365,8 @@ void __init tsc_init(void)
>  
>  	if (unsynchronized_tsc())
>  		mark_tsc_unstable("TSCs unsynchronized");
> +	else
> +		tsc_store_and_check_tsc_adjust();
>  
>  	check_system_tsc_reliable();
>  
> --- a/arch/x86/kernel/tsc_sync.c
> +++ b/arch/x86/kernel/tsc_sync.c
> @@ -14,12 +14,95 @@
>   * ( The serial nature of the boot logic and the CPU hotplug lock
>   *   protects against more than 2 CPUs entering this code. )
>   */
> +#include <linux/topology.h>
>  #include <linux/spinlock.h>
>  #include <linux/kernel.h>
>  #include <linux/smp.h>
>  #include <linux/nmi.h>
>  #include <asm/tsc.h>
>  
> +struct tsc_adjust {
> +	s64	bootval;
> +	s64	adjusted;
> +};
> +
> +static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
> +
> +#ifndef CONFIG_SMP
> +void __init tsc_store_and_check_tsc_adjust(void)
> +{
> +	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
> +	s64 bootval;
> +
> +	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
> +		return;
> +
> +	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
> +	cur->bootval = bootval;
> +	cur->adjusted = bootval;
> +	pr_info("TSC ADJUST: Boot CPU%u: %lld\n",cpu,  bootval);
> +}
> +
> +#else /* !CONFIG_SMP */
> +
> +/*
> + * Store and check the TSC ADJUST MSR if available
> + */
> +void tsc_store_and_check_tsc_adjust(void)
> +{
> +	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
> +	unsigned int refcpu, cpu = smp_processor_id();
> +	s64 bootval;
> +
> +	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
> +		return;
> +
> +	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
> +	cur->bootval = bootval;
> +
> +	/*
> +	 * Check whether this CPU is the first in a package to come
> up. In
> +	 * this case do not check the boot value against another
> package
> +	 * because the package might have been physically
> hotplugged, where
> +	 * TSC_ADJUST is expected to be different.
> +	 */
> +	refcpu = cpumask_any_but(topology_core_cpumask(cpu), cpu);
> +
> +	if (refcpu >= nr_cpu_ids) {
> +		/*
> +		 * First online CPU in a package stores the boot
> value in
> +		 * the adjustment value. This value might change
> later via
> +		 * the sync mechanism. If that fails we still can
> yell
> +		 * about boot values not being consistent.
> +		 */
> +		cur->adjusted = bootval;
> +		pr_info_once("TSC ADJUST: Boot CPU%u: %lld\n", cpu,
> bootval);
> +		return;
> +	}
> +
> +	ref = per_cpu_ptr(&tsc_adjust, refcpu);
> +	/*
> +	 * Compare the boot value and complain if it differs in the
> +	 * package.
> +	 */
> +	if (bootval != ref->bootval) {
> +		pr_warn("TSC ADJUST differs: Reference CPU%u: %lld
> CPU%u: %lld\n",
> +			refcpu, ref->bootval, cpu, bootval);
> +	}
> +	/*
> +	 * The TSC_ADJUST values in a package must be the same. If
> the boot
> +	 * value on this newly upcoming CPU differs from the
> adjustment
> +	 * value of the already online CPU in this package, set it
> to that
> +	 * adjusted value.
> +	 */
> +	if (bootval != ref->adjusted) {
> +		pr_warn("TSC ADJUST synchronize: Reference CPU%u:
> %lld CPU%u: %lld\n",
> +			refcpu, ref->adjusted, cpu, bootval);
> +		cur->adjusted = ref->adjusted;
> +		wrmsrl(MSR_IA32_TSC_ADJUST, ref->adjusted);
> +	}
> +}
> +
>  /*
>   * Entry/exit counters that make sure that both CPUs
>   * run the measurement code at once:
> @@ -202,6 +285,9 @@ void check_tsc_sync_target(void)
>  	if (unsynchronized_tsc() || tsc_clocksource_reliable)
>  		return;
>  
> +	/* Store and check the TSC ADJUST MSR */
> +	tsc_store_and_check_tsc_adjust();
> +
>  	/*
>  	 * Register this CPU's participation and wait for the
>  	 * source CPU to start the measurement:
> @@ -223,3 +309,5 @@ void check_tsc_sync_target(void)
>  	while (atomic_read(&stop_count) != cpus)
>  		cpu_relax();
>  }
> +
> +#endif /* CONFIG_SMP */

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [3/8] x86/tsc: Store and check TSC ADJUST MSR
  2017-01-27 13:36   ` [3/8] " Henning Schild
@ 2017-01-30 10:20     ` Thomas Gleixner
  2017-01-30 12:13       ` Henning Schild
  0 siblings, 1 reply; 36+ messages in thread
From: Thomas Gleixner @ 2017-01-30 10:20 UTC (permalink / raw)
  To: Henning Schild
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Borislav Petkov, Yinghai Lu

Henning,

On Fri, 27 Jan 2017, Henning Schild wrote:
> 
> did you by any chance look into TSC synchronization by adjusting the
> absolute value (MSR_IA32_TSC) as well? As far as i have seen Linux did
> that a long time ago and eventually it was stopped because it caused more
> harm than good.

I was involved in both developing the TSC sync patches and ripping them out
again.

The problem with writing TSC directly is that you really _CANNOT_ reliably
handle run-time differences and SMI/NMI induced deltas. With the adjust MRS
it's a halfways sane thing to do, except for the brokeness of that botch
job vs. the TSC deadline timer.

> The ADJUST MSR offers an easy way to synchronize, still taking care of
> all the special cases resulted in an 8-patch series. Synching without
> that using the absolute value is likely much harder, but that series
> might be a good foundation.

I'm not even thinking about bringing the pure TSC based sync back.

> The big question is whether we can rely on all future CPUs to 
> support that MSR. Do "new MSRs" disappear again at some point? If we
> can not rely on the ADJUST MSR, now might be a good time to revisit the
> idea of synching the absolute values.

There is nothing you can ever be sure about, but I doubt that the ADJUST
MSR is going to vanish.

> I remember having read somewhere that this series might get backported
> to longterm kernels, what is the status on that?

No idea.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [3/8] x86/tsc: Store and check TSC ADJUST MSR
  2017-01-30 10:20     ` Thomas Gleixner
@ 2017-01-30 12:13       ` Henning Schild
  2017-01-30 13:04         ` Thomas Gleixner
  0 siblings, 1 reply; 36+ messages in thread
From: Henning Schild @ 2017-01-30 12:13 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Borislav Petkov, Yinghai Lu

On Mon, 30 Jan 2017 11:20:25 +0100
Thomas Gleixner <tglx@linutronix.de> wrote:

> Henning,
> 
> On Fri, 27 Jan 2017, Henning Schild wrote:
> > 
> > did you by any chance look into TSC synchronization by adjusting the
> > absolute value (MSR_IA32_TSC) as well? As far as i have seen Linux
> > did that a long time ago and eventually it was stopped because it
> > caused more harm than good.  
> 
> I was involved in both developing the TSC sync patches and ripping
> them out again.
> 
> The problem with writing TSC directly is that you really _CANNOT_
> reliably handle run-time differences and SMI/NMI induced deltas. With
> the adjust MRS it's a halfways sane thing to do, except for the
> brokeness of that botch job vs. the TSC deadline timer.
> 
> > The ADJUST MSR offers an easy way to synchronize, still taking care
> > of all the special cases resulted in an 8-patch series. Synching
> > without that using the absolute value is likely much harder, but
> > that series might be a good foundation.  
> 
> I'm not even thinking about bringing the pure TSC based sync back.
> 
> > The big question is whether we can rely on all future CPUs to 
> > support that MSR. Do "new MSRs" disappear again at some point? If we
> > can not rely on the ADJUST MSR, now might be a good time to revisit
> > the idea of synching the absolute values.  
> 
> There is nothing you can ever be sure about, but I doubt that the
> ADJUST MSR is going to vanish.

That sounds very much like i expected. But assuming the MSR has come to
stay, the problem should be solved for recent kernels and Intel-CPUs.

The AMD-Manual from 12/16 does not mention that MSR. I do not have
access to an AMD machine. But i can only assume that bigger machines
also suffer from async TSCs and basically all fall back to HPET.

> > I remember having read somewhere that this series might get
> > backported to longterm kernels, what is the status on that?  
> 
> No idea.
> 
> Thanks,

Thanks a lot!
Henning

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [3/8] x86/tsc: Store and check TSC ADJUST MSR
  2017-01-30 12:13       ` Henning Schild
@ 2017-01-30 13:04         ` Thomas Gleixner
  2017-01-30 15:58           ` Borislav Petkov
  0 siblings, 1 reply; 36+ messages in thread
From: Thomas Gleixner @ 2017-01-30 13:04 UTC (permalink / raw)
  To: Henning Schild
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Borislav Petkov, Yinghai Lu

On Mon, 30 Jan 2017, Henning Schild wrote:
> On Mon, 30 Jan 2017 11:20:25 +0100
> Thomas Gleixner <tglx@linutronix.de> wrote:
> > There is nothing you can ever be sure about, but I doubt that the
> > ADJUST MSR is going to vanish.
> 
> That sounds very much like i expected. But assuming the MSR has come to
> stay, the problem should be solved for recent kernels and Intel-CPUs.
> 
> The AMD-Manual from 12/16 does not mention that MSR. I do not have
> access to an AMD machine. But i can only assume that bigger machines
> also suffer from async TSCs and basically all fall back to HPET.

Borislav?

^ permalink raw reply	[flat|nested] 36+ messages in thread

* Re: [3/8] x86/tsc: Store and check TSC ADJUST MSR
  2017-01-30 13:04         ` Thomas Gleixner
@ 2017-01-30 15:58           ` Borislav Petkov
  0 siblings, 0 replies; 36+ messages in thread
From: Borislav Petkov @ 2017-01-30 15:58 UTC (permalink / raw)
  To: Thomas Gleixner, Henning Schild
  Cc: LKML, Ingo Molnar, Peter Zijlstra, Yinghai Lu

On Mon, Jan 30, 2017 at 02:04:46PM +0100, Thomas Gleixner wrote:
> > The AMD-Manual from 12/16 does not mention that MSR. I do not have
> > access to an AMD machine. But i can only assume that bigger machines
> > also suffer from async TSCs and basically all fall back to HPET.
> 
> Borislav?

So far, all AMD machines starting from Barcelona (F10h) onwards should
have a stable TSC. If you've seen some discrepancies, make sure to
let us know. The bigger machines I have access to are all ok wrt
synchronized TSCs but I won't be surprised if someone proves me wrong.

Of course, if dumb firmware decides to fiddle with the TSC, then dumb
firmware should be fixed.

Wrt TSC adjust MSR, it should be probably best to take this up with AMD
engineering.

HTH.

-- 
Regards/Gruss,
    Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.

^ permalink raw reply	[flat|nested] 36+ messages in thread

end of thread, other threads:[~2017-01-30 15:59 UTC | newest]

Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-11-19 13:47 [patch 0/8] x86/tsc: Utilize TSC_ADJUST MSR Thomas Gleixner
2016-11-19 13:47 ` [patch 1/8] x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() Thomas Gleixner
2016-11-29 16:46   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
2016-11-29 18:28   ` tip-bot for Thomas Gleixner
2016-11-19 13:47 ` [patch 2/8] x86/tsc: Detect random warps Thomas Gleixner
2016-11-29 16:46   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
2016-11-29 18:28   ` tip-bot for Thomas Gleixner
2016-11-19 13:47 ` [patch 3/8] x86/tsc: Store and check TSC ADJUST MSR Thomas Gleixner
2016-11-29 16:47   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
2016-11-29 18:29   ` tip-bot for Thomas Gleixner
2017-01-27 13:36   ` [3/8] " Henning Schild
2017-01-30 10:20     ` Thomas Gleixner
2017-01-30 12:13       ` Henning Schild
2017-01-30 13:04         ` Thomas Gleixner
2017-01-30 15:58           ` Borislav Petkov
2016-11-19 13:47 ` [patch 4/8] x86/tsc: Verify TSC_ADJUST from idle Thomas Gleixner
2016-11-20 13:10   ` Peter Zijlstra
2016-11-21  8:16     ` Thomas Gleixner
2016-11-21 11:06       ` Peter Zijlstra
2016-11-29  9:17         ` Thomas Gleixner
2016-11-29 14:25           ` Ingo Molnar
2016-11-21 22:57   ` Andi Kleen
2016-11-29 16:47   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
2016-11-29 18:29   ` tip-bot for Thomas Gleixner
2016-11-19 13:47 ` [patch 5/8] x86/tsc: Sync test only for the first cpu in a package Thomas Gleixner
2016-11-29 16:48   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
2016-11-29 18:30   ` tip-bot for Thomas Gleixner
2016-11-19 13:47 ` [patch 6/8] x86/tsc: Move sync cleanup to a safe place Thomas Gleixner
2016-11-29 16:48   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
2016-11-29 18:30   ` tip-bot for Thomas Gleixner
2016-11-19 13:47 ` [patch 7/8] x86/tsc: Prepare warp test for TSC adjustment Thomas Gleixner
2016-11-29 16:49   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
2016-11-29 18:31   ` tip-bot for Thomas Gleixner
2016-11-19 13:47 ` [patch 8/8] x86/tsc: Try to adjust TSC if sync test fails Thomas Gleixner
2016-11-29 16:49   ` [tip:x86/timers] " tip-bot for Thomas Gleixner
2016-11-29 18:31   ` tip-bot for Thomas Gleixner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).