All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT pull] x86/timers for 4.10
@ 2016-12-18 20:06 Thomas Gleixner
  2017-02-07  1:31 ` Olof Johansson
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2016-12-18 20:06 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: LKML, Andrew Morton, Ingo Molnar, H. Peter Anvin

Linus,

please pull the latest x86-timers-for-linus git tree from:

   git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-timers-for-linus

This is the last functional update from the tip tree for 4.10. It got
delayed due to a newly reported and anlyzed variant of BIOS bug and the
resulting wreckage:

 - Seperation of TSC being marked realiable and the fact that the platform
   provides the TSC frequency via CPUID/MSRs and making use for it for
   GOLDMONT.

 - TSC adjust MSR validation and sanitizing:

   The TSC adjust MSR contains the offset to the hardware counter. The sum
   of the adjust MSR and the counter is the TSC value which is read via
   RDTSC.

   On at least two machines from different vendors the BIOS sets the TSC
   adjust MSR to negative values. This happens on cold and warm boot. While
   on cold boot the offset is a few milliseconds, on warm boot it basically
   compensates the power on time of the system. The BIOSes are not even
   using the adjust MSR to set all CPUs in the package to the same
   offset. The offsets are different which renders the TSC unusable,

   What's worse is that the TSC deadline timer has a HW feature^Wbug. It
   malfunctions when the TSC adjust value is negative or greater equal
   0x80000000 resulting in silent boot failures, hard lockups or non firing
   timers. This looks like some hardware internal 32/64bit issue with a
   sign extension problem. Intel has been silent so far on the issue.

   The update contains sanity checks and keeps the adjust register within
   working limits and in sync on the package.

   As it looks like this disease is spreading via BIOS crapware, we need to
   address this urgently as the boot failures are hard to debug for users.


Thanks,

	tglx

------------------>
Bin Gao (4):
      x86/tsc: Add X86_FEATURE_TSC_KNOWN_FREQ flag
      x86/tsc: Mark TSC frequency determined by CPUID as known
      x86/tsc: Mark Intel ATOM_GOLDMONT TSC reliable
      x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs

Thomas Gleixner (15):
      x86/tsc: Finalize the split of the TSC_RELIABLE flag
      x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art()
      x86/tsc: Detect random warps
      x86/tsc: Store and check TSC ADJUST MSR
      x86/tsc: Verify TSC_ADJUST from idle
      x86/tsc: Sync test only for the first cpu in a package
      x86/tsc: Move sync cleanup to a safe place
      x86/tsc: Prepare warp test for TSC adjustment
      x86/tsc: Try to adjust TSC if sync test fails
      x86/tsc: Fix broken CONFIG_X86_TSC=n build
      x86/tsc: Validate cpumask pointer before accessing it
      x86/tsc: Validate TSC_ADJUST after resume
      x86/tsc: Force TSC_ADJUST register to value >= zero
      x86/tsc: Annotate printouts as firmware bug
      x86/tsc: Limit the adjust value further


 arch/x86/include/asm/cpufeatures.h  |   1 +
 arch/x86/include/asm/tsc.h          |   9 ++
 arch/x86/kernel/Makefile            |   2 +-
 arch/x86/kernel/process.c           |   1 +
 arch/x86/kernel/tsc.c               |  42 ++++--
 arch/x86/kernel/tsc_msr.c           |  19 +++
 arch/x86/kernel/tsc_sync.c          | 290 ++++++++++++++++++++++++++++++++++--
 arch/x86/platform/intel-mid/mfld.c  |   9 +-
 arch/x86/platform/intel-mid/mrfld.c |   8 +-
 arch/x86/power/cpu.c                |   1 +
 10 files changed, 355 insertions(+), 27 deletions(-)

diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
index a39629206864..7f6a5f88d5ae 100644
--- a/arch/x86/include/asm/cpufeatures.h
+++ b/arch/x86/include/asm/cpufeatures.h
@@ -106,6 +106,7 @@
 #define X86_FEATURE_APERFMPERF	( 3*32+28) /* APERFMPERF */
 #define X86_FEATURE_EAGER_FPU	( 3*32+29) /* "eagerfpu" Non lazy FPU restore */
 #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */
+#define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */
 
 /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
 #define X86_FEATURE_XMM3	( 4*32+ 0) /* "pni" SSE-3 */
diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
index 33b6365c22fe..abb1fdcc545a 100644
--- a/arch/x86/include/asm/tsc.h
+++ b/arch/x86/include/asm/tsc.h
@@ -45,8 +45,17 @@ extern int tsc_clocksource_reliable;
  * Boot-time check whether the TSCs are synchronized across
  * all CPUs/cores:
  */
+#ifdef CONFIG_X86_TSC
+extern bool tsc_store_and_check_tsc_adjust(bool bootcpu);
+extern void tsc_verify_tsc_adjust(bool resume);
 extern void check_tsc_sync_source(int cpu);
 extern void check_tsc_sync_target(void);
+#else
+static inline bool tsc_store_and_check_tsc_adjust(bool bootcpu) { return false; }
+static inline void tsc_verify_tsc_adjust(bool resume) { }
+static inline void check_tsc_sync_source(int cpu) { }
+static inline void check_tsc_sync_target(void) { }
+#endif
 
 extern int notsc_setup(char *);
 extern void tsc_save_sched_clock_state(void);
diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
index 79076d75bdbf..c0ac317dd372 100644
--- a/arch/x86/kernel/Makefile
+++ b/arch/x86/kernel/Makefile
@@ -75,7 +75,7 @@ apm-y				:= apm_32.o
 obj-$(CONFIG_APM)		+= apm.o
 obj-$(CONFIG_SMP)		+= smp.o
 obj-$(CONFIG_SMP)		+= smpboot.o
-obj-$(CONFIG_SMP)		+= tsc_sync.o
+obj-$(CONFIG_X86_TSC)		+= tsc_sync.o
 obj-$(CONFIG_SMP)		+= setup_percpu.o
 obj-$(CONFIG_X86_MPPARSE)	+= mpparse.o
 obj-y				+= apic/
diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
index 0888a879120f..a67e0f0cdaab 100644
--- a/arch/x86/kernel/process.c
+++ b/arch/x86/kernel/process.c
@@ -277,6 +277,7 @@ void exit_idle(void)
 
 void arch_cpu_idle_enter(void)
 {
+	tsc_verify_tsc_adjust(false);
 	local_touch_nmi();
 	enter_idle();
 }
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 46b2f41f8b05..0aed75a1e31b 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -702,6 +702,20 @@ unsigned long native_calibrate_tsc(void)
 		}
 	}
 
+	/*
+	 * TSC frequency determined by CPUID is a "hardware reported"
+	 * frequency and is the most accurate one so far we have. This
+	 * is considered a known frequency.
+	 */
+	setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+
+	/*
+	 * For Atom SoCs TSC is the only reliable clocksource.
+	 * Mark TSC reliable so no watchdog on it.
+	 */
+	if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT)
+		setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
+
 	return crystal_khz * ebx_numerator / eax_denominator;
 }
 
@@ -1043,18 +1057,20 @@ static void detect_art(void)
 	if (boot_cpu_data.cpuid_level < ART_CPUID_LEAF)
 		return;
 
-	cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
-	      &art_to_tsc_numerator, unused, unused+1);
-
-	/* Don't enable ART in a VM, non-stop TSC required */
+	/* Don't enable ART in a VM, non-stop TSC and TSC_ADJUST required */
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR) ||
 	    !boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ||
-	    art_to_tsc_denominator < ART_MIN_DENOMINATOR)
+	    !boot_cpu_has(X86_FEATURE_TSC_ADJUST))
 		return;
 
-	if (rdmsrl_safe(MSR_IA32_TSC_ADJUST, &art_to_tsc_offset))
+	cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
+	      &art_to_tsc_numerator, unused, unused+1);
+
+	if (art_to_tsc_denominator < ART_MIN_DENOMINATOR)
 		return;
 
+	rdmsrl(MSR_IA32_TSC_ADJUST, art_to_tsc_offset);
+
 	/* Make this sticky over multiple CPU init calls */
 	setup_force_cpu_cap(X86_FEATURE_ART);
 }
@@ -1064,6 +1080,11 @@ static void detect_art(void)
 
 static struct clocksource clocksource_tsc;
 
+static void tsc_resume(struct clocksource *cs)
+{
+	tsc_verify_tsc_adjust(true);
+}
+
 /*
  * We used to compare the TSC to the cycle_last value in the clocksource
  * structure to avoid a nasty time-warp. This can be observed in a
@@ -1096,6 +1117,7 @@ static struct clocksource clocksource_tsc = {
 	.flags                  = CLOCK_SOURCE_IS_CONTINUOUS |
 				  CLOCK_SOURCE_MUST_VERIFY,
 	.archdata               = { .vclock_mode = VCLOCK_TSC },
+	.resume			= tsc_resume,
 };
 
 void mark_tsc_unstable(char *reason)
@@ -1283,10 +1305,10 @@ static int __init init_tsc_clocksource(void)
 		clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
 
 	/*
-	 * Trust the results of the earlier calibration on systems
-	 * exporting a reliable TSC.
+	 * When TSC frequency is known (retrieved via MSR or CPUID), we skip
+	 * the refined calibration and directly register it as a clocksource.
 	 */
-	if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE)) {
+	if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
 		clocksource_register_khz(&clocksource_tsc, tsc_khz);
 		return 0;
 	}
@@ -1363,6 +1385,8 @@ void __init tsc_init(void)
 
 	if (unsynchronized_tsc())
 		mark_tsc_unstable("TSCs unsynchronized");
+	else
+		tsc_store_and_check_tsc_adjust(true);
 
 	check_system_tsc_reliable();
 
diff --git a/arch/x86/kernel/tsc_msr.c b/arch/x86/kernel/tsc_msr.c
index 0fe720d64fef..19afdbd7d0a7 100644
--- a/arch/x86/kernel/tsc_msr.c
+++ b/arch/x86/kernel/tsc_msr.c
@@ -100,5 +100,24 @@ unsigned long cpu_khz_from_msr(void)
 #ifdef CONFIG_X86_LOCAL_APIC
 	lapic_timer_frequency = (freq * 1000) / HZ;
 #endif
+
+	/*
+	 * TSC frequency determined by MSR is always considered "known"
+	 * because it is reported by HW.
+	 * Another fact is that on MSR capable platforms, PIT/HPET is
+	 * generally not available so calibration won't work at all.
+	 */
+	setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+
+	/*
+	 * Unfortunately there is no way for hardware to tell whether the
+	 * TSC is reliable.  We were told by silicon design team that TSC
+	 * on Atom SoCs are always "reliable". TSC is also the only
+	 * reliable clocksource on these SoCs (HPET is either not present
+	 * or not functional) so mark TSC reliable which removes the
+	 * requirement for a watchdog clocksource.
+	 */
+	setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
+
 	return res;
 }
diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
index 78083bf23ed1..d0db011051a5 100644
--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -14,18 +14,166 @@
  * ( The serial nature of the boot logic and the CPU hotplug lock
  *   protects against more than 2 CPUs entering this code. )
  */
+#include <linux/topology.h>
 #include <linux/spinlock.h>
 #include <linux/kernel.h>
 #include <linux/smp.h>
 #include <linux/nmi.h>
 #include <asm/tsc.h>
 
+struct tsc_adjust {
+	s64		bootval;
+	s64		adjusted;
+	unsigned long	nextcheck;
+	bool		warned;
+};
+
+static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
+
+void tsc_verify_tsc_adjust(bool resume)
+{
+	struct tsc_adjust *adj = this_cpu_ptr(&tsc_adjust);
+	s64 curval;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		return;
+
+	/* Rate limit the MSR check */
+	if (!resume && time_before(jiffies, adj->nextcheck))
+		return;
+
+	adj->nextcheck = jiffies + HZ;
+
+	rdmsrl(MSR_IA32_TSC_ADJUST, curval);
+	if (adj->adjusted == curval)
+		return;
+
+	/* Restore the original value */
+	wrmsrl(MSR_IA32_TSC_ADJUST, adj->adjusted);
+
+	if (!adj->warned || resume) {
+		pr_warn(FW_BUG "TSC ADJUST differs: CPU%u %lld --> %lld. Restoring\n",
+			smp_processor_id(), adj->adjusted, curval);
+		adj->warned = true;
+	}
+}
+
+static void tsc_sanitize_first_cpu(struct tsc_adjust *cur, s64 bootval,
+				   unsigned int cpu, bool bootcpu)
+{
+	/*
+	 * First online CPU in a package stores the boot value in the
+	 * adjustment value. This value might change later via the sync
+	 * mechanism. If that fails we still can yell about boot values not
+	 * being consistent.
+	 *
+	 * On the boot cpu we just force set the ADJUST value to 0 if it's
+	 * non zero. We don't do that on non boot cpus because physical
+	 * hotplug should have set the ADJUST register to a value > 0 so
+	 * the TSC is in sync with the already running cpus.
+	 *
+	 * But we always force positive ADJUST values. Otherwise the TSC
+	 * deadline timer creates an interrupt storm. We also have to
+	 * prevent values > 0x7FFFFFFF as those wreckage the timer as well.
+	 */
+	if ((bootcpu && bootval != 0) || (!bootcpu && bootval < 0) ||
+	    (bootval > 0x7FFFFFFF)) {
+		pr_warn(FW_BUG "TSC ADJUST: CPU%u: %lld force to 0\n", cpu,
+			bootval);
+		wrmsrl(MSR_IA32_TSC_ADJUST, 0);
+		bootval = 0;
+	}
+	cur->adjusted = bootval;
+}
+
+#ifndef CONFIG_SMP
+bool __init tsc_store_and_check_tsc_adjust(bool bootcpu)
+{
+	struct tsc_adjust *cur = this_cpu_ptr(&tsc_adjust);
+	s64 bootval;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		return false;
+
+	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
+	cur->bootval = bootval;
+	cur->nextcheck = jiffies + HZ;
+	tsc_sanitize_first_cpu(cur, bootval, smp_processor_id(), bootcpu);
+	return false;
+}
+
+#else /* !CONFIG_SMP */
+
+/*
+ * Store and check the TSC ADJUST MSR if available
+ */
+bool tsc_store_and_check_tsc_adjust(bool bootcpu)
+{
+	struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
+	unsigned int refcpu, cpu = smp_processor_id();
+	struct cpumask *mask;
+	s64 bootval;
+
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		return false;
+
+	rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
+	cur->bootval = bootval;
+	cur->nextcheck = jiffies + HZ;
+	cur->warned = false;
+
+	/*
+	 * Check whether this CPU is the first in a package to come up. In
+	 * this case do not check the boot value against another package
+	 * because the new package might have been physically hotplugged,
+	 * where TSC_ADJUST is expected to be different. When called on the
+	 * boot CPU topology_core_cpumask() might not be available yet.
+	 */
+	mask = topology_core_cpumask(cpu);
+	refcpu = mask ? cpumask_any_but(mask, cpu) : nr_cpu_ids;
+
+	if (refcpu >= nr_cpu_ids) {
+		tsc_sanitize_first_cpu(cur, bootval, smp_processor_id(),
+				       bootcpu);
+		return false;
+	}
+
+	ref = per_cpu_ptr(&tsc_adjust, refcpu);
+	/*
+	 * Compare the boot value and complain if it differs in the
+	 * package.
+	 */
+	if (bootval != ref->bootval) {
+		pr_warn(FW_BUG "TSC ADJUST differs: Reference CPU%u: %lld CPU%u: %lld\n",
+			refcpu, ref->bootval, cpu, bootval);
+	}
+	/*
+	 * The TSC_ADJUST values in a package must be the same. If the boot
+	 * value on this newly upcoming CPU differs from the adjustment
+	 * value of the already online CPU in this package, set it to that
+	 * adjusted value.
+	 */
+	if (bootval != ref->adjusted) {
+		pr_warn("TSC ADJUST synchronize: Reference CPU%u: %lld CPU%u: %lld\n",
+			refcpu, ref->adjusted, cpu, bootval);
+		cur->adjusted = ref->adjusted;
+		wrmsrl(MSR_IA32_TSC_ADJUST, ref->adjusted);
+	}
+	/*
+	 * We have the TSCs forced to be in sync on this package. Skip sync
+	 * test:
+	 */
+	return true;
+}
+
 /*
  * Entry/exit counters that make sure that both CPUs
  * run the measurement code at once:
  */
 static atomic_t start_count;
 static atomic_t stop_count;
+static atomic_t skip_test;
+static atomic_t test_runs;
 
 /*
  * We use a raw spinlock in this exceptional case, because
@@ -37,15 +185,16 @@ static arch_spinlock_t sync_lock = __ARCH_SPIN_LOCK_UNLOCKED;
 static cycles_t last_tsc;
 static cycles_t max_warp;
 static int nr_warps;
+static int random_warps;
 
 /*
  * TSC-warp measurement loop running on both CPUs.  This is not called
  * if there is no TSC.
  */
-static void check_tsc_warp(unsigned int timeout)
+static cycles_t check_tsc_warp(unsigned int timeout)
 {
-	cycles_t start, now, prev, end;
-	int i;
+	cycles_t start, now, prev, end, cur_max_warp = 0;
+	int i, cur_warps = 0;
 
 	start = rdtsc_ordered();
 	/*
@@ -85,13 +234,22 @@ static void check_tsc_warp(unsigned int timeout)
 		if (unlikely(prev > now)) {
 			arch_spin_lock(&sync_lock);
 			max_warp = max(max_warp, prev - now);
+			cur_max_warp = max_warp;
+			/*
+			 * Check whether this bounces back and forth. Only
+			 * one CPU should observe time going backwards.
+			 */
+			if (cur_warps != nr_warps)
+				random_warps++;
 			nr_warps++;
+			cur_warps = nr_warps;
 			arch_spin_unlock(&sync_lock);
 		}
 	}
 	WARN(!(now-start),
 		"Warning: zero tsc calibration delta: %Ld [max: %Ld]\n",
 			now-start, end-start);
+	return cur_max_warp;
 }
 
 /*
@@ -136,15 +294,26 @@ void check_tsc_sync_source(int cpu)
 	}
 
 	/*
-	 * Reset it - in case this is a second bootup:
+	 * Set the maximum number of test runs to
+	 *  1 if the CPU does not provide the TSC_ADJUST MSR
+	 *  3 if the MSR is available, so the target can try to adjust
 	 */
-	atomic_set(&stop_count, 0);
-
+	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
+		atomic_set(&test_runs, 1);
+	else
+		atomic_set(&test_runs, 3);
+retry:
 	/*
-	 * Wait for the target to arrive:
+	 * Wait for the target to start or to skip the test:
 	 */
-	while (atomic_read(&start_count) != cpus-1)
+	while (atomic_read(&start_count) != cpus - 1) {
+		if (atomic_read(&skip_test) > 0) {
+			atomic_set(&skip_test, 0);
+			return;
+		}
 		cpu_relax();
+	}
+
 	/*
 	 * Trigger the target to continue into the measurement too:
 	 */
@@ -155,21 +324,35 @@ void check_tsc_sync_source(int cpu)
 	while (atomic_read(&stop_count) != cpus-1)
 		cpu_relax();
 
-	if (nr_warps) {
+	/*
+	 * If the test was successful set the number of runs to zero and
+	 * stop. If not, decrement the number of runs an check if we can
+	 * retry. In case of random warps no retry is attempted.
+	 */
+	if (!nr_warps) {
+		atomic_set(&test_runs, 0);
+
+		pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
+			smp_processor_id(), cpu);
+
+	} else if (atomic_dec_and_test(&test_runs) || random_warps) {
+		/* Force it to 0 if random warps brought us here */
+		atomic_set(&test_runs, 0);
+
 		pr_warning("TSC synchronization [CPU#%d -> CPU#%d]:\n",
 			smp_processor_id(), cpu);
 		pr_warning("Measured %Ld cycles TSC warp between CPUs, "
 			   "turning off TSC clock.\n", max_warp);
+		if (random_warps)
+			pr_warning("TSC warped randomly between CPUs\n");
 		mark_tsc_unstable("check_tsc_sync_source failed");
-	} else {
-		pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
-			smp_processor_id(), cpu);
 	}
 
 	/*
 	 * Reset it - just in case we boot another CPU later:
 	 */
 	atomic_set(&start_count, 0);
+	random_warps = 0;
 	nr_warps = 0;
 	max_warp = 0;
 	last_tsc = 0;
@@ -178,6 +361,12 @@ void check_tsc_sync_source(int cpu)
 	 * Let the target continue with the bootup:
 	 */
 	atomic_inc(&stop_count);
+
+	/*
+	 * Retry, if there is a chance to do so.
+	 */
+	if (atomic_read(&test_runs) > 0)
+		goto retry;
 }
 
 /*
@@ -185,6 +374,9 @@ void check_tsc_sync_source(int cpu)
  */
 void check_tsc_sync_target(void)
 {
+	struct tsc_adjust *cur = this_cpu_ptr(&tsc_adjust);
+	unsigned int cpu = smp_processor_id();
+	cycles_t cur_max_warp, gbl_max_warp;
 	int cpus = 2;
 
 	/* Also aborts if there is no TSC. */
@@ -192,6 +384,16 @@ void check_tsc_sync_target(void)
 		return;
 
 	/*
+	 * Store, verify and sanitize the TSC adjust register. If
+	 * successful skip the test.
+	 */
+	if (tsc_store_and_check_tsc_adjust(false)) {
+		atomic_inc(&skip_test);
+		return;
+	}
+
+retry:
+	/*
 	 * Register this CPU's participation and wait for the
 	 * source CPU to start the measurement:
 	 */
@@ -199,7 +401,12 @@ void check_tsc_sync_target(void)
 	while (atomic_read(&start_count) != cpus)
 		cpu_relax();
 
-	check_tsc_warp(loop_timeout(smp_processor_id()));
+	cur_max_warp = check_tsc_warp(loop_timeout(cpu));
+
+	/*
+	 * Store the maximum observed warp value for a potential retry:
+	 */
+	gbl_max_warp = max_warp;
 
 	/*
 	 * Ok, we are done:
@@ -211,4 +418,61 @@ void check_tsc_sync_target(void)
 	 */
 	while (atomic_read(&stop_count) != cpus)
 		cpu_relax();
+
+	/*
+	 * Reset it for the next sync test:
+	 */
+	atomic_set(&stop_count, 0);
+
+	/*
+	 * Check the number of remaining test runs. If not zero, the test
+	 * failed and a retry with adjusted TSC is possible. If zero the
+	 * test was either successful or failed terminally.
+	 */
+	if (!atomic_read(&test_runs))
+		return;
+
+	/*
+	 * If the warp value of this CPU is 0, then the other CPU
+	 * observed time going backwards so this TSC was ahead and
+	 * needs to move backwards.
+	 */
+	if (!cur_max_warp)
+		cur_max_warp = -gbl_max_warp;
+
+	/*
+	 * Add the result to the previous adjustment value.
+	 *
+	 * The adjustement value is slightly off by the overhead of the
+	 * sync mechanism (observed values are ~200 TSC cycles), but this
+	 * really depends on CPU, node distance and frequency. So
+	 * compensating for this is hard to get right. Experiments show
+	 * that the warp is not longer detectable when the observed warp
+	 * value is used. In the worst case the adjustment needs to go
+	 * through a 3rd run for fine tuning.
+	 */
+	cur->adjusted += cur_max_warp;
+
+	/*
+	 * TSC deadline timer stops working or creates an interrupt storm
+	 * with adjust values < 0 and > x07ffffff.
+	 *
+	 * To allow adjust values > 0x7FFFFFFF we need to disable the
+	 * deadline timer and use the local APIC timer, but that requires
+	 * more intrusive changes and we do not have any useful information
+	 * from Intel about the underlying HW wreckage yet.
+	 */
+	if (cur->adjusted < 0)
+		cur->adjusted = 0;
+	if (cur->adjusted > 0x7FFFFFFF)
+		cur->adjusted = 0x7FFFFFFF;
+
+	pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp. Adjust: %lld\n",
+		cpu, cur_max_warp, cur->adjusted);
+
+	wrmsrl(MSR_IA32_TSC_ADJUST, cur->adjusted);
+	goto retry;
+
 }
+
+#endif /* CONFIG_SMP */
diff --git a/arch/x86/platform/intel-mid/mfld.c b/arch/x86/platform/intel-mid/mfld.c
index 1eb47b6298c2..e793fe509971 100644
--- a/arch/x86/platform/intel-mid/mfld.c
+++ b/arch/x86/platform/intel-mid/mfld.c
@@ -49,8 +49,13 @@ static unsigned long __init mfld_calibrate_tsc(void)
 	fast_calibrate = ratio * fsb;
 	pr_debug("read penwell tsc %lu khz\n", fast_calibrate);
 	lapic_timer_frequency = fsb * 1000 / HZ;
-	/* mark tsc clocksource as reliable */
-	set_cpu_cap(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE);
+
+	/*
+	 * TSC on Intel Atom SoCs is reliable and of known frequency.
+	 * See tsc_msr.c for details.
+	 */
+	setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+	setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
 
 	return fast_calibrate;
 }
diff --git a/arch/x86/platform/intel-mid/mrfld.c b/arch/x86/platform/intel-mid/mrfld.c
index 59253db41bbc..e0607c77a1bd 100644
--- a/arch/x86/platform/intel-mid/mrfld.c
+++ b/arch/x86/platform/intel-mid/mrfld.c
@@ -78,8 +78,12 @@ static unsigned long __init tangier_calibrate_tsc(void)
 	pr_debug("Setting lapic_timer_frequency = %d\n",
 			lapic_timer_frequency);
 
-	/* mark tsc clocksource as reliable */
-	set_cpu_cap(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE);
+	/*
+	 * TSC on Intel Atom SoCs is reliable and of known frequency.
+	 * See tsc_msr.c for details.
+	 */
+	setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
+	setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
 
 	return fast_calibrate;
 }
diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
index 53cace2ec0e2..66ade16c7693 100644
--- a/arch/x86/power/cpu.c
+++ b/arch/x86/power/cpu.c
@@ -252,6 +252,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
 	fix_processor_context();
 
 	do_fpu_end();
+	tsc_verify_tsc_adjust(true);
 	x86_platform.restore_sched_clock_state();
 	mtrr_bp_restore();
 	perf_restore_debug_store();

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [GIT pull] x86/timers for 4.10
  2016-12-18 20:06 [GIT pull] x86/timers for 4.10 Thomas Gleixner
@ 2017-02-07  1:31 ` Olof Johansson
  2017-02-08 11:44   ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Olof Johansson @ 2017-02-07  1:31 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, LKML, Andrew Morton, Ingo Molnar, H. Peter Anvin

Hi Thomas,

I just now updated my build box from 4.9-rc to 4.10-rc, and picked up
these changes. My machine went from doing:

[    0.000000] tsc: Fast TSC calibration using PIT
[    0.060669] TSC deadline timer enabled
[    0.142701] TSC synchronization [CPU#0 -> CPU#1]:
[    0.142704] Measured 3127756 cycles TSC warp between CPUs, turning
off TSC clock.
[    0.142708] tsc: Marking TSC unstable due to check_tsc_sync_source failed

To:

[    0.000000] clocksource: hpet: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 133484882848 ns
[    0.000000] hpet clockevent registered
[    0.000000] tsc: Fast TSC calibration using PIT
[    0.000000] tsc: Detected 2793.624 MHz processor
[    0.000000] [Firmware Bug]: TSC ADJUST: CPU0: -6495898515190607 force to 0
[2325258.699535] Calibrating delay loop (skipped), value calculated
using timer frequency.. 5587.24 BogoMIPS (lpj=11174496)
[2325258.699537] pid_max: default: 32768 minimum: 301
[... SMP bringup and for each CPU:]
[    0.177102] [Firmware Bug]: TSC ADJUST differs: Reference CPU0:
-6495898515190607 CPU1: -6495898517158354
[    0.177104] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -6495898517158354
[2325258.877496]   #2
[    0.257232] [Firmware Bug]: TSC ADJUST differs: Reference CPU0:
-6495898515190607 CPU2: -6495898516849701
[    0.257234] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -6495898516849701
[2325258.957514]   #3

(Once SMP bringup is done, system settles down at the 232525x printk timestamps)

...

So, a couple of obvious notes:

1) Timestamp jumps around during SMP bringup.

2) Timestamp jumps forward a lot. That timestamp is ~26 days, which is
likely the last cold boot of the system, similar to the original
reports.

I do find it somewhat annoying when printk timestamps aren't 0-based
at boot, but can cope with it. Not sure if it's intended behavior
though? And the jumping around definitely seems to not be.


If someone cares, hardware is a Dell T7810 with 2x E5-2663 v3, BIOS
date 03/09/2016.


-Olof

On Sun, Dec 18, 2016 at 12:06 PM, Thomas Gleixner <tglx@linutronix.de> wrote:
> Linus,
>
> please pull the latest x86-timers-for-linus git tree from:
>
>    git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-timers-for-linus
>
> This is the last functional update from the tip tree for 4.10. It got
> delayed due to a newly reported and anlyzed variant of BIOS bug and the
> resulting wreckage:
>
>  - Seperation of TSC being marked realiable and the fact that the platform
>    provides the TSC frequency via CPUID/MSRs and making use for it for
>    GOLDMONT.
>
>  - TSC adjust MSR validation and sanitizing:
>
>    The TSC adjust MSR contains the offset to the hardware counter. The sum
>    of the adjust MSR and the counter is the TSC value which is read via
>    RDTSC.
>
>    On at least two machines from different vendors the BIOS sets the TSC
>    adjust MSR to negative values. This happens on cold and warm boot. While
>    on cold boot the offset is a few milliseconds, on warm boot it basically
>    compensates the power on time of the system. The BIOSes are not even
>    using the adjust MSR to set all CPUs in the package to the same
>    offset. The offsets are different which renders the TSC unusable,
>
>    What's worse is that the TSC deadline timer has a HW feature^Wbug. It
>    malfunctions when the TSC adjust value is negative or greater equal
>    0x80000000 resulting in silent boot failures, hard lockups or non firing
>    timers. This looks like some hardware internal 32/64bit issue with a
>    sign extension problem. Intel has been silent so far on the issue.
>
>    The update contains sanity checks and keeps the adjust register within
>    working limits and in sync on the package.
>
>    As it looks like this disease is spreading via BIOS crapware, we need to
>    address this urgently as the boot failures are hard to debug for users.
>
>
> Thanks,
>
>         tglx
>
> ------------------>
> Bin Gao (4):
>       x86/tsc: Add X86_FEATURE_TSC_KNOWN_FREQ flag
>       x86/tsc: Mark TSC frequency determined by CPUID as known
>       x86/tsc: Mark Intel ATOM_GOLDMONT TSC reliable
>       x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs
>
> Thomas Gleixner (15):
>       x86/tsc: Finalize the split of the TSC_RELIABLE flag
>       x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art()
>       x86/tsc: Detect random warps
>       x86/tsc: Store and check TSC ADJUST MSR
>       x86/tsc: Verify TSC_ADJUST from idle
>       x86/tsc: Sync test only for the first cpu in a package
>       x86/tsc: Move sync cleanup to a safe place
>       x86/tsc: Prepare warp test for TSC adjustment
>       x86/tsc: Try to adjust TSC if sync test fails
>       x86/tsc: Fix broken CONFIG_X86_TSC=n build
>       x86/tsc: Validate cpumask pointer before accessing it
>       x86/tsc: Validate TSC_ADJUST after resume
>       x86/tsc: Force TSC_ADJUST register to value >= zero
>       x86/tsc: Annotate printouts as firmware bug
>       x86/tsc: Limit the adjust value further
>
>
>  arch/x86/include/asm/cpufeatures.h  |   1 +
>  arch/x86/include/asm/tsc.h          |   9 ++
>  arch/x86/kernel/Makefile            |   2 +-
>  arch/x86/kernel/process.c           |   1 +
>  arch/x86/kernel/tsc.c               |  42 ++++--
>  arch/x86/kernel/tsc_msr.c           |  19 +++
>  arch/x86/kernel/tsc_sync.c          | 290 ++++++++++++++++++++++++++++++++++--
>  arch/x86/platform/intel-mid/mfld.c  |   9 +-
>  arch/x86/platform/intel-mid/mrfld.c |   8 +-
>  arch/x86/power/cpu.c                |   1 +
>  10 files changed, 355 insertions(+), 27 deletions(-)
>
> diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h
> index a39629206864..7f6a5f88d5ae 100644
> --- a/arch/x86/include/asm/cpufeatures.h
> +++ b/arch/x86/include/asm/cpufeatures.h
> @@ -106,6 +106,7 @@
>  #define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */
>  #define X86_FEATURE_EAGER_FPU  ( 3*32+29) /* "eagerfpu" Non lazy FPU restore */
>  #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */
> +#define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */
>
>  /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */
>  #define X86_FEATURE_XMM3       ( 4*32+ 0) /* "pni" SSE-3 */
> diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h
> index 33b6365c22fe..abb1fdcc545a 100644
> --- a/arch/x86/include/asm/tsc.h
> +++ b/arch/x86/include/asm/tsc.h
> @@ -45,8 +45,17 @@ extern int tsc_clocksource_reliable;
>   * Boot-time check whether the TSCs are synchronized across
>   * all CPUs/cores:
>   */
> +#ifdef CONFIG_X86_TSC
> +extern bool tsc_store_and_check_tsc_adjust(bool bootcpu);
> +extern void tsc_verify_tsc_adjust(bool resume);
>  extern void check_tsc_sync_source(int cpu);
>  extern void check_tsc_sync_target(void);
> +#else
> +static inline bool tsc_store_and_check_tsc_adjust(bool bootcpu) { return false; }
> +static inline void tsc_verify_tsc_adjust(bool resume) { }
> +static inline void check_tsc_sync_source(int cpu) { }
> +static inline void check_tsc_sync_target(void) { }
> +#endif
>
>  extern int notsc_setup(char *);
>  extern void tsc_save_sched_clock_state(void);
> diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile
> index 79076d75bdbf..c0ac317dd372 100644
> --- a/arch/x86/kernel/Makefile
> +++ b/arch/x86/kernel/Makefile
> @@ -75,7 +75,7 @@ apm-y                         := apm_32.o
>  obj-$(CONFIG_APM)              += apm.o
>  obj-$(CONFIG_SMP)              += smp.o
>  obj-$(CONFIG_SMP)              += smpboot.o
> -obj-$(CONFIG_SMP)              += tsc_sync.o
> +obj-$(CONFIG_X86_TSC)          += tsc_sync.o
>  obj-$(CONFIG_SMP)              += setup_percpu.o
>  obj-$(CONFIG_X86_MPPARSE)      += mpparse.o
>  obj-y                          += apic/
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> index 0888a879120f..a67e0f0cdaab 100644
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
> @@ -277,6 +277,7 @@ void exit_idle(void)
>
>  void arch_cpu_idle_enter(void)
>  {
> +       tsc_verify_tsc_adjust(false);
>         local_touch_nmi();
>         enter_idle();
>  }
> diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
> index 46b2f41f8b05..0aed75a1e31b 100644
> --- a/arch/x86/kernel/tsc.c
> +++ b/arch/x86/kernel/tsc.c
> @@ -702,6 +702,20 @@ unsigned long native_calibrate_tsc(void)
>                 }
>         }
>
> +       /*
> +        * TSC frequency determined by CPUID is a "hardware reported"
> +        * frequency and is the most accurate one so far we have. This
> +        * is considered a known frequency.
> +        */
> +       setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
> +
> +       /*
> +        * For Atom SoCs TSC is the only reliable clocksource.
> +        * Mark TSC reliable so no watchdog on it.
> +        */
> +       if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT)
> +               setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
> +
>         return crystal_khz * ebx_numerator / eax_denominator;
>  }
>
> @@ -1043,18 +1057,20 @@ static void detect_art(void)
>         if (boot_cpu_data.cpuid_level < ART_CPUID_LEAF)
>                 return;
>
> -       cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
> -             &art_to_tsc_numerator, unused, unused+1);
> -
> -       /* Don't enable ART in a VM, non-stop TSC required */
> +       /* Don't enable ART in a VM, non-stop TSC and TSC_ADJUST required */
>         if (boot_cpu_has(X86_FEATURE_HYPERVISOR) ||
>             !boot_cpu_has(X86_FEATURE_NONSTOP_TSC) ||
> -           art_to_tsc_denominator < ART_MIN_DENOMINATOR)
> +           !boot_cpu_has(X86_FEATURE_TSC_ADJUST))
>                 return;
>
> -       if (rdmsrl_safe(MSR_IA32_TSC_ADJUST, &art_to_tsc_offset))
> +       cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator,
> +             &art_to_tsc_numerator, unused, unused+1);
> +
> +       if (art_to_tsc_denominator < ART_MIN_DENOMINATOR)
>                 return;
>
> +       rdmsrl(MSR_IA32_TSC_ADJUST, art_to_tsc_offset);
> +
>         /* Make this sticky over multiple CPU init calls */
>         setup_force_cpu_cap(X86_FEATURE_ART);
>  }
> @@ -1064,6 +1080,11 @@ static void detect_art(void)
>
>  static struct clocksource clocksource_tsc;
>
> +static void tsc_resume(struct clocksource *cs)
> +{
> +       tsc_verify_tsc_adjust(true);
> +}
> +
>  /*
>   * We used to compare the TSC to the cycle_last value in the clocksource
>   * structure to avoid a nasty time-warp. This can be observed in a
> @@ -1096,6 +1117,7 @@ static struct clocksource clocksource_tsc = {
>         .flags                  = CLOCK_SOURCE_IS_CONTINUOUS |
>                                   CLOCK_SOURCE_MUST_VERIFY,
>         .archdata               = { .vclock_mode = VCLOCK_TSC },
> +       .resume                 = tsc_resume,
>  };
>
>  void mark_tsc_unstable(char *reason)
> @@ -1283,10 +1305,10 @@ static int __init init_tsc_clocksource(void)
>                 clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP;
>
>         /*
> -        * Trust the results of the earlier calibration on systems
> -        * exporting a reliable TSC.
> +        * When TSC frequency is known (retrieved via MSR or CPUID), we skip
> +        * the refined calibration and directly register it as a clocksource.
>          */
> -       if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE)) {
> +       if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) {
>                 clocksource_register_khz(&clocksource_tsc, tsc_khz);
>                 return 0;
>         }
> @@ -1363,6 +1385,8 @@ void __init tsc_init(void)
>
>         if (unsynchronized_tsc())
>                 mark_tsc_unstable("TSCs unsynchronized");
> +       else
> +               tsc_store_and_check_tsc_adjust(true);
>
>         check_system_tsc_reliable();
>
> diff --git a/arch/x86/kernel/tsc_msr.c b/arch/x86/kernel/tsc_msr.c
> index 0fe720d64fef..19afdbd7d0a7 100644
> --- a/arch/x86/kernel/tsc_msr.c
> +++ b/arch/x86/kernel/tsc_msr.c
> @@ -100,5 +100,24 @@ unsigned long cpu_khz_from_msr(void)
>  #ifdef CONFIG_X86_LOCAL_APIC
>         lapic_timer_frequency = (freq * 1000) / HZ;
>  #endif
> +
> +       /*
> +        * TSC frequency determined by MSR is always considered "known"
> +        * because it is reported by HW.
> +        * Another fact is that on MSR capable platforms, PIT/HPET is
> +        * generally not available so calibration won't work at all.
> +        */
> +       setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
> +
> +       /*
> +        * Unfortunately there is no way for hardware to tell whether the
> +        * TSC is reliable.  We were told by silicon design team that TSC
> +        * on Atom SoCs are always "reliable". TSC is also the only
> +        * reliable clocksource on these SoCs (HPET is either not present
> +        * or not functional) so mark TSC reliable which removes the
> +        * requirement for a watchdog clocksource.
> +        */
> +       setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
> +
>         return res;
>  }
> diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c
> index 78083bf23ed1..d0db011051a5 100644
> --- a/arch/x86/kernel/tsc_sync.c
> +++ b/arch/x86/kernel/tsc_sync.c
> @@ -14,18 +14,166 @@
>   * ( The serial nature of the boot logic and the CPU hotplug lock
>   *   protects against more than 2 CPUs entering this code. )
>   */
> +#include <linux/topology.h>
>  #include <linux/spinlock.h>
>  #include <linux/kernel.h>
>  #include <linux/smp.h>
>  #include <linux/nmi.h>
>  #include <asm/tsc.h>
>
> +struct tsc_adjust {
> +       s64             bootval;
> +       s64             adjusted;
> +       unsigned long   nextcheck;
> +       bool            warned;
> +};
> +
> +static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust);
> +
> +void tsc_verify_tsc_adjust(bool resume)
> +{
> +       struct tsc_adjust *adj = this_cpu_ptr(&tsc_adjust);
> +       s64 curval;
> +
> +       if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
> +               return;
> +
> +       /* Rate limit the MSR check */
> +       if (!resume && time_before(jiffies, adj->nextcheck))
> +               return;
> +
> +       adj->nextcheck = jiffies + HZ;
> +
> +       rdmsrl(MSR_IA32_TSC_ADJUST, curval);
> +       if (adj->adjusted == curval)
> +               return;
> +
> +       /* Restore the original value */
> +       wrmsrl(MSR_IA32_TSC_ADJUST, adj->adjusted);
> +
> +       if (!adj->warned || resume) {
> +               pr_warn(FW_BUG "TSC ADJUST differs: CPU%u %lld --> %lld. Restoring\n",
> +                       smp_processor_id(), adj->adjusted, curval);
> +               adj->warned = true;
> +       }
> +}
> +
> +static void tsc_sanitize_first_cpu(struct tsc_adjust *cur, s64 bootval,
> +                                  unsigned int cpu, bool bootcpu)
> +{
> +       /*
> +        * First online CPU in a package stores the boot value in the
> +        * adjustment value. This value might change later via the sync
> +        * mechanism. If that fails we still can yell about boot values not
> +        * being consistent.
> +        *
> +        * On the boot cpu we just force set the ADJUST value to 0 if it's
> +        * non zero. We don't do that on non boot cpus because physical
> +        * hotplug should have set the ADJUST register to a value > 0 so
> +        * the TSC is in sync with the already running cpus.
> +        *
> +        * But we always force positive ADJUST values. Otherwise the TSC
> +        * deadline timer creates an interrupt storm. We also have to
> +        * prevent values > 0x7FFFFFFF as those wreckage the timer as well.
> +        */
> +       if ((bootcpu && bootval != 0) || (!bootcpu && bootval < 0) ||
> +           (bootval > 0x7FFFFFFF)) {
> +               pr_warn(FW_BUG "TSC ADJUST: CPU%u: %lld force to 0\n", cpu,
> +                       bootval);
> +               wrmsrl(MSR_IA32_TSC_ADJUST, 0);
> +               bootval = 0;
> +       }
> +       cur->adjusted = bootval;
> +}
> +
> +#ifndef CONFIG_SMP
> +bool __init tsc_store_and_check_tsc_adjust(bool bootcpu)
> +{
> +       struct tsc_adjust *cur = this_cpu_ptr(&tsc_adjust);
> +       s64 bootval;
> +
> +       if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
> +               return false;
> +
> +       rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
> +       cur->bootval = bootval;
> +       cur->nextcheck = jiffies + HZ;
> +       tsc_sanitize_first_cpu(cur, bootval, smp_processor_id(), bootcpu);
> +       return false;
> +}
> +
> +#else /* !CONFIG_SMP */
> +
> +/*
> + * Store and check the TSC ADJUST MSR if available
> + */
> +bool tsc_store_and_check_tsc_adjust(bool bootcpu)
> +{
> +       struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust);
> +       unsigned int refcpu, cpu = smp_processor_id();
> +       struct cpumask *mask;
> +       s64 bootval;
> +
> +       if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
> +               return false;
> +
> +       rdmsrl(MSR_IA32_TSC_ADJUST, bootval);
> +       cur->bootval = bootval;
> +       cur->nextcheck = jiffies + HZ;
> +       cur->warned = false;
> +
> +       /*
> +        * Check whether this CPU is the first in a package to come up. In
> +        * this case do not check the boot value against another package
> +        * because the new package might have been physically hotplugged,
> +        * where TSC_ADJUST is expected to be different. When called on the
> +        * boot CPU topology_core_cpumask() might not be available yet.
> +        */
> +       mask = topology_core_cpumask(cpu);
> +       refcpu = mask ? cpumask_any_but(mask, cpu) : nr_cpu_ids;
> +
> +       if (refcpu >= nr_cpu_ids) {
> +               tsc_sanitize_first_cpu(cur, bootval, smp_processor_id(),
> +                                      bootcpu);
> +               return false;
> +       }
> +
> +       ref = per_cpu_ptr(&tsc_adjust, refcpu);
> +       /*
> +        * Compare the boot value and complain if it differs in the
> +        * package.
> +        */
> +       if (bootval != ref->bootval) {
> +               pr_warn(FW_BUG "TSC ADJUST differs: Reference CPU%u: %lld CPU%u: %lld\n",
> +                       refcpu, ref->bootval, cpu, bootval);
> +       }
> +       /*
> +        * The TSC_ADJUST values in a package must be the same. If the boot
> +        * value on this newly upcoming CPU differs from the adjustment
> +        * value of the already online CPU in this package, set it to that
> +        * adjusted value.
> +        */
> +       if (bootval != ref->adjusted) {
> +               pr_warn("TSC ADJUST synchronize: Reference CPU%u: %lld CPU%u: %lld\n",
> +                       refcpu, ref->adjusted, cpu, bootval);
> +               cur->adjusted = ref->adjusted;
> +               wrmsrl(MSR_IA32_TSC_ADJUST, ref->adjusted);
> +       }
> +       /*
> +        * We have the TSCs forced to be in sync on this package. Skip sync
> +        * test:
> +        */
> +       return true;
> +}
> +
>  /*
>   * Entry/exit counters that make sure that both CPUs
>   * run the measurement code at once:
>   */
>  static atomic_t start_count;
>  static atomic_t stop_count;
> +static atomic_t skip_test;
> +static atomic_t test_runs;
>
>  /*
>   * We use a raw spinlock in this exceptional case, because
> @@ -37,15 +185,16 @@ static arch_spinlock_t sync_lock = __ARCH_SPIN_LOCK_UNLOCKED;
>  static cycles_t last_tsc;
>  static cycles_t max_warp;
>  static int nr_warps;
> +static int random_warps;
>
>  /*
>   * TSC-warp measurement loop running on both CPUs.  This is not called
>   * if there is no TSC.
>   */
> -static void check_tsc_warp(unsigned int timeout)
> +static cycles_t check_tsc_warp(unsigned int timeout)
>  {
> -       cycles_t start, now, prev, end;
> -       int i;
> +       cycles_t start, now, prev, end, cur_max_warp = 0;
> +       int i, cur_warps = 0;
>
>         start = rdtsc_ordered();
>         /*
> @@ -85,13 +234,22 @@ static void check_tsc_warp(unsigned int timeout)
>                 if (unlikely(prev > now)) {
>                         arch_spin_lock(&sync_lock);
>                         max_warp = max(max_warp, prev - now);
> +                       cur_max_warp = max_warp;
> +                       /*
> +                        * Check whether this bounces back and forth. Only
> +                        * one CPU should observe time going backwards.
> +                        */
> +                       if (cur_warps != nr_warps)
> +                               random_warps++;
>                         nr_warps++;
> +                       cur_warps = nr_warps;
>                         arch_spin_unlock(&sync_lock);
>                 }
>         }
>         WARN(!(now-start),
>                 "Warning: zero tsc calibration delta: %Ld [max: %Ld]\n",
>                         now-start, end-start);
> +       return cur_max_warp;
>  }
>
>  /*
> @@ -136,15 +294,26 @@ void check_tsc_sync_source(int cpu)
>         }
>
>         /*
> -        * Reset it - in case this is a second bootup:
> +        * Set the maximum number of test runs to
> +        *  1 if the CPU does not provide the TSC_ADJUST MSR
> +        *  3 if the MSR is available, so the target can try to adjust
>          */
> -       atomic_set(&stop_count, 0);
> -
> +       if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
> +               atomic_set(&test_runs, 1);
> +       else
> +               atomic_set(&test_runs, 3);
> +retry:
>         /*
> -        * Wait for the target to arrive:
> +        * Wait for the target to start or to skip the test:
>          */
> -       while (atomic_read(&start_count) != cpus-1)
> +       while (atomic_read(&start_count) != cpus - 1) {
> +               if (atomic_read(&skip_test) > 0) {
> +                       atomic_set(&skip_test, 0);
> +                       return;
> +               }
>                 cpu_relax();
> +       }
> +
>         /*
>          * Trigger the target to continue into the measurement too:
>          */
> @@ -155,21 +324,35 @@ void check_tsc_sync_source(int cpu)
>         while (atomic_read(&stop_count) != cpus-1)
>                 cpu_relax();
>
> -       if (nr_warps) {
> +       /*
> +        * If the test was successful set the number of runs to zero and
> +        * stop. If not, decrement the number of runs an check if we can
> +        * retry. In case of random warps no retry is attempted.
> +        */
> +       if (!nr_warps) {
> +               atomic_set(&test_runs, 0);
> +
> +               pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
> +                       smp_processor_id(), cpu);
> +
> +       } else if (atomic_dec_and_test(&test_runs) || random_warps) {
> +               /* Force it to 0 if random warps brought us here */
> +               atomic_set(&test_runs, 0);
> +
>                 pr_warning("TSC synchronization [CPU#%d -> CPU#%d]:\n",
>                         smp_processor_id(), cpu);
>                 pr_warning("Measured %Ld cycles TSC warp between CPUs, "
>                            "turning off TSC clock.\n", max_warp);
> +               if (random_warps)
> +                       pr_warning("TSC warped randomly between CPUs\n");
>                 mark_tsc_unstable("check_tsc_sync_source failed");
> -       } else {
> -               pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n",
> -                       smp_processor_id(), cpu);
>         }
>
>         /*
>          * Reset it - just in case we boot another CPU later:
>          */
>         atomic_set(&start_count, 0);
> +       random_warps = 0;
>         nr_warps = 0;
>         max_warp = 0;
>         last_tsc = 0;
> @@ -178,6 +361,12 @@ void check_tsc_sync_source(int cpu)
>          * Let the target continue with the bootup:
>          */
>         atomic_inc(&stop_count);
> +
> +       /*
> +        * Retry, if there is a chance to do so.
> +        */
> +       if (atomic_read(&test_runs) > 0)
> +               goto retry;
>  }
>
>  /*
> @@ -185,6 +374,9 @@ void check_tsc_sync_source(int cpu)
>   */
>  void check_tsc_sync_target(void)
>  {
> +       struct tsc_adjust *cur = this_cpu_ptr(&tsc_adjust);
> +       unsigned int cpu = smp_processor_id();
> +       cycles_t cur_max_warp, gbl_max_warp;
>         int cpus = 2;
>
>         /* Also aborts if there is no TSC. */
> @@ -192,6 +384,16 @@ void check_tsc_sync_target(void)
>                 return;
>
>         /*
> +        * Store, verify and sanitize the TSC adjust register. If
> +        * successful skip the test.
> +        */
> +       if (tsc_store_and_check_tsc_adjust(false)) {
> +               atomic_inc(&skip_test);
> +               return;
> +       }
> +
> +retry:
> +       /*
>          * Register this CPU's participation and wait for the
>          * source CPU to start the measurement:
>          */
> @@ -199,7 +401,12 @@ void check_tsc_sync_target(void)
>         while (atomic_read(&start_count) != cpus)
>                 cpu_relax();
>
> -       check_tsc_warp(loop_timeout(smp_processor_id()));
> +       cur_max_warp = check_tsc_warp(loop_timeout(cpu));
> +
> +       /*
> +        * Store the maximum observed warp value for a potential retry:
> +        */
> +       gbl_max_warp = max_warp;
>
>         /*
>          * Ok, we are done:
> @@ -211,4 +418,61 @@ void check_tsc_sync_target(void)
>          */
>         while (atomic_read(&stop_count) != cpus)
>                 cpu_relax();
> +
> +       /*
> +        * Reset it for the next sync test:
> +        */
> +       atomic_set(&stop_count, 0);
> +
> +       /*
> +        * Check the number of remaining test runs. If not zero, the test
> +        * failed and a retry with adjusted TSC is possible. If zero the
> +        * test was either successful or failed terminally.
> +        */
> +       if (!atomic_read(&test_runs))
> +               return;
> +
> +       /*
> +        * If the warp value of this CPU is 0, then the other CPU
> +        * observed time going backwards so this TSC was ahead and
> +        * needs to move backwards.
> +        */
> +       if (!cur_max_warp)
> +               cur_max_warp = -gbl_max_warp;
> +
> +       /*
> +        * Add the result to the previous adjustment value.
> +        *
> +        * The adjustement value is slightly off by the overhead of the
> +        * sync mechanism (observed values are ~200 TSC cycles), but this
> +        * really depends on CPU, node distance and frequency. So
> +        * compensating for this is hard to get right. Experiments show
> +        * that the warp is not longer detectable when the observed warp
> +        * value is used. In the worst case the adjustment needs to go
> +        * through a 3rd run for fine tuning.
> +        */
> +       cur->adjusted += cur_max_warp;
> +
> +       /*
> +        * TSC deadline timer stops working or creates an interrupt storm
> +        * with adjust values < 0 and > x07ffffff.
> +        *
> +        * To allow adjust values > 0x7FFFFFFF we need to disable the
> +        * deadline timer and use the local APIC timer, but that requires
> +        * more intrusive changes and we do not have any useful information
> +        * from Intel about the underlying HW wreckage yet.
> +        */
> +       if (cur->adjusted < 0)
> +               cur->adjusted = 0;
> +       if (cur->adjusted > 0x7FFFFFFF)
> +               cur->adjusted = 0x7FFFFFFF;
> +
> +       pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp. Adjust: %lld\n",
> +               cpu, cur_max_warp, cur->adjusted);
> +
> +       wrmsrl(MSR_IA32_TSC_ADJUST, cur->adjusted);
> +       goto retry;
> +
>  }
> +
> +#endif /* CONFIG_SMP */
> diff --git a/arch/x86/platform/intel-mid/mfld.c b/arch/x86/platform/intel-mid/mfld.c
> index 1eb47b6298c2..e793fe509971 100644
> --- a/arch/x86/platform/intel-mid/mfld.c
> +++ b/arch/x86/platform/intel-mid/mfld.c
> @@ -49,8 +49,13 @@ static unsigned long __init mfld_calibrate_tsc(void)
>         fast_calibrate = ratio * fsb;
>         pr_debug("read penwell tsc %lu khz\n", fast_calibrate);
>         lapic_timer_frequency = fsb * 1000 / HZ;
> -       /* mark tsc clocksource as reliable */
> -       set_cpu_cap(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE);
> +
> +       /*
> +        * TSC on Intel Atom SoCs is reliable and of known frequency.
> +        * See tsc_msr.c for details.
> +        */
> +       setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
> +       setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>
>         return fast_calibrate;
>  }
> diff --git a/arch/x86/platform/intel-mid/mrfld.c b/arch/x86/platform/intel-mid/mrfld.c
> index 59253db41bbc..e0607c77a1bd 100644
> --- a/arch/x86/platform/intel-mid/mrfld.c
> +++ b/arch/x86/platform/intel-mid/mrfld.c
> @@ -78,8 +78,12 @@ static unsigned long __init tangier_calibrate_tsc(void)
>         pr_debug("Setting lapic_timer_frequency = %d\n",
>                         lapic_timer_frequency);
>
> -       /* mark tsc clocksource as reliable */
> -       set_cpu_cap(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE);
> +       /*
> +        * TSC on Intel Atom SoCs is reliable and of known frequency.
> +        * See tsc_msr.c for details.
> +        */
> +       setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ);
> +       setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE);
>
>         return fast_calibrate;
>  }
> diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c
> index 53cace2ec0e2..66ade16c7693 100644
> --- a/arch/x86/power/cpu.c
> +++ b/arch/x86/power/cpu.c
> @@ -252,6 +252,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt)
>         fix_processor_context();
>
>         do_fpu_end();
> +       tsc_verify_tsc_adjust(true);
>         x86_platform.restore_sched_clock_state();
>         mtrr_bp_restore();
>         perf_restore_debug_store();

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT pull] x86/timers for 4.10
  2017-02-07  1:31 ` Olof Johansson
@ 2017-02-08 11:44   ` Thomas Gleixner
  2017-02-08 13:04     ` Mike Galbraith
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2017-02-08 11:44 UTC (permalink / raw)
  To: Olof Johansson
  Cc: Linus Torvalds, LKML, Andrew Morton, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra

On Mon, 6 Feb 2017, Olof Johansson wrote:
> [    0.177102] [Firmware Bug]: TSC ADJUST differs: Reference CPU0:
> -6495898515190607 CPU1: -6495898517158354

Yay, another "clever" BIOS ....

> [    0.177104] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -6495898517158354
> [2325258.877496]   #2
> [    0.257232] [Firmware Bug]: TSC ADJUST differs: Reference CPU0:
> -6495898515190607 CPU2: -6495898516849701
> [    0.257234] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -6495898516849701
> [2325258.957514]   #3
> 
> (Once SMP bringup is done, system settles down at the 232525x printk timestamps)


> 1) Timestamp jumps around during SMP bringup.
> 
> 2) Timestamp jumps forward a lot. That timestamp is ~26 days, which is
> likely the last cold boot of the system, similar to the original
> reports.
>
> I do find it somewhat annoying when printk timestamps aren't 0-based
> at boot, but can cope with it. Not sure if it's intended behavior
> though?

More resulting behaviour than intended.

The first goal was to fixup that TSC_ADJUST mess because it causes
malfunction of the TSC deadline timer and we don't know which other side
effects the broken hardware implementation has.

Aside of preventing the wreckage TSC becomes usable on those machines which
makes a massive performance difference especially on such big irons where
otherwise all CPUs (in your case 56) compete for HPET access....

> And the jumping around definitely seems to not be.

Bah. You got a working TSC and instead of enjoying the performance boost
you complain about timestamps in dmesg :)

But yes, it's not intended and I really did not pay attention to that.

Looking deeper, it's trivial to fixup the forward jump, but it would
require a lot of nasty hackery to make the "Firmware Bug" timestamps
behave.

The result with the patch below is:

[    0.397445] smp: Bringing up secondary CPUs ...
[    0.402100] x86: Booting SMP configuration:
[    0.406343] .... node  #0, CPUs:      #1
[1265776479.930667] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU1: -2978888639183101
[1265776479.944664] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -2978888639183101
[    0.508119]  #2
[1265776480.032346] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU2: -2978888639183677
[1265776480.044192] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -2978888639183677
[    0.607643]  #3
[1265776480.131874] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -2978888639075328 CPU3: -2978888639184530
[1265776480.143720] TSC ADJUST synchronize: Reference CPU0: 0 CPU3: -2978888639184530
[    0.707108] smp: Brought up 1 node, 4 CPUs
[    0.711271] smpboot: Total of 4 processors activated (21698.88 BogoMIPS)

and frankly I think it's not worth the trouble of buffering and delayed
printing to make the timestamps for these [Firmware Bug] lines
adjusted.

It's a firmware bug, so the timestamp being bonkers is just collateral
damage. Tell your hardware vendor to fix that crap.

> If someone cares, hardware is a Dell T7810 with 2x E5-2663 v3, BIOS
> date 03/09/2016.

Yet another Dell...

Thanks,

	tglx

8<----------------------------

diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index e41af597aed8..37e7cf544e51 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -1356,6 +1356,9 @@ void __init tsc_init(void)
 		(unsigned long)cpu_khz / 1000,
 		(unsigned long)cpu_khz % 1000);
 
+	/* Sanitize TSC ADJUST before cyc2ns gets initialized */
+	tsc_store_and_check_tsc_adjust(true);
+
 	/*
 	 * Secondary CPUs do not run through tsc_init(), so set up
 	 * all the scale factors for all CPUs, assuming the same
@@ -1386,8 +1389,6 @@ void __init tsc_init(void)
 
 	if (unsynchronized_tsc())
 		mark_tsc_unstable("TSCs unsynchronized");
-	else
-		tsc_store_and_check_tsc_adjust(true);
 
 	check_system_tsc_reliable();
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [GIT pull] x86/timers for 4.10
  2017-02-08 11:44   ` Thomas Gleixner
@ 2017-02-08 13:04     ` Mike Galbraith
  2017-02-09 15:07       ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Galbraith @ 2017-02-08 13:04 UTC (permalink / raw)
  To: Thomas Gleixner, Olof Johansson
  Cc: Linus Torvalds, LKML, Andrew Morton, Ingo Molnar, H. Peter Anvin,
	Peter Zijlstra

On Wed, 2017-02-08 at 12:44 +0100, Thomas Gleixner wrote:
> On Mon, 6 Feb 2017, Olof Johansson wrote:
> > [    0.177102] [Firmware Bug]: TSC ADJUST differs: Reference CPU0:
> > -6495898515190607 CPU1: -6495898517158354
> 
> Yay, another "clever" BIOS ....

Oh yeah, that reminds me...

I met one such box, and the adjustment code did salvage it, but I had
to cheat a little for it to do so reliably, as it would sometimes still
see a delta of 1 or 2 whole cycles, and hand me a useless wreck instead
quick like bunny big box.

	-Mike

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT pull] x86/timers for 4.10
  2017-02-08 13:04     ` Mike Galbraith
@ 2017-02-09 15:07       ` Thomas Gleixner
  2017-02-09 15:16         ` Mike Galbraith
  2017-02-23  8:20         ` Mike Galbraith
  0 siblings, 2 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-02-09 15:07 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Olof Johansson, Linus Torvalds, LKML, Andrew Morton, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra

On Wed, 8 Feb 2017, Mike Galbraith wrote:
> On Wed, 2017-02-08 at 12:44 +0100, Thomas Gleixner wrote:
> > On Mon, 6 Feb 2017, Olof Johansson wrote:
> > > [    0.177102] [Firmware Bug]: TSC ADJUST differs: Reference CPU0:
> > > -6495898515190607 CPU1: -6495898517158354
> > 
> > Yay, another "clever" BIOS ....
> 
> Oh yeah, that reminds me...
> 
> I met one such box, and the adjustment code did salvage it, but I had
> to cheat a little for it to do so reliably, as it would sometimes still
> see a delta of 1 or 2 whole cycles, and hand me a useless wreck instead
> quick like bunny big box.

Can you share your cheatery ?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT pull] x86/timers for 4.10
  2017-02-09 15:07       ` Thomas Gleixner
@ 2017-02-09 15:16         ` Mike Galbraith
  2017-02-09 15:21           ` Thomas Gleixner
  2017-02-23  8:20         ` Mike Galbraith
  1 sibling, 1 reply; 13+ messages in thread
From: Mike Galbraith @ 2017-02-09 15:16 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Olof Johansson, Linus Torvalds, LKML, Andrew Morton, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra

On Thu, 2017-02-09 at 16:07 +0100, Thomas Gleixner wrote:
> On Wed, 8 Feb 2017, Mike Galbraith wrote:
> > On Wed, 2017-02-08 at 12:44 +0100, Thomas Gleixner wrote:
> > > On Mon, 6 Feb 2017, Olof Johansson wrote:
> > > > [    0.177102] [Firmware Bug]: TSC ADJUST differs: Reference CPU0:
> > > > -6495898515190607 CPU1: -6495898517158354
> > > 
> > > Yay, another "clever" BIOS ....
> > 
> > Oh yeah, that reminds me...
> > 
> > I met one such box, and the adjustment code did salvage it, but I had
> > to cheat a little for it to do so reliably, as it would sometimes still
> > see a delta of 1 or 2 whole cycles, and hand me a useless wreck instead
> > quick like bunny big box.
> 
> Can you share your cheatery ?

I didn't keep it, it was just a bandaid for a fleeting use, dirt simple
ignore microscopic delta.

	-Mike

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT pull] x86/timers for 4.10
  2017-02-09 15:16         ` Mike Galbraith
@ 2017-02-09 15:21           ` Thomas Gleixner
  2017-02-09 15:25             ` Mike Galbraith
  0 siblings, 1 reply; 13+ messages in thread
From: Thomas Gleixner @ 2017-02-09 15:21 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Olof Johansson, Linus Torvalds, LKML, Andrew Morton, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra

On Thu, 9 Feb 2017, Mike Galbraith wrote:

> On Thu, 2017-02-09 at 16:07 +0100, Thomas Gleixner wrote:
> > On Wed, 8 Feb 2017, Mike Galbraith wrote:
> > > On Wed, 2017-02-08 at 12:44 +0100, Thomas Gleixner wrote:
> > > > On Mon, 6 Feb 2017, Olof Johansson wrote:
> > > > > [    0.177102] [Firmware Bug]: TSC ADJUST differs: Reference CPU0:
> > > > > -6495898515190607 CPU1: -6495898517158354
> > > > 
> > > > Yay, another "clever" BIOS ....
> > > 
> > > Oh yeah, that reminds me...
> > > 
> > > I met one such box, and the adjustment code did salvage it, but I had
> > > to cheat a little for it to do so reliably, as it would sometimes still
> > > see a delta of 1 or 2 whole cycles, and hand me a useless wreck instead
> > > quick like bunny big box.
> > 
> > Can you share your cheatery ?
> 
> I didn't keep it, it was just a bandaid for a fleeting use, dirt simple
> ignore microscopic delta.

Can you send me the dmesg output of that box for a good and a bad case or
don't you have access to it anymore?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT pull] x86/timers for 4.10
  2017-02-09 15:21           ` Thomas Gleixner
@ 2017-02-09 15:25             ` Mike Galbraith
  2017-02-09 15:39               ` Thomas Gleixner
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Galbraith @ 2017-02-09 15:25 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Olof Johansson, Linus Torvalds, LKML, Andrew Morton, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra

On Thu, 2017-02-09 at 16:21 +0100, Thomas Gleixner wrote:
> On Thu, 9 Feb 2017, Mike Galbraith wrote:
> 
> > On Thu, 2017-02-09 at 16:07 +0100, Thomas Gleixner wrote:
> > > On Wed, 8 Feb 2017, Mike Galbraith wrote:
> > > > On Wed, 2017-02-08 at 12:44 +0100, Thomas Gleixner wrote:
> > > > > On Mon, 6 Feb 2017, Olof Johansson wrote:
> > > > > > [    0.177102] [Firmware Bug]: TSC ADJUST differs: Reference CPU0:
> > > > > > -6495898515190607 CPU1: -6495898517158354
> > > > > 
> > > > > Yay, another "clever" BIOS ....
> > > > 
> > > > Oh yeah, that reminds me...
> > > > 
> > > > I met one such box, and the adjustment code did salvage it, but I had
> > > > to cheat a little for it to do so reliably, as it would sometimes still
> > > > see a delta of 1 or 2 whole cycles, and hand me a useless wreck instead
> > > > quick like bunny big box.
> > > 
> > > Can you share your cheatery ?
> > 
> > I didn't keep it, it was just a bandaid for a fleeting use, dirt simple
> > ignore microscopic delta.
> 
> Can you send me the dmesg output of that box for a good and a bad case or
> don't you have access to it anymore?

I don't even remember which box it was, but I can try to find it again
during idle moments.

	-Mike 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT pull] x86/timers for 4.10
  2017-02-09 15:25             ` Mike Galbraith
@ 2017-02-09 15:39               ` Thomas Gleixner
  0 siblings, 0 replies; 13+ messages in thread
From: Thomas Gleixner @ 2017-02-09 15:39 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Olof Johansson, Linus Torvalds, LKML, Andrew Morton, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra

On Thu, 9 Feb 2017, Mike Galbraith wrote:
> 
> I don't even remember which box it was, but I can try to find it again
> during idle moments.

  sys_sched_setscheduler(PIDOF(Mike), IDLE);

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT pull] x86/timers for 4.10
  2017-02-09 15:07       ` Thomas Gleixner
  2017-02-09 15:16         ` Mike Galbraith
@ 2017-02-23  8:20         ` Mike Galbraith
  2017-02-23  9:17           ` Peter Zijlstra
  2017-02-23 10:26           ` Borislav Petkov
  1 sibling, 2 replies; 13+ messages in thread
From: Mike Galbraith @ 2017-02-23  8:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Olof Johansson, Linus Torvalds, LKML, Andrew Morton, Ingo Molnar,
	H. Peter Anvin, Peter Zijlstra

On Thu, 2017-02-09 at 16:07 +0100, Thomas Gleixner wrote:
> On Wed, 8 Feb 2017, Mike Galbraith wrote:
> > On Wed, 2017-02-08 at 12:44 +0100, Thomas Gleixner wrote:
> > > On Mon, 6 Feb 2017, Olof Johansson wrote:
> > > > [    0.177102] [Firmware Bug]: TSC ADJUST differs: Reference
> > > > CPU0:
> > > > -6495898515190607 CPU1: -6495898517158354
> > > 
> > > Yay, another "clever" BIOS ....
> > 
> > Oh yeah, that reminds me...
> > 
> > I met one such box, and the adjustment code did salvage it, but I
> > had
> > to cheat a little for it to do so reliably, as it would sometimes
> > still
> > see a delta of 1 or 2 whole cycles, and hand me a useless wreck
> > instead
> > quick like bunny big box.
> 
> Can you share your cheatery ?

I can do better than that... sorta ;-)

x86/tsc: Fix unreliable tsc adjust

On a 4 socket BIOS challenged box (4x18), the magic number '3' does
not work reliably, resulting in TSC being disabled more often than not.

Replace defective magic number '3' with functional magic number '5',
derived via scientific method number sockets in afflicted box, plus
one for good luck, and reboot box a lot to validate (poke 'n' hope).

Signed-off-by: Mike Galbraith <efault@gmx.de>
---
 arch/x86/kernel/tsc_sync.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

--- a/arch/x86/kernel/tsc_sync.c
+++ b/arch/x86/kernel/tsc_sync.c
@@ -294,7 +294,7 @@ void check_tsc_sync_source(int cpu)
 	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
 		atomic_set(&test_runs, 1);
 	else
-		atomic_set(&test_runs, 3);
+		atomic_set(&test_runs, 5);
 retry:
 	/*
 	 * Wait for the target to start or to skip the test:

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT pull] x86/timers for 4.10
  2017-02-23  8:20         ` Mike Galbraith
@ 2017-02-23  9:17           ` Peter Zijlstra
  2017-02-23 10:26           ` Borislav Petkov
  1 sibling, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2017-02-23  9:17 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Thomas Gleixner, Olof Johansson, Linus Torvalds, LKML,
	Andrew Morton, Ingo Molnar, H. Peter Anvin

On Thu, Feb 23, 2017 at 09:20:06AM +0100, Mike Galbraith wrote:
> On Thu, 2017-02-09 at 16:07 +0100, Thomas Gleixner wrote:

> > Can you share your cheatery ?
> 
> I can do better than that... sorta ;-)
> 
> x86/tsc: Fix unreliable tsc adjust
> 
> On a 4 socket BIOS challenged box (4x18), the magic number '3' does
> not work reliably, resulting in TSC being disabled more often than not.
> 
> Replace defective magic number '3' with functional magic number '5',
> derived via scientific method number sockets in afflicted box, plus
> one for good luck, and reboot box a lot to validate (poke 'n' hope).
> 
> Signed-off-by: Mike Galbraith <efault@gmx.de>
> ---
>  arch/x86/kernel/tsc_sync.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/arch/x86/kernel/tsc_sync.c
> +++ b/arch/x86/kernel/tsc_sync.c
> @@ -294,7 +294,7 @@ void check_tsc_sync_source(int cpu)
>  	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
>  		atomic_set(&test_runs, 1);
>  	else
> -		atomic_set(&test_runs, 3);
> +		atomic_set(&test_runs, 5);
>  retry:
>  	/*
>  	 * Wait for the target to start or to skip the test:

Confirmed on another box too; the default of 3 wasn't sufficient to
achieve sync. When upping it to 16, sync was achieved in around 4 runs
or so.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT pull] x86/timers for 4.10
  2017-02-23  8:20         ` Mike Galbraith
  2017-02-23  9:17           ` Peter Zijlstra
@ 2017-02-23 10:26           ` Borislav Petkov
  2017-02-23 13:29             ` Mike Galbraith
  1 sibling, 1 reply; 13+ messages in thread
From: Borislav Petkov @ 2017-02-23 10:26 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Thomas Gleixner, Olof Johansson, Linus Torvalds, LKML,
	Andrew Morton, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On Thu, Feb 23, 2017 at 09:20:06AM +0100, Mike Galbraith wrote:
> --- a/arch/x86/kernel/tsc_sync.c
> +++ b/arch/x86/kernel/tsc_sync.c
> @@ -294,7 +294,7 @@ void check_tsc_sync_source(int cpu)
>  	if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
>  		atomic_set(&test_runs, 1);
>  	else
> -		atomic_set(&test_runs, 3);
> +		atomic_set(&test_runs, 5);

So
		atomic_set(&test_runs, max_t(unsigned int, 3, num_online_nodes() + 1));

:-)

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [GIT pull] x86/timers for 4.10
  2017-02-23 10:26           ` Borislav Petkov
@ 2017-02-23 13:29             ` Mike Galbraith
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Galbraith @ 2017-02-23 13:29 UTC (permalink / raw)
  To: Borislav Petkov
  Cc: Thomas Gleixner, Olof Johansson, Linus Torvalds, LKML,
	Andrew Morton, Ingo Molnar, H. Peter Anvin, Peter Zijlstra

On Thu, 2017-02-23 at 11:26 +0100, Borislav Petkov wrote:
> On Thu, Feb 23, 2017 at 09:20:06AM +0100, Mike Galbraith wrote:
> > --- a/arch/x86/kernel/tsc_sync.c
> > +++ b/arch/x86/kernel/tsc_sync.c
> > @@ -294,7 +294,7 @@ void check_tsc_sync_source(int cpu)
> >  > > 	> > if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST))
> >  > > 	> > 	> > atomic_set(&test_runs, 1);
> >  > > 	> > else
> > -> > 	> > 	> > atomic_set(&test_runs, 3);
> > +> > 	> > 	> > atomic_set(&test_runs, 5);
> 
> So
> 	> 	> atomic_set(&test_runs, max_t(unsigned int, 3, num_online_nodes() + 1));

I don't know that there is a correlation, ergo the "log" wording ;-)

	-Mike

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-02-23 13:31 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-12-18 20:06 [GIT pull] x86/timers for 4.10 Thomas Gleixner
2017-02-07  1:31 ` Olof Johansson
2017-02-08 11:44   ` Thomas Gleixner
2017-02-08 13:04     ` Mike Galbraith
2017-02-09 15:07       ` Thomas Gleixner
2017-02-09 15:16         ` Mike Galbraith
2017-02-09 15:21           ` Thomas Gleixner
2017-02-09 15:25             ` Mike Galbraith
2017-02-09 15:39               ` Thomas Gleixner
2017-02-23  8:20         ` Mike Galbraith
2017-02-23  9:17           ` Peter Zijlstra
2017-02-23 10:26           ` Borislav Petkov
2017-02-23 13:29             ` Mike Galbraith

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.