From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752035AbdBGBbl (ORCPT ); Mon, 6 Feb 2017 20:31:41 -0500 Received: from mail-qk0-f193.google.com ([209.85.220.193]:36090 "EHLO mail-qk0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751164AbdBGBbi (ORCPT ); Mon, 6 Feb 2017 20:31:38 -0500 MIME-Version: 1.0 X-Originating-IP: [209.133.79.6] In-Reply-To: References: From: Olof Johansson Date: Mon, 6 Feb 2017 17:31:35 -0800 Message-ID: Subject: Re: [GIT pull] x86/timers for 4.10 To: Thomas Gleixner Cc: Linus Torvalds , LKML , Andrew Morton , Ingo Molnar , "H. Peter Anvin" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Thomas, I just now updated my build box from 4.9-rc to 4.10-rc, and picked up these changes. My machine went from doing: [ 0.000000] tsc: Fast TSC calibration using PIT [ 0.060669] TSC deadline timer enabled [ 0.142701] TSC synchronization [CPU#0 -> CPU#1]: [ 0.142704] Measured 3127756 cycles TSC warp between CPUs, turning off TSC clock. [ 0.142708] tsc: Marking TSC unstable due to check_tsc_sync_source failed To: [ 0.000000] clocksource: hpet: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 133484882848 ns [ 0.000000] hpet clockevent registered [ 0.000000] tsc: Fast TSC calibration using PIT [ 0.000000] tsc: Detected 2793.624 MHz processor [ 0.000000] [Firmware Bug]: TSC ADJUST: CPU0: -6495898515190607 force to 0 [2325258.699535] Calibrating delay loop (skipped), value calculated using timer frequency.. 5587.24 BogoMIPS (lpj=11174496) [2325258.699537] pid_max: default: 32768 minimum: 301 [... SMP bringup and for each CPU:] [ 0.177102] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -6495898515190607 CPU1: -6495898517158354 [ 0.177104] TSC ADJUST synchronize: Reference CPU0: 0 CPU1: -6495898517158354 [2325258.877496] #2 [ 0.257232] [Firmware Bug]: TSC ADJUST differs: Reference CPU0: -6495898515190607 CPU2: -6495898516849701 [ 0.257234] TSC ADJUST synchronize: Reference CPU0: 0 CPU2: -6495898516849701 [2325258.957514] #3 (Once SMP bringup is done, system settles down at the 232525x printk timestamps) ... So, a couple of obvious notes: 1) Timestamp jumps around during SMP bringup. 2) Timestamp jumps forward a lot. That timestamp is ~26 days, which is likely the last cold boot of the system, similar to the original reports. I do find it somewhat annoying when printk timestamps aren't 0-based at boot, but can cope with it. Not sure if it's intended behavior though? And the jumping around definitely seems to not be. If someone cares, hardware is a Dell T7810 with 2x E5-2663 v3, BIOS date 03/09/2016. -Olof On Sun, Dec 18, 2016 at 12:06 PM, Thomas Gleixner wrote: > Linus, > > please pull the latest x86-timers-for-linus git tree from: > > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git x86-timers-for-linus > > This is the last functional update from the tip tree for 4.10. It got > delayed due to a newly reported and anlyzed variant of BIOS bug and the > resulting wreckage: > > - Seperation of TSC being marked realiable and the fact that the platform > provides the TSC frequency via CPUID/MSRs and making use for it for > GOLDMONT. > > - TSC adjust MSR validation and sanitizing: > > The TSC adjust MSR contains the offset to the hardware counter. The sum > of the adjust MSR and the counter is the TSC value which is read via > RDTSC. > > On at least two machines from different vendors the BIOS sets the TSC > adjust MSR to negative values. This happens on cold and warm boot. While > on cold boot the offset is a few milliseconds, on warm boot it basically > compensates the power on time of the system. The BIOSes are not even > using the adjust MSR to set all CPUs in the package to the same > offset. The offsets are different which renders the TSC unusable, > > What's worse is that the TSC deadline timer has a HW feature^Wbug. It > malfunctions when the TSC adjust value is negative or greater equal > 0x80000000 resulting in silent boot failures, hard lockups or non firing > timers. This looks like some hardware internal 32/64bit issue with a > sign extension problem. Intel has been silent so far on the issue. > > The update contains sanity checks and keeps the adjust register within > working limits and in sync on the package. > > As it looks like this disease is spreading via BIOS crapware, we need to > address this urgently as the boot failures are hard to debug for users. > > > Thanks, > > tglx > > ------------------> > Bin Gao (4): > x86/tsc: Add X86_FEATURE_TSC_KNOWN_FREQ flag > x86/tsc: Mark TSC frequency determined by CPUID as known > x86/tsc: Mark Intel ATOM_GOLDMONT TSC reliable > x86/tsc: Set TSC_KNOWN_FREQ and TSC_RELIABLE flags on Intel Atom SoCs > > Thomas Gleixner (15): > x86/tsc: Finalize the split of the TSC_RELIABLE flag > x86/tsc: Use X86_FEATURE_TSC_ADJUST in detect_art() > x86/tsc: Detect random warps > x86/tsc: Store and check TSC ADJUST MSR > x86/tsc: Verify TSC_ADJUST from idle > x86/tsc: Sync test only for the first cpu in a package > x86/tsc: Move sync cleanup to a safe place > x86/tsc: Prepare warp test for TSC adjustment > x86/tsc: Try to adjust TSC if sync test fails > x86/tsc: Fix broken CONFIG_X86_TSC=n build > x86/tsc: Validate cpumask pointer before accessing it > x86/tsc: Validate TSC_ADJUST after resume > x86/tsc: Force TSC_ADJUST register to value >= zero > x86/tsc: Annotate printouts as firmware bug > x86/tsc: Limit the adjust value further > > > arch/x86/include/asm/cpufeatures.h | 1 + > arch/x86/include/asm/tsc.h | 9 ++ > arch/x86/kernel/Makefile | 2 +- > arch/x86/kernel/process.c | 1 + > arch/x86/kernel/tsc.c | 42 ++++-- > arch/x86/kernel/tsc_msr.c | 19 +++ > arch/x86/kernel/tsc_sync.c | 290 ++++++++++++++++++++++++++++++++++-- > arch/x86/platform/intel-mid/mfld.c | 9 +- > arch/x86/platform/intel-mid/mrfld.c | 8 +- > arch/x86/power/cpu.c | 1 + > 10 files changed, 355 insertions(+), 27 deletions(-) > > diff --git a/arch/x86/include/asm/cpufeatures.h b/arch/x86/include/asm/cpufeatures.h > index a39629206864..7f6a5f88d5ae 100644 > --- a/arch/x86/include/asm/cpufeatures.h > +++ b/arch/x86/include/asm/cpufeatures.h > @@ -106,6 +106,7 @@ > #define X86_FEATURE_APERFMPERF ( 3*32+28) /* APERFMPERF */ > #define X86_FEATURE_EAGER_FPU ( 3*32+29) /* "eagerfpu" Non lazy FPU restore */ > #define X86_FEATURE_NONSTOP_TSC_S3 ( 3*32+30) /* TSC doesn't stop in S3 state */ > +#define X86_FEATURE_TSC_KNOWN_FREQ ( 3*32+31) /* TSC has known frequency */ > > /* Intel-defined CPU features, CPUID level 0x00000001 (ecx), word 4 */ > #define X86_FEATURE_XMM3 ( 4*32+ 0) /* "pni" SSE-3 */ > diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h > index 33b6365c22fe..abb1fdcc545a 100644 > --- a/arch/x86/include/asm/tsc.h > +++ b/arch/x86/include/asm/tsc.h > @@ -45,8 +45,17 @@ extern int tsc_clocksource_reliable; > * Boot-time check whether the TSCs are synchronized across > * all CPUs/cores: > */ > +#ifdef CONFIG_X86_TSC > +extern bool tsc_store_and_check_tsc_adjust(bool bootcpu); > +extern void tsc_verify_tsc_adjust(bool resume); > extern void check_tsc_sync_source(int cpu); > extern void check_tsc_sync_target(void); > +#else > +static inline bool tsc_store_and_check_tsc_adjust(bool bootcpu) { return false; } > +static inline void tsc_verify_tsc_adjust(bool resume) { } > +static inline void check_tsc_sync_source(int cpu) { } > +static inline void check_tsc_sync_target(void) { } > +#endif > > extern int notsc_setup(char *); > extern void tsc_save_sched_clock_state(void); > diff --git a/arch/x86/kernel/Makefile b/arch/x86/kernel/Makefile > index 79076d75bdbf..c0ac317dd372 100644 > --- a/arch/x86/kernel/Makefile > +++ b/arch/x86/kernel/Makefile > @@ -75,7 +75,7 @@ apm-y := apm_32.o > obj-$(CONFIG_APM) += apm.o > obj-$(CONFIG_SMP) += smp.o > obj-$(CONFIG_SMP) += smpboot.o > -obj-$(CONFIG_SMP) += tsc_sync.o > +obj-$(CONFIG_X86_TSC) += tsc_sync.o > obj-$(CONFIG_SMP) += setup_percpu.o > obj-$(CONFIG_X86_MPPARSE) += mpparse.o > obj-y += apic/ > diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c > index 0888a879120f..a67e0f0cdaab 100644 > --- a/arch/x86/kernel/process.c > +++ b/arch/x86/kernel/process.c > @@ -277,6 +277,7 @@ void exit_idle(void) > > void arch_cpu_idle_enter(void) > { > + tsc_verify_tsc_adjust(false); > local_touch_nmi(); > enter_idle(); > } > diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c > index 46b2f41f8b05..0aed75a1e31b 100644 > --- a/arch/x86/kernel/tsc.c > +++ b/arch/x86/kernel/tsc.c > @@ -702,6 +702,20 @@ unsigned long native_calibrate_tsc(void) > } > } > > + /* > + * TSC frequency determined by CPUID is a "hardware reported" > + * frequency and is the most accurate one so far we have. This > + * is considered a known frequency. > + */ > + setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ); > + > + /* > + * For Atom SoCs TSC is the only reliable clocksource. > + * Mark TSC reliable so no watchdog on it. > + */ > + if (boot_cpu_data.x86_model == INTEL_FAM6_ATOM_GOLDMONT) > + setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); > + > return crystal_khz * ebx_numerator / eax_denominator; > } > > @@ -1043,18 +1057,20 @@ static void detect_art(void) > if (boot_cpu_data.cpuid_level < ART_CPUID_LEAF) > return; > > - cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator, > - &art_to_tsc_numerator, unused, unused+1); > - > - /* Don't enable ART in a VM, non-stop TSC required */ > + /* Don't enable ART in a VM, non-stop TSC and TSC_ADJUST required */ > if (boot_cpu_has(X86_FEATURE_HYPERVISOR) || > !boot_cpu_has(X86_FEATURE_NONSTOP_TSC) || > - art_to_tsc_denominator < ART_MIN_DENOMINATOR) > + !boot_cpu_has(X86_FEATURE_TSC_ADJUST)) > return; > > - if (rdmsrl_safe(MSR_IA32_TSC_ADJUST, &art_to_tsc_offset)) > + cpuid(ART_CPUID_LEAF, &art_to_tsc_denominator, > + &art_to_tsc_numerator, unused, unused+1); > + > + if (art_to_tsc_denominator < ART_MIN_DENOMINATOR) > return; > > + rdmsrl(MSR_IA32_TSC_ADJUST, art_to_tsc_offset); > + > /* Make this sticky over multiple CPU init calls */ > setup_force_cpu_cap(X86_FEATURE_ART); > } > @@ -1064,6 +1080,11 @@ static void detect_art(void) > > static struct clocksource clocksource_tsc; > > +static void tsc_resume(struct clocksource *cs) > +{ > + tsc_verify_tsc_adjust(true); > +} > + > /* > * We used to compare the TSC to the cycle_last value in the clocksource > * structure to avoid a nasty time-warp. This can be observed in a > @@ -1096,6 +1117,7 @@ static struct clocksource clocksource_tsc = { > .flags = CLOCK_SOURCE_IS_CONTINUOUS | > CLOCK_SOURCE_MUST_VERIFY, > .archdata = { .vclock_mode = VCLOCK_TSC }, > + .resume = tsc_resume, > }; > > void mark_tsc_unstable(char *reason) > @@ -1283,10 +1305,10 @@ static int __init init_tsc_clocksource(void) > clocksource_tsc.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP; > > /* > - * Trust the results of the earlier calibration on systems > - * exporting a reliable TSC. > + * When TSC frequency is known (retrieved via MSR or CPUID), we skip > + * the refined calibration and directly register it as a clocksource. > */ > - if (boot_cpu_has(X86_FEATURE_TSC_RELIABLE)) { > + if (boot_cpu_has(X86_FEATURE_TSC_KNOWN_FREQ)) { > clocksource_register_khz(&clocksource_tsc, tsc_khz); > return 0; > } > @@ -1363,6 +1385,8 @@ void __init tsc_init(void) > > if (unsynchronized_tsc()) > mark_tsc_unstable("TSCs unsynchronized"); > + else > + tsc_store_and_check_tsc_adjust(true); > > check_system_tsc_reliable(); > > diff --git a/arch/x86/kernel/tsc_msr.c b/arch/x86/kernel/tsc_msr.c > index 0fe720d64fef..19afdbd7d0a7 100644 > --- a/arch/x86/kernel/tsc_msr.c > +++ b/arch/x86/kernel/tsc_msr.c > @@ -100,5 +100,24 @@ unsigned long cpu_khz_from_msr(void) > #ifdef CONFIG_X86_LOCAL_APIC > lapic_timer_frequency = (freq * 1000) / HZ; > #endif > + > + /* > + * TSC frequency determined by MSR is always considered "known" > + * because it is reported by HW. > + * Another fact is that on MSR capable platforms, PIT/HPET is > + * generally not available so calibration won't work at all. > + */ > + setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ); > + > + /* > + * Unfortunately there is no way for hardware to tell whether the > + * TSC is reliable. We were told by silicon design team that TSC > + * on Atom SoCs are always "reliable". TSC is also the only > + * reliable clocksource on these SoCs (HPET is either not present > + * or not functional) so mark TSC reliable which removes the > + * requirement for a watchdog clocksource. > + */ > + setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); > + > return res; > } > diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c > index 78083bf23ed1..d0db011051a5 100644 > --- a/arch/x86/kernel/tsc_sync.c > +++ b/arch/x86/kernel/tsc_sync.c > @@ -14,18 +14,166 @@ > * ( The serial nature of the boot logic and the CPU hotplug lock > * protects against more than 2 CPUs entering this code. ) > */ > +#include > #include > #include > #include > #include > #include > > +struct tsc_adjust { > + s64 bootval; > + s64 adjusted; > + unsigned long nextcheck; > + bool warned; > +}; > + > +static DEFINE_PER_CPU(struct tsc_adjust, tsc_adjust); > + > +void tsc_verify_tsc_adjust(bool resume) > +{ > + struct tsc_adjust *adj = this_cpu_ptr(&tsc_adjust); > + s64 curval; > + > + if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST)) > + return; > + > + /* Rate limit the MSR check */ > + if (!resume && time_before(jiffies, adj->nextcheck)) > + return; > + > + adj->nextcheck = jiffies + HZ; > + > + rdmsrl(MSR_IA32_TSC_ADJUST, curval); > + if (adj->adjusted == curval) > + return; > + > + /* Restore the original value */ > + wrmsrl(MSR_IA32_TSC_ADJUST, adj->adjusted); > + > + if (!adj->warned || resume) { > + pr_warn(FW_BUG "TSC ADJUST differs: CPU%u %lld --> %lld. Restoring\n", > + smp_processor_id(), adj->adjusted, curval); > + adj->warned = true; > + } > +} > + > +static void tsc_sanitize_first_cpu(struct tsc_adjust *cur, s64 bootval, > + unsigned int cpu, bool bootcpu) > +{ > + /* > + * First online CPU in a package stores the boot value in the > + * adjustment value. This value might change later via the sync > + * mechanism. If that fails we still can yell about boot values not > + * being consistent. > + * > + * On the boot cpu we just force set the ADJUST value to 0 if it's > + * non zero. We don't do that on non boot cpus because physical > + * hotplug should have set the ADJUST register to a value > 0 so > + * the TSC is in sync with the already running cpus. > + * > + * But we always force positive ADJUST values. Otherwise the TSC > + * deadline timer creates an interrupt storm. We also have to > + * prevent values > 0x7FFFFFFF as those wreckage the timer as well. > + */ > + if ((bootcpu && bootval != 0) || (!bootcpu && bootval < 0) || > + (bootval > 0x7FFFFFFF)) { > + pr_warn(FW_BUG "TSC ADJUST: CPU%u: %lld force to 0\n", cpu, > + bootval); > + wrmsrl(MSR_IA32_TSC_ADJUST, 0); > + bootval = 0; > + } > + cur->adjusted = bootval; > +} > + > +#ifndef CONFIG_SMP > +bool __init tsc_store_and_check_tsc_adjust(bool bootcpu) > +{ > + struct tsc_adjust *cur = this_cpu_ptr(&tsc_adjust); > + s64 bootval; > + > + if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST)) > + return false; > + > + rdmsrl(MSR_IA32_TSC_ADJUST, bootval); > + cur->bootval = bootval; > + cur->nextcheck = jiffies + HZ; > + tsc_sanitize_first_cpu(cur, bootval, smp_processor_id(), bootcpu); > + return false; > +} > + > +#else /* !CONFIG_SMP */ > + > +/* > + * Store and check the TSC ADJUST MSR if available > + */ > +bool tsc_store_and_check_tsc_adjust(bool bootcpu) > +{ > + struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust); > + unsigned int refcpu, cpu = smp_processor_id(); > + struct cpumask *mask; > + s64 bootval; > + > + if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST)) > + return false; > + > + rdmsrl(MSR_IA32_TSC_ADJUST, bootval); > + cur->bootval = bootval; > + cur->nextcheck = jiffies + HZ; > + cur->warned = false; > + > + /* > + * Check whether this CPU is the first in a package to come up. In > + * this case do not check the boot value against another package > + * because the new package might have been physically hotplugged, > + * where TSC_ADJUST is expected to be different. When called on the > + * boot CPU topology_core_cpumask() might not be available yet. > + */ > + mask = topology_core_cpumask(cpu); > + refcpu = mask ? cpumask_any_but(mask, cpu) : nr_cpu_ids; > + > + if (refcpu >= nr_cpu_ids) { > + tsc_sanitize_first_cpu(cur, bootval, smp_processor_id(), > + bootcpu); > + return false; > + } > + > + ref = per_cpu_ptr(&tsc_adjust, refcpu); > + /* > + * Compare the boot value and complain if it differs in the > + * package. > + */ > + if (bootval != ref->bootval) { > + pr_warn(FW_BUG "TSC ADJUST differs: Reference CPU%u: %lld CPU%u: %lld\n", > + refcpu, ref->bootval, cpu, bootval); > + } > + /* > + * The TSC_ADJUST values in a package must be the same. If the boot > + * value on this newly upcoming CPU differs from the adjustment > + * value of the already online CPU in this package, set it to that > + * adjusted value. > + */ > + if (bootval != ref->adjusted) { > + pr_warn("TSC ADJUST synchronize: Reference CPU%u: %lld CPU%u: %lld\n", > + refcpu, ref->adjusted, cpu, bootval); > + cur->adjusted = ref->adjusted; > + wrmsrl(MSR_IA32_TSC_ADJUST, ref->adjusted); > + } > + /* > + * We have the TSCs forced to be in sync on this package. Skip sync > + * test: > + */ > + return true; > +} > + > /* > * Entry/exit counters that make sure that both CPUs > * run the measurement code at once: > */ > static atomic_t start_count; > static atomic_t stop_count; > +static atomic_t skip_test; > +static atomic_t test_runs; > > /* > * We use a raw spinlock in this exceptional case, because > @@ -37,15 +185,16 @@ static arch_spinlock_t sync_lock = __ARCH_SPIN_LOCK_UNLOCKED; > static cycles_t last_tsc; > static cycles_t max_warp; > static int nr_warps; > +static int random_warps; > > /* > * TSC-warp measurement loop running on both CPUs. This is not called > * if there is no TSC. > */ > -static void check_tsc_warp(unsigned int timeout) > +static cycles_t check_tsc_warp(unsigned int timeout) > { > - cycles_t start, now, prev, end; > - int i; > + cycles_t start, now, prev, end, cur_max_warp = 0; > + int i, cur_warps = 0; > > start = rdtsc_ordered(); > /* > @@ -85,13 +234,22 @@ static void check_tsc_warp(unsigned int timeout) > if (unlikely(prev > now)) { > arch_spin_lock(&sync_lock); > max_warp = max(max_warp, prev - now); > + cur_max_warp = max_warp; > + /* > + * Check whether this bounces back and forth. Only > + * one CPU should observe time going backwards. > + */ > + if (cur_warps != nr_warps) > + random_warps++; > nr_warps++; > + cur_warps = nr_warps; > arch_spin_unlock(&sync_lock); > } > } > WARN(!(now-start), > "Warning: zero tsc calibration delta: %Ld [max: %Ld]\n", > now-start, end-start); > + return cur_max_warp; > } > > /* > @@ -136,15 +294,26 @@ void check_tsc_sync_source(int cpu) > } > > /* > - * Reset it - in case this is a second bootup: > + * Set the maximum number of test runs to > + * 1 if the CPU does not provide the TSC_ADJUST MSR > + * 3 if the MSR is available, so the target can try to adjust > */ > - atomic_set(&stop_count, 0); > - > + if (!boot_cpu_has(X86_FEATURE_TSC_ADJUST)) > + atomic_set(&test_runs, 1); > + else > + atomic_set(&test_runs, 3); > +retry: > /* > - * Wait for the target to arrive: > + * Wait for the target to start or to skip the test: > */ > - while (atomic_read(&start_count) != cpus-1) > + while (atomic_read(&start_count) != cpus - 1) { > + if (atomic_read(&skip_test) > 0) { > + atomic_set(&skip_test, 0); > + return; > + } > cpu_relax(); > + } > + > /* > * Trigger the target to continue into the measurement too: > */ > @@ -155,21 +324,35 @@ void check_tsc_sync_source(int cpu) > while (atomic_read(&stop_count) != cpus-1) > cpu_relax(); > > - if (nr_warps) { > + /* > + * If the test was successful set the number of runs to zero and > + * stop. If not, decrement the number of runs an check if we can > + * retry. In case of random warps no retry is attempted. > + */ > + if (!nr_warps) { > + atomic_set(&test_runs, 0); > + > + pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n", > + smp_processor_id(), cpu); > + > + } else if (atomic_dec_and_test(&test_runs) || random_warps) { > + /* Force it to 0 if random warps brought us here */ > + atomic_set(&test_runs, 0); > + > pr_warning("TSC synchronization [CPU#%d -> CPU#%d]:\n", > smp_processor_id(), cpu); > pr_warning("Measured %Ld cycles TSC warp between CPUs, " > "turning off TSC clock.\n", max_warp); > + if (random_warps) > + pr_warning("TSC warped randomly between CPUs\n"); > mark_tsc_unstable("check_tsc_sync_source failed"); > - } else { > - pr_debug("TSC synchronization [CPU#%d -> CPU#%d]: passed\n", > - smp_processor_id(), cpu); > } > > /* > * Reset it - just in case we boot another CPU later: > */ > atomic_set(&start_count, 0); > + random_warps = 0; > nr_warps = 0; > max_warp = 0; > last_tsc = 0; > @@ -178,6 +361,12 @@ void check_tsc_sync_source(int cpu) > * Let the target continue with the bootup: > */ > atomic_inc(&stop_count); > + > + /* > + * Retry, if there is a chance to do so. > + */ > + if (atomic_read(&test_runs) > 0) > + goto retry; > } > > /* > @@ -185,6 +374,9 @@ void check_tsc_sync_source(int cpu) > */ > void check_tsc_sync_target(void) > { > + struct tsc_adjust *cur = this_cpu_ptr(&tsc_adjust); > + unsigned int cpu = smp_processor_id(); > + cycles_t cur_max_warp, gbl_max_warp; > int cpus = 2; > > /* Also aborts if there is no TSC. */ > @@ -192,6 +384,16 @@ void check_tsc_sync_target(void) > return; > > /* > + * Store, verify and sanitize the TSC adjust register. If > + * successful skip the test. > + */ > + if (tsc_store_and_check_tsc_adjust(false)) { > + atomic_inc(&skip_test); > + return; > + } > + > +retry: > + /* > * Register this CPU's participation and wait for the > * source CPU to start the measurement: > */ > @@ -199,7 +401,12 @@ void check_tsc_sync_target(void) > while (atomic_read(&start_count) != cpus) > cpu_relax(); > > - check_tsc_warp(loop_timeout(smp_processor_id())); > + cur_max_warp = check_tsc_warp(loop_timeout(cpu)); > + > + /* > + * Store the maximum observed warp value for a potential retry: > + */ > + gbl_max_warp = max_warp; > > /* > * Ok, we are done: > @@ -211,4 +418,61 @@ void check_tsc_sync_target(void) > */ > while (atomic_read(&stop_count) != cpus) > cpu_relax(); > + > + /* > + * Reset it for the next sync test: > + */ > + atomic_set(&stop_count, 0); > + > + /* > + * Check the number of remaining test runs. If not zero, the test > + * failed and a retry with adjusted TSC is possible. If zero the > + * test was either successful or failed terminally. > + */ > + if (!atomic_read(&test_runs)) > + return; > + > + /* > + * If the warp value of this CPU is 0, then the other CPU > + * observed time going backwards so this TSC was ahead and > + * needs to move backwards. > + */ > + if (!cur_max_warp) > + cur_max_warp = -gbl_max_warp; > + > + /* > + * Add the result to the previous adjustment value. > + * > + * The adjustement value is slightly off by the overhead of the > + * sync mechanism (observed values are ~200 TSC cycles), but this > + * really depends on CPU, node distance and frequency. So > + * compensating for this is hard to get right. Experiments show > + * that the warp is not longer detectable when the observed warp > + * value is used. In the worst case the adjustment needs to go > + * through a 3rd run for fine tuning. > + */ > + cur->adjusted += cur_max_warp; > + > + /* > + * TSC deadline timer stops working or creates an interrupt storm > + * with adjust values < 0 and > x07ffffff. > + * > + * To allow adjust values > 0x7FFFFFFF we need to disable the > + * deadline timer and use the local APIC timer, but that requires > + * more intrusive changes and we do not have any useful information > + * from Intel about the underlying HW wreckage yet. > + */ > + if (cur->adjusted < 0) > + cur->adjusted = 0; > + if (cur->adjusted > 0x7FFFFFFF) > + cur->adjusted = 0x7FFFFFFF; > + > + pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp. Adjust: %lld\n", > + cpu, cur_max_warp, cur->adjusted); > + > + wrmsrl(MSR_IA32_TSC_ADJUST, cur->adjusted); > + goto retry; > + > } > + > +#endif /* CONFIG_SMP */ > diff --git a/arch/x86/platform/intel-mid/mfld.c b/arch/x86/platform/intel-mid/mfld.c > index 1eb47b6298c2..e793fe509971 100644 > --- a/arch/x86/platform/intel-mid/mfld.c > +++ b/arch/x86/platform/intel-mid/mfld.c > @@ -49,8 +49,13 @@ static unsigned long __init mfld_calibrate_tsc(void) > fast_calibrate = ratio * fsb; > pr_debug("read penwell tsc %lu khz\n", fast_calibrate); > lapic_timer_frequency = fsb * 1000 / HZ; > - /* mark tsc clocksource as reliable */ > - set_cpu_cap(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE); > + > + /* > + * TSC on Intel Atom SoCs is reliable and of known frequency. > + * See tsc_msr.c for details. > + */ > + setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ); > + setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); > > return fast_calibrate; > } > diff --git a/arch/x86/platform/intel-mid/mrfld.c b/arch/x86/platform/intel-mid/mrfld.c > index 59253db41bbc..e0607c77a1bd 100644 > --- a/arch/x86/platform/intel-mid/mrfld.c > +++ b/arch/x86/platform/intel-mid/mrfld.c > @@ -78,8 +78,12 @@ static unsigned long __init tangier_calibrate_tsc(void) > pr_debug("Setting lapic_timer_frequency = %d\n", > lapic_timer_frequency); > > - /* mark tsc clocksource as reliable */ > - set_cpu_cap(&boot_cpu_data, X86_FEATURE_TSC_RELIABLE); > + /* > + * TSC on Intel Atom SoCs is reliable and of known frequency. > + * See tsc_msr.c for details. > + */ > + setup_force_cpu_cap(X86_FEATURE_TSC_KNOWN_FREQ); > + setup_force_cpu_cap(X86_FEATURE_TSC_RELIABLE); > > return fast_calibrate; > } > diff --git a/arch/x86/power/cpu.c b/arch/x86/power/cpu.c > index 53cace2ec0e2..66ade16c7693 100644 > --- a/arch/x86/power/cpu.c > +++ b/arch/x86/power/cpu.c > @@ -252,6 +252,7 @@ static void notrace __restore_processor_state(struct saved_context *ctxt) > fix_processor_context(); > > do_fpu_end(); > + tsc_verify_tsc_adjust(true); > x86_platform.restore_sched_clock_state(); > mtrr_bp_restore(); > perf_restore_debug_store();