From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S935294AbcLOK4A (ORCPT ); Thu, 15 Dec 2016 05:56:00 -0500 Received: from terminus.zytor.com ([198.137.202.10]:40192 "EHLO terminus.zytor.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752528AbcLOKz4 (ORCPT ); Thu, 15 Dec 2016 05:55:56 -0500 Date: Thu, 15 Dec 2016 02:53:05 -0800 From: tip-bot for Thomas Gleixner Message-ID: Cc: linux-kernel@vger.kernel.org, bp@alien8.de, bruce.schlobohm@intel.com, mingo@kernel.org, tglx@linutronix.de, allen_hung@dell.com, rscheidegger_lists@hispeed.ch, peterz@infradead.org, hpa@zytor.com, kevin.b.stanton@intel.com Reply-To: mingo@kernel.org, bruce.schlobohm@intel.com, bp@alien8.de, linux-kernel@vger.kernel.org, kevin.b.stanton@intel.com, peterz@infradead.org, hpa@zytor.com, rscheidegger_lists@hispeed.ch, tglx@linutronix.de, allen_hung@dell.com In-Reply-To: <20161213131211.397588033@linutronix.de> References: <20161213131211.397588033@linutronix.de> To: linux-tip-commits@vger.kernel.org Subject: [tip:x86/timers] x86/tsc: Force TSC_ADJUST register to value >= zero Git-Commit-ID: 5bae156241e05d25171b18ee43e49f103c3f8097 X-Mailer: tip-git-log-daemon Robot-ID: Robot-Unsubscribe: Contact to get blacklisted from these emails MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Content-Type: text/plain; charset=UTF-8 Content-Disposition: inline Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Commit-ID: 5bae156241e05d25171b18ee43e49f103c3f8097 Gitweb: http://git.kernel.org/tip/5bae156241e05d25171b18ee43e49f103c3f8097 Author: Thomas Gleixner AuthorDate: Tue, 13 Dec 2016 13:14:17 +0000 Committer: Thomas Gleixner CommitDate: Thu, 15 Dec 2016 11:44:29 +0100 x86/tsc: Force TSC_ADJUST register to value >= zero Roland reported that his DELL T5810 sports a value add BIOS which completely wreckages the TSC. The squirmware [(TM) Ingo Molnar] boots with random negative TSC_ADJUST values, different on all CPUs. That renders the TSC useless because the sycnchronization check fails. Roland tested the new TSC_ADJUST mechanism. While it manages to readjust the TSCs he needs to disable the TSC deadline timer, otherwise the machine just stops booting. Deeper investigation unearthed that the TSC deadline timer is sensitive to the TSC_ADJUST value. Writing TSC_ADJUST to a negative value results in an interrupt storm caused by the TSC deadline timer. This does not make any sense and it's hard to imagine what kind of hardware wreckage is behind that misfeature, but it's reliably reproducible on other systems which have TSC_ADJUST and TSC deadline timer. While it would be understandable that a big enough negative value which moves the resulting TSC readout into the negative space could have the described effect, this happens even with a adjust value of -1, which keeps the TSC readout definitely in the positive space. The compare register for the TSC deadline timer is set to a positive value larger than the TSC, but despite not having reached the deadline the interrupt is raised immediately. If this happens on the boot CPU, then the machine dies silently because this setup happens before the NMI watchdog is armed. Further experiments showed that any other adjustment of TSC_ADJUST works as expected as long as it stays in the positive range. The direction of the adjustment has no influence either. See the lkml link for further analysis. Yet another proof for the theory that timers are designed by janitors and the underlying (obviously undocumented) mechanisms which allow BIOSes to wreckage them are considered a feature. Well done Intel - NOT! To address this wreckage add the following sanity measures: - If the TSC_ADJUST value on the boot cpu is not 0, set it to 0 - If the TSC_ADJUST value on any cpu is negative, set it to 0 - Prevent the cross package synchronization mechanism from setting negative TSC_ADJUST values. Reported-and-tested-by: Roland Scheidegger Signed-off-by: Thomas Gleixner Cc: Bruce Schlobohm Cc: Kevin Stanton Cc: Peter Zijlstra Cc: Allen Hung Cc: Borislav Petkov Link: http://lkml.kernel.org/r/20161213131211.397588033@linutronix.de Signed-off-by: Thomas Gleixner --- arch/x86/include/asm/tsc.h | 4 ++-- arch/x86/kernel/tsc.c | 2 +- arch/x86/kernel/tsc_sync.c | 55 ++++++++++++++++++++++++++++++++-------------- 3 files changed, 42 insertions(+), 19 deletions(-) diff --git a/arch/x86/include/asm/tsc.h b/arch/x86/include/asm/tsc.h index 372ad0c..abb1fdc 100644 --- a/arch/x86/include/asm/tsc.h +++ b/arch/x86/include/asm/tsc.h @@ -46,12 +46,12 @@ extern int tsc_clocksource_reliable; * all CPUs/cores: */ #ifdef CONFIG_X86_TSC -extern bool tsc_store_and_check_tsc_adjust(void); +extern bool tsc_store_and_check_tsc_adjust(bool bootcpu); extern void tsc_verify_tsc_adjust(bool resume); extern void check_tsc_sync_source(int cpu); extern void check_tsc_sync_target(void); #else -static inline bool tsc_store_and_check_tsc_adjust(void) { return false; } +static inline bool tsc_store_and_check_tsc_adjust(bool bootcpu) { return false; } static inline void tsc_verify_tsc_adjust(bool resume) { } static inline void check_tsc_sync_source(int cpu) { } static inline void check_tsc_sync_target(void) { } diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c index bfb541a..0aed75a 100644 --- a/arch/x86/kernel/tsc.c +++ b/arch/x86/kernel/tsc.c @@ -1386,7 +1386,7 @@ void __init tsc_init(void) if (unsynchronized_tsc()) mark_tsc_unstable("TSCs unsynchronized"); else - tsc_store_and_check_tsc_adjust(); + tsc_store_and_check_tsc_adjust(true); check_system_tsc_reliable(); diff --git a/arch/x86/kernel/tsc_sync.c b/arch/x86/kernel/tsc_sync.c index 94f2ce5..9151f0c 100644 --- a/arch/x86/kernel/tsc_sync.c +++ b/arch/x86/kernel/tsc_sync.c @@ -58,8 +58,33 @@ void tsc_verify_tsc_adjust(bool resume) } } +static void tsc_sanitize_first_cpu(struct tsc_adjust *cur, s64 bootval, + unsigned int cpu, bool bootcpu) +{ + /* + * First online CPU in a package stores the boot value in the + * adjustment value. This value might change later via the sync + * mechanism. If that fails we still can yell about boot values not + * being consistent. + * + * On the boot cpu we just force set the ADJUST value to 0 if it's + * non zero. We don't do that on non boot cpus because physical + * hotplug should have set the ADJUST register to a value > 0 so + * the TSC is in sync with the already running cpus. + * + * But we always force positive ADJUST values. Otherwise the TSC + * deadline timer creates an interrupt storm. Sigh! + */ + if ((bootcpu && bootval != 0) || (!bootcpu && bootval < 0)) { + pr_warn("TSC ADJUST: CPU%u: %lld force to 0\n", cpu, bootval); + wrmsrl(MSR_IA32_TSC_ADJUST, 0); + bootval = 0; + } + cur->adjusted = bootval; +} + #ifndef CONFIG_SMP -bool __init tsc_store_and_check_tsc_adjust(void) +bool __init tsc_store_and_check_tsc_adjust(bool bootcpu) { struct tsc_adjust *cur = this_cpu_ptr(&tsc_adjust); s64 bootval; @@ -69,9 +94,8 @@ bool __init tsc_store_and_check_tsc_adjust(void) rdmsrl(MSR_IA32_TSC_ADJUST, bootval); cur->bootval = bootval; - cur->adjusted = bootval; cur->nextcheck = jiffies + HZ; - pr_info("TSC ADJUST: Boot CPU0: %lld\n", bootval); + tsc_sanitize_first_cpu(cur, bootval, smp_processor_id(), bootcpu); return false; } @@ -80,7 +104,7 @@ bool __init tsc_store_and_check_tsc_adjust(void) /* * Store and check the TSC ADJUST MSR if available */ -bool tsc_store_and_check_tsc_adjust(void) +bool tsc_store_and_check_tsc_adjust(bool bootcpu) { struct tsc_adjust *ref, *cur = this_cpu_ptr(&tsc_adjust); unsigned int refcpu, cpu = smp_processor_id(); @@ -98,22 +122,16 @@ bool tsc_store_and_check_tsc_adjust(void) /* * Check whether this CPU is the first in a package to come up. In * this case do not check the boot value against another package - * because the package might have been physically hotplugged, where - * TSC_ADJUST is expected to be different. When called on the boot - * CPU topology_core_cpumask() might not be available yet. + * because the new package might have been physically hotplugged, + * where TSC_ADJUST is expected to be different. When called on the + * boot CPU topology_core_cpumask() might not be available yet. */ mask = topology_core_cpumask(cpu); refcpu = mask ? cpumask_any_but(mask, cpu) : nr_cpu_ids; if (refcpu >= nr_cpu_ids) { - /* - * First online CPU in a package stores the boot value in - * the adjustment value. This value might change later via - * the sync mechanism. If that fails we still can yell - * about boot values not being consistent. - */ - cur->adjusted = bootval; - pr_info_once("TSC ADJUST: Boot CPU%u: %lld\n", cpu, bootval); + tsc_sanitize_first_cpu(cur, bootval, smp_processor_id(), + bootcpu); return false; } @@ -366,7 +384,7 @@ void check_tsc_sync_target(void) * Store, verify and sanitize the TSC adjust register. If * successful skip the test. */ - if (tsc_store_and_check_tsc_adjust()) { + if (tsc_store_and_check_tsc_adjust(false)) { atomic_inc(&skip_test); return; } @@ -429,8 +447,13 @@ retry: * that the warp is not longer detectable when the observed warp * value is used. In the worst case the adjustment needs to go * through a 3rd run for fine tuning. + * + * But we must make sure that the value doesn't become negative + * otherwise TSC deadline timer will create an interrupt storm. */ cur->adjusted += cur_max_warp; + if (cur->adjusted < 0) + cur->adjusted = 0; pr_warn("TSC ADJUST compensate: CPU%u observed %lld warp. Adjust: %lld\n", cpu, cur_max_warp, cur->adjusted);