From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934685AbcLPLtw (ORCPT ); Fri, 16 Dec 2016 06:49:52 -0500 Received: from Galois.linutronix.de ([146.0.238.70]:49406 "EHLO Galois.linutronix.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1761550AbcLPLtg (ORCPT ); Fri, 16 Dec 2016 06:49:36 -0500 Date: Fri, 16 Dec 2016 12:46:12 +0100 (CET) From: Thomas Gleixner To: LKML cc: x86@kernel.org, Peter Zijlstra , Borislav Petkov , Bruce Schlobohm , Roland Scheidegger , Kevin Stanton , Allen Hung , stable@vger.kernel.org Subject: Re: [patch 2/2] x86/tsc: Force TSC_ADJUST register to value >= zero In-Reply-To: <20161213131211.397588033@linutronix.de> Message-ID: References: <20161213131115.764824574@linutronix.de> <20161213131211.397588033@linutronix.de> User-Agent: Alpine 2.20 (DEB 67 2015-01-07) MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 13 Dec 2016, Thomas Gleixner wrote: > Roland reported that his DELL T5810 sports a value add BIOS which > completely wreckages the TSC. The squirmware [(TM) Ingo Molnar] boots with > random negative TSC_ADJUST values, different on all CPUs. That renders the > TSC useless because the sycnchronization check fails. While everyone assumed that this is the usual DELL squirmware problem, I have to say it's not. Just got my hands on a Skylake based Lenovo S510 box and it shows the same feature: TSC ADJUST: CPU0: -10123656703215 CPU1: -10123656796701 CPU2: -10123656797460 CPU3: -10123656798366 Which causes the TSC to be out of sync on a stock upstream kernel and the TSC deadline timer wreckage is happening on that machine as well. I'm pretty sure, that this well thought out feature to 'hide power on time' from TSC has not been independently 'invented' by DELL and Lenovo BIOS tinkerers. I rather have the impression that this is an advisory or feature kit from some other entity. Whoever came up with this misfeature at Intel and/or Microsoft (sorry, I could not come up with any other suspects) should be promoted to run the 'Linux on feature-plagued systems' hot line. As this seems to be more wide spread than we thought initially, we have to think about a solution for stable kernels, especially 4.9. And distros will have to think about that as well.... We have two options: 1) Disable TSC deadline timer by default and force users with sane machines to enable it on the kernel command line. Upside: Very small patch Downside: Degrades existing setups on sane machines, keeps TSC unusable on affected machines. We have no idea what other hidden side effects the TSC_ADJUST tinkering has. If there are any, they ain't be nice ones. 2) Push the whole TSC_ADJUST sanitizing machinery into stable Upside: Does not affect sane machines and gives a benefit to users of affected machines Downside: Rather large patch, but not that risky either. Needs a few eyes and good test coverage though Thoughts? tglx