From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-wm0-f65.google.com ([74.125.82.65]:34970 "EHLO mail-wm0-f65.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S933832AbdBQOE4 (ORCPT ); Fri, 17 Feb 2017 09:04:56 -0500 Date: Fri, 17 Feb 2017 15:04:51 +0100 From: Frederic Weisbecker To: Thomas Gleixner Cc: Linus Torvalds , Pavel Machek , wanpeng.li@hotmail.com, Peter Zijlstra , Rik van Riel , "# .39.x" , "linux-pci@vger.kernel.org" , Greg Kroah-Hartman , Alan Stern , Linux Kernel Mailing List , Bjorn Helgaas , USB list Subject: Re: v4.10-rc8 (-rc6) boot regression on Intel desktop, does not boot after cold boots, boots after reboot Message-ID: <20170217140449.GA4521@lerouge> References: <20170215172303.GA15696@amd> <20170215232005.GA7877@amd> <20170216111144.GA12377@amd> <20170216172535.GA7868@amd> <20170216181353.GB4357@lerouge> <20170216183421.GC4357@lerouge> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii In-Reply-To: Sender: linux-pci-owner@vger.kernel.org List-ID: On Thu, Feb 16, 2017 at 08:34:45PM +0100, Thomas Gleixner wrote: > On Thu, 16 Feb 2017, Frederic Weisbecker wrote: > > On Thu, Feb 16, 2017 at 10:20:14AM -0800, Linus Torvalds wrote: > > > On Thu, Feb 16, 2017 at 10:13 AM, Frederic Weisbecker > > > wrote: > > > > > > > > I haven't followed the discussion but this patch has a known issue which is fixed > > > > with: > > > > 7bdb59f1ad474bd7161adc8f923cdef10f2638d1 > > > > "tick/nohz: Fix possible missing clock reprog after tick soft restart" > > > > > > > > I hope this fixes your issue. > > > > > > No, Pavel saw the problem with rc8 too, which already has that fix. > > > > > > So I think we'll just need to revert that original patch (and that > > > means that we have to revert the commit you point to as well, since > > > that ->next_tick field was added by the original commit). > > > > Aw too bad, but indeed that late we don't have the choice. > > Hint: Look for CPU hotplug interaction of these patches. I bet something > becomes stale when the CPU goes down and does not get reset when it comes > back online. Indeed I should check that. But Pavel is seeing this on boot, where the only hotplug operations that happen are CPU UP without preceding CPU DOWN that may have retained stale values. I think the value of ts->next_tick should be initially 0 for all CPUs. So perhaps that 0 value confuses stuff. But looking at the code I don't see how. It maybe something more subtle.