From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754788AbbAFCFd (ORCPT ); Mon, 5 Jan 2015 21:05:33 -0500 Received: from mail-qa0-f51.google.com ([209.85.216.51]:55193 "EHLO mail-qa0-f51.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754060AbbAFCF3 (ORCPT ); Mon, 5 Jan 2015 21:05:29 -0500 MIME-Version: 1.0 In-Reply-To: References: <20141221223204.GA9618@codemonkey.org.uk> <20141222225725.GA8140@codemonkey.org.uk> <20141224030125.GA8725@codemonkey.org.uk> <20141226163410.GA25161@codemonkey.org.uk> <20141226181204.GA26527@codemonkey.org.uk> Date: Mon, 5 Jan 2015 18:05:28 -0800 Message-ID: Subject: Re: frequent lockups in 3.18rc4 From: John Stultz To: Linus Torvalds Cc: Dave Jones , Thomas Gleixner , Chris Mason , Mike Galbraith , Ingo Molnar , Peter Zijlstra , =?UTF-8?Q?D=C3=A2niel_Fraga?= , Sasha Levin , "Paul E. McKenney" , Linux Kernel Mailing List , Suresh Siddha , Oleg Nesterov , Peter Anvin Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Jan 5, 2015 at 5:25 PM, Linus Torvalds wrote: > On Mon, Jan 5, 2015 at 5:17 PM, John Stultz wrote: >> >> Anyway, It may be worth keeping the 50% margin (and dropping the 12% >> reduction to simplify things) > > Again, the 50% margin is only on the multiplication overflow. Not on the mask. Right, but we calculate the mult value based on the mask (or 10 mins, whichever is shorter). So then when we go back and calculate the max_cycles/max_idle_ns using the mult, we end up with a value smaller then the mask. So the scheduler shouldn't push idle times out beyond that and the debug logic in my patch should be able to catch strangely large values. > So it won't do anything at all for the case we actually care about, > namely a broken HPET, afaik. Yea, the case my code doesn't catch that yours did is for slightly broken clocksources (I'm thinking two cpus which virtual hpets embedded in them that are slightly off) where you could get negative deltas right after the update. In that case the capping on read is really needed since by the next update the stale value has grown large enough to look like a reasonable offset. The TSC has a similar issue, but its easier to check for negative values because it won't reasonably ever overflow. > > I'd much rather limit to 50% of the mask too. Ok, I'll try to rework the code to make this choice and make it more explicitly clear. > Also, why do we actually play games with ilog2 for that overflow > calculation? It seems pointless. This is for the setup code, doing a > real division there would seem to be a whole lot more straightforward, > and not need that big comment. And there's no performance issue. Am I > missing something? I feel like there was a time when this may have been called by some of the clocksource code if it they changed frequency (I think over suspend/resume), but I'm not seeing it in the current source. So yea, likely something to simplify. >> I've also got a capping patch that I'm testing that keeps time reads >> from passing that interval. The only thing I'm really cautious about >> with that change is that we have to make sure the hrtimer that >> triggers update_wall_clock is always set to expire within that cap (I >> need to review it again) or else we'll hang ourselves. > > Yeah, that thing is fragile. And quite possibly part of the problem. "Time is a flat circle..." and thus unfortunately requires some circular logic. :) thanks -john