From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1754194Ab0C2Vnv (ORCPT ); Mon, 29 Mar 2010 17:43:51 -0400 Received: from e6.ny.us.ibm.com ([32.97.182.146]:55703 "EHLO e6.ny.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1753648Ab0C2Vnt (ORCPT ); Mon, 29 Mar 2010 17:43:49 -0400 Subject: Re: [PATCH] hangcheck-timer is broken on x86 From: john stultz To: Yury Polyanskiy Cc: Joel Becker , linux-kernel@vger.kernel.org, Andrew Morton , Jan Glauber In-Reply-To: References: <20100323233611.6dcbe4f4@penta.localdomain> <20100326214648.GF9984@mail.oracle.com> <1269824436.1880.2.camel@work-vm> <20100329101106.3678a312@penta.localdomain> <1269881007.1857.18.camel@work-vm> <20100329130418.2b5c068c@penta.localdomain> <1269888291.3968.5.camel@localhost.localdomain> Content-Type: text/plain; charset="UTF-8" Date: Mon, 29 Mar 2010 14:43:44 -0700 Message-ID: <1269899024.3968.27.camel@localhost.localdomain> Mime-Version: 1.0 X-Mailer: Evolution 2.28.1 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, 2010-03-29 at 17:08 -0400, Yury Polyanskiy wrote: > >> > What I'm saying is that if you're using getrawmonotonic() to detect > >> > hangs, you might miss them, as getrawmonotonic may wrap (and thus stop > >> > continually increasing) if the timer interrupt is delayed. This does not > >> > apply to systems using the TSC clocksource, but does apply to systems > >> > using the acpi_pm. > >> > >> But if timer interrupt is delayed by more than acpi_pm wrap-around > >> time, then the update_wall_time() is also screwed. Since it is not, we > >> can rely on getrawmonotonic(). > > > > Right, if the box hangs for longer then the clocksource can count for, > > the timekeeping subsystem will be off by some multiple of that length. > > > > Oh, I see. You mean that getrawmonotonic() wouldn't work under > abnormal conditions. I understand now, sorry for the confusion. You > are correct, of course. And something else I thought of, while the TSC won't wrap, the multiplication done to convert to nanoseconds will overflow when you hit a large enough cycle delta. So even TSC systems are not guaranteed to have timekeeping (and thus getrawmonotonic) work over infinite time without accumulation. We try to establish this length via timekeeping_max_deferment(), so that we make sure we don't go into tickless mode for longer then the clocksource can handle. > I personally don't like the idea of relying on read_persistent_clock() > not only because of hwclock and ntp. In fact, my core interest in > hangcheck-timer is to set a very low margin (1 to 3 jiffies for > example) so that I would get a log message upon any kernel slow down > or a tick-miss (as a hardware integrity check). I don't think > read_persistent_clock() is precise enough for this purpose, is it? read_persistent_clock is a bit coarse, so for small intervals it would not do. However, the current timeout range for the hangcheck timer is in seconds, which should be fine for read_persistent_clock(). You might also have some trouble with small intervals. Since things like tickless systems or other advanced power-savings systems might try to collate or push timers together to save battery. So ticks may be delayed a small amount (timers are only guaranteed to fire AFTER the time specified, there really is no promised bound on how late they may be). Additionally, on -rt systems, you might have higher priority FIFO tasks blocking the hangcheck timer from executing for a smallish amount of time. > Also, hooking to ntp update code complicates an otherwise simple > driver. I propose to simply check on non-S390 if the clock source > resolves to something other than TSC and dump a warning message on > driver load (something like "Hangcheck: kernel using clocksource %s, > which is not reliable for hang detection"). That requires the hangcheck code to parse the current clocksource, which might change as the system runs, so it also has to track the clocksource over time. So I'm not sure its that much easier of a solution. Something to also consider might also be to look at the softlockup watchdog, which is fairly similar but somewhat more deeply integrated into the kernel. Maybe some of this could be merged? thanks -john