On Mon, 29 Mar 2010 14:43:44 -0700 john stultz wrote: > On Mon, 2010-03-29 at 17:08 -0400, Yury Polyanskiy wrote: > > >> > What I'm saying is that if you're using getrawmonotonic() to detect > > >> > hangs, you might miss them, as getrawmonotonic may wrap (and thus stop > > >> > continually increasing) if the timer interrupt is delayed. This does not > > >> > apply to systems using the TSC clocksource, but does apply to systems > > >> > using the acpi_pm. > And something else I thought of, while the TSC won't wrap, the > multiplication done to convert to nanoseconds will overflow when you hit > a large enough cycle delta. So even TSC systems are not guaranteed to > have timekeeping (and thus getrawmonotonic) work over infinite time > without accumulation. Agreed (large clock->shift, right?), but for hangcheck-timer this would hardly be a problem, since such a large overflow very unlikely to land inside allowed interval around the pre-planned timer fire instant. > > You might also have some trouble with small intervals. Since things like > tickless systems or other advanced power-savings systems might try to > collate or push timers together to save battery. So ticks may be delayed > a small amount (timers are only guaranteed to fire AFTER the time > specified, there really is no promised bound on how late they may be). > > Additionally, on -rt systems, you might have higher priority FIFO tasks > blocking the hangcheck timer from executing for a smallish amount of > time. Yes, these are the events I want to see logged. Essentially I use hangcheck timer to check stability of kernel's heartbeat. > > Also, hooking to ntp update code complicates an otherwise simple > > driver. I propose to simply check on non-S390 if the clock source > > resolves to something other than TSC and dump a warning message on > > driver load (something like "Hangcheck: kernel using clocksource %s, > > which is not reliable for hang detection"). > > That requires the hangcheck code to parse the current clocksource, which > might change as the system runs, so it also has to track the clocksource > over time. So I'm not sure its that much easier of a solution. Oh, shoot, you are right. So if compiled-in it would always complain. > Something to also consider might also be to look at the softlockup > watchdog, which is fairly similar but somewhat more deeply integrated > into the kernel. Maybe some of this could be merged? Yeah, for softlockup detection, I don't understand why one would prefer hangcheck-timer to watchdog. I am sure Joel has some reasons though. For me read_persistent_clock() is not a solution, and others perhaps are indeed would be using softlockup watchdog, which leaves the decision to Joel. Best, Y