linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH RFC] clocksource: Detect a watchdog overflow
@ 2016-03-15 18:50 Gratian Crisan
  2016-04-06 22:21 ` John Stultz
  0 siblings, 1 reply; 4+ messages in thread
From: Gratian Crisan @ 2016-03-15 18:50 UTC (permalink / raw)
  To: John Stultz, Thomas Gleixner; +Cc: linux-kernel, Gratian Crisan

The clocksource watchdog can falsely trigger and disable the main
clocksource when the watchdog wraps around.

The reason is that an interrupt storm and/or high priority (FIFO/RR) tasks
can preempt the timer softirq long enough for the watchdog to wrap around
if it has a limited number of bits available by comparison to the main
clocksource. One observed example is on a Intel Baytrail platform where TSC
is the main clocksource, HPET is disabled due to a hardware bug and acpi_pm
gets selected as the watchdog clocksource.

Calculate the maximum number of nanoseconds the watchdog clocksource can
represent without overflow and do not disqualify the main clocksource if
the delta since the last time we have checked exceeds the measurement
capabilities of the watchdog clocksource.

Signed-off-by: Gratian Crisan <gratian.crisan@ni.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/clocksource.c | 12 +++++++++---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 56ece14..597132e 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -170,7 +170,7 @@ static void clocksource_watchdog(unsigned long data)
 {
 	struct clocksource *cs;
 	cycle_t csnow, wdnow, cslast, wdlast, delta;
-	int64_t wd_nsec, cs_nsec;
+	int64_t wd_nsec, wd_max_nsec, cs_nsec;
 	int next_cpu, reset_pending;
 
 	spin_lock(&watchdog_lock);
@@ -178,6 +178,8 @@ static void clocksource_watchdog(unsigned long data)
 		goto out;
 
 	reset_pending = atomic_read(&watchdog_reset_pending);
+	wd_max_nsec = clocksource_cyc2ns(watchdog->max_cycles, watchdog->mult,
+					watchdog->shift);
 
 	list_for_each_entry(cs, &watchdog_list, wd_list) {
 
@@ -216,8 +218,12 @@ static void clocksource_watchdog(unsigned long data)
 		if (atomic_read(&watchdog_reset_pending))
 			continue;
 
-		/* Check the deviation from the watchdog clocksource. */
-		if (abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD) {
+		/*
+		 * Check the deviation from the watchdog clocksource,
+		 * accounting for a possible watchdog overflow.
+		 */
+		if (abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD &&
+			cs_nsec < wd_max_nsec) {
 			pr_warn("timekeeping watchdog on CPU%d: Marking clocksource '%s' as unstable because the skew is too large:\n",
 				smp_processor_id(), cs->name);
 			pr_warn("                      '%s' wd_now: %llx wd_last: %llx mask: %llx\n",
-- 
2.7.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] clocksource: Detect a watchdog overflow
  2016-03-15 18:50 [PATCH RFC] clocksource: Detect a watchdog overflow Gratian Crisan
@ 2016-04-06 22:21 ` John Stultz
  2016-04-07  8:14   ` Gratian Crisan
  0 siblings, 1 reply; 4+ messages in thread
From: John Stultz @ 2016-04-06 22:21 UTC (permalink / raw)
  To: Gratian Crisan; +Cc: Thomas Gleixner, lkml, Gratian Crisan

On Tue, Mar 15, 2016 at 11:50 AM, Gratian Crisan <gratian.crisan@ni.com> wrote:
> The clocksource watchdog can falsely trigger and disable the main
> clocksource when the watchdog wraps around.
>
> The reason is that an interrupt storm and/or high priority (FIFO/RR) tasks
> can preempt the timer softirq long enough for the watchdog to wrap around
> if it has a limited number of bits available by comparison to the main
> clocksource. One observed example is on a Intel Baytrail platform where TSC
> is the main clocksource, HPET is disabled due to a hardware bug and acpi_pm
> gets selected as the watchdog clocksource.
>
> Calculate the maximum number of nanoseconds the watchdog clocksource can
> represent without overflow and do not disqualify the main clocksource if
> the delta since the last time we have checked exceeds the measurement
> capabilities of the watchdog clocksource.

Sorry for not getting back to you sooner on this. You managed to send
these both out while I was at a conference and on vacation, and so
they were deep in the mail backlog. :)

So I'm sympathetic to this issue, because I remember seeing similar
problems w/ runaway SCHED_FIFO tasks w/ PREEMPT_RT.

However, its really difficult to create a solution without opening new
cases where bad clocksources will be mis-identified as good (which
your solution seems to suffer as well, measuring the time past with a
known bad clocksource can easily result in large deltas, which will be
ignored if the watchdog has a short interval).

A previous effort on this was made here, and there's a resulting
thread that didn't come to resolution:
    https://lkml.org/lkml/2015/8/17/542

Way back I had tried to come up with an approach where if the time
delta was large, it was divided by the watchdog interval, and then we
just compared the remainder with the current watchdog delta to see if
they matched (closely enough). Unfortunately this didn't work out for
me then, but perhaps it deserves a second try?

thanks
-john

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] clocksource: Detect a watchdog overflow
  2016-04-06 22:21 ` John Stultz
@ 2016-04-07  8:14   ` Gratian Crisan
  2016-04-07 16:31     ` John Stultz
  0 siblings, 1 reply; 4+ messages in thread
From: Gratian Crisan @ 2016-04-07  8:14 UTC (permalink / raw)
  To: John Stultz; +Cc: Gratian Crisan, Thomas Gleixner, lkml, Gratian Crisan


John Stultz writes:

> On Tue, Mar 15, 2016 at 11:50 AM, Gratian Crisan <gratian.crisan@ni.com> wrote:
>> The clocksource watchdog can falsely trigger and disable the main
>> clocksource when the watchdog wraps around.
>>
>> The reason is that an interrupt storm and/or high priority (FIFO/RR) tasks
>> can preempt the timer softirq long enough for the watchdog to wrap around
>> if it has a limited number of bits available by comparison to the main
>> clocksource. One observed example is on a Intel Baytrail platform where TSC
>> is the main clocksource, HPET is disabled due to a hardware bug and acpi_pm
>> gets selected as the watchdog clocksource.
>>
>> Calculate the maximum number of nanoseconds the watchdog clocksource can
>> represent without overflow and do not disqualify the main clocksource if
>> the delta since the last time we have checked exceeds the measurement
>> capabilities of the watchdog clocksource.
>
> Sorry for not getting back to you sooner on this. You managed to send
> these both out while I was at a conference and on vacation, and so
> they were deep in the mail backlog. :)

No worries, I'm actually "on the road" this week too (ELC). I appreciate
the reply.

> So I'm sympathetic to this issue, because I remember seeing similar
> problems w/ runaway SCHED_FIFO tasks w/ PREEMPT_RT.

Yeah, a runaway rt thread can easily do it. That's just bad design. In
our case it was a bit more subtle bc. it was a combination of high
priority interrupts and rt threads that would occasionally stack up to
delay the timer softirq long enough to cause the watchdog wrap.

> However, its really difficult to create a solution without opening new
> cases where bad clocksources will be mis-identified as good (which
> your solution seems to suffer as well, measuring the time past with a
> known bad clocksource can easily result in large deltas, which will be
> ignored if the watchdog has a short interval).

Fair point. Ultimately you have to trust one of the clocksources. I
guess I was naive in thinking that the main clocksource can't drift more
than what the watchdog clocksource can measure within the
WATCHDOG_INTERVAL. I'm glad I don't have to deal with hardware that
lobotomized.

Would a simple solution that exposes the config option for the
clocksource wathchdog[1] (and defaults it to on) be an acceptable
alternative? It will work for us because we test the stability of the
main clocksource - part of the hardware bring-up.

> A previous effort on this was made here, and there's a resulting
> thread that didn't come to resolution:
>     https://lkml.org/lkml/2015/8/17/542

Sorry I've missed it.

> Way back I had tried to come up with an approach where if the time
> delta was large, it was divided by the watchdog interval, and then we
> just compared the remainder with the current watchdog delta to see if
> they matched (closely enough). Unfortunately this didn't work out for
> me then, but perhaps it deserves a second try?

I've entertained that idea too but I think I was trying to optimize
things too early and do everything with the mult/shift math. That first
attempt failed but I do need to try harder because it would be a better
general solution.

> thanks
> -john

Thanks,
-Gratian

[1]

>From e942ddaba439cd6711e9eed44ceae34167b864f8 Mon Sep 17 00:00:00 2001
From: Gratian Crisan <gratian.crisan@ni.com>
Date: Wed, 6 Apr 2016 21:20:15 -0700
Subject: [PATCH] time: Make the clocksource watchdog user configurable

The clocksource watchdog is used to detect instabilities in the current
clocksource. This is a beneficial feature on new/unknown hardware however
it can create problems by falsely triggering when the watchdog wraps. The
reason is that an interrupt storm and/or high priority (FIFO/RR) tasks can
preempt the timer softirq long enough for the watchdog to wrap if it has a
limited number of bits available by comparison with the main clocksource.

One observed example is on a Intel Baytrail platform where TSC is the main
clocksource, HPET is disabled due to a hardware bug and acpi_pm gets
selected as the watchdog clocksource.

Provide the option to disable the clocksource watchdog for hardware where
the clocksource stability has been validated.

Signed-off-by: Gratian Crisan <gratian.crisan@ni.com>
---
 arch/x86/Kconfig    |  2 +-
 kernel/time/Kconfig | 12 +++++++++++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 2dc18605..6da5d9e 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -54,7 +54,7 @@ config X86
 	select CLKEVT_I8253
 	select CLKSRC_I8253			if X86_32
 	select CLOCKSOURCE_VALIDATE_LAST_CYCLE
-	select CLOCKSOURCE_WATCHDOG
+	select HAVE_CLOCKSOURCE_WATCHDOG
 	select CLONE_BACKWARDS			if X86_32
 	select COMPAT_OLD_SIGACTION		if IA32_EMULATION
 	select DCACHE_WORD_ACCESS
diff --git a/kernel/time/Kconfig b/kernel/time/Kconfig
index 4008d9f..6707f1d 100644
--- a/kernel/time/Kconfig
+++ b/kernel/time/Kconfig
@@ -5,7 +5,7 @@
 # Options selectable by arch Kconfig
 
 # Watchdog function for clocksources to detect instabilities
-config CLOCKSOURCE_WATCHDOG
+config HAVE_CLOCKSOURCE_WATCHDOG
 	bool
 
 # Architecture has extra clocksource data
@@ -193,5 +193,15 @@ config HIGH_RES_TIMERS
 	  hardware is not capable then this option only increases
 	  the size of the kernel image.
 
+config CLOCKSOURCE_WATCHDOG
+	bool "Clocksource watchdog"
+	depends on HAVE_CLOCKSOURCE_WATCHDOG
+	default y
+	help
+	  This option enables the watchdog function for clocksources. It is
+	  used to detect instabilities in the currently selected clocksource.
+
+	  Say Y if you are unsure.
+
 endmenu
 endif
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH RFC] clocksource: Detect a watchdog overflow
  2016-04-07  8:14   ` Gratian Crisan
@ 2016-04-07 16:31     ` John Stultz
  0 siblings, 0 replies; 4+ messages in thread
From: John Stultz @ 2016-04-07 16:31 UTC (permalink / raw)
  To: Gratian Crisan; +Cc: Thomas Gleixner, lkml, Gratian Crisan

On Thu, Apr 7, 2016 at 1:14 AM, Gratian Crisan <gratian.crisan@ni.com> wrote:
> John Stultz writes:
>> So I'm sympathetic to this issue, because I remember seeing similar
>> problems w/ runaway SCHED_FIFO tasks w/ PREEMPT_RT.
>
> Yeah, a runaway rt thread can easily do it. That's just bad design. In
> our case it was a bit more subtle bc. it was a combination of high
> priority interrupts and rt threads that would occasionally stack up to
> delay the timer softirq long enough to cause the watchdog wrap.

So in the last discussion, I believe Thomas and others were skeptical
because we really shouldn't be blocking tasks from running for such a
long time. Instead of trying to turn off the watchdog, instead they
were suggesting we ensure we don't get into such a state where things
are delayed so unexpectedly long.


>> However, its really difficult to create a solution without opening new
>> cases where bad clocksources will be mis-identified as good (which
>> your solution seems to suffer as well, measuring the time past with a
>> known bad clocksource can easily result in large deltas, which will be
>> ignored if the watchdog has a short interval).
>
> Fair point. Ultimately you have to trust one of the clocksources. I
> guess I was naive in thinking that the main clocksource can't drift more
> than what the watchdog clocksource can measure within the
> WATCHDOG_INTERVAL. I'm glad I don't have to deal with hardware that
> lobotomized.

Another thought might be to try to add a third longer-running clock
into the mix. Possibly a very rough fallback check against something
like the RTC to see if the interval was really long enough to have the
watchdog wrap.

> Would a simple solution that exposes the config option for the
> clocksource wathchdog[1] (and defaults it to on) be an acceptable
> alternative? It will work for us because we test the stability of the
> main clocksource - part of the hardware bring-up.

So there is already the tsc=reliable boot option, which I believe
disables the watchdog. So I'm not sure the build time option makes the
most sense.

>> A previous effort on this was made here, and there's a resulting
>> thread that didn't come to resolution:
>>     https://lkml.org/lkml/2015/8/17/542
>
> Sorry I've missed it.
>
>> Way back I had tried to come up with an approach where if the time
>> delta was large, it was divided by the watchdog interval, and then we
>> just compared the remainder with the current watchdog delta to see if
>> they matched (closely enough). Unfortunately this didn't work out for
>> me then, but perhaps it deserves a second try?
>
> I've entertained that idea too but I think I was trying to optimize
> things too early and do everything with the mult/shift math. That first
> attempt failed but I do need to try harder because it would be a better
> general solution.

Yea. I'd much prefer a general solution.

thanks
-john

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2016-04-07 16:31 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-15 18:50 [PATCH RFC] clocksource: Detect a watchdog overflow Gratian Crisan
2016-04-06 22:21 ` John Stultz
2016-04-07  8:14   ` Gratian Crisan
2016-04-07 16:31     ` John Stultz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).