All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2)
@ 2015-01-23  0:09 John Stultz
  2015-01-23  0:09 ` [PATCH 01/12] clocksource: Simplify clocks_calc_max_nsecs logic John Stultz
                   ` (11 more replies)
  0 siblings, 12 replies; 20+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

So here's another round for this series, which is the result of
earlier discussions with Linus and his suggestions around
improvements to clocksource validation in the hope we can more
easily catch bad hardware.

There's also a few cleanups Linus suggested as well as a few I've been
meaning to get to for awhile.

I tried in address all the feedback that had been given, adding
the checks behind CONFIG_DEBUG_TIMEKEEPING.  I also sorted out a
sane way to print rate-limited warnings if we see cycle deltas that
are too large, or if they look like small underflows.

Let me know what you think. I'm still a bit hesitant to queue
any of this for 3.20 (particularly the last few cleanups), but
if there aren't many more objections I might see if it can
still make it.

thanks
-john

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>

John Stultz (12):
  clocksource: Simplify clocks_calc_max_nsecs logic
  clocksource: Simplify logic around clocksource wrapping saftey margins
  clocksource: Remove clocksource_max_deferment()
  clocksource: Add max_cycles to clocksource structure
  time: Add debugging checks to warn if we see delays
  time: Add infrastructure to cap clocksource reads to the max_cycles
    value
  time: Try to catch clocksource delta underflows
  time: Add warnings when overflows or underflows are observed
  clocksource: Improve clocksource watchdog reporting
  clocksource: Mostly kill clocksource_register()
  sparc: Convert to using clocksource_register_hz()
  clocksource: Add some debug info about clocksources being registered

 arch/s390/kernel/time.c     |   2 +-
 arch/sparc/kernel/time_32.c |   6 +-
 include/linux/clocksource.h |  16 ++++-
 kernel/time/clocksource.c   | 164 +++++++++++++++++++-------------------------
 kernel/time/jiffies.c       |   5 +-
 kernel/time/sched_clock.c   |   6 +-
 kernel/time/timekeeping.c   | 113 ++++++++++++++++++++++++++----
 lib/Kconfig.debug           |  12 ++++
 8 files changed, 205 insertions(+), 119 deletions(-)

-- 
1.9.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 01/12] clocksource: Simplify clocks_calc_max_nsecs logic
  2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
@ 2015-01-23  0:09 ` John Stultz
  2015-01-23  0:09 ` [PATCH 02/12] clocksource: Simplify logic around clocksource wrapping saftey margins John Stultz
                   ` (10 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

The previous clocks_calc_max_nsecs had some unecessarily
complex bit logic to find the max interval that could cause
multiplication overflows. Since this is not in the hot
path, just do the divide to make it easier to read.

The previous implementation also had a subtle issue
that it avoided overflows into signed 64bit values, where
as the intervals are always unsigned. This resulted in
overly conservative intervals, which other saftey margins
were then added to, reducing the intended interval length.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index b79f39b..c14cd03 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -552,19 +552,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 
 	/*
 	 * Calculate the maximum number of cycles that we can pass to the
-	 * cyc2ns function without overflowing a 64-bit signed result. The
-	 * maximum number of cycles is equal to ULLONG_MAX/(mult+maxadj)
-	 * which is equivalent to the below.
-	 * max_cycles < (2^63)/(mult + maxadj)
-	 * max_cycles < 2^(log2((2^63)/(mult + maxadj)))
-	 * max_cycles < 2^(log2(2^63) - log2(mult + maxadj))
-	 * max_cycles < 2^(63 - log2(mult + maxadj))
-	 * max_cycles < 1 << (63 - log2(mult + maxadj))
-	 * Please note that we add 1 to the result of the log2 to account for
-	 * any rounding errors, ensure the above inequality is satisfied and
-	 * no overflow will occur.
+	 * cyc2ns function without overflowing a 64-bit result.
 	 */
-	max_cycles = 1ULL << (63 - (ilog2(mult + maxadj) + 1));
+	max_cycles = ULLONG_MAX;
+	do_div(max_cycles, mult+maxadj);
 
 	/*
 	 * The actual maximum number of cycles we can defer the clocksource is
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 02/12] clocksource: Simplify logic around clocksource wrapping saftey margins
  2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
  2015-01-23  0:09 ` [PATCH 01/12] clocksource: Simplify clocks_calc_max_nsecs logic John Stultz
@ 2015-01-23  0:09 ` John Stultz
  2015-01-23  0:09 ` [PATCH 03/12] clocksource: Remove clocksource_max_deferment() John Stultz
                   ` (9 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

The clocksource logic has a number of places where we try to
include a safety margin. Most of these are 12% safety margins,
but they are inconsistently applied and sometimes are applied
on top of each other.

Additionally, in the previous patch, we corrected an issue
where we unintentionally in effect created a 50% saftey margin,
which these 12.5% margins where then added to.

So to siplify the logic here, this patch removes the various
12.5% margins, and consolidates adding the margin in one place:
clocks_calc_max_nsecs().

Addtionally, Linus prefers a 50% safety margin, as it allows
bad clock values to be more easily caught. This should really
have no net effect, due to the corrected issue earlier which
caused greater then 50% margins to be used w/o issue.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Acked-by: Stephen Boyd <sboyd@codeaurora.org> (for sched_clock.c bit)
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 26 ++++++++++++--------------
 kernel/time/sched_clock.c |  4 ++--
 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c14cd03..e837ffd1 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -545,6 +545,9 @@ static u32 clocksource_max_adjustment(struct clocksource *cs)
  * @shift:	cycle to nanosecond divisor (power of two)
  * @maxadj:	maximum adjustment value to mult (~11%)
  * @mask:	bitmask for two's complement subtraction of non 64 bit counters
+ *
+ * NOTE: This function includes a saftey margin of 50%, so that bad clock values
+ * can be detected.
  */
 u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 {
@@ -566,11 +569,14 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 	max_cycles = min(max_cycles, mask);
 	max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift);
 
+	/* Return 50% of the actual maximum, so we can detect bad values */
+	max_nsecs >>= 1;
+
 	return max_nsecs;
 }
 
 /**
- * clocksource_max_deferment - Returns max time the clocksource can be deferred
+ * clocksource_max_deferment - Returns max time the clocksource should be deferred
  * @cs:         Pointer to clocksource
  *
  */
@@ -580,13 +586,7 @@ static u64 clocksource_max_deferment(struct clocksource *cs)
 
 	max_nsecs = clocks_calc_max_nsecs(cs->mult, cs->shift, cs->maxadj,
 					  cs->mask);
-	/*
-	 * To ensure that the clocksource does not wrap whilst we are idle,
-	 * limit the time the clocksource can be deferred by 12.5%. Please
-	 * note a margin of 12.5% is used because this can be computed with
-	 * a shift, versus say 10% which would require division.
-	 */
-	return max_nsecs - (max_nsecs >> 3);
+	return max_nsecs;
 }
 
 #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET
@@ -735,10 +735,9 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 	 * conversion precision. 10 minutes is still a reasonable
 	 * amount. That results in a shift value of 24 for a
 	 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
-	 * ~ 0.06ppm granularity for NTP. We apply the same 12.5%
-	 * margin as we do in clocksource_max_deferment()
+	 * ~ 0.06ppm granularity for NTP.
 	 */
-	sec = (cs->mask - (cs->mask >> 3));
+	sec = cs->mask;
 	do_div(sec, freq);
 	do_div(sec, scale);
 	if (!sec)
@@ -750,9 +749,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 			       NSEC_PER_SEC / scale, sec * scale);
 
 	/*
-	 * for clocksources that have large mults, to avoid overflow.
-	 * Since mult may be adjusted by ntp, add an safety extra margin
-	 *
+	 * Ensure clocksources that have large mults don't overflow
+	 * when adjusted.
 	 */
 	cs->maxadj = clocksource_max_adjustment(cs);
 	while ((cs->mult + cs->maxadj < cs->mult)
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 01d2d15..c794b84 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -125,9 +125,9 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 
 	new_mask = CLOCKSOURCE_MASK(bits);
 
-	/* calculate how many ns until we wrap */
+	/* calculate how many ns until we risk wrapping */
 	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask);
-	new_wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
+	new_wrap_kt = ns_to_ktime(wrap);
 
 	/* update epoch for new counter and update epoch_ns from old counter*/
 	new_epoch = read();
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 03/12] clocksource: Remove clocksource_max_deferment()
  2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
  2015-01-23  0:09 ` [PATCH 01/12] clocksource: Simplify clocks_calc_max_nsecs logic John Stultz
  2015-01-23  0:09 ` [PATCH 02/12] clocksource: Simplify logic around clocksource wrapping saftey margins John Stultz
@ 2015-01-23  0:09 ` John Stultz
  2015-01-23  0:09 ` [PATCH 04/12] clocksource: Add max_cycles to clocksource structure John Stultz
                   ` (8 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

clocksource_max_deferment() doesn't do anything useful
anymore, so zap it.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index e837ffd1..0696559 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -575,20 +575,6 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 	return max_nsecs;
 }
 
-/**
- * clocksource_max_deferment - Returns max time the clocksource should be deferred
- * @cs:         Pointer to clocksource
- *
- */
-static u64 clocksource_max_deferment(struct clocksource *cs)
-{
-	u64 max_nsecs;
-
-	max_nsecs = clocks_calc_max_nsecs(cs->mult, cs->shift, cs->maxadj,
-					  cs->mask);
-	return max_nsecs;
-}
-
 #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET
 
 static struct clocksource *clocksource_find_best(bool oneshot, bool skipcur)
@@ -760,7 +746,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 		cs->maxadj = clocksource_max_adjustment(cs);
 	}
 
-	cs->max_idle_ns = clocksource_max_deferment(cs);
+	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
+						 cs->maxadj, cs->mask);
 }
 EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
 
@@ -807,7 +794,8 @@ int clocksource_register(struct clocksource *cs)
 		cs->name);
 
 	/* calculate max idle time permitted for this clocksource */
-	cs->max_idle_ns = clocksource_max_deferment(cs);
+	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
+						 cs->maxadj, cs->mask);
 
 	mutex_lock(&clocksource_mutex);
 	clocksource_enqueue(cs);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 04/12] clocksource: Add max_cycles to clocksource structure
  2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
                   ` (2 preceding siblings ...)
  2015-01-23  0:09 ` [PATCH 03/12] clocksource: Remove clocksource_max_deferment() John Stultz
@ 2015-01-23  0:09 ` John Stultz
  2015-01-23  0:09 ` [PATCH 05/12] time: Add debugging checks to warn if we see delays John Stultz
                   ` (7 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

In order to facilitate some clocksource validation,
add a max_cycles entry to the structure which will
hold the maximum cycle value that can safely be
multiplied without potentially causing an overflow.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/clocksource.h |  6 ++++--
 kernel/time/clocksource.c   | 15 ++++++++++++---
 kernel/time/sched_clock.c   |  2 +-
 3 files changed, 17 insertions(+), 6 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index abcafaa..32dced9 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -158,6 +158,7 @@ extern u64 timecounter_cyc2time(struct timecounter *tc,
  * @shift:		cycle to nanosecond divisor (power of two)
  * @max_idle_ns:	max idle time permitted by the clocksource (nsecs)
  * @maxadj:		maximum adjustment value to mult (~11%)
+ * @max_cycles:		maximum safe cycle value which won't overflow on mult
  * @flags:		flags describing special properties
  * @archdata:		arch-specific data
  * @suspend:		suspend function for the clocksource, if necessary
@@ -178,7 +179,7 @@ struct clocksource {
 #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
 	struct arch_clocksource_data archdata;
 #endif
-
+	u64 max_cycles;
 	const char *name;
 	struct list_head list;
 	int rating;
@@ -291,7 +292,8 @@ extern struct clocksource * __init clocksource_default_clock(void);
 extern void clocksource_mark_unstable(struct clocksource *cs);
 
 extern u64
-clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask);
+clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask,
+							u64 *max_cycles);
 extern void
 clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 minsec);
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 0696559..c2f8639 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -545,11 +545,14 @@ static u32 clocksource_max_adjustment(struct clocksource *cs)
  * @shift:	cycle to nanosecond divisor (power of two)
  * @maxadj:	maximum adjustment value to mult (~11%)
  * @mask:	bitmask for two's complement subtraction of non 64 bit counters
+ * @max_cyc:    maximum cycle value before potential overflow (does not include
+ *		any saftey margin)
  *
  * NOTE: This function includes a saftey margin of 50%, so that bad clock values
  * can be detected.
  */
-u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
+u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask,
+								u64 *max_cyc)
 {
 	u64 max_nsecs, max_cycles;
 
@@ -569,6 +572,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 	max_cycles = min(max_cycles, mask);
 	max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift);
 
+	/* return the max_cycles value as well if requested */
+	if (max_cyc)
+		*max_cyc = max_cycles;
+
 	/* Return 50% of the actual maximum, so we can detect bad values */
 	max_nsecs >>= 1;
 
@@ -747,7 +754,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 	}
 
 	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
-						 cs->maxadj, cs->mask);
+						 cs->maxadj, cs->mask,
+						 &cs->max_cycles);
 }
 EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
 
@@ -795,7 +803,8 @@ int clocksource_register(struct clocksource *cs)
 
 	/* calculate max idle time permitted for this clocksource */
 	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
-						 cs->maxadj, cs->mask);
+						 cs->maxadj, cs->mask,
+						 &cs->max_cycles);
 
 	mutex_lock(&clocksource_mutex);
 	clocksource_enqueue(cs);
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index c794b84..d43855b 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -126,7 +126,7 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 	new_mask = CLOCKSOURCE_MASK(bits);
 
 	/* calculate how many ns until we risk wrapping */
-	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask);
+	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask, NULL);
 	new_wrap_kt = ns_to_ktime(wrap);
 
 	/* update epoch for new counter and update epoch_ns from old counter*/
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 05/12] time: Add debugging checks to warn if we see delays
  2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
                   ` (3 preceding siblings ...)
  2015-01-23  0:09 ` [PATCH 04/12] clocksource: Add max_cycles to clocksource structure John Stultz
@ 2015-01-23  0:09 ` John Stultz
  2015-01-23 14:27   ` Peter Zijlstra
  2015-01-23  0:09 ` [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value John Stultz
                   ` (6 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

Recently there's been some request for better sanity
checking in the time code, so that its more clear
when something is going wrong since timekeeping issues
could manifest in a large number of strange ways with
various subsystems.

Thus, this patch adds some extra infrastructure to
add a check update_wall_time to print warnings if we
see the call delayed beyond the max_cycles overflow
point, or beyond the clocksource max_idle_ns value
which is currently 50% of the overflow point.

This extra infrastructure is conditionalized
behind a new CONFIG_DEBUG_TIMEKEEPING option
also added in this patch.

Tested this a bit by halting qemu for specified
lengths of time to trigger the warnings.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/jiffies.c     |  1 +
 kernel/time/timekeeping.c | 23 +++++++++++++++++++++++
 lib/Kconfig.debug         | 12 ++++++++++++
 3 files changed, 36 insertions(+)

diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index a6a5bf5..7e41390 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -71,6 +71,7 @@ static struct clocksource clocksource_jiffies = {
 	.mask		= 0xffffffff, /*32bits*/
 	.mult		= NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */
 	.shift		= JIFFIES_SHIFT,
+	.max_cycles	= 10,
 };
 
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6a93185..daa0d43 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -118,6 +118,26 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
 	tk->offs_boot = ktime_add(tk->offs_boot, delta);
 }
 
+#ifdef CONFIG_DEBUG_TIMEKEEPING
+static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
+{
+
+	cycle_t max_cycles = tk->tkr.clock->max_cycles;
+	const char *name = tk->tkr.clock->name;
+
+	if (offset > max_cycles)
+		printk_deferred("ERROR: cycle offset (%lld) is larger then"
+			" allowed %s max_cycles (%lld)\n",
+			offset, name, max_cycles);
+	else if (offset > (max_cycles >> 1))
+		printk_deferred("WARNING: cycle offset (%lld) is past"
+			" the %s 50%% safety margin (%lld)\n",
+			offset, name, max_cycles>>1);
+}
+#else
+#define timekeeping_check_update(x, y)
+#endif
+
 /**
  * tk_setup_internals - Set up internals to use clocksource clock.
  *
@@ -1602,6 +1622,9 @@ void update_wall_time(void)
 	if (offset < real_tk->cycle_interval)
 		goto out;
 
+	/* Do some additional sanity checking */
+	timekeeping_check_update(real_tk, offset);
+
 	/*
 	 * With NO_HZ we may have to accumulate many cycle_intervals
 	 * (think "ticks") worth of time at once. To do this efficiently,
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index 5f2ce61..c7b040d 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -852,6 +852,18 @@ config SCHED_STACK_END_CHECK
 	  data corruption or a sporadic crash at a later stage once the region
 	  is examined. The runtime overhead introduced is minimal.
 
+config DEBUG_TIMEKEEPING
+	bool "Enable extra timekeeping sanity checking"
+	help
+	  This option will enable additional timekeeping sanity checks
+	  which may be helpful when diagnoising issues where timekeeping
+	  problems are suspected.
+
+	  This may include checks in the timekeeping hotpaths, so this
+	  option may have a performance impact to some workloads.
+
+	  If unsure, say N.
+
 config TIMER_STATS
 	bool "Collect kernel timers statistics"
 	depends on DEBUG_KERNEL && PROC_FS
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value
  2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
                   ` (4 preceding siblings ...)
  2015-01-23  0:09 ` [PATCH 05/12] time: Add debugging checks to warn if we see delays John Stultz
@ 2015-01-23  0:09 ` John Stultz
  2015-01-23 14:28   ` Peter Zijlstra
  2015-01-24 17:35   ` Richard Cochran
  2015-01-23  0:09 ` [PATCH 07/12] time: Try to catch clocksource delta underflows John Stultz
                   ` (5 subsequent siblings)
  11 siblings, 2 replies; 20+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

When calculating the current delta since the last tick, we
currently have no hard protections to prevent a multiplciation
overflow from ocurring.

This patch introduces infrastructure to allow a capp that
limits the read delta value to the max_cycles value, which is
where an overflow would occur.

Since this is in the hotpath, it adds the extra checking under
CONFIG_DEBUG_TIMEKEEPING.

There was some concern that capping time like this could cause
problems as we may stop expiring timers, which could go circular
if the timer that triggers time accumulation were misscheduled
too far in the future, which would cause time to stop.

However, since the mult overflow would result in a smaller time
value, we would effectively have the same problem there.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index daa0d43..ae3945d 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -134,8 +134,31 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 			" the %s 50%% safety margin (%lld)\n",
 			offset, name, max_cycles>>1);
 }
+
+static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+{
+	cycle_t cycle_now, delta;
+
+	/* read clocksource */
+	cycle_now = tkr->read(tkr->clock);
+
+	/* calculate the delta since the last update_wall_time */
+	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+
+	/* Cap delta value to the max_cycles values to avoid mult overflows */
+	if (unlikely(delta > tkr->clock->max_cycles))
+		delta = tkr->clock->max_cycles;
+
+	return delta;
+}
 #else
 #define timekeeping_check_update(x, y)
+static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+{
+	/* calculate the delta since the last update_wall_time */
+	return clocksource_delta(tkr->read(tkr->clock), tkr->cycle_last,
+								tkr->mask);
+}
 #endif
 
 /**
@@ -213,14 +236,10 @@ static inline u32 arch_gettimeoffset(void) { return 0; }
 
 static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 {
-	cycle_t cycle_now, delta;
+	cycle_t delta;
 	s64 nsec;
 
-	/* read clocksource: */
-	cycle_now = tkr->read(tkr->clock);
-
-	/* calculate the delta since the last update_wall_time: */
-	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+	delta = timekeeping_get_delta(tkr);
 
 	nsec = delta * tkr->mult + tkr->xtime_nsec;
 	nsec >>= tkr->shift;
@@ -232,14 +251,10 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk)
 {
 	struct clocksource *clock = tk->tkr.clock;
-	cycle_t cycle_now, delta;
+	cycle_t delta;
 	s64 nsec;
 
-	/* read clocksource: */
-	cycle_now = tk->tkr.read(clock);
-
-	/* calculate the delta since the last update_wall_time: */
-	delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask);
+	delta = timekeeping_get_delta(&tk->tkr);
 
 	/* convert delta to nanoseconds. */
 	nsec = clocksource_cyc2ns(delta, clock->mult, clock->shift);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 07/12] time: Try to catch clocksource delta underflows
  2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
                   ` (5 preceding siblings ...)
  2015-01-23  0:09 ` [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value John Stultz
@ 2015-01-23  0:09 ` John Stultz
  2015-01-23  0:09 ` [PATCH 08/12] time: Add warnings when overflows or underflows are observed John Stultz
                   ` (4 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

In the case where there is a broken clocksource
where there are multiple actual clocks that
aren't perfectly aligned, we may see small "negative"
deltas when we subtract now from cycle_last.

The values are actually negative with respect to the
clocksource mask value, not necessarily negative
if cast to a s64, but we can check by checking the
delta see if it is a small (relative to the mask)
negative value (again negative relative to the mask).

If so, we assume we jumped backwards somehow and
instead use zero for our delta.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index ae3945d..568186c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -145,6 +145,13 @@ static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
 	/* calculate the delta since the last update_wall_time */
 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
 
+	/*
+	 * Try to catch underflows by checking if we are seeing small
+	 * mask-relative negative values.
+	 */
+	if (unlikely((~delta & tkr->mask) < (tkr->mask >> 3)))
+		delta = 0;
+
 	/* Cap delta value to the max_cycles values to avoid mult overflows */
 	if (unlikely(delta > tkr->clock->max_cycles))
 		delta = tkr->clock->max_cycles;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 08/12] time: Add warnings when overflows or underflows are observed
  2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
                   ` (6 preceding siblings ...)
  2015-01-23  0:09 ` [PATCH 07/12] time: Try to catch clocksource delta underflows John Stultz
@ 2015-01-23  0:09 ` John Stultz
  2015-01-23 14:30   ` Peter Zijlstra
  2015-01-23  0:09 ` [PATCH 09/12] clocksource: Improve clocksource watchdog reporting John Stultz
                   ` (3 subsequent siblings)
  11 siblings, 1 reply; 20+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

It was suggested that the underflow/overflow protection
should probably throw some sort of warning out, rather
then just silently fixing the issue.

So this patch adds some warnings here. The flag variables
used are not protected by locks, but since we can't print
from the reading functions, just being able to say we
saw an issue in the update interval is useful enough,
and can be slightly racy without real consequnece.

The big complication is that we're only under a read
seqlock, so the data could shift under us during
our calcualtion to see if there was a problem. This
patch avoids this issue by nesting another seqlock
which allows us to snapshot the just required values
atomically. So we shouldn't see false positives.

I also added some basic ratelimiting here, since
on one build machine w/ skewed TSCs it was fairly
noisy at bootup.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 58 +++++++++++++++++++++++++++++++++++++++++------
 1 file changed, 51 insertions(+), 7 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 568186c..d216b0e 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -119,11 +119,23 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
 }
 
 #ifdef CONFIG_DEBUG_TIMEKEEPING
+#define WARNINGFREQ (HZ*300) /* 5 minute rate-limiting */
+/*
+ * These simple flag variables are managed
+ * without locks, which is racy, but ok since
+ * we don't really care about being super
+ * precise about how many events were seen,
+ * just that a problem was observed.
+ */
+static int timekeeping_underflow_seen;
+static int timekeeping_overflow_seen;
+
 static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 {
 
 	cycle_t max_cycles = tk->tkr.clock->max_cycles;
 	const char *name = tk->tkr.clock->name;
+	static long last_warning; /* we always hold write on timekeeper lock */
 
 	if (offset > max_cycles)
 		printk_deferred("ERROR: cycle offset (%lld) is larger then"
@@ -133,28 +145,60 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 		printk_deferred("WARNING: cycle offset (%lld) is past"
 			" the %s 50%% safety margin (%lld)\n",
 			offset, name, max_cycles>>1);
+
+	if (timekeeping_underflow_seen) {
+		if (jiffies - last_warning > WARNINGFREQ) {
+			printk_deferred("WARNING: Clocksource underflow observed\n");
+			last_warning = jiffies;
+		}
+		timekeeping_underflow_seen = 0;
+	}
+	if (timekeeping_overflow_seen) {
+		if (jiffies - last_warning > WARNINGFREQ) {
+			printk_deferred("WARNING: Clocksource overflow observed\n");
+			last_warning = jiffies;
+		}
+		timekeeping_overflow_seen = 0;
+	}
+
 }
 
 static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
 {
-	cycle_t cycle_now, delta;
+	cycle_t now, last, mask, max, delta;
+	unsigned int seq;
 
-	/* read clocksource */
-	cycle_now = tkr->read(tkr->clock);
+	/*
+	 * Since we're called holding a seqlock, the data may shift
+	 * under us while we're doign the calculation. This can cause
+	 * false positives, since we'd note a problem but throw the
+	 * results away. So nest  another seqlock here to atomically
+	 * grab the points we are checking with.
+	 */
+	do {
+		seq = read_seqcount_begin(&tk_core.seq);
+		now = tkr->read(tkr->clock);
+		last = tkr->cycle_last;
+		mask = tkr->mask;
+		max = tkr->clock->max_cycles;
+	} while (read_seqcount_retry(&tk_core.seq, seq));
 
-	/* calculate the delta since the last update_wall_time */
-	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+	delta = clocksource_delta(now, last, mask);
 
 	/*
 	 * Try to catch underflows by checking if we are seeing small
 	 * mask-relative negative values.
 	 */
-	if (unlikely((~delta & tkr->mask) < (tkr->mask >> 3)))
+	if (unlikely((~delta & mask) < (mask >> 3))) {
+		timekeeping_underflow_seen = 1;
 		delta = 0;
+	}
 
 	/* Cap delta value to the max_cycles values to avoid mult overflows */
-	if (unlikely(delta > tkr->clock->max_cycles))
+	if (unlikely(delta > max)) {
+		timekeeping_overflow_seen = 1;
 		delta = tkr->clock->max_cycles;
+	}
 
 	return delta;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 09/12] clocksource: Improve clocksource watchdog reporting
  2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
                   ` (7 preceding siblings ...)
  2015-01-23  0:09 ` [PATCH 08/12] time: Add warnings when overflows or underflows are observed John Stultz
@ 2015-01-23  0:09 ` John Stultz
  2015-01-23  0:09 ` [PATCH 10/12] clocksource: Mostly kill clocksource_register() John Stultz
                   ` (2 subsequent siblings)
  11 siblings, 0 replies; 20+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: John Stultz

The clocksource watchdog reporting has been less helpful
then desired, as it just printed the delta between
the two clocksources. This prevents any useful analysis
of why the skew occurred.

Thus this patch tries to improve the output when we
mark a clocksource as unstable, printing out the cycle
last and now values for both the current clocksource
and the watchdog clocksource. This will allow us to see
if the result was due to a false positive caused by
a problematic watchdog.

Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c2f8639..4aca66d 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -218,13 +218,6 @@ static void __clocksource_unstable(struct clocksource *cs)
 		schedule_work(&watchdog_work);
 }
 
-static void clocksource_unstable(struct clocksource *cs, int64_t delta)
-{
-	printk(KERN_WARNING "Clocksource %s unstable (delta = %Ld ns)\n",
-	       cs->name, delta);
-	__clocksource_unstable(cs);
-}
-
 /**
  * clocksource_mark_unstable - mark clocksource unstable via watchdog
  * @cs:		clocksource to be marked unstable
@@ -250,7 +243,7 @@ void clocksource_mark_unstable(struct clocksource *cs)
 static void clocksource_watchdog(unsigned long data)
 {
 	struct clocksource *cs;
-	cycle_t csnow, wdnow, delta;
+	cycle_t csnow, wdnow, cslast, wdlast, delta;
 	int64_t wd_nsec, cs_nsec;
 	int next_cpu, reset_pending;
 
@@ -289,6 +282,8 @@ static void clocksource_watchdog(unsigned long data)
 
 		delta = clocksource_delta(csnow, cs->cs_last, cs->mask);
 		cs_nsec = clocksource_cyc2ns(delta, cs->mult, cs->shift);
+		wdlast = cs->wd_last; /* save these incase we print them */
+		cslast = cs->cs_last;
 		cs->cs_last = csnow;
 		cs->wd_last = wdnow;
 
@@ -297,7 +292,15 @@ static void clocksource_watchdog(unsigned long data)
 
 		/* Check the deviation from the watchdog clocksource. */
 		if ((abs(cs_nsec - wd_nsec) > WATCHDOG_THRESHOLD)) {
-			clocksource_unstable(cs, cs_nsec - wd_nsec);
+			pr_warn("Watchdog: clocksource %s unstable\n",
+				cs->name);
+			pr_warn("	"
+				"%s wd_now: %llx wd_last: %llx mask: %llx\n",
+				watchdog->name, wdnow, wdlast, watchdog->mask);
+			pr_warn("	"
+				"%s cs_now: %llx cs_last: %llx mask: %llx\n",
+				cs->name, csnow, cslast, cs->mask);
+			__clocksource_unstable(cs);
 			continue;
 		}
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 10/12] clocksource: Mostly kill clocksource_register()
  2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
                   ` (8 preceding siblings ...)
  2015-01-23  0:09 ` [PATCH 09/12] clocksource: Improve clocksource watchdog reporting John Stultz
@ 2015-01-23  0:09 ` John Stultz
  2015-01-23  0:09 ` [PATCH 11/12] sparc: Convert to using clocksource_register_hz() John Stultz
  2015-01-23  0:09 ` [PATCH 12/12] clocksource: Add some debug info about clocksources being registered John Stultz
  11 siblings, 0 replies; 20+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra, David S. Miller, Martin Schwidefsky

A long running project has been to cleanup remaining uses
of clocksource_register(), replacing it with the simpler
clocksource_register_khz/hz().

However, there are a few cases where we need to self-define
our mult/shift values, so switch the function to a more
obviously internal __clocksource_register(), and consolidate
much of the internal logic so we don't have duplication.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 arch/s390/kernel/time.c     |  2 +-
 arch/sparc/kernel/time_32.c |  2 +-
 include/linux/clocksource.h | 10 +++++-
 kernel/time/clocksource.c   | 83 +++++++++++++++++++--------------------------
 kernel/time/jiffies.c       |  4 +--
 5 files changed, 47 insertions(+), 54 deletions(-)

diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 20660dd..6c273cd 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -283,7 +283,7 @@ void __init time_init(void)
 	if (register_external_irq(EXT_IRQ_TIMING_ALERT, timing_alert_interrupt))
 		panic("Couldn't request external interrupt 0x1406");
 
-	if (clocksource_register(&clocksource_tod) != 0)
+	if (__clocksource_register(&clocksource_tod) != 0)
 		panic("Could not register TOD clock source");
 
 	/* Enable TOD clock interrupts on the boot cpu. */
diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c
index 2f80d23..a31c0c8 100644
--- a/arch/sparc/kernel/time_32.c
+++ b/arch/sparc/kernel/time_32.c
@@ -191,7 +191,7 @@ static __init int setup_timer_cs(void)
 	timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate,
 	                                    timer_cs.shift);
 
-	return clocksource_register(&timer_cs);
+	return __clocksource_register(&timer_cs);
 }
 
 #ifdef CONFIG_SMP
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 32dced9..61236f8 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -281,7 +281,6 @@ static inline s64 clocksource_cyc2ns(cycle_t cycles, u32 mult, u32 shift)
 }
 
 
-extern int clocksource_register(struct clocksource*);
 extern int clocksource_unregister(struct clocksource*);
 extern void clocksource_touch_watchdog(void);
 extern struct clocksource* clocksource_get_next(void);
@@ -306,6 +305,15 @@ __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq);
 extern void
 __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq);
 
+/*
+ * Dont' call this unless you're a default clocksource
+ * (AKA: jiffies) and absolutely have to.
+ */
+static inline int __clocksource_register(struct clocksource *cs)
+{
+	return __clocksource_register_scale(cs, 1, 0);
+}
+
 static inline int clocksource_register_hz(struct clocksource *cs, u32 hz)
 {
 	return __clocksource_register_scale(cs, 1, hz);
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 4aca66d..b56f4b9 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -724,38 +724,52 @@ static void clocksource_enqueue(struct clocksource *cs)
 void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 {
 	u64 sec;
+
 	/*
-	 * Calc the maximum number of seconds which we can run before
-	 * wrapping around. For clocksources which have a mask > 32bit
-	 * we need to limit the max sleep time to have a good
-	 * conversion precision. 10 minutes is still a reasonable
-	 * amount. That results in a shift value of 24 for a
-	 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
-	 * ~ 0.06ppm granularity for NTP.
+	 * Default clocksources are *special* and self-define their mult/shift.
+	 * But, you're not special, so you should specify a freq value.
 	 */
-	sec = cs->mask;
-	do_div(sec, freq);
-	do_div(sec, scale);
-	if (!sec)
-		sec = 1;
-	else if (sec > 600 && cs->mask > UINT_MAX)
-		sec = 600;
-
-	clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
-			       NSEC_PER_SEC / scale, sec * scale);
-
+	if (freq) {
+		/*
+		 * Calc the maximum number of seconds which we can run before
+		 * wrapping around. For clocksources which have a mask > 32bit
+		 * we need to limit the max sleep time to have a good
+		 * conversion precision. 10 minutes is still a reasonable
+		 * amount. That results in a shift value of 24 for a
+		 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
+		 * ~ 0.06ppm granularity for NTP.
+		 */
+		sec = cs->mask;
+		do_div(sec, freq);
+		do_div(sec, scale);
+		if (!sec)
+			sec = 1;
+		else if (sec > 600 && cs->mask > UINT_MAX)
+			sec = 600;
+
+		clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
+				       NSEC_PER_SEC / scale, sec * scale);
+	}
 	/*
 	 * Ensure clocksources that have large mults don't overflow
 	 * when adjusted.
 	 */
 	cs->maxadj = clocksource_max_adjustment(cs);
-	while ((cs->mult + cs->maxadj < cs->mult)
-		|| (cs->mult - cs->maxadj > cs->mult)) {
+	while (freq && ((cs->mult + cs->maxadj < cs->mult)
+		|| (cs->mult - cs->maxadj > cs->mult))) {
 		cs->mult >>= 1;
 		cs->shift--;
 		cs->maxadj = clocksource_max_adjustment(cs);
 	}
 
+	/*
+	 * Only warn for *special* clocksources that self-define
+	 * their mult/shift values and don't specify a freq.
+	 */
+	WARN_ONCE(cs->mult + cs->maxadj < cs->mult,
+		"Clocksource %s might overflow on 11%% adjustment\n",
+		cs->name);
+
 	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
 						 cs->maxadj, cs->mask,
 						 &cs->max_cycles);
@@ -789,35 +803,6 @@ int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq)
 }
 EXPORT_SYMBOL_GPL(__clocksource_register_scale);
 
-
-/**
- * clocksource_register - Used to install new clocksources
- * @cs:		clocksource to be registered
- *
- * Returns -EBUSY if registration fails, zero otherwise.
- */
-int clocksource_register(struct clocksource *cs)
-{
-	/* calculate max adjustment for given mult/shift */
-	cs->maxadj = clocksource_max_adjustment(cs);
-	WARN_ONCE(cs->mult + cs->maxadj < cs->mult,
-		"Clocksource %s might overflow on 11%% adjustment\n",
-		cs->name);
-
-	/* calculate max idle time permitted for this clocksource */
-	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
-						 cs->maxadj, cs->mask,
-						 &cs->max_cycles);
-
-	mutex_lock(&clocksource_mutex);
-	clocksource_enqueue(cs);
-	clocksource_enqueue_watchdog(cs);
-	clocksource_select();
-	mutex_unlock(&clocksource_mutex);
-	return 0;
-}
-EXPORT_SYMBOL(clocksource_register);
-
 static void __clocksource_change_rating(struct clocksource *cs, int rating)
 {
 	list_del(&cs->list);
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index 7e41390..c4bb518 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -95,7 +95,7 @@ EXPORT_SYMBOL(jiffies);
 
 static int __init init_jiffies_clocksource(void)
 {
-	return clocksource_register(&clocksource_jiffies);
+	return __clocksource_register(&clocksource_jiffies);
 }
 
 core_initcall(init_jiffies_clocksource);
@@ -131,6 +131,6 @@ int register_refined_jiffies(long cycles_per_second)
 
 	refined_jiffies.mult = ((u32)nsec_per_tick) << JIFFIES_SHIFT;
 
-	clocksource_register(&refined_jiffies);
+	__clocksource_register(&refined_jiffies);
 	return 0;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 11/12] sparc: Convert to using clocksource_register_hz()
  2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
                   ` (9 preceding siblings ...)
  2015-01-23  0:09 ` [PATCH 10/12] clocksource: Mostly kill clocksource_register() John Stultz
@ 2015-01-23  0:09 ` John Stultz
  2015-01-27  1:18   ` David Miller
  2015-01-23  0:09 ` [PATCH 12/12] clocksource: Add some debug info about clocksources being registered John Stultz
  11 siblings, 1 reply; 20+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra, David S. Miller

While cleaning up some clocksource code, I noticed the
time_32 impelementation uses the hz2mult helper, but
doesn't use the clocksource_register_hz() method.

I don't believe the sparc clocksource is a default
clocksource, so we shouldn't need to self-define
the mult/shift pair.

So convert the time_32.c implementation to use
clocksource_register_hz().

Untested.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 arch/sparc/kernel/time_32.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c
index a31c0c8..18147a5 100644
--- a/arch/sparc/kernel/time_32.c
+++ b/arch/sparc/kernel/time_32.c
@@ -181,17 +181,13 @@ static struct clocksource timer_cs = {
 	.rating	= 100,
 	.read	= timer_cs_read,
 	.mask	= CLOCKSOURCE_MASK(64),
-	.shift	= 2,
 	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
 static __init int setup_timer_cs(void)
 {
 	timer_cs_enabled = 1;
-	timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate,
-	                                    timer_cs.shift);
-
-	return __clocksource_register(&timer_cs);
+	return clocksource_register_hz(&timer_cs, sparc_config.clock_rate);
 }
 
 #ifdef CONFIG_SMP
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH 12/12] clocksource: Add some debug info about clocksources being registered
  2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
                   ` (10 preceding siblings ...)
  2015-01-23  0:09 ` [PATCH 11/12] sparc: Convert to using clocksource_register_hz() John Stultz
@ 2015-01-23  0:09 ` John Stultz
  11 siblings, 0 replies; 20+ messages in thread
From: John Stultz @ 2015-01-23  0:09 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

Print the mask, max_cycles, and max_idle_ns values for clocksources
being registered.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index b56f4b9..ec7b552 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -773,6 +773,10 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
 						 cs->maxadj, cs->mask,
 						 &cs->max_cycles);
+
+	pr_info("clocksource %s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: %lld ns\n",
+			cs->name, cs->mask, cs->max_cycles, cs->max_idle_ns);
+
 }
 EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH 05/12] time: Add debugging checks to warn if we see delays
  2015-01-23  0:09 ` [PATCH 05/12] time: Add debugging checks to warn if we see delays John Stultz
@ 2015-01-23 14:27   ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2015-01-23 14:27 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Richard Cochran, Prarit Bhargava, Stephen Boyd,
	Ingo Molnar

On Thu, Jan 22, 2015 at 04:09:20PM -0800, John Stultz wrote:
> +#ifdef CONFIG_DEBUG_TIMEKEEPING
> +static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
> +{
> +
> +	cycle_t max_cycles = tk->tkr.clock->max_cycles;
> +	const char *name = tk->tkr.clock->name;
> +
> +	if (offset > max_cycles)
> +		printk_deferred("ERROR: cycle offset (%lld) is larger then"
> +			" allowed %s max_cycles (%lld)\n",
> +			offset, name, max_cycles);
> +	else if (offset > (max_cycles >> 1))
> +		printk_deferred("WARNING: cycle offset (%lld) is past"
> +			" the %s 50%% safety margin (%lld)\n",
> +			offset, name, max_cycles>>1);
> +}
> +#else
> +#define timekeeping_check_update(x, y)

static inline void timekeeping_check_update(struct timekeeper *tk, cycle_t offset) { }

Would preserve argument type checks. The only argument for using a macro
here would be to avoid evaluating the arguments, but that's not an issue
in this case afaict.

> +#endif

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value
  2015-01-23  0:09 ` [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value John Stultz
@ 2015-01-23 14:28   ` Peter Zijlstra
  2015-01-24 17:35   ` Richard Cochran
  1 sibling, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2015-01-23 14:28 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Richard Cochran, Prarit Bhargava, Stephen Boyd,
	Ingo Molnar

On Thu, Jan 22, 2015 at 04:09:21PM -0800, John Stultz wrote:
> +static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
> +{
> +	cycle_t cycle_now, delta;
> +
> +	/* read clocksource */
> +	cycle_now = tkr->read(tkr->clock);
> +
> +	/* calculate the delta since the last update_wall_time */
> +	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
> +
> +	/* Cap delta value to the max_cycles values to avoid mult overflows */
> +	if (unlikely(delta > tkr->clock->max_cycles))
> +		delta = tkr->clock->max_cycles;


Should we not raise _something_ here? I know we cannot printk() here,
but bad (TM) things happened if this ever triggers.

> +
> +	return delta;
> +}

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 08/12] time: Add warnings when overflows or underflows are observed
  2015-01-23  0:09 ` [PATCH 08/12] time: Add warnings when overflows or underflows are observed John Stultz
@ 2015-01-23 14:30   ` Peter Zijlstra
  0 siblings, 0 replies; 20+ messages in thread
From: Peter Zijlstra @ 2015-01-23 14:30 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Richard Cochran, Prarit Bhargava, Stephen Boyd,
	Ingo Molnar

On Thu, Jan 22, 2015 at 04:09:23PM -0800, John Stultz wrote:
>  #ifdef CONFIG_DEBUG_TIMEKEEPING
> +#define WARNINGFREQ (HZ*300) /* 5 minute rate-limiting */
> +/*
> + * These simple flag variables are managed
> + * without locks, which is racy, but ok since
> + * we don't really care about being super
> + * precise about how many events were seen,
> + * just that a problem was observed.
> + */
> +static int timekeeping_underflow_seen;
> +static int timekeeping_overflow_seen;
> +
>  static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
>  {
>  
>  	cycle_t max_cycles = tk->tkr.clock->max_cycles;
>  	const char *name = tk->tkr.clock->name;
> +	static long last_warning; /* we always hold write on timekeeper lock */
>  
>  	if (offset > max_cycles)
>  		printk_deferred("ERROR: cycle offset (%lld) is larger then"
> @@ -133,28 +145,60 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
>  		printk_deferred("WARNING: cycle offset (%lld) is past"
>  			" the %s 50%% safety margin (%lld)\n",
>  			offset, name, max_cycles>>1);
> +
> +	if (timekeeping_underflow_seen) {
> +		if (jiffies - last_warning > WARNINGFREQ) {
> +			printk_deferred("WARNING: Clocksource underflow observed\n");
> +			last_warning = jiffies;
> +		}
> +		timekeeping_underflow_seen = 0;
> +	}
> +	if (timekeeping_overflow_seen) {
> +		if (jiffies - last_warning > WARNINGFREQ) {
> +			printk_deferred("WARNING: Clocksource overflow observed\n");
> +			last_warning = jiffies;
> +		}
> +		timekeeping_overflow_seen = 0;
> +	}
> +
>  }

Ah, ignore my last comment. Excellent!

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value
  2015-01-23  0:09 ` [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value John Stultz
  2015-01-23 14:28   ` Peter Zijlstra
@ 2015-01-24 17:35   ` Richard Cochran
  1 sibling, 0 replies; 20+ messages in thread
From: Richard Cochran @ 2015-01-24 17:35 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

On Thu, Jan 22, 2015 at 04:09:21PM -0800, John Stultz wrote:
> +static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
> +{
> +	cycle_t cycle_now, delta;
> +
> +	/* read clocksource */
> +	cycle_now = tkr->read(tkr->clock);
> +
> +	/* calculate the delta since the last update_wall_time */
> +	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
> +
> +	/* Cap delta value to the max_cycles values to avoid mult overflows */
> +	if (unlikely(delta > tkr->clock->max_cycles))
> +		delta = tkr->clock->max_cycles;
> +
> +	return delta;
> +}
>  #else
>  #define timekeeping_check_update(x, y)
> +static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
> +{
> +	/* calculate the delta since the last update_wall_time */
> +	return clocksource_delta(tkr->read(tkr->clock), tkr->cycle_last,
> +								tkr->mask);

Coding style is strange here.  It would look nicer it were like the
debug version, above.

> +}
>  #endif

Series looks fine to me.

Acked-by: Richard Cochran <richardcochran@gmail.com>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 11/12] sparc: Convert to using clocksource_register_hz()
  2015-01-23  0:09 ` [PATCH 11/12] sparc: Convert to using clocksource_register_hz() John Stultz
@ 2015-01-27  1:18   ` David Miller
  0 siblings, 0 replies; 20+ messages in thread
From: David Miller @ 2015-01-27  1:18 UTC (permalink / raw)
  To: john.stultz
  Cc: linux-kernel, davej, torvalds, tglx, richardcochran, prarit,
	sboyd, mingo, peterz

From: John Stultz <john.stultz@linaro.org>
Date: Thu, 22 Jan 2015 16:09:26 -0800

> While cleaning up some clocksource code, I noticed the
> time_32 impelementation uses the hz2mult helper, but
> doesn't use the clocksource_register_hz() method.
> 
> I don't believe the sparc clocksource is a default
> clocksource, so we shouldn't need to self-define
> the mult/shift pair.
> 
> So convert the time_32.c implementation to use
> clocksource_register_hz().
> 
> Untested.
> 
> Signed-off-by: John Stultz <john.stultz@linaro.org>

Acked-by: David S. Miller <davem@davemloft.net>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value
  2015-03-07  2:49 ` [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value John Stultz
@ 2015-03-07  9:32   ` Ingo Molnar
  0 siblings, 0 replies; 20+ messages in thread
From: Ingo Molnar @ 2015-03-07  9:32 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Peter Zijlstra


* John Stultz <john.stultz@linaro.org> wrote:

> When calculating the current delta since the last tick, we
> currently have no hard protections to prevent a multiplciation

Typo.

> overflow from ocurring.

Typo.

> This patch introduces infrastructure to allow a capp that

Typo ...

> +static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
> +{
> +	/* calculate the delta since the last update_wall_time */
> +	return clocksource_delta(tkr->read(tkr->clock), tkr->cycle_last,
> +								tkr->mask);

Please don't do spurious col80 line breaks that don't improve the end 
result.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value
  2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
@ 2015-03-07  2:49 ` John Stultz
  2015-03-07  9:32   ` Ingo Molnar
  0 siblings, 1 reply; 20+ messages in thread
From: John Stultz @ 2015-03-07  2:49 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

When calculating the current delta since the last tick, we
currently have no hard protections to prevent a multiplciation
overflow from ocurring.

This patch introduces infrastructure to allow a capp that
limits the read delta value to the max_cycles value, which is
where an overflow would occur.

Since this is in the hotpath, it adds the extra checking under
CONFIG_DEBUG_TIMEKEEPING.

There was some concern that capping time like this could cause
problems as we may stop expiring timers, which could go circular
if the timer that triggers time accumulation were misscheduled
too far in the future, which would cause time to stop.

However, since the mult overflow would result in a smaller time
value, we would effectively have the same problem there.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 39 +++++++++++++++++++++++++++------------
 1 file changed, 27 insertions(+), 12 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 7e9d433..8b9e328 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -134,11 +134,34 @@ static void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 			" the %s 50%% safety margin (%lld)\n",
 			offset, name, max_cycles>>1);
 }
+
+static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+{
+	cycle_t cycle_now, delta;
+
+	/* read clocksource */
+	cycle_now = tkr->read(tkr->clock);
+
+	/* calculate the delta since the last update_wall_time */
+	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+
+	/* Cap delta value to the max_cycles values to avoid mult overflows */
+	if (unlikely(delta > tkr->clock->max_cycles))
+		delta = tkr->clock->max_cycles;
+
+	return delta;
+}
 #else
 static inline
 void timekeeping_check_update(struct timekeeper *tk, cycle_t offset)
 {
 }
+static inline cycle_t timekeeping_get_delta(struct tk_read_base *tkr)
+{
+	/* calculate the delta since the last update_wall_time */
+	return clocksource_delta(tkr->read(tkr->clock), tkr->cycle_last,
+								tkr->mask);
+}
 #endif
 
 /**
@@ -216,14 +239,10 @@ static inline u32 arch_gettimeoffset(void) { return 0; }
 
 static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 {
-	cycle_t cycle_now, delta;
+	cycle_t delta;
 	s64 nsec;
 
-	/* read clocksource: */
-	cycle_now = tkr->read(tkr->clock);
-
-	/* calculate the delta since the last update_wall_time: */
-	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
+	delta = timekeeping_get_delta(tkr);
 
 	nsec = delta * tkr->mult + tkr->xtime_nsec;
 	nsec >>= tkr->shift;
@@ -235,14 +254,10 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk)
 {
 	struct clocksource *clock = tk->tkr.clock;
-	cycle_t cycle_now, delta;
+	cycle_t delta;
 	s64 nsec;
 
-	/* read clocksource: */
-	cycle_now = tk->tkr.read(clock);
-
-	/* calculate the delta since the last update_wall_time: */
-	delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask);
+	delta = timekeeping_get_delta(&tk->tkr);
 
 	/* convert delta to nanoseconds. */
 	nsec = clocksource_cyc2ns(delta, clock->mult, clock->shift);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-03-07  9:32 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-23  0:09 [PATCH 00/12][RFC] Increased clocksource validation and cleanups (v2) John Stultz
2015-01-23  0:09 ` [PATCH 01/12] clocksource: Simplify clocks_calc_max_nsecs logic John Stultz
2015-01-23  0:09 ` [PATCH 02/12] clocksource: Simplify logic around clocksource wrapping saftey margins John Stultz
2015-01-23  0:09 ` [PATCH 03/12] clocksource: Remove clocksource_max_deferment() John Stultz
2015-01-23  0:09 ` [PATCH 04/12] clocksource: Add max_cycles to clocksource structure John Stultz
2015-01-23  0:09 ` [PATCH 05/12] time: Add debugging checks to warn if we see delays John Stultz
2015-01-23 14:27   ` Peter Zijlstra
2015-01-23  0:09 ` [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value John Stultz
2015-01-23 14:28   ` Peter Zijlstra
2015-01-24 17:35   ` Richard Cochran
2015-01-23  0:09 ` [PATCH 07/12] time: Try to catch clocksource delta underflows John Stultz
2015-01-23  0:09 ` [PATCH 08/12] time: Add warnings when overflows or underflows are observed John Stultz
2015-01-23 14:30   ` Peter Zijlstra
2015-01-23  0:09 ` [PATCH 09/12] clocksource: Improve clocksource watchdog reporting John Stultz
2015-01-23  0:09 ` [PATCH 10/12] clocksource: Mostly kill clocksource_register() John Stultz
2015-01-23  0:09 ` [PATCH 11/12] sparc: Convert to using clocksource_register_hz() John Stultz
2015-01-27  1:18   ` David Miller
2015-01-23  0:09 ` [PATCH 12/12] clocksource: Add some debug info about clocksources being registered John Stultz
2015-03-07  2:49 [PATCH 00/12] Increased clocksource validation and cleanups (v3) John Stultz
2015-03-07  2:49 ` [PATCH 06/12] time: Add infrastructure to cap clocksource reads to the max_cycles value John Stultz
2015-03-07  9:32   ` Ingo Molnar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.