[PATCH 00/10][RFC] Increased clocksource validation and cleanups

All of lore.kernel.org
 help / color / mirror / Atom feed

* [PATCH 00/10][RFC] Increased clocksource validation and cleanups
@ 2015-01-10  0:34 John Stultz
  2015-01-10  0:34 ` [PATCH 01/10] clocksource: Simplify clocks_calc_max_nsecs logic John Stultz
                   ` (10 more replies)
  0 siblings, 11 replies; 33+ messages in thread
From: John Stultz @ 2015-01-10  0:34 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra, David S. Miller, Martin Schwidefsky

So this series is the result of earlier discussions with Linus
and his suggestions around improvements to clocksource validation
in the hope we can more easily catch bad hardware.

There's also a few cleanups Linus suggested as well as a few I've been
meaning to get to for awhile.

I'm still feeling cautious about capping the clocksource reads to the
max_cycles values, and even more so with the underflow detection. So 
this isn't yet ready for merging, but I wanted to get some initial
review and feedback since its not blowing up in my initial testing.

So let me know what you think, and hopefully we can get this into
shape so it can get more testing and we can evaluate it for 3.20.

thanks
-john

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>

John Stultz (10):
  clocksource: Simplify clocks_calc_max_nsecs logic
  clocksource: Simplify logic around clocksource wrapping saftey margins
  clocksource: Remove clocksource_max_deferment()
  clocksource: Add max_cycles to clocksource structure
  time: Add debugging checks to warn if we see delays
  time: Cap clocksource reads to the clocksource max_cycles value
  time: Try to catch clocksource delta underflows
  clocksource: Mostly kill clocksource_register()
  sparc: Convert to using clocksource_register_hz()
  clocksource: Add some debug info about clocksources being registered

 arch/s390/kernel/time.c     |   2 +-
 arch/sparc/kernel/time_32.c |   6 +-
 include/linux/clocksource.h |  15 ++++-
 kernel/time/clocksource.c   | 143 ++++++++++++++++++--------------------------
 kernel/time/jiffies.c       |   5 +-
 kernel/time/sched_clock.c   |   6 +-
 kernel/time/timekeeping.c   |  34 +++++++++++
 7 files changed, 113 insertions(+), 98 deletions(-)

-- 
1.9.1

^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 01/10] clocksource: Simplify clocks_calc_max_nsecs logic
  2015-01-10  0:34 [PATCH 00/10][RFC] Increased clocksource validation and cleanups John Stultz
@ 2015-01-10  0:34 ` John Stultz
  2015-01-10  0:34 ` [PATCH 02/10] clocksource: Simplify logic around clocksource wrapping saftey margins John Stultz
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: John Stultz @ 2015-01-10  0:34 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

The previous clocks_calc_max_nsecs had some unecessarily
complex bit logic to find the max interval that could cause
multiplication overflows. Since this is not in the hot
path, just do the divide to make it easier to read.

The previous implementation also had a subtle issue
that it avoided overflows into signed 64bit values, where
as the intervals are always unsigned. This resulted in
overly conservative intervals, which other saftey margins
were then added to, reducing the intended interval length.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 15 +++------------
 1 file changed, 3 insertions(+), 12 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index b79f39b..c14cd03 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -552,19 +552,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 
 	/*
 	 * Calculate the maximum number of cycles that we can pass to the
-	 * cyc2ns function without overflowing a 64-bit signed result. The
-	 * maximum number of cycles is equal to ULLONG_MAX/(mult+maxadj)
-	 * which is equivalent to the below.
-	 * max_cycles < (2^63)/(mult + maxadj)
-	 * max_cycles < 2^(log2((2^63)/(mult + maxadj)))
-	 * max_cycles < 2^(log2(2^63) - log2(mult + maxadj))
-	 * max_cycles < 2^(63 - log2(mult + maxadj))
-	 * max_cycles < 1 << (63 - log2(mult + maxadj))
-	 * Please note that we add 1 to the result of the log2 to account for
-	 * any rounding errors, ensure the above inequality is satisfied and
-	 * no overflow will occur.
+	 * cyc2ns function without overflowing a 64-bit result.
 	 */
-	max_cycles = 1ULL << (63 - (ilog2(mult + maxadj) + 1));
+	max_cycles = ULLONG_MAX;
+	do_div(max_cycles, mult+maxadj);
 
 	/*
 	 * The actual maximum number of cycles we can defer the clocksource is
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 02/10] clocksource: Simplify logic around clocksource wrapping saftey margins
  2015-01-10  0:34 [PATCH 00/10][RFC] Increased clocksource validation and cleanups John Stultz
  2015-01-10  0:34 ` [PATCH 01/10] clocksource: Simplify clocks_calc_max_nsecs logic John Stultz
@ 2015-01-10  0:34 ` John Stultz
  2015-01-10  2:03   ` Stephen Boyd
  2015-01-10  0:34 ` [PATCH 03/10] clocksource: Remove clocksource_max_deferment() John Stultz
                   ` (8 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: John Stultz @ 2015-01-10  0:34 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

The clocksource logic has a number of places where we try to
include a safety margin. Most of these are 12% safety margins,
but they are inconsistently applied and sometimes are applied
on top of each other.

Additionally, in the previous patch, we corrected an issue
where we unintentionally in effect created a 50% saftey margin,
which these 12.5% margins where then added to.

So to siplify the logic here, this patch removes the various
12.5% margins, and consolidates adding the margin in one place:
clocks_calc_max_nsecs().

Addtionally, Linus prefers a 50% safety margin, as it allows
bad clock values to be more easily caught. This should really
have no net effect, due to the corrected issue earlier which
caused greater then 50% margins to be used w/o issue.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 26 ++++++++++++--------------
 kernel/time/sched_clock.c |  4 ++--
 2 files changed, 14 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c14cd03..e837ffd1 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -545,6 +545,9 @@ static u32 clocksource_max_adjustment(struct clocksource *cs)
  * @shift:	cycle to nanosecond divisor (power of two)
  * @maxadj:	maximum adjustment value to mult (~11%)
  * @mask:	bitmask for two's complement subtraction of non 64 bit counters
+ *
+ * NOTE: This function includes a saftey margin of 50%, so that bad clock values
+ * can be detected.
  */
 u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 {
@@ -566,11 +569,14 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 	max_cycles = min(max_cycles, mask);
 	max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift);
 
+	/* Return 50% of the actual maximum, so we can detect bad values */
+	max_nsecs >>= 1;
+
 	return max_nsecs;
 }
 
 /**
- * clocksource_max_deferment - Returns max time the clocksource can be deferred
+ * clocksource_max_deferment - Returns max time the clocksource should be deferred
  * @cs:         Pointer to clocksource
  *
  */
@@ -580,13 +586,7 @@ static u64 clocksource_max_deferment(struct clocksource *cs)
 
 	max_nsecs = clocks_calc_max_nsecs(cs->mult, cs->shift, cs->maxadj,
 					  cs->mask);
-	/*
-	 * To ensure that the clocksource does not wrap whilst we are idle,
-	 * limit the time the clocksource can be deferred by 12.5%. Please
-	 * note a margin of 12.5% is used because this can be computed with
-	 * a shift, versus say 10% which would require division.
-	 */
-	return max_nsecs - (max_nsecs >> 3);
+	return max_nsecs;
 }
 
 #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET
@@ -735,10 +735,9 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 	 * conversion precision. 10 minutes is still a reasonable
 	 * amount. That results in a shift value of 24 for a
 	 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
-	 * ~ 0.06ppm granularity for NTP. We apply the same 12.5%
-	 * margin as we do in clocksource_max_deferment()
+	 * ~ 0.06ppm granularity for NTP.
 	 */
-	sec = (cs->mask - (cs->mask >> 3));
+	sec = cs->mask;
 	do_div(sec, freq);
 	do_div(sec, scale);
 	if (!sec)
@@ -750,9 +749,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 			       NSEC_PER_SEC / scale, sec * scale);
 
 	/*
-	 * for clocksources that have large mults, to avoid overflow.
-	 * Since mult may be adjusted by ntp, add an safety extra margin
-	 *
+	 * Ensure clocksources that have large mults don't overflow
+	 * when adjusted.
 	 */
 	cs->maxadj = clocksource_max_adjustment(cs);
 	while ((cs->mult + cs->maxadj < cs->mult)
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index 01d2d15..c794b84 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -125,9 +125,9 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 
 	new_mask = CLOCKSOURCE_MASK(bits);
 
-	/* calculate how many ns until we wrap */
+	/* calculate how many ns until we risk wrapping */
 	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask);
-	new_wrap_kt = ns_to_ktime(wrap - (wrap >> 3));
+	new_wrap_kt = ns_to_ktime(wrap);
 
 	/* update epoch for new counter and update epoch_ns from old counter*/
 	new_epoch = read();
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 03/10] clocksource: Remove clocksource_max_deferment()
  2015-01-10  0:34 [PATCH 00/10][RFC] Increased clocksource validation and cleanups John Stultz
  2015-01-10  0:34 ` [PATCH 01/10] clocksource: Simplify clocks_calc_max_nsecs logic John Stultz
  2015-01-10  0:34 ` [PATCH 02/10] clocksource: Simplify logic around clocksource wrapping saftey margins John Stultz
@ 2015-01-10  0:34 ` John Stultz
  2015-01-11 11:47   ` Richard Cochran
  2015-01-10  0:34 ` [PATCH 04/10] clocksource: Add max_cycles to clocksource structure John Stultz
                   ` (7 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: John Stultz @ 2015-01-10  0:34 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

clocksource_max_deferment() doesn't do anything useful
anymore, so zap it.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 20 ++++----------------
 1 file changed, 4 insertions(+), 16 deletions(-)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index e837ffd1..0696559 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -575,20 +575,6 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 	return max_nsecs;
 }
 
-/**
- * clocksource_max_deferment - Returns max time the clocksource should be deferred
- * @cs:         Pointer to clocksource
- *
- */
-static u64 clocksource_max_deferment(struct clocksource *cs)
-{
-	u64 max_nsecs;
-
-	max_nsecs = clocks_calc_max_nsecs(cs->mult, cs->shift, cs->maxadj,
-					  cs->mask);
-	return max_nsecs;
-}
-
 #ifndef CONFIG_ARCH_USES_GETTIMEOFFSET
 
 static struct clocksource *clocksource_find_best(bool oneshot, bool skipcur)
@@ -760,7 +746,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 		cs->maxadj = clocksource_max_adjustment(cs);
 	}
 
-	cs->max_idle_ns = clocksource_max_deferment(cs);
+	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
+						 cs->maxadj, cs->mask);
 }
 EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
 
@@ -807,7 +794,8 @@ int clocksource_register(struct clocksource *cs)
 		cs->name);
 
 	/* calculate max idle time permitted for this clocksource */
-	cs->max_idle_ns = clocksource_max_deferment(cs);
+	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
+						 cs->maxadj, cs->mask);
 
 	mutex_lock(&clocksource_mutex);
 	clocksource_enqueue(cs);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 04/10] clocksource: Add max_cycles to clocksource structure
  2015-01-10  0:34 [PATCH 00/10][RFC] Increased clocksource validation and cleanups John Stultz
                   ` (2 preceding siblings ...)
  2015-01-10  0:34 ` [PATCH 03/10] clocksource: Remove clocksource_max_deferment() John Stultz
@ 2015-01-10  0:34 ` John Stultz
  2015-01-10  2:06   ` Stephen Boyd
  2015-01-10  0:34 ` [PATCH 05/10] time: Add debugging checks to warn if we see delays John Stultz
                   ` (6 subsequent siblings)
  10 siblings, 1 reply; 33+ messages in thread
From: John Stultz @ 2015-01-10  0:34 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

In order to facilitate some clocksource validation,
add a max_cycles entry to the structure which will
hold the maximum cycle value that can safely be
multiplied without potentially causing an overflow.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/clocksource.h |  5 +++--
 kernel/time/clocksource.c   | 15 ++++++++++++---
 kernel/time/sched_clock.c   |  2 +-
 3 files changed, 16 insertions(+), 6 deletions(-)

diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index abcafaa..9b54cb9 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -178,7 +178,7 @@ struct clocksource {
 #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
 	struct arch_clocksource_data archdata;
 #endif
-
+	u64 max_cycles;
 	const char *name;
 	struct list_head list;
 	int rating;
@@ -291,7 +291,8 @@ extern struct clocksource * __init clocksource_default_clock(void);
 extern void clocksource_mark_unstable(struct clocksource *cs);
 
 extern u64
-clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask);
+clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask,
+							u64 *max_cycles);
 extern void
 clocks_calc_mult_shift(u32 *mult, u32 *shift, u32 from, u32 to, u32 minsec);
 
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 0696559..c2f8639 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -545,11 +545,14 @@ static u32 clocksource_max_adjustment(struct clocksource *cs)
  * @shift:	cycle to nanosecond divisor (power of two)
  * @maxadj:	maximum adjustment value to mult (~11%)
  * @mask:	bitmask for two's complement subtraction of non 64 bit counters
+ * @max_cyc:    maximum cycle value before potential overflow (does not include
+ *		any saftey margin)
  *
  * NOTE: This function includes a saftey margin of 50%, so that bad clock values
  * can be detected.
  */
-u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
+u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask,
+								u64 *max_cyc)
 {
 	u64 max_nsecs, max_cycles;
 
@@ -569,6 +572,10 @@ u64 clocks_calc_max_nsecs(u32 mult, u32 shift, u32 maxadj, u64 mask)
 	max_cycles = min(max_cycles, mask);
 	max_nsecs = clocksource_cyc2ns(max_cycles, mult - maxadj, shift);
 
+	/* return the max_cycles value as well if requested */
+	if (max_cyc)
+		*max_cyc = max_cycles;
+
 	/* Return 50% of the actual maximum, so we can detect bad values */
 	max_nsecs >>= 1;
 
@@ -747,7 +754,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 	}
 
 	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
-						 cs->maxadj, cs->mask);
+						 cs->maxadj, cs->mask,
+						 &cs->max_cycles);
 }
 EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
 
@@ -795,7 +803,8 @@ int clocksource_register(struct clocksource *cs)
 
 	/* calculate max idle time permitted for this clocksource */
 	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
-						 cs->maxadj, cs->mask);
+						 cs->maxadj, cs->mask,
+						 &cs->max_cycles);
 
 	mutex_lock(&clocksource_mutex);
 	clocksource_enqueue(cs);
diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
index c794b84..2910f00 100644
--- a/kernel/time/sched_clock.c
+++ b/kernel/time/sched_clock.c
@@ -126,7 +126,7 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
 	new_mask = CLOCKSOURCE_MASK(bits);
 
 	/* calculate how many ns until we risk wrapping */
-	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask);
+	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask, 0);
 	new_wrap_kt = ns_to_ktime(wrap);
 
 	/* update epoch for new counter and update epoch_ns from old counter*/
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 05/10] time: Add debugging checks to warn if we see delays
  2015-01-10  0:34 [PATCH 00/10][RFC] Increased clocksource validation and cleanups John Stultz
                   ` (3 preceding siblings ...)
  2015-01-10  0:34 ` [PATCH 04/10] clocksource: Add max_cycles to clocksource structure John Stultz
@ 2015-01-10  0:34 ` John Stultz
  2015-01-10  0:34 ` [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value John Stultz
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: John Stultz @ 2015-01-10  0:34 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

Recently there's been some request for better sanity
checking in the time code, so that its more clear
when something is going wrong since timekeeping issues
could manifest in a large number of strange ways with
various subsystems.

Thus, this patch adds some extra infrastructure to
add a check update_wall_time to print warnings if we
see the call delayed beyond the max_cycles overflow
point, or beyond the clocksource max_idle_ns value
which is currently 50% of the overflow point.

Tested this a bit by halting qemu for specified
lengths of time to trigger the warnings.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/jiffies.c     |  1 +
 kernel/time/timekeeping.c | 18 ++++++++++++++++++
 2 files changed, 19 insertions(+)

diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index a6a5bf5..7e41390 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -71,6 +71,7 @@ static struct clocksource clocksource_jiffies = {
 	.mask		= 0xffffffff, /*32bits*/
 	.mult		= NSEC_PER_JIFFY << JIFFIES_SHIFT, /* details above */
 	.shift		= JIFFIES_SHIFT,
+	.max_cycles	= 10,
 };
 
 __cacheline_aligned_in_smp DEFINE_SEQLOCK(jiffies_lock);
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6a93185..0dcceba 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -1319,6 +1319,22 @@ static int __init timekeeping_init_ops(void)
 }
 device_initcall(timekeeping_init_ops);
 
+static void timekeeping_check_offset(struct timekeeper *tk, cycle_t offset)
+{
+
+	cycle_t max_cycles = tk->tkr.clock->max_cycles;
+	const char *name = tk->tkr.clock->name;
+
+	if (offset > max_cycles)
+		printk("ERROR: cycle offset (%lld) is larger then"
+			" allowed %s max_cycles (%lld)\n",
+			offset, name, max_cycles);
+	else if (offset > (max_cycles >> 1))
+		printk("WARNING: cycle offset (%lld) is past"
+			" the %s 50%% safety margin (%lld)\n",
+			offset, name, max_cycles>>1);
+}
+
 /*
  * Apply a multiplier adjustment to the timekeeper
  */
@@ -1602,6 +1618,8 @@ void update_wall_time(void)
 	if (offset < real_tk->cycle_interval)
 		goto out;
 
+	timekeeping_check_offset(real_tk, offset);
+
 	/*
 	 * With NO_HZ we may have to accumulate many cycle_intervals
 	 * (think "ticks") worth of time at once. To do this efficiently,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value
  2015-01-10  0:34 [PATCH 00/10][RFC] Increased clocksource validation and cleanups John Stultz
                   ` (4 preceding siblings ...)
  2015-01-10  0:34 ` [PATCH 05/10] time: Add debugging checks to warn if we see delays John Stultz
@ 2015-01-10  0:34 ` John Stultz
  2015-01-11 12:41   ` Richard Cochran
  2015-01-13 11:11   ` Peter Zijlstra
  2015-01-10  0:34 ` [PATCH 07/10] time: Try to catch clocksource delta underflows John Stultz
                   ` (4 subsequent siblings)
  10 siblings, 2 replies; 33+ messages in thread
From: John Stultz @ 2015-01-10  0:34 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

When calculating the current delta since the last tick, we
currently have no hard protections to prevent a multiplciation
overflow from ocurring.

This patch introduces such a cap that limits the read delta
value to the max_cycles value, which is where an overflow would
occur.

There was some concern that capping time like this could cause
problems as we may stop expiring timers, which could go circular
if the timer that triggers time accumulation were misscheduled
too far in the future, which would cause time to stop.

However, since the mult overflow would result in a smaller time
value, we would effectively have the same problem there.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 0dcceba..9740fd8 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -202,6 +202,9 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
 	/* calculate the delta since the last update_wall_time: */
 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
 
+	/* Cap delta value to the max_cycles values to avoid mult overflows */
+	delta = min(delta, tkr->clock->max_cycles);
+
 	nsec = delta * tkr->mult + tkr->xtime_nsec;
 	nsec >>= tkr->shift;
 
@@ -221,6 +224,9 @@ static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk)
 	/* calculate the delta since the last update_wall_time: */
 	delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask);
 
+	/* Cap delta value to the max_cycles values to avoid mult overflows */
+	delta = min(delta, tk->tkr.clock->max_cycles);
+
 	/* convert delta to nanoseconds. */
 	nsec = clocksource_cyc2ns(delta, clock->mult, clock->shift);
 
@@ -482,6 +488,9 @@ static void timekeeping_forward_now(struct timekeeper *tk)
 	delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask);
 	tk->tkr.cycle_last = cycle_now;
 
+	/* Cap delta value to the max_cycles values to avoid mult overflows */
+	delta = min(delta, tk->tkr.clock->max_cycles);
+
 	tk->tkr.xtime_nsec += delta * tk->tkr.mult;
 
 	/* If arch requires, add in get_arch_timeoffset() */
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 07/10] time: Try to catch clocksource delta underflows
  2015-01-10  0:34 [PATCH 00/10][RFC] Increased clocksource validation and cleanups John Stultz
                   ` (5 preceding siblings ...)
  2015-01-10  0:34 ` [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value John Stultz
@ 2015-01-10  0:34 ` John Stultz
  2015-01-10  0:34 ` [PATCH 08/10] clocksource: Mostly kill clocksource_register() John Stultz
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: John Stultz @ 2015-01-10  0:34 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

In the case where there is a broken clocksource
where there are multiple actual clocks that
aren't perfectly aligned, we may see small "negative"
deltas when we subtract now from cycle_last.

The values are actually negative with respect to the
clocksource mask value, not necessarily negative
if cast to a s64, but we can check by checking the
delta see if it is a small (relative to the mask)
negative value (again negative relative to the mask).

If so, we assume we jumped backwards somehow and
instead use zero for our delta.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 9740fd8..4acfc7f 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -224,6 +224,13 @@ static inline s64 timekeeping_get_ns_raw(struct timekeeper *tk)
 	/* calculate the delta since the last update_wall_time: */
 	delta = clocksource_delta(cycle_now, tk->tkr.cycle_last, tk->tkr.mask);
 
+	/*
+	 * Try to catch underflows by checking if we are seeing small
+	 * mask-relative negative values.
+	 */
+	if (((~delta+1) & tk->tkr.mask) < (tk->tkr.mask >> 3))
+		delta = 0;
+
 	/* Cap delta value to the max_cycles values to avoid mult overflows */
 	delta = min(delta, tk->tkr.clock->max_cycles);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 08/10] clocksource: Mostly kill clocksource_register()
  2015-01-10  0:34 [PATCH 00/10][RFC] Increased clocksource validation and cleanups John Stultz
                   ` (6 preceding siblings ...)
  2015-01-10  0:34 ` [PATCH 07/10] time: Try to catch clocksource delta underflows John Stultz
@ 2015-01-10  0:34 ` John Stultz
  2015-01-10  0:34 ` [PATCH 09/10] sparc: Convert to using clocksource_register_hz() John Stultz
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 33+ messages in thread
From: John Stultz @ 2015-01-10  0:34 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra, David S. Miller, Martin Schwidefsky

A long running project has been to cleanup remaining uses
of clocksource_register(), replacing it with the simpler
clocksource_register_khz/hz().

However, there are a few cases where we need to self-define
our mult/shift values, so switch the function to a more
obviously internal __clocksource_register(), and consolidate
much of the internal logic so we don't have duplication.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 arch/s390/kernel/time.c     |  2 +-
 arch/sparc/kernel/time_32.c |  2 +-
 include/linux/clocksource.h | 10 +++++-
 kernel/time/clocksource.c   | 83 +++++++++++++++++++--------------------------
 kernel/time/jiffies.c       |  4 +--
 5 files changed, 47 insertions(+), 54 deletions(-)

diff --git a/arch/s390/kernel/time.c b/arch/s390/kernel/time.c
index 20660dd..6c273cd 100644
--- a/arch/s390/kernel/time.c
+++ b/arch/s390/kernel/time.c
@@ -283,7 +283,7 @@ void __init time_init(void)
 	if (register_external_irq(EXT_IRQ_TIMING_ALERT, timing_alert_interrupt))
 		panic("Couldn't request external interrupt 0x1406");
 
-	if (clocksource_register(&clocksource_tod) != 0)
+	if (__clocksource_register(&clocksource_tod) != 0)
 		panic("Could not register TOD clock source");
 
 	/* Enable TOD clock interrupts on the boot cpu. */
diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c
index 2f80d23..a31c0c8 100644
--- a/arch/sparc/kernel/time_32.c
+++ b/arch/sparc/kernel/time_32.c
@@ -191,7 +191,7 @@ static __init int setup_timer_cs(void)
 	timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate,
 	                                    timer_cs.shift);
 
-	return clocksource_register(&timer_cs);
+	return __clocksource_register(&timer_cs);
 }
 
 #ifdef CONFIG_SMP
diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
index 9b54cb9..ba25e70 100644
--- a/include/linux/clocksource.h
+++ b/include/linux/clocksource.h
@@ -280,7 +280,6 @@ static inline s64 clocksource_cyc2ns(cycle_t cycles, u32 mult, u32 shift)
 }
 
 
-extern int clocksource_register(struct clocksource*);
 extern int clocksource_unregister(struct clocksource*);
 extern void clocksource_touch_watchdog(void);
 extern struct clocksource* clocksource_get_next(void);
@@ -305,6 +304,15 @@ __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq);
 extern void
 __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq);
 
+/*
+ * Dont' call this unless you're a default clocksource
+ * (AKA: jiffies) and absolutely have to.
+ */
+static inline int __clocksource_register(struct clocksource *cs)
+{
+	return __clocksource_register_scale(cs, 1, 0);
+}
+
 static inline int clocksource_register_hz(struct clocksource *cs, u32 hz)
 {
 	return __clocksource_register_scale(cs, 1, hz);
diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index c2f8639..9a0b951 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -721,38 +721,52 @@ static void clocksource_enqueue(struct clocksource *cs)
 void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 {
 	u64 sec;
+
 	/*
-	 * Calc the maximum number of seconds which we can run before
-	 * wrapping around. For clocksources which have a mask > 32bit
-	 * we need to limit the max sleep time to have a good
-	 * conversion precision. 10 minutes is still a reasonable
-	 * amount. That results in a shift value of 24 for a
-	 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
-	 * ~ 0.06ppm granularity for NTP.
+	 * Default clocksources are *special* and self-define their mult/shift.
+	 * But, you're not special, so you should specify a freq value.
 	 */
-	sec = cs->mask;
-	do_div(sec, freq);
-	do_div(sec, scale);
-	if (!sec)
-		sec = 1;
-	else if (sec > 600 && cs->mask > UINT_MAX)
-		sec = 600;
-
-	clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
-			       NSEC_PER_SEC / scale, sec * scale);
-
+	if (freq) {
+		/*
+		 * Calc the maximum number of seconds which we can run before
+		 * wrapping around. For clocksources which have a mask > 32bit
+		 * we need to limit the max sleep time to have a good
+		 * conversion precision. 10 minutes is still a reasonable
+		 * amount. That results in a shift value of 24 for a
+		 * clocksource with mask >= 40bit and f >= 4GHz. That maps to
+		 * ~ 0.06ppm granularity for NTP.
+		 */
+		sec = cs->mask;
+		do_div(sec, freq);
+		do_div(sec, scale);
+		if (!sec)
+			sec = 1;
+		else if (sec > 600 && cs->mask > UINT_MAX)
+			sec = 600;
+
+		clocks_calc_mult_shift(&cs->mult, &cs->shift, freq,
+				       NSEC_PER_SEC / scale, sec * scale);
+	}
 	/*
 	 * Ensure clocksources that have large mults don't overflow
 	 * when adjusted.
 	 */
 	cs->maxadj = clocksource_max_adjustment(cs);
-	while ((cs->mult + cs->maxadj < cs->mult)
-		|| (cs->mult - cs->maxadj > cs->mult)) {
+	while (freq && ((cs->mult + cs->maxadj < cs->mult)
+		|| (cs->mult - cs->maxadj > cs->mult))) {
 		cs->mult >>= 1;
 		cs->shift--;
 		cs->maxadj = clocksource_max_adjustment(cs);
 	}
 
+	/*
+	 * Only warn for *special* clocksources that self-define
+	 * their mult/shift values and don't specify a freq.
+	 */
+	WARN_ONCE(cs->mult + cs->maxadj < cs->mult,
+		"Clocksource %s might overflow on 11%% adjustment\n",
+		cs->name);
+
 	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
 						 cs->maxadj, cs->mask,
 						 &cs->max_cycles);
@@ -786,35 +800,6 @@ int __clocksource_register_scale(struct clocksource *cs, u32 scale, u32 freq)
 }
 EXPORT_SYMBOL_GPL(__clocksource_register_scale);
 
-
-/**
- * clocksource_register - Used to install new clocksources
- * @cs:		clocksource to be registered
- *
- * Returns -EBUSY if registration fails, zero otherwise.
- */
-int clocksource_register(struct clocksource *cs)
-{
-	/* calculate max adjustment for given mult/shift */
-	cs->maxadj = clocksource_max_adjustment(cs);
-	WARN_ONCE(cs->mult + cs->maxadj < cs->mult,
-		"Clocksource %s might overflow on 11%% adjustment\n",
-		cs->name);
-
-	/* calculate max idle time permitted for this clocksource */
-	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
-						 cs->maxadj, cs->mask,
-						 &cs->max_cycles);
-
-	mutex_lock(&clocksource_mutex);
-	clocksource_enqueue(cs);
-	clocksource_enqueue_watchdog(cs);
-	clocksource_select();
-	mutex_unlock(&clocksource_mutex);
-	return 0;
-}
-EXPORT_SYMBOL(clocksource_register);
-
 static void __clocksource_change_rating(struct clocksource *cs, int rating)
 {
 	list_del(&cs->list);
diff --git a/kernel/time/jiffies.c b/kernel/time/jiffies.c
index 7e41390..c4bb518 100644
--- a/kernel/time/jiffies.c
+++ b/kernel/time/jiffies.c
@@ -95,7 +95,7 @@ EXPORT_SYMBOL(jiffies);
 
 static int __init init_jiffies_clocksource(void)
 {
-	return clocksource_register(&clocksource_jiffies);
+	return __clocksource_register(&clocksource_jiffies);
 }
 
 core_initcall(init_jiffies_clocksource);
@@ -131,6 +131,6 @@ int register_refined_jiffies(long cycles_per_second)
 
 	refined_jiffies.mult = ((u32)nsec_per_tick) << JIFFIES_SHIFT;
 
-	clocksource_register(&refined_jiffies);
+	__clocksource_register(&refined_jiffies);
 	return 0;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 09/10] sparc: Convert to using clocksource_register_hz()
  2015-01-10  0:34 [PATCH 00/10][RFC] Increased clocksource validation and cleanups John Stultz
                   ` (7 preceding siblings ...)
  2015-01-10  0:34 ` [PATCH 08/10] clocksource: Mostly kill clocksource_register() John Stultz
@ 2015-01-10  0:34 ` John Stultz
  2015-01-10  0:34 ` [PATCH 10/10] clocksource: Add some debug info about clocksources being registered John Stultz
  2015-01-11 11:41 ` [PATCH 00/10][RFC] Increased clocksource validation and cleanups Richard Cochran
  10 siblings, 0 replies; 33+ messages in thread
From: John Stultz @ 2015-01-10  0:34 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra, David S. Miller

While cleaning up some clocksource code, I noticed the
time_32 impelementation uses the hz2mult helper, but
doesn't use the clocksource_register_hz() method.

I don't believe the sparc clocksource is a default
clocksource, so we shouldn't need to self-define
the mult/shift pair.

So convert the time_32.c implementation to use
clocksource_register_hz().

Untested.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: "David S. Miller" <davem@davemloft.net>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 arch/sparc/kernel/time_32.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/arch/sparc/kernel/time_32.c b/arch/sparc/kernel/time_32.c
index a31c0c8..18147a5 100644
--- a/arch/sparc/kernel/time_32.c
+++ b/arch/sparc/kernel/time_32.c
@@ -181,17 +181,13 @@ static struct clocksource timer_cs = {
 	.rating	= 100,
 	.read	= timer_cs_read,
 	.mask	= CLOCKSOURCE_MASK(64),
-	.shift	= 2,
 	.flags	= CLOCK_SOURCE_IS_CONTINUOUS,
 };
 
 static __init int setup_timer_cs(void)
 {
 	timer_cs_enabled = 1;
-	timer_cs.mult = clocksource_hz2mult(sparc_config.clock_rate,
-	                                    timer_cs.shift);
-
-	return __clocksource_register(&timer_cs);
+	return clocksource_register_hz(&timer_cs, sparc_config.clock_rate);
 }
 
 #ifdef CONFIG_SMP
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 10/10] clocksource: Add some debug info about clocksources being registered
  2015-01-10  0:34 [PATCH 00/10][RFC] Increased clocksource validation and cleanups John Stultz
                   ` (8 preceding siblings ...)
  2015-01-10  0:34 ` [PATCH 09/10] sparc: Convert to using clocksource_register_hz() John Stultz
@ 2015-01-10  0:34 ` John Stultz
  2015-01-10  2:02   ` Stephen Boyd
  2015-01-11 11:41 ` [PATCH 00/10][RFC] Increased clocksource validation and cleanups Richard Cochran
  10 siblings, 1 reply; 33+ messages in thread
From: John Stultz @ 2015-01-10  0:34 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: John Stultz, Dave Jones, Linus Torvalds, Thomas Gleixner,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

Print the mask, max_cycles, and max_idle_ns values for clocksources
being registered.

Cc: Dave Jones <davej@codemonkey.org.uk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <sboyd@codeaurora.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/clocksource.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
index 9a0b951..c641aa7 100644
--- a/kernel/time/clocksource.c
+++ b/kernel/time/clocksource.c
@@ -770,6 +770,10 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
 	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
 						 cs->maxadj, cs->mask,
 						 &cs->max_cycles);
+
+	pr_info("clocksource %s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: %lld ns\n",
+			cs->name, cs->mask, cs->max_cycles, cs->max_idle_ns);
+
 }
 EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/10] clocksource: Add some debug info about clocksources being registered
  2015-01-10  0:34 ` [PATCH 10/10] clocksource: Add some debug info about clocksources being registered John Stultz
@ 2015-01-10  2:02   ` Stephen Boyd
  2015-01-22  0:51     ` John Stultz
  0 siblings, 1 reply; 33+ messages in thread
From: Stephen Boyd @ 2015-01-10  2:02 UTC (permalink / raw)
  To: John Stultz, Linux Kernel Mailing List
  Cc: Dave Jones, Linus Torvalds, Thomas Gleixner, Richard Cochran,
	Prarit Bhargava, Ingo Molnar, Peter Zijlstra

On 01/09/2015 04:34 PM, John Stultz wrote:
> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
> index 9a0b951..c641aa7 100644
> --- a/kernel/time/clocksource.c
> +++ b/kernel/time/clocksource.c
> @@ -770,6 +770,10 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
>  	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
>  						 cs->maxadj, cs->mask,
>  						 &cs->max_cycles);
> +
> +	pr_info("clocksource %s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: %lld ns\n",
> +			cs->name, cs->mask, cs->max_cycles, cs->max_idle_ns);

Is this intentionally info level? Or was it supposed to be debug level
per $subject?

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 02/10] clocksource: Simplify logic around clocksource wrapping saftey margins
  2015-01-10  0:34 ` [PATCH 02/10] clocksource: Simplify logic around clocksource wrapping saftey margins John Stultz
@ 2015-01-10  2:03   ` Stephen Boyd
  0 siblings, 0 replies; 33+ messages in thread
From: Stephen Boyd @ 2015-01-10  2:03 UTC (permalink / raw)
  To: John Stultz, Linux Kernel Mailing List
  Cc: Dave Jones, Linus Torvalds, Thomas Gleixner, Richard Cochran,
	Prarit Bhargava, Ingo Molnar, Peter Zijlstra

On 01/09/2015 04:34 PM, John Stultz wrote:
> The clocksource logic has a number of places where we try to
> include a safety margin. Most of these are 12% safety margins,
> but they are inconsistently applied and sometimes are applied
> on top of each other.
>
> Additionally, in the previous patch, we corrected an issue
> where we unintentionally in effect created a 50% saftey margin,
> which these 12.5% margins where then added to.
>
> So to siplify the logic here, this patch removes the various
> 12.5% margins, and consolidates adding the margin in one place:
> clocks_calc_max_nsecs().
>
> Addtionally, Linus prefers a 50% safety margin, as it allows
> bad clock values to be more easily caught. This should really
> have no net effect, due to the corrected issue earlier which
> caused greater then 50% margins to be used w/o issue.
>
> Cc: Dave Jones <davej@codemonkey.org.uk>
> Cc: Linus Torvalds <torvalds@linux-foundation.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Stephen Boyd <sboyd@codeaurora.org>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---

For sched_clock.c part

Acked-by: Stephen Boyd <sboyd@codeaurora.org>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 04/10] clocksource: Add max_cycles to clocksource structure
  2015-01-10  0:34 ` [PATCH 04/10] clocksource: Add max_cycles to clocksource structure John Stultz
@ 2015-01-10  2:06   ` Stephen Boyd
  0 siblings, 0 replies; 33+ messages in thread
From: Stephen Boyd @ 2015-01-10  2:06 UTC (permalink / raw)
  To: John Stultz, Linux Kernel Mailing List
  Cc: Dave Jones, Linus Torvalds, Thomas Gleixner, Richard Cochran,
	Prarit Bhargava, Ingo Molnar, Peter Zijlstra

On 01/09/2015 04:34 PM, John Stultz wrote:
> diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h
> index abcafaa..9b54cb9 100644
> --- a/include/linux/clocksource.h
> +++ b/include/linux/clocksource.h
> @@ -178,7 +178,7 @@ struct clocksource {
>  #ifdef CONFIG_ARCH_CLOCKSOURCE_DATA
>  	struct arch_clocksource_data archdata;
>  #endif
> -
> +	u64 max_cycles;

Nitpick: Update kernel-doc for new field?

>  	const char *name;
>  	struct list_head list;
>  	int rating;
> @@ -291,7 +291,8 @@ extern struct clocksource * __init clocksource_default_clock(void);
> diff --git a/kernel/time/sched_clock.c b/kernel/time/sched_clock.c
> index c794b84..2910f00 100644
> --- a/kernel/time/sched_clock.c
> +++ b/kernel/time/sched_clock.c
> @@ -126,7 +126,7 @@ void __init sched_clock_register(u64 (*read)(void), int bits,
>  	new_mask = CLOCKSOURCE_MASK(bits);
>  
>  	/* calculate how many ns until we risk wrapping */
> -	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask);
> +	wrap = clocks_calc_max_nsecs(new_mult, new_shift, 0, new_mask, 0);

Should be NULL instead of 0? Otherwise I think sparse would complain.

Otherwise:

Acked-by: Stephen Boyd <sboyd@codeaurora.org>

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/10][RFC] Increased clocksource validation and cleanups
  2015-01-10  0:34 [PATCH 00/10][RFC] Increased clocksource validation and cleanups John Stultz
                   ` (9 preceding siblings ...)
  2015-01-10  0:34 ` [PATCH 10/10] clocksource: Add some debug info about clocksources being registered John Stultz
@ 2015-01-11 11:41 ` Richard Cochran
  2015-01-12 18:22   ` John Stultz
  10 siblings, 1 reply; 33+ messages in thread
From: Richard Cochran @ 2015-01-11 11:41 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra, David S. Miller, Martin Schwidefsky

On Fri, Jan 09, 2015 at 04:34:18PM -0800, John Stultz wrote:
> So this series is the result of earlier discussions with Linus
> and his suggestions around improvements to clocksource validation
> in the hope we can more easily catch bad hardware.

Why penalize most users just because of a random hardware failure?
Why not make the "catching code" a compile time option?

Thanks,
Richard


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/10] clocksource: Remove clocksource_max_deferment()
  2015-01-10  0:34 ` [PATCH 03/10] clocksource: Remove clocksource_max_deferment() John Stultz
@ 2015-01-11 11:47   ` Richard Cochran
  2015-01-12 18:36     ` John Stultz
  0 siblings, 1 reply; 33+ messages in thread
From: Richard Cochran @ 2015-01-11 11:47 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra


This series added:

+	/* Return 50% of the actual maximum, so we can detect bad values */
+	max_nsecs >>= 1;

and then...

On Fri, Jan 09, 2015 at 04:34:21PM -0800, John Stultz wrote:
> @@ -760,7 +746,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
>  		cs->maxadj = clocksource_max_adjustment(cs);
>  	}
>  
> -	cs->max_idle_ns = clocksource_max_deferment(cs);
> +	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
> +						 cs->maxadj, cs->mask);
>  }
>  EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
>  
> @@ -807,7 +794,8 @@ int clocksource_register(struct clocksource *cs)
>  		cs->name);
>  
>  	/* calculate max idle time permitted for this clocksource */
> -	cs->max_idle_ns = clocksource_max_deferment(cs);
> +	cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
> +						 cs->maxadj, cs->mask);

... the whole world's maximum idle time is artificially reduced by
half in order to catch some rare HW bug?  Not a very green solution.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value
  2015-01-10  0:34 ` [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value John Stultz
@ 2015-01-11 12:41   ` Richard Cochran
  2015-01-12 18:54     ` John Stultz
  2015-01-13 11:11   ` Peter Zijlstra
  1 sibling, 1 reply; 33+ messages in thread
From: Richard Cochran @ 2015-01-11 12:41 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

On Fri, Jan 09, 2015 at 04:34:24PM -0800, John Stultz wrote:
> When calculating the current delta since the last tick, we
> currently have no hard protections to prevent a multiplciation
> overflow from ocurring.

This is just papering over the problem. The "hard protection" should
be having a tick scheduled before the range of the clock source is
exhausted.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/10][RFC] Increased clocksource validation and cleanups
  2015-01-11 11:41 ` [PATCH 00/10][RFC] Increased clocksource validation and cleanups Richard Cochran
@ 2015-01-12 18:22   ` John Stultz
  2015-01-12 20:45     ` Richard Cochran
  0 siblings, 1 reply; 33+ messages in thread
From: John Stultz @ 2015-01-12 18:22 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra, David S. Miller, Martin Schwidefsky

On Sun, Jan 11, 2015 at 3:41 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Fri, Jan 09, 2015 at 04:34:18PM -0800, John Stultz wrote:
>> So this series is the result of earlier discussions with Linus
>> and his suggestions around improvements to clocksource validation
>> in the hope we can more easily catch bad hardware.
>
> Why penalize most users just because of a random hardware failure?
> Why not make the "catching code" a compile time option?

Yea, I've not looked at the actual performance impact yet, but things
like the read-time capping (which is in the hot path) could be put
under a debug config. Thanks for the suggestion!
-john

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/10] clocksource: Remove clocksource_max_deferment()
  2015-01-11 11:47   ` Richard Cochran
@ 2015-01-12 18:36     ` John Stultz
  2015-01-12 20:16       ` Richard Cochran
  0 siblings, 1 reply; 33+ messages in thread
From: John Stultz @ 2015-01-12 18:36 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

On Sun, Jan 11, 2015 at 3:47 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
>
> This series added:
>
> +       /* Return 50% of the actual maximum, so we can detect bad values */
> +       max_nsecs >>= 1;
>
> and then...
>
> On Fri, Jan 09, 2015 at 04:34:21PM -0800, John Stultz wrote:
>> @@ -760,7 +746,8 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
>>               cs->maxadj = clocksource_max_adjustment(cs);
>>       }
>>
>> -     cs->max_idle_ns = clocksource_max_deferment(cs);
>> +     cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
>> +                                              cs->maxadj, cs->mask);
>>  }
>>  EXPORT_SYMBOL_GPL(__clocksource_updatefreq_scale);
>>
>> @@ -807,7 +794,8 @@ int clocksource_register(struct clocksource *cs)
>>               cs->name);
>>
>>       /* calculate max idle time permitted for this clocksource */
>> -     cs->max_idle_ns = clocksource_max_deferment(cs);
>> +     cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
>> +                                              cs->maxadj, cs->mask);
>
> ... the whole world's maximum idle time is artificially reduced by
> half in order to catch some rare HW bug?  Not a very green solution.

So, the first patch had a cleanup which removed case where the max
mult value was being calculated assuming nanoseconds was a s64 instead
of a u64, which resulted in the max_idle_ns to be halved. So this
doesn't actually cost us more over what the current kernel does.

In fact, the cleanup removed the extra 12.5% reductions that were
applied, so for the HPET in qemu on my system, the max idle goes from
~16 secs to ~21 secs with this patchset (if I'm remembering
correctly).

But I'm open to put this under a debug config if its justified (part
of Linus' concern w/ clocksource code is that its had a few spots that
were needlessly complex, so some weighing of performance/power vs
simplified logic should be done). Do you have a specific case in mind
that you're worried about?

thanks
-john

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value
  2015-01-11 12:41   ` Richard Cochran
@ 2015-01-12 18:54     ` John Stultz
  2015-01-12 19:02       ` Linus Torvalds
  2015-01-12 20:30       ` Richard Cochran
  0 siblings, 2 replies; 33+ messages in thread
From: John Stultz @ 2015-01-12 18:54 UTC (permalink / raw)
  To: Richard Cochran
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

On Sun, Jan 11, 2015 at 4:41 AM, Richard Cochran
<richardcochran@gmail.com> wrote:
> On Fri, Jan 09, 2015 at 04:34:24PM -0800, John Stultz wrote:
>> When calculating the current delta since the last tick, we
>> currently have no hard protections to prevent a multiplciation
>> overflow from ocurring.
>
> This is just papering over the problem. The "hard protection" should
> be having a tick scheduled before the range of the clock source is
> exhausted.

So I disagree this is papering over the problem.

You say the tick should be scheduled before the clocksource wraps -
but we have logic to do that.

However there are many ways that can still go wrong.  Virtualization
can delay interrupts for long periods of time, the timer/irq code
isn't the simplest and there can be bugs, or timer hardware itself can
have issues. The difficulty is that when something has gone wrong, the
only thing we have to measure the problem may become corrupted.  And
worse, once the timekeeping code is having problems,  that can result
in bugs that manifest in all sorts of strange ways that are very
difficult to debug (you can't trust your log timestamps, etc).

So I think having some extra measures of protection is useful here.

I'll admit that its always difficult to manage, since we have to layer
our checks, we have circular dependencies (timer code needs
timekeeping to be correct, timekeeping code needs timer code to be
correct), and hardware problems are rampant - so we get trouble like
the clocksource watchdog which uses more trustworthy clocksources to
watch less trustworthy ones, but then hardware starts adding bugs to
the trustworthy ones which cause false positives, etc.   And these
checks make the code add complexity to the code that we'd be happier
without, but we can't throw out supporting the majority of hardware
that have some quirk and imperfection, so I'm not sure what the
alternative should be.

thanks
-john

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value
  2015-01-12 18:54     ` John Stultz
@ 2015-01-12 19:02       ` Linus Torvalds
  2015-01-12 20:37         ` Richard Cochran
  2015-01-12 20:30       ` Richard Cochran
  1 sibling, 1 reply; 33+ messages in thread
From: Linus Torvalds @ 2015-01-12 19:02 UTC (permalink / raw)
  To: John Stultz
  Cc: Richard Cochran, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

On Tue, Jan 13, 2015 at 7:54 AM, John Stultz <john.stultz@linaro.org> wrote:
>
> So I disagree this is papering over the problem.

Indeed. It's making things more robust in the face of _known_ issues.
Even with a perfectly designed timer (which we so far have never
seen), interrupts get delayed etc, so trying to stretch it to the
limit of the timer is simply not a good idea. Quite the reverse.

More importantly, if the timer is actually any good, the safety margin
won't actually matter, since the timer cycle is so long that 50% of
essentially infinite is still essentially infinite.

And if the timer isn't very good, then some slop for safety is just
being robust.

At no point is it "papering over" anything at all.

           Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 03/10] clocksource: Remove clocksource_max_deferment()
  2015-01-12 18:36     ` John Stultz
@ 2015-01-12 20:16       ` Richard Cochran
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Cochran @ 2015-01-12 20:16 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

On Mon, Jan 12, 2015 at 10:36:58AM -0800, John Stultz wrote:
> So, the first patch had a cleanup which removed case where the max
> mult value was being calculated assuming nanoseconds was a s64 instead
> of a u64, which resulted in the max_idle_ns to be halved. So this
> doesn't actually cost us more over what the current kernel does.

Ok, fine, missed that.

Thanks
Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value
  2015-01-12 18:54     ` John Stultz
  2015-01-12 19:02       ` Linus Torvalds
@ 2015-01-12 20:30       ` Richard Cochran
  2015-01-12 20:49         ` Richard Cochran
  1 sibling, 1 reply; 33+ messages in thread
From: Richard Cochran @ 2015-01-12 20:30 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

On Mon, Jan 12, 2015 at 10:54:50AM -0800, John Stultz wrote:
> On Sun, Jan 11, 2015 at 4:41 AM, Richard Cochran
> <richardcochran@gmail.com> wrote:
> > On Fri, Jan 09, 2015 at 04:34:24PM -0800, John Stultz wrote:
> >> When calculating the current delta since the last tick, we
> >> currently have no hard protections to prevent a multiplciation
> >> overflow from ocurring.
> >
> > This is just papering over the problem. The "hard protection" should
> > be having a tick scheduled before the range of the clock source is
> > exhausted.
> 
> So I disagree this is papering over the problem.
> 
> You say the tick should be scheduled before the clocksource wraps -
> but we have logic to do that.

Well that is a shame.  To my way of thinking, having a reliable
watchdog (clock readout) at half the period would be a real solution.
Yes, I do mean providing some sort of "soft real time" guarantee.

What is the use case here?  I thought we are trying to fix unreliable
clocks with random jumps.  It is hard to see how substituting
MAX_DURATION for RANDOM_JUMP_VALUE is helping to catch bad hardware.
 
> However there are many ways that can still go wrong.  Virtualization
> can delay interrupts for long periods of time,

fixable with some soft RT?

> the timer/irq code isn't the simplest and there can be bugs,

simplify and fix?

> or timer hardware itself can have issues.

for this we can have a compile time timer validation module, just like
we have for locks, mutexs, rcu, etc.

> The difficulty is that when something has gone wrong, the
> only thing we have to measure the problem may become corrupted.  And
> worse, once the timekeeping code is having problems,  that can result
> in bugs that manifest in all sorts of strange ways that are very
> difficult to debug (you can't trust your log timestamps, etc).

But this this patch make the timestamps trustworthy?  Not really.
 
Thanks,
Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value
  2015-01-12 19:02       ` Linus Torvalds
@ 2015-01-12 20:37         ` Richard Cochran
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Cochran @ 2015-01-12 20:37 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: John Stultz, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

On Tue, Jan 13, 2015 at 08:02:53AM +1300, Linus Torvalds wrote:
> Indeed. It's making things more robust in the face of _known_ issues.
> Even with a perfectly designed timer (which we so far have never
> seen), interrupts get delayed etc, so trying to stretch it to the
> limit of the timer is simply not a good idea. Quite the reverse.

So is this patch supposed to fix the case when a tick just missed the
range of the clock?  Or is this to deal with really broken sources?

> More importantly, if the timer is actually any good, the safety margin
> won't actually matter, since the timer cycle is so long that 50% of
> essentially infinite is still essentially infinite.
> 
> And if the timer isn't very good, then some slop for safety is just
> being robust.

Here "isn't very good" means that the clock rolls over too frequently.
Well, if you cannot be sure to sample the clock in time, then you
shouldn't use that clock source at all.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 00/10][RFC] Increased clocksource validation and cleanups
  2015-01-12 18:22   ` John Stultz
@ 2015-01-12 20:45     ` Richard Cochran
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Cochran @ 2015-01-12 20:45 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra, David S. Miller, Martin Schwidefsky

On Mon, Jan 12, 2015 at 10:22:11AM -0800, John Stultz wrote:
> Yea, I've not looked at the actual performance impact yet, but things
> like the read-time capping (which is in the hot path) could be put
> under a debug config. Thanks for the suggestion!

Having a broken clock is like having a broken memory card.  Both will
make your system behave very strangely.  Once you start having really
strange issues, you run a dedicated memory test.  That is why I would
expect to have clock validation as compile time option.

Thanks,
Richard

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value
  2015-01-12 20:30       ` Richard Cochran
@ 2015-01-12 20:49         ` Richard Cochran
  0 siblings, 0 replies; 33+ messages in thread
From: Richard Cochran @ 2015-01-12 20:49 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Prarit Bhargava, Stephen Boyd, Ingo Molnar,
	Peter Zijlstra

On Mon, Jan 12, 2015 at 09:30:07PM +0100, Richard Cochran wrote:
> On Mon, Jan 12, 2015 at 10:54:50AM -0800, John Stultz wrote:
> > You say the tick should be scheduled before the clocksource wraps -
> > but we have logic to do that.
> 
> Well that is a shame.

Arg, I mean, not a shame that we have logic for that, but rather that
the logic is not reliable enough!

Sorry
Richard



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value
  2015-01-10  0:34 ` [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value John Stultz
  2015-01-11 12:41   ` Richard Cochran
@ 2015-01-13 11:11   ` Peter Zijlstra
  2015-01-13 21:33     ` John Stultz
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2015-01-13 11:11 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Richard Cochran, Prarit Bhargava, Stephen Boyd,
	Ingo Molnar

On Fri, Jan 09, 2015 at 04:34:24PM -0800, John Stultz wrote:
> When calculating the current delta since the last tick, we
> currently have no hard protections to prevent a multiplciation
> overflow from ocurring.
> 
> This patch introduces such a cap that limits the read delta
> value to the max_cycles value, which is where an overflow would
> occur.

> +++ b/kernel/time/timekeeping.c
> @@ -202,6 +202,9 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
>  	/* calculate the delta since the last update_wall_time: */
>  	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
>  
> +	/* Cap delta value to the max_cycles values to avoid mult overflows */
> +	delta = min(delta, tkr->clock->max_cycles);
> +
>  	nsec = delta * tkr->mult + tkr->xtime_nsec;
>  	nsec >>= tkr->shift;
>  

So while I appreciate stuff can be broken, should we not at least keep
track of this brokenness? That is, we all agree bad things happened IF
we actually hit this, right? So should we then not inform people that
bad things did happen?

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value
  2015-01-13 11:11   ` Peter Zijlstra
@ 2015-01-13 21:33     ` John Stultz
  2015-01-13 22:51       ` Linus Torvalds
  2015-01-14  9:35       ` Peter Zijlstra
  0 siblings, 2 replies; 33+ messages in thread
From: John Stultz @ 2015-01-13 21:33 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Richard Cochran, Prarit Bhargava, Stephen Boyd,
	Ingo Molnar

On Tue, Jan 13, 2015 at 3:11 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Fri, Jan 09, 2015 at 04:34:24PM -0800, John Stultz wrote:
>> When calculating the current delta since the last tick, we
>> currently have no hard protections to prevent a multiplciation
>> overflow from ocurring.
>>
>> This patch introduces such a cap that limits the read delta
>> value to the max_cycles value, which is where an overflow would
>> occur.
>
>> +++ b/kernel/time/timekeeping.c
>> @@ -202,6 +202,9 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
>>       /* calculate the delta since the last update_wall_time: */
>>       delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
>>
>> +     /* Cap delta value to the max_cycles values to avoid mult overflows */
>> +     delta = min(delta, tkr->clock->max_cycles);
>> +
>>       nsec = delta * tkr->mult + tkr->xtime_nsec;
>>       nsec >>= tkr->shift;
>>
>
> So while I appreciate stuff can be broken, should we not at least keep
> track of this brokenness? That is, we all agree bad things happened IF
> we actually hit this, right? So should we then not inform people that
> bad things did happen?

So since this is a time reading function, this could be called
anywhere. So I'm hesitant to try to printk anything in such a hot
path. Though, if we catch such a large delta during the timekeeping
update function, we will print a warning (which is done in an earlier
patch in the series).

Were you thinking of something else maybe? I guess we could set a flag
and then print later (if there is a later), but we'd lose much of the
context of what went wrong.

thanks
-john

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value
  2015-01-13 21:33     ` John Stultz
@ 2015-01-13 22:51       ` Linus Torvalds
  2015-01-14  9:35       ` Peter Zijlstra
  1 sibling, 0 replies; 33+ messages in thread
From: Linus Torvalds @ 2015-01-13 22:51 UTC (permalink / raw)
  To: John Stultz
  Cc: Peter Zijlstra, Linux Kernel Mailing List, Dave Jones,
	Thomas Gleixner, Richard Cochran, Prarit Bhargava, Stephen Boyd,
	Ingo Molnar

On Wed, Jan 14, 2015 at 10:33 AM, John Stultz <john.stultz@linaro.org> wrote:
>
> So since this is a time reading function, this could be called
> anywhere.

Indeed. Could, and is.

>From within the scheduler, with some very core locks held. From within
printk itself (more really core locks held). From the tracer (which in
turn can be from pretty much anything else). From various lockless and
critical places. And from very early in the boot sequence when almost
nothing is set up yet, etc etc.

You can't print from that context, you can't even disable preemption
(because the boot sequence seems to get unhappy when you re-enable
preemption before things are really set up), you can't do much at all.
It's why my hacky patch just read the state and updated an error flag
(and possibly wrote a backtrace), to be printed out later.

I think we could possibly do more if being very careful. But the code
word really is "very careful indeed". It would probably be ok to use a
raw irq-disable spinlock. Maybe the preempt-disable works too if you
end up using some raw version that still works during boot etc etc.

             Linus

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value
  2015-01-13 21:33     ` John Stultz
  2015-01-13 22:51       ` Linus Torvalds
@ 2015-01-14  9:35       ` Peter Zijlstra
  2015-01-22 20:55         ` John Stultz
  1 sibling, 1 reply; 33+ messages in thread
From: Peter Zijlstra @ 2015-01-14  9:35 UTC (permalink / raw)
  To: John Stultz
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Richard Cochran, Prarit Bhargava, Stephen Boyd,
	Ingo Molnar

On Tue, Jan 13, 2015 at 01:33:29PM -0800, John Stultz wrote:
> On Tue, Jan 13, 2015 at 3:11 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> > On Fri, Jan 09, 2015 at 04:34:24PM -0800, John Stultz wrote:
> >> When calculating the current delta since the last tick, we
> >> currently have no hard protections to prevent a multiplciation
> >> overflow from ocurring.
> >>
> >> This patch introduces such a cap that limits the read delta
> >> value to the max_cycles value, which is where an overflow would
> >> occur.
> >
> >> +++ b/kernel/time/timekeeping.c
> >> @@ -202,6 +202,9 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
> >>       /* calculate the delta since the last update_wall_time: */
> >>       delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
> >>
> >> +     /* Cap delta value to the max_cycles values to avoid mult overflows */
> >> +     delta = min(delta, tkr->clock->max_cycles);
> >> +
> >>       nsec = delta * tkr->mult + tkr->xtime_nsec;
> >>       nsec >>= tkr->shift;
> >>
> >
> > So while I appreciate stuff can be broken, should we not at least keep
> > track of this brokenness? That is, we all agree bad things happened IF
> > we actually hit this, right? So should we then not inform people that
> > bad things did happen?
> 
> So since this is a time reading function, this could be called
> anywhere. So I'm hesitant to try to printk anything in such a hot
> path. Though, if we catch such a large delta during the timekeeping
> update function, we will print a warning (which is done in an earlier
> patch in the series).
> 
> Were you thinking of something else maybe? I guess we could set a flag
> and then print later (if there is a later), but we'd lose much of the
> context of what went wrong.

Maybe a stats counter? In any case, keeping it silent seems the wrong
thing.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/10] clocksource: Add some debug info about clocksources being registered
  2015-01-10  2:02   ` Stephen Boyd
@ 2015-01-22  0:51     ` John Stultz
  2015-01-22 12:27       ` Prarit Bhargava
  0 siblings, 1 reply; 33+ messages in thread
From: John Stultz @ 2015-01-22  0:51 UTC (permalink / raw)
  To: Stephen Boyd
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Richard Cochran, Prarit Bhargava, Ingo Molnar,
	Peter Zijlstra

On Fri, Jan 9, 2015 at 6:02 PM, Stephen Boyd <sboyd@codeaurora.org> wrote:
> On 01/09/2015 04:34 PM, John Stultz wrote:
>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>> index 9a0b951..c641aa7 100644
>> --- a/kernel/time/clocksource.c
>> +++ b/kernel/time/clocksource.c
>> @@ -770,6 +770,10 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
>>       cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
>>                                                cs->maxadj, cs->mask,
>>                                                &cs->max_cycles);
>> +
>> +     pr_info("clocksource %s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: %lld ns\n",
>> +                     cs->name, cs->mask, cs->max_cycles, cs->max_idle_ns);
>
> Is this intentionally info level? Or was it supposed to be debug level
> per $subject?

Sorry for not getting back on this sooner. Since this is a
one-time-at-bootup message, I think enabling the pr_debug output is a
little too finicky for most folks to boot with. So I left it at
pr_info. If folks object, I can change it, but it just seems less
likely we'd get any useful data if it was so hidden by default.

thanks
-john

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 10/10] clocksource: Add some debug info about clocksources being registered
  2015-01-22  0:51     ` John Stultz
@ 2015-01-22 12:27       ` Prarit Bhargava
  0 siblings, 0 replies; 33+ messages in thread
From: Prarit Bhargava @ 2015-01-22 12:27 UTC (permalink / raw)
  To: John Stultz
  Cc: Stephen Boyd, Linux Kernel Mailing List, Dave Jones,
	Linus Torvalds, Thomas Gleixner, Richard Cochran, Ingo Molnar,
	Peter Zijlstra



On 01/21/2015 07:51 PM, John Stultz wrote:
> On Fri, Jan 9, 2015 at 6:02 PM, Stephen Boyd <sboyd@codeaurora.org> wrote:
>> On 01/09/2015 04:34 PM, John Stultz wrote:
>>> diff --git a/kernel/time/clocksource.c b/kernel/time/clocksource.c
>>> index 9a0b951..c641aa7 100644
>>> --- a/kernel/time/clocksource.c
>>> +++ b/kernel/time/clocksource.c
>>> @@ -770,6 +770,10 @@ void __clocksource_updatefreq_scale(struct clocksource *cs, u32 scale, u32 freq)
>>>       cs->max_idle_ns = clocks_calc_max_nsecs(cs->mult, cs->shift,
>>>                                                cs->maxadj, cs->mask,
>>>                                                &cs->max_cycles);
>>> +
>>> +     pr_info("clocksource %s: mask: 0x%llx max_cycles: 0x%llx, max_idle_ns: %lld ns\n",
>>> +                     cs->name, cs->mask, cs->max_cycles, cs->max_idle_ns);
>>
>> Is this intentionally info level? Or was it supposed to be debug level
>> per $subject?
> 
> Sorry for not getting back on this sooner. Since this is a
> one-time-at-bootup message, I think enabling the pr_debug output is a
> little too finicky for most folks to boot with. So I left it at
> pr_info. If folks object, I can change it, but it just seems less
> likely we'd get any useful data if it was so hidden by default.
> 

John, I've found in the past that when a user has a problem with the kernel the
first thing that support tells them to do is boot with "ignore_loglevel" so that
support can take a look at the full boot log.  So while I understand your
comment that this is a "one-time-at-bootup message" I wonder if this really
should be debug.  It only will be useful in debug scenarios and doesn't really
provide any true information to the user about their system.

... IMO.

P.

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value
  2015-01-14  9:35       ` Peter Zijlstra
@ 2015-01-22 20:55         ` John Stultz
  0 siblings, 0 replies; 33+ messages in thread
From: John Stultz @ 2015-01-22 20:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linux Kernel Mailing List, Dave Jones, Linus Torvalds,
	Thomas Gleixner, Richard Cochran, Prarit Bhargava, Stephen Boyd,
	Ingo Molnar

On Wed, Jan 14, 2015 at 1:35 AM, Peter Zijlstra <peterz@infradead.org> wrote:
> On Tue, Jan 13, 2015 at 01:33:29PM -0800, John Stultz wrote:
>> On Tue, Jan 13, 2015 at 3:11 AM, Peter Zijlstra <peterz@infradead.org> wrote:
>> > On Fri, Jan 09, 2015 at 04:34:24PM -0800, John Stultz wrote:
>> >> When calculating the current delta since the last tick, we
>> >> currently have no hard protections to prevent a multiplciation
>> >> overflow from ocurring.
>> >>
>> >> This patch introduces such a cap that limits the read delta
>> >> value to the max_cycles value, which is where an overflow would
>> >> occur.
>> >
>> >> +++ b/kernel/time/timekeeping.c
>> >> @@ -202,6 +202,9 @@ static inline s64 timekeeping_get_ns(struct tk_read_base *tkr)
>> >>       /* calculate the delta since the last update_wall_time: */
>> >>       delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
>> >>
>> >> +     /* Cap delta value to the max_cycles values to avoid mult overflows */
>> >> +     delta = min(delta, tkr->clock->max_cycles);
>> >> +
>> >>       nsec = delta * tkr->mult + tkr->xtime_nsec;
>> >>       nsec >>= tkr->shift;
>> >>
>> >
>> > So while I appreciate stuff can be broken, should we not at least keep
>> > track of this brokenness? That is, we all agree bad things happened IF
>> > we actually hit this, right? So should we then not inform people that
>> > bad things did happen?
>>
>> So since this is a time reading function, this could be called
>> anywhere. So I'm hesitant to try to printk anything in such a hot
>> path. Though, if we catch such a large delta during the timekeeping
>> update function, we will print a warning (which is done in an earlier
>> patch in the series).
>>
>> Were you thinking of something else maybe? I guess we could set a flag
>> and then print later (if there is a later), but we'd lose much of the
>> context of what went wrong.
>
> Maybe a stats counter? In any case, keeping it silent seems the wrong
> thing.

So I spent a bit of time looking into if we could track if we were
actually seeing underflows, and to do some rate limited printing in a
different context to provide some warning.

Luckily this found one bug with my underflow checking logic (falsely
triggering if delta was zero), but even then I was occasionally seeing
reports of underflows on my test systems using the hpet (as well as
other clocksources). I added a bunch of extra debug logic and started
to see really strange underflow behavior, where hardware reads were
returning values slightly before cycle_last.

Then after banging my head on it a bit with a knot in my stomach that
I've uncovered a bigger issue, I realized that we're in a seqlock
loop. Duh. So the data reads are done optimistically and we throw away
any reads that might have raced with the writer. So I was doing a
hardware read, then having an interrupt land and update the cycle_last
state with even newer time, then go back see the negative delta, flag
to report a warning, then throw all the data away because we noticed
the seqlock was updated.

So this makes it a little hard to have the protective capping /
underflow logic also trigger a warning, because they may trigger in
many cases where the data is all junk and we're going to repeat the
calculation anyway. Its also a bit nested down a few functions below
where we actually take the seqlock, so we'd probably have to leave the
capping alone and then find some way to capture and sanity check the
data points after we left the lock. That gets a bit ugly.

I'll see if I can't sort out something better, but I'll probably drop
the reporting patch I was working on before resending this series.

thanks
-john

^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2015-01-22 20:55 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-01-10  0:34 [PATCH 00/10][RFC] Increased clocksource validation and cleanups John Stultz
2015-01-10  0:34 ` [PATCH 01/10] clocksource: Simplify clocks_calc_max_nsecs logic John Stultz
2015-01-10  0:34 ` [PATCH 02/10] clocksource: Simplify logic around clocksource wrapping saftey margins John Stultz
2015-01-10  2:03   ` Stephen Boyd
2015-01-10  0:34 ` [PATCH 03/10] clocksource: Remove clocksource_max_deferment() John Stultz
2015-01-11 11:47   ` Richard Cochran
2015-01-12 18:36     ` John Stultz
2015-01-12 20:16       ` Richard Cochran
2015-01-10  0:34 ` [PATCH 04/10] clocksource: Add max_cycles to clocksource structure John Stultz
2015-01-10  2:06   ` Stephen Boyd
2015-01-10  0:34 ` [PATCH 05/10] time: Add debugging checks to warn if we see delays John Stultz
2015-01-10  0:34 ` [PATCH 06/10] time: Cap clocksource reads to the clocksource max_cycles value John Stultz
2015-01-11 12:41   ` Richard Cochran
2015-01-12 18:54     ` John Stultz
2015-01-12 19:02       ` Linus Torvalds
2015-01-12 20:37         ` Richard Cochran
2015-01-12 20:30       ` Richard Cochran
2015-01-12 20:49         ` Richard Cochran
2015-01-13 11:11   ` Peter Zijlstra
2015-01-13 21:33     ` John Stultz
2015-01-13 22:51       ` Linus Torvalds
2015-01-14  9:35       ` Peter Zijlstra
2015-01-22 20:55         ` John Stultz
2015-01-10  0:34 ` [PATCH 07/10] time: Try to catch clocksource delta underflows John Stultz
2015-01-10  0:34 ` [PATCH 08/10] clocksource: Mostly kill clocksource_register() John Stultz
2015-01-10  0:34 ` [PATCH 09/10] sparc: Convert to using clocksource_register_hz() John Stultz
2015-01-10  0:34 ` [PATCH 10/10] clocksource: Add some debug info about clocksources being registered John Stultz
2015-01-10  2:02   ` Stephen Boyd
2015-01-22  0:51     ` John Stultz
2015-01-22 12:27       ` Prarit Bhargava
2015-01-11 11:41 ` [PATCH 00/10][RFC] Increased clocksource validation and cleanups Richard Cochran
2015-01-12 18:22   ` John Stultz
2015-01-12 20:45     ` Richard Cochran

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.