linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs
@ 2017-05-27  3:33 John Stultz
  2017-05-27  3:33 ` [RFC][PATCH 1/4] time: Fix clock->read(clock) race around clocksource changes John Stultz
                   ` (5 more replies)
  0 siblings, 6 replies; 19+ messages in thread
From: John Stultz @ 2017-05-27  3:33 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Thomas Gleixner, Ingo Molnar, Miroslav Lichvar,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Daniel Mentz

As part of the Linaro Linux Kernel Functional Test (LKFT)
effort, test failures from kselftest/timer's
inconsistency-check were reported connected to
CLOCK_MONOTONIC_RAW, on the HiKey platform.

Digging in I found that an old issue with how sub-ns accounting
is handled with the RAW time which was fixed long ago with the
CLOCK_MONOTONIC/REALTIME ids, but missed with RAW time, was
present.

Additionally, running further tests, I uncovered an issue with
how the clocksource read function is handled when clocksources
are changed, which can cause crashes.

Both of these issues have not been uncovered in x86 based
testing due to x86 not using vDSO to accelerate
CLOCK_MONOTONIC_RAW, combined with the HiKey's arch_timer
clocksource being fast to access but incrementing slowly enough
to get multiple reads using the same counter value (which helps
uncover time handing issues), along with the fact that none of
the x86 clocksources making use of the clocksource argument
passed to the read function.

This patchset addresses these two issues. 

Thanks so much to help from Will Deacon in getting the needed
adjustments to the arm64 vDSO in place. Also thanks to Daniel
Mentz who also properly diagnosed the MONOTONIC_RAW issue in
parallel while I was woking on this patchset.

I wanted to send these out for some initial review. I'm
unfortunately still chasing a third issue (related to
inconsistencies triggered during extreme freq adjustments) I've
uncovered on HiKey, which doesn't seem to be related to the
previous two.

As always feedback would be appreciated!

thanks
-john

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Daniel Mentz <danielmentz@google.com>


John Stultz (3):
  time: Fix clock->read(clock) race around clocksource changes
  time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
  time: Clean up CLOCK_MONOTONIC_RAW time handling

Will Deacon (1):
  arm64: vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW

 arch/arm64/kernel/vdso.c              |  5 +-
 arch/arm64/kernel/vdso/gettimeofday.S |  1 -
 include/linux/timekeeper_internal.h   |  8 +--
 kernel/time/timekeeping.c             | 96 ++++++++++++++++++++++-------------
 4 files changed, 66 insertions(+), 44 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC][PATCH 1/4] time: Fix clock->read(clock) race around clocksource changes
  2017-05-27  3:33 [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs John Stultz
@ 2017-05-27  3:33 ` John Stultz
  2017-05-27  7:31   ` Ingo Molnar
  2017-05-27  3:33 ` [RFC][PATCH 2/4] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting John Stultz
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2017-05-27  3:33 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Thomas Gleixner, Ingo Molnar, Miroslav Lichvar,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Daniel Mentz

In some testing on arm64 platforms, I was seeing null ptr
crashes in the kselftest/timers clocksource-switch test.

This was happening in a read function like:
u64 clocksource_mmio_readl_down(struct clocksource *c)
{
    return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
}

Where the callers enter the seqlock, and then call something
like:
    cycle_now = tkr->read(tkr->clock);

The problem seeming to be that since the read and clock
references are happening separately, its possible the
clocksource change happens in between and we end up calling the
old read with the new clocksource, (or vice-versa) which causes
the to_mmio_clksrc() in the read function to run off into space.

This patch tries to address the issue by providing a helper
function that atomically reads the clock value and then calls
the clock->read(clock) call so that we always call the read
funciton with the appropriate clocksource and don't accidentally
mix them.

The one exception where this helper isn't necessary is for the
fast-timekepers which use their own locking and update logic
to the tkr structures.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Daniel Mentz <danielmentz@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 40 +++++++++++++++++++++++++++++-----------
 1 file changed, 29 insertions(+), 11 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 9652bc5..abc1968 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -118,6 +118,26 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
 	tk->offs_boot = ktime_add(tk->offs_boot, delta);
 }
 
+/*
+ * tk_clock_read - atomic clocksource read() helper
+ *
+ * This helper is necessary to use in the read paths as, while the seqlock
+ * ensures we don't return a bad value while structures are updated, it
+ * doesn't protect from potential crashes. There is the possibility that
+ * the tkr's clocksource may change between the read reference, and the
+ * clock refernce passed to the read function.  This can cause crashes if
+ * the wrong clocksource is passed to the wrong read function.
+ * This isn't necessary to use when holding the timekeeper_lock or doing
+ * a read of the fast-timekeeper tkrs (which is protected by its own locking
+ * and update logic).
+ */
+static inline u64 tk_clock_read(struct tk_read_base *tkr)
+{
+	struct clocksource *clock = READ_ONCE(tkr->clock);
+
+	return clock->read(clock);
+}
+
 #ifdef CONFIG_DEBUG_TIMEKEEPING
 #define WARNING_FREQ (HZ*300) /* 5 minute rate-limiting */
 
@@ -175,7 +195,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
 	 */
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-		now = tkr->read(tkr->clock);
+		now = tk_clock_read(tkr);
 		last = tkr->cycle_last;
 		mask = tkr->mask;
 		max = tkr->clock->max_cycles;
@@ -209,7 +229,7 @@ static inline u64 timekeeping_get_delta(struct tk_read_base *tkr)
 	u64 cycle_now, delta;
 
 	/* read clocksource */
-	cycle_now = tkr->read(tkr->clock);
+	cycle_now = tk_clock_read(tkr);
 
 	/* calculate the delta since the last update_wall_time */
 	delta = clocksource_delta(cycle_now, tkr->cycle_last, tkr->mask);
@@ -240,7 +260,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
 	tk->tkr_mono.clock = clock;
 	tk->tkr_mono.read = clock->read;
 	tk->tkr_mono.mask = clock->mask;
-	tk->tkr_mono.cycle_last = tk->tkr_mono.read(clock);
+	tk->tkr_mono.cycle_last = tk_clock_read(&tk->tkr_mono);
 
 	tk->tkr_raw.clock = clock;
 	tk->tkr_raw.read = clock->read;
@@ -477,7 +497,7 @@ static void halt_fast_timekeeper(struct timekeeper *tk)
 	struct tk_read_base *tkr = &tk->tkr_mono;
 
 	memcpy(&tkr_dummy, tkr, sizeof(tkr_dummy));
-	cycles_at_suspend = tkr->read(tkr->clock);
+	cycles_at_suspend = tk_clock_read(tkr);
 	tkr_dummy.read = dummy_clock_read;
 	update_fast_timekeeper(&tkr_dummy, &tk_fast_mono);
 
@@ -649,11 +669,10 @@ static void timekeeping_update(struct timekeeper *tk, unsigned int action)
  */
 static void timekeeping_forward_now(struct timekeeper *tk)
 {
-	struct clocksource *clock = tk->tkr_mono.clock;
 	u64 cycle_now, delta;
 	u64 nsec;
 
-	cycle_now = tk->tkr_mono.read(clock);
+	cycle_now = tk_clock_read(&tk->tkr_mono);
 	delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
 	tk->tkr_mono.cycle_last = cycle_now;
 	tk->tkr_raw.cycle_last  = cycle_now;
@@ -929,8 +948,7 @@ void ktime_get_snapshot(struct system_time_snapshot *systime_snapshot)
 
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
-
-		now = tk->tkr_mono.read(tk->tkr_mono.clock);
+		now = tk_clock_read(&tk->tkr_mono);
 		systime_snapshot->cs_was_changed_seq = tk->cs_was_changed_seq;
 		systime_snapshot->clock_was_set_seq = tk->clock_was_set_seq;
 		base_real = ktime_add(tk->tkr_mono.base,
@@ -1108,7 +1126,7 @@ int get_device_system_crosststamp(int (*get_time_fn)
 		 * Check whether the system counter value provided by the
 		 * device driver is on the current timekeeping interval.
 		 */
-		now = tk->tkr_mono.read(tk->tkr_mono.clock);
+		now = tk_clock_read(&tk->tkr_mono);
 		interval_start = tk->tkr_mono.cycle_last;
 		if (!cycle_between(interval_start, cycles, now)) {
 			clock_was_set_seq = tk->clock_was_set_seq;
@@ -1629,7 +1647,7 @@ void timekeeping_resume(void)
 	 * The less preferred source will only be tried if there is no better
 	 * usable source. The rtc part is handled separately in rtc core code.
 	 */
-	cycle_now = tk->tkr_mono.read(clock);
+	cycle_now = tk_clock_read(&tk->tkr_mono);
 	if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
 		cycle_now > tk->tkr_mono.cycle_last) {
 		u64 nsec, cyc_delta;
@@ -2030,7 +2048,7 @@ void update_wall_time(void)
 #ifdef CONFIG_ARCH_USES_GETTIMEOFFSET
 	offset = real_tk->cycle_interval;
 #else
-	offset = clocksource_delta(tk->tkr_mono.read(tk->tkr_mono.clock),
+	offset = clocksource_delta(tk_clock_read(&tk->tkr_mono),
 				   tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
 #endif
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC][PATCH 2/4] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
  2017-05-27  3:33 [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs John Stultz
  2017-05-27  3:33 ` [RFC][PATCH 1/4] time: Fix clock->read(clock) race around clocksource changes John Stultz
@ 2017-05-27  3:33 ` John Stultz
  2017-05-27  7:36   ` Ingo Molnar
  2017-05-27  3:33 ` [RFC][PATCH 3/4] arm64: vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW John Stultz
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2017-05-27  3:33 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Thomas Gleixner, Ingo Molnar, Miroslav Lichvar,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Kevin Brodsky,
	Will Deacon, Daniel Mentz

Due to how the MONOTONIC_RAW accumulation logic was handled,
there is the potential for a 1ns discontinuity when we do
accumulations. This small discontinuity has for the most part
gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
in their vDSO clock_gettime implementation, we've seen failures
with the inconsistency-check test in kselftest.

This patch addresses the issue by using the same sub-ns
accumulation handling that CLOCK_MONOTONIC uses, which avoids
the issue for in-kernel users.

Since the ARM64 vDSO implementation has its own clock_gettime
calculation logic, this patch reduces the frequency of errors,
but failures are still seen. The ARM64 vDSO will need to be
updated to include the sub-nanosecond xtime_nsec values in its
calculation for this issue to be completely fixed.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Daniel Mentz <danielmentz@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 include/linux/timekeeper_internal.h |  4 ++--
 kernel/time/timekeeping.c           | 21 ++++++++++++---------
 2 files changed, 14 insertions(+), 11 deletions(-)

diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index 110f453..528cc86 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -58,7 +58,7 @@ struct tk_read_base {
  *			interval.
  * @xtime_remainder:	Shifted nano seconds left over when rounding
  *			@cycle_interval
- * @raw_interval:	Raw nano seconds accumulated per NTP interval.
+ * @raw_interval:	Shifted raw nano seconds accumulated per NTP interval.
  * @ntp_error:		Difference between accumulated time and NTP time in ntp
  *			shifted nano seconds.
  * @ntp_error_shift:	Shift conversion between clock shifted nano seconds and
@@ -100,7 +100,7 @@ struct timekeeper {
 	u64			cycle_interval;
 	u64			xtime_interval;
 	s64			xtime_remainder;
-	u32			raw_interval;
+	u64			raw_interval;
 	/* The ntp_tick_length() value currently being used.
 	 * This cached copy ensures we consistently apply the tick
 	 * length for an entire tick, as ntp_tick_length may change
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index abc1968..35d3ba3 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -282,7 +282,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
 	/* Go back from cycles -> shifted ns */
 	tk->xtime_interval = interval * clock->mult;
 	tk->xtime_remainder = ntpinterval - tk->xtime_interval;
-	tk->raw_interval = (interval * clock->mult) >> clock->shift;
+	tk->raw_interval = interval * clock->mult;
 
 	 /* if changing clocks, convert xtime_nsec shift units */
 	if (old_clock) {
@@ -1994,7 +1994,7 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset,
 				    u32 shift, unsigned int *clock_set)
 {
 	u64 interval = tk->cycle_interval << shift;
-	u64 raw_nsecs;
+	u64 nsecps;
 
 	/* If the offset is smaller than a shifted interval, do nothing */
 	if (offset < interval)
@@ -2009,14 +2009,17 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset,
 	*clock_set |= accumulate_nsecs_to_secs(tk);
 
 	/* Accumulate raw time */
-	raw_nsecs = (u64)tk->raw_interval << shift;
-	raw_nsecs += tk->raw_time.tv_nsec;
-	if (raw_nsecs >= NSEC_PER_SEC) {
-		u64 raw_secs = raw_nsecs;
-		raw_nsecs = do_div(raw_secs, NSEC_PER_SEC);
-		tk->raw_time.tv_sec += raw_secs;
+	tk->tkr_raw.xtime_nsec += (u64)tk->raw_time.tv_nsec
+					<< tk->tkr_raw.shift;
+	tk->tkr_raw.xtime_nsec += tk->raw_interval << shift;
+	nsecps = (u64)NSEC_PER_SEC << tk->tkr_raw.shift;
+	while (tk->tkr_raw.xtime_nsec >= nsecps) {
+		tk->tkr_raw.xtime_nsec -= nsecps;
+		tk->raw_time.tv_sec++;
 	}
-	tk->raw_time.tv_nsec = raw_nsecs;
+	tk->raw_time.tv_nsec = tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift;
+	tk->tkr_raw.xtime_nsec -= (u64)tk->raw_time.tv_nsec
+					<< tk->tkr_raw.shift;
 
 	/* Accumulate error between NTP and clock interval */
 	tk->ntp_error += tk->ntp_tick << shift;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC][PATCH 3/4] arm64: vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
  2017-05-27  3:33 [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs John Stultz
  2017-05-27  3:33 ` [RFC][PATCH 1/4] time: Fix clock->read(clock) race around clocksource changes John Stultz
  2017-05-27  3:33 ` [RFC][PATCH 2/4] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting John Stultz
@ 2017-05-27  3:33 ` John Stultz
  2017-05-30  9:38   ` Will Deacon
  2017-05-27  3:33 ` [RFC][PATCH 4/4] time: Clean up CLOCK_MONOTONIC_RAW time handling John Stultz
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2017-05-27  3:33 UTC (permalink / raw)
  To: lkml
  Cc: Will Deacon, Thomas Gleixner, Ingo Molnar, Miroslav Lichvar,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Kevin Brodsky,
	Daniel Mentz, John Stultz

From: Will Deacon <will.deacon@arm.com>

Commit 45a7905fc48f ("arm64: vdso: defer shifting of nanosecond
component of timespec") fixed sub-ns inaccuracies in our vDSO
clock_gettime implementation by deferring the right-shift of the
nanoseconds components until after the timespec addition, which
operates on left-shifted values. That worked nicely until
support for CLOCK_MONOTONIC_RAW was added in 49eea433b326
("arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime()
vDSO"). Noticing that the core timekeeping code never set
tkr_raw.xtime_nsec, the vDSO implementation didn't bother
exposing it via the data page and instead took the unshifted
tk->raw_time.tv_nsec value which was then immediately shifted
left in the vDSO code.

Now that the core code is actually setting tkr_raw.xtime_nsec,
we need to take that into account in the vDSO by adding it to
the shifted raw_time value. Rather than do that at each use (and
expand the data page in the process), instead perform the
shift/addition operation when populating the data page and
remove the shift from the vDSO code entirely.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Daniel Mentz <danielmentz@google.com>
Reported-by: John Stultz <john.stultz@linaro.org>
Acked-by: Acked-by: Kevin Brodsky <kevin.brodsky@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
[jstultz: minor whitespace tweak]
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 arch/arm64/kernel/vdso.c              | 5 +++--
 arch/arm64/kernel/vdso/gettimeofday.S | 1 -
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index 41b6e31..d0cb007 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -221,10 +221,11 @@ void update_vsyscall(struct timekeeper *tk)
 		/* tkr_mono.cycle_last == tkr_raw.cycle_last */
 		vdso_data->cs_cycle_last	= tk->tkr_mono.cycle_last;
 		vdso_data->raw_time_sec		= tk->raw_time.tv_sec;
-		vdso_data->raw_time_nsec	= tk->raw_time.tv_nsec;
+		vdso_data->raw_time_nsec	= (tk->raw_time.tv_nsec <<
+						   tk->tkr_raw.shift) +
+						  tk->tkr_raw.xtime_nsec;
 		vdso_data->xtime_clock_sec	= tk->xtime_sec;
 		vdso_data->xtime_clock_nsec	= tk->tkr_mono.xtime_nsec;
-		/* tkr_raw.xtime_nsec == 0 */
 		vdso_data->cs_mono_mult		= tk->tkr_mono.mult;
 		vdso_data->cs_raw_mult		= tk->tkr_raw.mult;
 		/* tkr_mono.shift == tkr_raw.shift */
diff --git a/arch/arm64/kernel/vdso/gettimeofday.S b/arch/arm64/kernel/vdso/gettimeofday.S
index e00b467..76320e9 100644
--- a/arch/arm64/kernel/vdso/gettimeofday.S
+++ b/arch/arm64/kernel/vdso/gettimeofday.S
@@ -256,7 +256,6 @@ monotonic_raw:
 	seqcnt_check fail=monotonic_raw
 
 	/* All computations are done with left-shifted nsecs. */
-	lsl	x14, x14, x12
 	get_nsec_per_sec res=x9
 	lsl	x9, x9, x12
 
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [RFC][PATCH 4/4] time: Clean up CLOCK_MONOTONIC_RAW time handling
  2017-05-27  3:33 [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs John Stultz
                   ` (2 preceding siblings ...)
  2017-05-27  3:33 ` [RFC][PATCH 3/4] arm64: vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW John Stultz
@ 2017-05-27  3:33 ` John Stultz
  2017-08-25 13:40   ` Chris Wilson
  2017-05-27  7:38 ` [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs Ingo Molnar
       [not found] ` <CAE2F3rBuOJqLs5Cu7A9wEruZj1Vmnpy6qAYW=U9FVAOEP73pdg@mail.gmail.com>
  5 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2017-05-27  3:33 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Thomas Gleixner, Ingo Molnar, Miroslav Lichvar,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Kevin Brodsky,
	Will Deacon, Daniel Mentz

Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW,
remove the duplicitive tk->raw_time.tv_nsec, which can be
stored in tk->tkr_raw.xtime_nsec (similarly to how its handled
for monotonic time).

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Daniel Mentz <danielmentz@google.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 arch/arm64/kernel/vdso.c            |  6 ++---
 include/linux/timekeeper_internal.h |  4 ++--
 kernel/time/timekeeping.c           | 47 ++++++++++++++++++++-----------------
 3 files changed, 29 insertions(+), 28 deletions(-)

diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
index d0cb007..7492d90 100644
--- a/arch/arm64/kernel/vdso.c
+++ b/arch/arm64/kernel/vdso.c
@@ -220,10 +220,8 @@ void update_vsyscall(struct timekeeper *tk)
 	if (!use_syscall) {
 		/* tkr_mono.cycle_last == tkr_raw.cycle_last */
 		vdso_data->cs_cycle_last	= tk->tkr_mono.cycle_last;
-		vdso_data->raw_time_sec		= tk->raw_time.tv_sec;
-		vdso_data->raw_time_nsec	= (tk->raw_time.tv_nsec <<
-						   tk->tkr_raw.shift) +
-						  tk->tkr_raw.xtime_nsec;
+		vdso_data->raw_time_sec         = tk->raw_sec;
+		vdso_data->raw_time_nsec        = tk->tkr_raw.xtime_nsec;
 		vdso_data->xtime_clock_sec	= tk->xtime_sec;
 		vdso_data->xtime_clock_nsec	= tk->tkr_mono.xtime_nsec;
 		vdso_data->cs_mono_mult		= tk->tkr_mono.mult;
diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
index 528cc86..8abf6df 100644
--- a/include/linux/timekeeper_internal.h
+++ b/include/linux/timekeeper_internal.h
@@ -52,7 +52,7 @@ struct tk_read_base {
  * @clock_was_set_seq:	The sequence number of clock was set events
  * @cs_was_changed_seq:	The sequence number of clocksource change events
  * @next_leap_ktime:	CLOCK_MONOTONIC time value of a pending leap-second
- * @raw_time:		Monotonic raw base time in timespec64 format
+ * @raw_sec:		CLOCK_MONOTONIC_RAW  time in seconds
  * @cycle_interval:	Number of clock cycles in one NTP interval
  * @xtime_interval:	Number of clock shifted nano seconds in one NTP
  *			interval.
@@ -94,7 +94,7 @@ struct timekeeper {
 	unsigned int		clock_was_set_seq;
 	u8			cs_was_changed_seq;
 	ktime_t			next_leap_ktime;
-	struct timespec64	raw_time;
+	u64			raw_sec;
 
 	/* The following members are for timekeeping internal use */
 	u64			cycle_interval;
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 35d3ba3..0c3b8c1 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -72,6 +72,10 @@ static inline void tk_normalize_xtime(struct timekeeper *tk)
 		tk->tkr_mono.xtime_nsec -= (u64)NSEC_PER_SEC << tk->tkr_mono.shift;
 		tk->xtime_sec++;
 	}
+	while (tk->tkr_raw.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_raw.shift)) {
+		tk->tkr_raw.xtime_nsec -= (u64)NSEC_PER_SEC << tk->tkr_raw.shift;
+		tk->raw_sec++;
+	}
 }
 
 static inline struct timespec64 tk_xtime(struct timekeeper *tk)
@@ -287,12 +291,14 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
 	 /* if changing clocks, convert xtime_nsec shift units */
 	if (old_clock) {
 		int shift_change = clock->shift - old_clock->shift;
-		if (shift_change < 0)
+		if (shift_change < 0) {
 			tk->tkr_mono.xtime_nsec >>= -shift_change;
-		else
+			tk->tkr_raw.xtime_nsec >>= -shift_change;
+		} else {
 			tk->tkr_mono.xtime_nsec <<= shift_change;
+			tk->tkr_raw.xtime_nsec <<= shift_change;
+		}
 	}
-	tk->tkr_raw.xtime_nsec = 0;
 
 	tk->tkr_mono.shift = clock->shift;
 	tk->tkr_raw.shift = clock->shift;
@@ -617,9 +623,6 @@ static inline void tk_update_ktime_data(struct timekeeper *tk)
 	nsec = (u32) tk->wall_to_monotonic.tv_nsec;
 	tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);
 
-	/* Update the monotonic raw base */
-	tk->tkr_raw.base = timespec64_to_ktime(tk->raw_time);
-
 	/*
 	 * The sum of the nanoseconds portions of xtime and
 	 * wall_to_monotonic can be greater/equal one second. Take
@@ -629,6 +632,11 @@ static inline void tk_update_ktime_data(struct timekeeper *tk)
 	if (nsec >= NSEC_PER_SEC)
 		seconds++;
 	tk->ktime_sec = seconds;
+
+	/* Update the monotonic raw base */
+	seconds = tk->raw_sec;
+	nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
+	tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);
 }
 
 /* must hold timekeeper_lock */
@@ -670,7 +678,6 @@ static void timekeeping_update(struct timekeeper *tk, unsigned int action)
 static void timekeeping_forward_now(struct timekeeper *tk)
 {
 	u64 cycle_now, delta;
-	u64 nsec;
 
 	cycle_now = tk_clock_read(&tk->tkr_mono);
 	delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
@@ -682,10 +689,13 @@ static void timekeeping_forward_now(struct timekeeper *tk)
 	/* If arch requires, add in get_arch_timeoffset() */
 	tk->tkr_mono.xtime_nsec += (u64)arch_gettimeoffset() << tk->tkr_mono.shift;
 
-	tk_normalize_xtime(tk);
 
-	nsec = clocksource_cyc2ns(delta, tk->tkr_raw.mult, tk->tkr_raw.shift);
-	timespec64_add_ns(&tk->raw_time, nsec);
+	tk->tkr_raw.xtime_nsec += delta * tk->tkr_raw.mult;
+
+	/* If arch requires, add in get_arch_timeoffset() */
+	tk->tkr_raw.xtime_nsec += (u64)arch_gettimeoffset() << tk->tkr_raw.shift;
+
+	tk_normalize_xtime(tk);
 }
 
 /**
@@ -1371,19 +1381,18 @@ int timekeeping_notify(struct clocksource *clock)
 void getrawmonotonic64(struct timespec64 *ts)
 {
 	struct timekeeper *tk = &tk_core.timekeeper;
-	struct timespec64 ts64;
 	unsigned long seq;
 	u64 nsecs;
 
 	do {
 		seq = read_seqcount_begin(&tk_core.seq);
+		ts->tv_sec = tk->raw_sec;
 		nsecs = timekeeping_get_ns(&tk->tkr_raw);
-		ts64 = tk->raw_time;
 
 	} while (read_seqcount_retry(&tk_core.seq, seq));
 
-	timespec64_add_ns(&ts64, nsecs);
-	*ts = ts64;
+	ts->tv_nsec = 0;
+	timespec64_add_ns(ts, nsecs);
 }
 EXPORT_SYMBOL(getrawmonotonic64);
 
@@ -1507,8 +1516,7 @@ void __init timekeeping_init(void)
 	tk_setup_internals(tk, clock);
 
 	tk_set_xtime(tk, &now);
-	tk->raw_time.tv_sec = 0;
-	tk->raw_time.tv_nsec = 0;
+	tk->raw_sec = 0;
 	if (boot.tv_sec == 0 && boot.tv_nsec == 0)
 		boot = tk_xtime(tk);
 
@@ -2009,17 +2017,12 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset,
 	*clock_set |= accumulate_nsecs_to_secs(tk);
 
 	/* Accumulate raw time */
-	tk->tkr_raw.xtime_nsec += (u64)tk->raw_time.tv_nsec
-					<< tk->tkr_raw.shift;
 	tk->tkr_raw.xtime_nsec += tk->raw_interval << shift;
 	nsecps = (u64)NSEC_PER_SEC << tk->tkr_raw.shift;
 	while (tk->tkr_raw.xtime_nsec >= nsecps) {
 		tk->tkr_raw.xtime_nsec -= nsecps;
-		tk->raw_time.tv_sec++;
+		tk->raw_sec++;
 	}
-	tk->raw_time.tv_nsec = tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift;
-	tk->tkr_raw.xtime_nsec -= (u64)tk->raw_time.tv_nsec
-					<< tk->tkr_raw.shift;
 
 	/* Accumulate error between NTP and clock interval */
 	tk->ntp_error += tk->ntp_tick << shift;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 1/4] time: Fix clock->read(clock) race around clocksource changes
  2017-05-27  3:33 ` [RFC][PATCH 1/4] time: Fix clock->read(clock) race around clocksource changes John Stultz
@ 2017-05-27  7:31   ` Ingo Molnar
  0 siblings, 0 replies; 19+ messages in thread
From: Ingo Molnar @ 2017-05-27  7:31 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Thomas Gleixner, Miroslav Lichvar, Richard Cochran,
	Prarit Bhargava, Stephen Boyd, Daniel Mentz


* John Stultz <john.stultz@linaro.org> wrote:

> In some testing on arm64 platforms, I was seeing null ptr
> crashes in the kselftest/timers clocksource-switch test.
> 
> This was happening in a read function like:
> u64 clocksource_mmio_readl_down(struct clocksource *c)
> {
>     return ~(u64)readl_relaxed(to_mmio_clksrc(c)->reg) & c->mask;
> }
> 
> Where the callers enter the seqlock, and then call something
> like:
>     cycle_now = tkr->read(tkr->clock);
> 
> The problem seeming to be that since the read and clock
> references are happening separately, its possible the
> clocksource change happens in between and we end up calling the
> old read with the new clocksource, (or vice-versa) which causes
> the to_mmio_clksrc() in the read function to run off into space.

Maybe:

  s/clock/->clock pointer
  s/read/old ->read() function

for increased clarity - because both are ambiguous in an English sentence.

> 
> This patch tries to address the issue by providing a helper
> function that atomically reads the clock value and then calls
> the clock->read(clock) call so that we always call the read
> funciton with the appropriate clocksource and don't accidentally
> mix them.

s/call/function

because 'call' is used both as a verb and as noun here, making for a difficult 
read.

> 
> The one exception where this helper isn't necessary is for the
> fast-timekepers which use their own locking and update logic
> to the tkr structures.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Miroslav Lichvar <mlichvar@redhat.com>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Stephen Boyd <stephen.boyd@linaro.org>
> Cc: Daniel Mentz <danielmentz@google.com>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  kernel/time/timekeeping.c | 40 +++++++++++++++++++++++++++++-----------
>  1 file changed, 29 insertions(+), 11 deletions(-)
> 
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 9652bc5..abc1968 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -118,6 +118,26 @@ static inline void tk_update_sleep_time(struct timekeeper *tk, ktime_t delta)
>  	tk->offs_boot = ktime_add(tk->offs_boot, delta);
>  }
>  
> +/*
> + * tk_clock_read - atomic clocksource read() helper
> + *
> + * This helper is necessary to use in the read paths as, while the seqlock
> + * ensures we don't return a bad value while structures are updated, it
> + * doesn't protect from potential crashes. There is the possibility that
> + * the tkr's clocksource may change between the read reference, and the
> + * clock refernce passed to the read function.  This can cause crashes if
> + * the wrong clocksource is passed to the wrong read function.
> + * This isn't necessary to use when holding the timekeeper_lock or doing
> + * a read of the fast-timekeeper tkrs (which is protected by its own locking
> + * and update logic).

typo:

  s/refernce/reference

I'd also do:

  s/as/because

because while 'as' is not wrong, it's somewhat harder to parse on first reading.

Other than that:

  Acked-by: Ingo Molnar <mingo@kernel.org>

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 2/4] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
  2017-05-27  3:33 ` [RFC][PATCH 2/4] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting John Stultz
@ 2017-05-27  7:36   ` Ingo Molnar
  2017-05-30 18:42     ` Daniel Mentz
  0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2017-05-27  7:36 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Thomas Gleixner, Miroslav Lichvar, Richard Cochran,
	Prarit Bhargava, Stephen Boyd, Kevin Brodsky, Will Deacon,
	Daniel Mentz


* John Stultz <john.stultz@linaro.org> wrote:

> Due to how the MONOTONIC_RAW accumulation logic was handled,
> there is the potential for a 1ns discontinuity when we do
> accumulations. This small discontinuity has for the most part
> gone un-noticed, but since ARM64 enabled CLOCK_MONOTONIC_RAW
> in their vDSO clock_gettime implementation, we've seen failures
> with the inconsistency-check test in kselftest.
> 
> This patch addresses the issue by using the same sub-ns
> accumulation handling that CLOCK_MONOTONIC uses, which avoids
> the issue for in-kernel users.
> 
> Since the ARM64 vDSO implementation has its own clock_gettime
> calculation logic, this patch reduces the frequency of errors,
> but failures are still seen. The ARM64 vDSO will need to be
> updated to include the sub-nanosecond xtime_nsec values in its
> calculation for this issue to be completely fixed.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Miroslav Lichvar <mlichvar@redhat.com>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Stephen Boyd <stephen.boyd@linaro.org>
> Cc: Kevin Brodsky <kevin.brodsky@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Daniel Mentz <danielmentz@google.com>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  include/linux/timekeeper_internal.h |  4 ++--
>  kernel/time/timekeeping.c           | 21 ++++++++++++---------
>  2 files changed, 14 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
> index 110f453..528cc86 100644
> --- a/include/linux/timekeeper_internal.h
> +++ b/include/linux/timekeeper_internal.h
> @@ -58,7 +58,7 @@ struct tk_read_base {
>   *			interval.
>   * @xtime_remainder:	Shifted nano seconds left over when rounding
>   *			@cycle_interval
> - * @raw_interval:	Raw nano seconds accumulated per NTP interval.
> + * @raw_interval:	Shifted raw nano seconds accumulated per NTP interval.
>   * @ntp_error:		Difference between accumulated time and NTP time in ntp
>   *			shifted nano seconds.
>   * @ntp_error_shift:	Shift conversion between clock shifted nano seconds and
> @@ -100,7 +100,7 @@ struct timekeeper {
>  	u64			cycle_interval;
>  	u64			xtime_interval;
>  	s64			xtime_remainder;
> -	u32			raw_interval;
> +	u64			raw_interval;
>  	/* The ntp_tick_length() value currently being used.
>  	 * This cached copy ensures we consistently apply the tick
>  	 * length for an entire tick, as ntp_tick_length may change
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index abc1968..35d3ba3 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -282,7 +282,7 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
>  	/* Go back from cycles -> shifted ns */
>  	tk->xtime_interval = interval * clock->mult;
>  	tk->xtime_remainder = ntpinterval - tk->xtime_interval;
> -	tk->raw_interval = (interval * clock->mult) >> clock->shift;
> +	tk->raw_interval = interval * clock->mult;
>  
>  	 /* if changing clocks, convert xtime_nsec shift units */
>  	if (old_clock) {
> @@ -1994,7 +1994,7 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset,
>  				    u32 shift, unsigned int *clock_set)
>  {
>  	u64 interval = tk->cycle_interval << shift;
> -	u64 raw_nsecs;
> +	u64 nsecps;

What does the 'ps' postfix stand for? It's not obvious (to me).

> +	tk->tkr_raw.xtime_nsec += (u64)tk->raw_time.tv_nsec
> +					<< tk->tkr_raw.shift;
> +	tk->tkr_raw.xtime_nsec -= (u64)tk->raw_time.tv_nsec
> +					<< tk->tkr_raw.shift;

Please don't break the line in such an ugly, random way, just write:

	tk->tkr_raw.xtime_nsec += (u64)tk->raw_time.tv_nsec << tk->tkr_raw.shift;
	tk->tkr_raw.xtime_nsec -= (u64)tk->raw_time.tv_nsec << tk->tkr_raw.shift;

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs
  2017-05-27  3:33 [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs John Stultz
                   ` (3 preceding siblings ...)
  2017-05-27  3:33 ` [RFC][PATCH 4/4] time: Clean up CLOCK_MONOTONIC_RAW time handling John Stultz
@ 2017-05-27  7:38 ` Ingo Molnar
  2017-05-27 16:16   ` John Stultz
       [not found] ` <CAE2F3rBuOJqLs5Cu7A9wEruZj1Vmnpy6qAYW=U9FVAOEP73pdg@mail.gmail.com>
  5 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2017-05-27  7:38 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Thomas Gleixner, Miroslav Lichvar, Richard Cochran,
	Prarit Bhargava, Stephen Boyd, Daniel Mentz


* John Stultz <john.stultz@linaro.org> wrote:

> As part of the Linaro Linux Kernel Functional Test (LKFT)
> effort, test failures from kselftest/timer's
> inconsistency-check were reported connected to
> CLOCK_MONOTONIC_RAW, on the HiKey platform.
> 
> Digging in I found that an old issue with how sub-ns accounting
> is handled with the RAW time which was fixed long ago with the
> CLOCK_MONOTONIC/REALTIME ids, but missed with RAW time, was
> present.
> 
> Additionally, running further tests, I uncovered an issue with
> how the clocksource read function is handled when clocksources
> are changed, which can cause crashes.
> 
> Both of these issues have not been uncovered in x86 based
> testing due to x86 not using vDSO to accelerate
> CLOCK_MONOTONIC_RAW, combined with the HiKey's arch_timer
> clocksource being fast to access but incrementing slowly enough
> to get multiple reads using the same counter value (which helps
> uncover time handing issues), along with the fact that none of
> the x86 clocksources making use of the clocksource argument
> passed to the read function.
> 
> This patchset addresses these two issues. 

AFAICS only the first two patches are fixes, the other two patches are 
cleanups/simplifications that resulted out of the debugging effort, right?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs
  2017-05-27  7:38 ` [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs Ingo Molnar
@ 2017-05-27 16:16   ` John Stultz
  2017-05-28  8:54     ` Ingo Molnar
  0 siblings, 1 reply; 19+ messages in thread
From: John Stultz @ 2017-05-27 16:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: lkml, Thomas Gleixner, Miroslav Lichvar, Richard Cochran,
	Prarit Bhargava, Stephen Boyd, Daniel Mentz

On Sat, May 27, 2017 at 12:38 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * John Stultz <john.stultz@linaro.org> wrote:
>
>> As part of the Linaro Linux Kernel Functional Test (LKFT)
>> effort, test failures from kselftest/timer's
>> inconsistency-check were reported connected to
>> CLOCK_MONOTONIC_RAW, on the HiKey platform.
>>
>> Digging in I found that an old issue with how sub-ns accounting
>> is handled with the RAW time which was fixed long ago with the
>> CLOCK_MONOTONIC/REALTIME ids, but missed with RAW time, was
>> present.
>>
>> Additionally, running further tests, I uncovered an issue with
>> how the clocksource read function is handled when clocksources
>> are changed, which can cause crashes.
>>
>> Both of these issues have not been uncovered in x86 based
>> testing due to x86 not using vDSO to accelerate
>> CLOCK_MONOTONIC_RAW, combined with the HiKey's arch_timer
>> clocksource being fast to access but incrementing slowly enough
>> to get multiple reads using the same counter value (which helps
>> uncover time handing issues), along with the fact that none of
>> the x86 clocksources making use of the clocksource argument
>> passed to the read function.
>>
>> This patchset addresses these two issues.
>
> AFAICS only the first two patches are fixes, the other two patches are
> cleanups/simplifications that resulted out of the debugging effort, right?

Actually the first three are fixes (ARM64 still sees discontinuities
until the vDSO is fixed), the last one is a cleanup.

thanks
-john

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs
  2017-05-27 16:16   ` John Stultz
@ 2017-05-28  8:54     ` Ingo Molnar
  0 siblings, 0 replies; 19+ messages in thread
From: Ingo Molnar @ 2017-05-28  8:54 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Thomas Gleixner, Miroslav Lichvar, Richard Cochran,
	Prarit Bhargava, Stephen Boyd, Daniel Mentz


* John Stultz <john.stultz@linaro.org> wrote:

> On Sat, May 27, 2017 at 12:38 AM, Ingo Molnar <mingo@kernel.org> wrote:
> >
> > * John Stultz <john.stultz@linaro.org> wrote:
> >
> >> As part of the Linaro Linux Kernel Functional Test (LKFT)
> >> effort, test failures from kselftest/timer's
> >> inconsistency-check were reported connected to
> >> CLOCK_MONOTONIC_RAW, on the HiKey platform.
> >>
> >> Digging in I found that an old issue with how sub-ns accounting
> >> is handled with the RAW time which was fixed long ago with the
> >> CLOCK_MONOTONIC/REALTIME ids, but missed with RAW time, was
> >> present.
> >>
> >> Additionally, running further tests, I uncovered an issue with
> >> how the clocksource read function is handled when clocksources
> >> are changed, which can cause crashes.
> >>
> >> Both of these issues have not been uncovered in x86 based
> >> testing due to x86 not using vDSO to accelerate
> >> CLOCK_MONOTONIC_RAW, combined with the HiKey's arch_timer
> >> clocksource being fast to access but incrementing slowly enough
> >> to get multiple reads using the same counter value (which helps
> >> uncover time handing issues), along with the fact that none of
> >> the x86 clocksources making use of the clocksource argument
> >> passed to the read function.
> >>
> >> This patchset addresses these two issues.
> >
> > AFAICS only the first two patches are fixes, the other two patches are
> > cleanups/simplifications that resulted out of the debugging effort, right?
> 
> Actually the first three are fixes (ARM64 still sees discontinuities
> until the vDSO is fixed), the last one is a cleanup.

Ok, please make it more apparent in the changelog of the third patch what user 
observable mis-behavior is actually fixed by it: corrupted time, non-monotonic 
time, or something else?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 3/4] arm64: vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
  2017-05-27  3:33 ` [RFC][PATCH 3/4] arm64: vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW John Stultz
@ 2017-05-30  9:38   ` Will Deacon
  0 siblings, 0 replies; 19+ messages in thread
From: Will Deacon @ 2017-05-30  9:38 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Thomas Gleixner, Ingo Molnar, Miroslav Lichvar,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Kevin Brodsky,
	Daniel Mentz

On Fri, May 26, 2017 at 08:33:54PM -0700, John Stultz wrote:
> From: Will Deacon <will.deacon@arm.com>
> 
> Commit 45a7905fc48f ("arm64: vdso: defer shifting of nanosecond
> component of timespec") fixed sub-ns inaccuracies in our vDSO
> clock_gettime implementation by deferring the right-shift of the
> nanoseconds components until after the timespec addition, which
> operates on left-shifted values. That worked nicely until
> support for CLOCK_MONOTONIC_RAW was added in 49eea433b326
> ("arm64: Add support for CLOCK_MONOTONIC_RAW in clock_gettime()
> vDSO"). Noticing that the core timekeeping code never set
> tkr_raw.xtime_nsec, the vDSO implementation didn't bother
> exposing it via the data page and instead took the unshifted
> tk->raw_time.tv_nsec value which was then immediately shifted
> left in the vDSO code.
> 
> Now that the core code is actually setting tkr_raw.xtime_nsec,
> we need to take that into account in the vDSO by adding it to
> the shifted raw_time value. Rather than do that at each use (and
> expand the data page in the process), instead perform the
> shift/addition operation when populating the data page and
> remove the shift from the vDSO code entirely.
> 
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Miroslav Lichvar <mlichvar@redhat.com>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Stephen Boyd <stephen.boyd@linaro.org>
> Cc: Kevin Brodsky <kevin.brodsky@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Daniel Mentz <danielmentz@google.com>
> Reported-by: John Stultz <john.stultz@linaro.org>
> Acked-by: Acked-by: Kevin Brodsky <kevin.brodsky@arm.com>

I don't think Kevin liked it *that* much ^^

Will

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 2/4] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
  2017-05-27  7:36   ` Ingo Molnar
@ 2017-05-30 18:42     ` Daniel Mentz
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Mentz @ 2017-05-30 18:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: John Stultz, lkml, Thomas Gleixner, Miroslav Lichvar,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Kevin Brodsky,
	Will Deacon

On Sat, May 27, 2017 at 12:36 AM, Ingo Molnar <mingo@kernel.org> wrote:
>
> * John Stultz <john.stultz@linaro.org> wrote:
>
>> +     u64 nsecps;
>
> What does the 'ps' postfix stand for? It's not obvious (to me).
>

I guess that nsecps stands for "nanoseconds per second", although the
code appears to be storing that value left shifted by
tk->tkr_raw.shift.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs
       [not found] ` <CAE2F3rBuOJqLs5Cu7A9wEruZj1Vmnpy6qAYW=U9FVAOEP73pdg@mail.gmail.com>
@ 2017-05-31  0:11   ` Daniel Mentz
  0 siblings, 0 replies; 19+ messages in thread
From: Daniel Mentz @ 2017-05-31  0:11 UTC (permalink / raw)
  To: John Stultz
  Cc: lkml, Thomas Gleixner, Ingo Molnar, Miroslav Lichvar,
	Richard Cochran, Prarit Bhargava, Stephen Boyd

I successfully tested these four patches on a HiKey (ARMv8) board
using a 4.9 kernel. I cherry picked various commits from upstream that
touched kernel/time/timekeeping.c to ensure that John's patches apply
cleanly. The inconsistency-check from kselftest that previously failed
because of CLOCK_MONOTONIC_RAW jumping backwards by 1 ns is now
passing.

Tested-by: Daniel Mentz <danielmentz@google.com>

On Tue, May 30, 2017 at 5:10 PM, Daniel Mentz <danielmentz@google.com> wrote:
> I successfully tested these four patches on a HiKey (ARMv8) board using a
> 4.9 kernel. I cherry picked various commits from upstream that touched
> kernel/time/timekeeping.c to ensure that John's patches apply cleanly. The
> inconsistency-check from kselftest that previously failed because of
> CLOCK_MONOTONIC_RAW jumping backwards by 1 ns is now passing.
>
> Tested-by: Daniel Mentz <danielmentz@google.com>
>
> On Fri, May 26, 2017 at 8:33 PM, John Stultz <john.stultz@linaro.org> wrote:
>>
>> As part of the Linaro Linux Kernel Functional Test (LKFT)
>> effort, test failures from kselftest/timer's
>> inconsistency-check were reported connected to
>> CLOCK_MONOTONIC_RAW, on the HiKey platform.
>>
>> Digging in I found that an old issue with how sub-ns accounting
>> is handled with the RAW time which was fixed long ago with the
>> CLOCK_MONOTONIC/REALTIME ids, but missed with RAW time, was
>> present.
>>
>> Additionally, running further tests, I uncovered an issue with
>> how the clocksource read function is handled when clocksources
>> are changed, which can cause crashes.
>>
>> Both of these issues have not been uncovered in x86 based
>> testing due to x86 not using vDSO to accelerate
>> CLOCK_MONOTONIC_RAW, combined with the HiKey's arch_timer
>> clocksource being fast to access but incrementing slowly enough
>> to get multiple reads using the same counter value (which helps
>> uncover time handing issues), along with the fact that none of
>> the x86 clocksources making use of the clocksource argument
>> passed to the read function.
>>
>> This patchset addresses these two issues.
>>
>> Thanks so much to help from Will Deacon in getting the needed
>> adjustments to the arm64 vDSO in place. Also thanks to Daniel
>> Mentz who also properly diagnosed the MONOTONIC_RAW issue in
>> parallel while I was woking on this patchset.
>>
>> I wanted to send these out for some initial review. I'm
>> unfortunately still chasing a third issue (related to
>> inconsistencies triggered during extreme freq adjustments) I've
>> uncovered on HiKey, which doesn't seem to be related to the
>> previous two.
>>
>> As always feedback would be appreciated!
>>
>> thanks
>> -john
>>
>> Cc: Thomas Gleixner <tglx@linutronix.de>
>> Cc: Ingo Molnar <mingo@kernel.org>
>> Cc: Miroslav Lichvar <mlichvar@redhat.com>
>> Cc: Richard Cochran <richardcochran@gmail.com>
>> Cc: Prarit Bhargava <prarit@redhat.com>
>> Cc: Stephen Boyd <stephen.boyd@linaro.org>
>> Cc: Daniel Mentz <danielmentz@google.com>
>>
>>
>> John Stultz (3):
>>   time: Fix clock->read(clock) race around clocksource changes
>>   time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting
>>   time: Clean up CLOCK_MONOTONIC_RAW time handling
>>
>> Will Deacon (1):
>>   arm64: vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW
>>
>>  arch/arm64/kernel/vdso.c              |  5 +-
>>  arch/arm64/kernel/vdso/gettimeofday.S |  1 -
>>  include/linux/timekeeper_internal.h   |  8 +--
>>  kernel/time/timekeeping.c             | 96
>> ++++++++++++++++++++++-------------
>>  4 files changed, 66 insertions(+), 44 deletions(-)
>>
>> --
>> 2.7.4
>>
>

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 4/4] time: Clean up CLOCK_MONOTONIC_RAW time handling
  2017-05-27  3:33 ` [RFC][PATCH 4/4] time: Clean up CLOCK_MONOTONIC_RAW time handling John Stultz
@ 2017-08-25 13:40   ` Chris Wilson
  2017-08-25 18:55     ` John Stultz
  2017-08-25 22:57     ` [RFC][PATCH] time: Fix ktime_get_raw() issues caused by incorrect base accumulation John Stultz
  0 siblings, 2 replies; 19+ messages in thread
From: Chris Wilson @ 2017-08-25 13:40 UTC (permalink / raw)
  To: John Stultz, lkml
  Cc: John Stultz, Thomas Gleixner, Ingo Molnar, Miroslav Lichvar,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Kevin Brodsky,
	Will Deacon, Daniel Mentz

Quoting John Stultz (2017-05-27 04:33:55)
> Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW,
> remove the duplicitive tk->raw_time.tv_nsec, which can be
> stored in tk->tkr_raw.xtime_nsec (similarly to how its handled
> for monotonic time).

This patch breaks tkr_raw pretty badly. It is now accumulating 2x faster
than tkr_mono, with the occasional wacky jump between two reads.

e.g:

@@ -838,6 +840,11 @@ ktime_t ktime_get_raw(void)
 
        } while (read_seqcount_retry(&tk_core.seq, seq));
 
+       pr_err("%s: raw.base=%llu, mono.base=%llu: nsecs=%llu, mono.nsecs=%llu\n",
+                       __func__,
+                       ktime_to_ns(base), ktime_to_ns(tk->tkr_mono.base),
+                       nsecs, timekeeping_get_ns(&tk->tkr_mono));
+

gave
	ktime_get_raw: raw.base=40255680121, mono.base=39940532364: nsecs=261321742, mono.nsecs=318630514
	ktime_get_raw: raw.base=40339680124, mono.base=39940532364: nsecs=345836800, mono.nsecs=403109209

https://bugs.freedesktop.org/show_bug.cgi?id=102336

The patch still reverts cleanly and aiui was not part of the bugfixes,
but a cleanup afterwards; so can be reapplied at leisure.

> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Ingo Molnar <mingo@kernel.org>
> Cc: Miroslav Lichvar <mlichvar@redhat.com>
> Cc: Richard Cochran <richardcochran@gmail.com>
> Cc: Prarit Bhargava <prarit@redhat.com>
> Cc: Stephen Boyd <stephen.boyd@linaro.org>
> Cc: Kevin Brodsky <kevin.brodsky@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Cc: Daniel Mentz <danielmentz@google.com>
> Signed-off-by: John Stultz <john.stultz@linaro.org>
> ---
>  arch/arm64/kernel/vdso.c            |  6 ++---
>  include/linux/timekeeper_internal.h |  4 ++--
>  kernel/time/timekeeping.c           | 47 ++++++++++++++++++++-----------------
>  3 files changed, 29 insertions(+), 28 deletions(-)
> 
> diff --git a/arch/arm64/kernel/vdso.c b/arch/arm64/kernel/vdso.c
> index d0cb007..7492d90 100644
> --- a/arch/arm64/kernel/vdso.c
> +++ b/arch/arm64/kernel/vdso.c
> @@ -220,10 +220,8 @@ void update_vsyscall(struct timekeeper *tk)
>         if (!use_syscall) {
>                 /* tkr_mono.cycle_last == tkr_raw.cycle_last */
>                 vdso_data->cs_cycle_last        = tk->tkr_mono.cycle_last;
> -               vdso_data->raw_time_sec         = tk->raw_time.tv_sec;
> -               vdso_data->raw_time_nsec        = (tk->raw_time.tv_nsec <<
> -                                                  tk->tkr_raw.shift) +
> -                                                 tk->tkr_raw.xtime_nsec;
> +               vdso_data->raw_time_sec         = tk->raw_sec;
> +               vdso_data->raw_time_nsec        = tk->tkr_raw.xtime_nsec;
>                 vdso_data->xtime_clock_sec      = tk->xtime_sec;
>                 vdso_data->xtime_clock_nsec     = tk->tkr_mono.xtime_nsec;
>                 vdso_data->cs_mono_mult         = tk->tkr_mono.mult;
> diff --git a/include/linux/timekeeper_internal.h b/include/linux/timekeeper_internal.h
> index 528cc86..8abf6df 100644
> --- a/include/linux/timekeeper_internal.h
> +++ b/include/linux/timekeeper_internal.h
> @@ -52,7 +52,7 @@ struct tk_read_base {
>   * @clock_was_set_seq: The sequence number of clock was set events
>   * @cs_was_changed_seq:        The sequence number of clocksource change events
>   * @next_leap_ktime:   CLOCK_MONOTONIC time value of a pending leap-second
> - * @raw_time:          Monotonic raw base time in timespec64 format
> + * @raw_sec:           CLOCK_MONOTONIC_RAW  time in seconds
>   * @cycle_interval:    Number of clock cycles in one NTP interval
>   * @xtime_interval:    Number of clock shifted nano seconds in one NTP
>   *                     interval.
> @@ -94,7 +94,7 @@ struct timekeeper {
>         unsigned int            clock_was_set_seq;
>         u8                      cs_was_changed_seq;
>         ktime_t                 next_leap_ktime;
> -       struct timespec64       raw_time;
> +       u64                     raw_sec;
>  
>         /* The following members are for timekeeping internal use */
>         u64                     cycle_interval;
> diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
> index 35d3ba3..0c3b8c1 100644
> --- a/kernel/time/timekeeping.c
> +++ b/kernel/time/timekeeping.c
> @@ -72,6 +72,10 @@ static inline void tk_normalize_xtime(struct timekeeper *tk)
>                 tk->tkr_mono.xtime_nsec -= (u64)NSEC_PER_SEC << tk->tkr_mono.shift;
>                 tk->xtime_sec++;
>         }
> +       while (tk->tkr_raw.xtime_nsec >= ((u64)NSEC_PER_SEC << tk->tkr_raw.shift)) {
> +               tk->tkr_raw.xtime_nsec -= (u64)NSEC_PER_SEC << tk->tkr_raw.shift;
> +               tk->raw_sec++;
> +       }
>  }
>  
>  static inline struct timespec64 tk_xtime(struct timekeeper *tk)
> @@ -287,12 +291,14 @@ static void tk_setup_internals(struct timekeeper *tk, struct clocksource *clock)
>          /* if changing clocks, convert xtime_nsec shift units */
>         if (old_clock) {
>                 int shift_change = clock->shift - old_clock->shift;
> -               if (shift_change < 0)
> +               if (shift_change < 0) {
>                         tk->tkr_mono.xtime_nsec >>= -shift_change;
> -               else
> +                       tk->tkr_raw.xtime_nsec >>= -shift_change;
> +               } else {
>                         tk->tkr_mono.xtime_nsec <<= shift_change;
> +                       tk->tkr_raw.xtime_nsec <<= shift_change;
> +               }
>         }
> -       tk->tkr_raw.xtime_nsec = 0;
>  
>         tk->tkr_mono.shift = clock->shift;
>         tk->tkr_raw.shift = clock->shift;
> @@ -617,9 +623,6 @@ static inline void tk_update_ktime_data(struct timekeeper *tk)
>         nsec = (u32) tk->wall_to_monotonic.tv_nsec;
>         tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);
>  
> -       /* Update the monotonic raw base */
> -       tk->tkr_raw.base = timespec64_to_ktime(tk->raw_time);
> -
>         /*
>          * The sum of the nanoseconds portions of xtime and
>          * wall_to_monotonic can be greater/equal one second. Take
> @@ -629,6 +632,11 @@ static inline void tk_update_ktime_data(struct timekeeper *tk)
>         if (nsec >= NSEC_PER_SEC)
>                 seconds++;
>         tk->ktime_sec = seconds;
> +
> +       /* Update the monotonic raw base */
> +       seconds = tk->raw_sec;
> +       nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
> +       tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);
>  }
>  
>  /* must hold timekeeper_lock */
> @@ -670,7 +678,6 @@ static void timekeeping_update(struct timekeeper *tk, unsigned int action)
>  static void timekeeping_forward_now(struct timekeeper *tk)
>  {
>         u64 cycle_now, delta;
> -       u64 nsec;
>  
>         cycle_now = tk_clock_read(&tk->tkr_mono);
>         delta = clocksource_delta(cycle_now, tk->tkr_mono.cycle_last, tk->tkr_mono.mask);
> @@ -682,10 +689,13 @@ static void timekeeping_forward_now(struct timekeeper *tk)
>         /* If arch requires, add in get_arch_timeoffset() */
>         tk->tkr_mono.xtime_nsec += (u64)arch_gettimeoffset() << tk->tkr_mono.shift;
>  
> -       tk_normalize_xtime(tk);
>  
> -       nsec = clocksource_cyc2ns(delta, tk->tkr_raw.mult, tk->tkr_raw.shift);
> -       timespec64_add_ns(&tk->raw_time, nsec);
> +       tk->tkr_raw.xtime_nsec += delta * tk->tkr_raw.mult;
> +
> +       /* If arch requires, add in get_arch_timeoffset() */
> +       tk->tkr_raw.xtime_nsec += (u64)arch_gettimeoffset() << tk->tkr_raw.shift;
> +
> +       tk_normalize_xtime(tk);
>  }
>  
>  /**
> @@ -1371,19 +1381,18 @@ int timekeeping_notify(struct clocksource *clock)
>  void getrawmonotonic64(struct timespec64 *ts)
>  {
>         struct timekeeper *tk = &tk_core.timekeeper;
> -       struct timespec64 ts64;
>         unsigned long seq;
>         u64 nsecs;
>  
>         do {
>                 seq = read_seqcount_begin(&tk_core.seq);
> +               ts->tv_sec = tk->raw_sec;
>                 nsecs = timekeeping_get_ns(&tk->tkr_raw);
> -               ts64 = tk->raw_time;
>  
>         } while (read_seqcount_retry(&tk_core.seq, seq));
>  
> -       timespec64_add_ns(&ts64, nsecs);
> -       *ts = ts64;
> +       ts->tv_nsec = 0;
> +       timespec64_add_ns(ts, nsecs);
>  }
>  EXPORT_SYMBOL(getrawmonotonic64);
>  
> @@ -1507,8 +1516,7 @@ void __init timekeeping_init(void)
>         tk_setup_internals(tk, clock);
>  
>         tk_set_xtime(tk, &now);
> -       tk->raw_time.tv_sec = 0;
> -       tk->raw_time.tv_nsec = 0;
> +       tk->raw_sec = 0;
>         if (boot.tv_sec == 0 && boot.tv_nsec == 0)
>                 boot = tk_xtime(tk);
>  
> @@ -2009,17 +2017,12 @@ static u64 logarithmic_accumulation(struct timekeeper *tk, u64 offset,
>         *clock_set |= accumulate_nsecs_to_secs(tk);
>  
>         /* Accumulate raw time */
> -       tk->tkr_raw.xtime_nsec += (u64)tk->raw_time.tv_nsec
> -                                       << tk->tkr_raw.shift;
>         tk->tkr_raw.xtime_nsec += tk->raw_interval << shift;
>         nsecps = (u64)NSEC_PER_SEC << tk->tkr_raw.shift;
>         while (tk->tkr_raw.xtime_nsec >= nsecps) {
>                 tk->tkr_raw.xtime_nsec -= nsecps;
> -               tk->raw_time.tv_sec++;
> +               tk->raw_sec++;
>         }
> -       tk->raw_time.tv_nsec = tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift;
> -       tk->tkr_raw.xtime_nsec -= (u64)tk->raw_time.tv_nsec
> -                                       << tk->tkr_raw.shift;
>  
>         /* Accumulate error between NTP and clock interval */
>         tk->ntp_error += tk->ntp_tick << shift;
> -- 
> 2.7.4
> 

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 4/4] time: Clean up CLOCK_MONOTONIC_RAW time handling
  2017-08-25 13:40   ` Chris Wilson
@ 2017-08-25 18:55     ` John Stultz
  2017-08-25 21:16       ` John Stultz
  2017-08-25 22:57     ` [RFC][PATCH] time: Fix ktime_get_raw() issues caused by incorrect base accumulation John Stultz
  1 sibling, 1 reply; 19+ messages in thread
From: John Stultz @ 2017-08-25 18:55 UTC (permalink / raw)
  To: Chris Wilson
  Cc: lkml, Thomas Gleixner, Ingo Molnar, Miroslav Lichvar,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Kevin Brodsky,
	Will Deacon, Daniel Mentz

On Fri, Aug 25, 2017 at 6:40 AM, Chris Wilson <chris@chris-wilson.co.uk> wrote:
> Quoting John Stultz (2017-05-27 04:33:55)
>> Now that we fixed the sub-ns handling for CLOCK_MONOTONIC_RAW,
>> remove the duplicitive tk->raw_time.tv_nsec, which can be
>> stored in tk->tkr_raw.xtime_nsec (similarly to how its handled
>> for monotonic time).
>
> This patch breaks tkr_raw pretty badly. It is now accumulating 2x faster
> than tkr_mono, with the occasional wacky jump between two reads.
>
> e.g:
>
> @@ -838,6 +840,11 @@ ktime_t ktime_get_raw(void)
>
>         } while (read_seqcount_retry(&tk_core.seq, seq));
>
> +       pr_err("%s: raw.base=%llu, mono.base=%llu: nsecs=%llu, mono.nsecs=%llu\n",
> +                       __func__,
> +                       ktime_to_ns(base), ktime_to_ns(tk->tkr_mono.base),
> +                       nsecs, timekeeping_get_ns(&tk->tkr_mono));
> +
>
> gave
>         ktime_get_raw: raw.base=40255680121, mono.base=39940532364: nsecs=261321742, mono.nsecs=318630514
>         ktime_get_raw: raw.base=40339680124, mono.base=39940532364: nsecs=345836800, mono.nsecs=403109209
>
> https://bugs.freedesktop.org/show_bug.cgi?id=102336
>
> The patch still reverts cleanly and aiui was not part of the bugfixes,
> but a cleanup afterwards; so can be reapplied at leisure.

Thanks for the bug report!

Its odd, as I'm not seeing such behavior in my testing (using the
raw_skew or change_skew tests in kselftest/timers).  Can you try
running those tests to see if they fail on your hardware?

>From the bug report, it says it BYT specific, but I'm not sure what
that means. Are there any hardware specific details you can share?
What clocksource is being used? Can you send a dmesg log?

I'll look over the code again to see if I can catch anything by
review. Worse case if we can't get any traction on this in a day or so
I'll submit a revert.

thanks
-john

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH 4/4] time: Clean up CLOCK_MONOTONIC_RAW time handling
  2017-08-25 18:55     ` John Stultz
@ 2017-08-25 21:16       ` John Stultz
  0 siblings, 0 replies; 19+ messages in thread
From: John Stultz @ 2017-08-25 21:16 UTC (permalink / raw)
  To: Chris Wilson
  Cc: lkml, Thomas Gleixner, Ingo Molnar, Miroslav Lichvar,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Kevin Brodsky,
	Will Deacon, Daniel Mentz

On Fri, Aug 25, 2017 at 11:55 AM, John Stultz <john.stultz@linaro.org> wrote:
> I'll look over the code again to see if I can catch anything by
> review. Worse case if we can't get any traction on this in a day or so
> I'll submit a revert.

I think I found the issue. In tk_update_ktime_data() I add the raw_sec
and shifted down  tk->tkr_raw.xtime_nsec to the base. But we already
add the tk->tkr_raw.xtime_nsec to the offset and shift it all down in
the timekeeping_delta_to_ns called from ktime_get_raw, so we
effectively are accumulating the nsecs portion faster then we should.

This only crops up for internal ktime_get_raw() users, but not
getrawmonotonic64() which uses the timespec generation rather then the
ktime method, which is why this wasn't seen by userspace time tests.

I'll send a patch for testing shortly.

thanks
-john

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [RFC][PATCH] time: Fix ktime_get_raw() issues caused by incorrect base accumulation
  2017-08-25 13:40   ` Chris Wilson
  2017-08-25 18:55     ` John Stultz
@ 2017-08-25 22:57     ` John Stultz
  2017-08-26 10:20       ` Chris Wilson
  2017-08-26 14:10       ` [tip:timers/urgent] time: Fix ktime_get_raw() " tip-bot for John Stultz
  1 sibling, 2 replies; 19+ messages in thread
From: John Stultz @ 2017-08-25 22:57 UTC (permalink / raw)
  To: lkml
  Cc: John Stultz, Thomas Gleixner, Ingo Molnar, Miroslav Lichvar,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Kevin Brodsky,
	Will Deacon, Daniel Mentz, Chris Wilson

In commit fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time
handling"), I mistakenly added the following:

 /* Update the monotonic raw base */
 seconds = tk->raw_sec;
 nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
 tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);

Which adds the raw_sec value and the shifted down raw xtime_nsec
to the base value.

This is problematic as when calling ktime_get_raw(), we add the
tk->tkr_raw.xtime_nsec and current offset, shift it down and add
it to the raw base.

This results in the shifted down tk->tkr_raw.xtime_nsec being
added twice.

My mistake, was that I was matching the monotonic base logic
above:

 seconds = (u64)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec);
 nsec = (u32) tk->wall_to_monotonic.tv_nsec;
 tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);

Which adds the wall_to_monotonic.tv_nsec value, but not the
tk->tkr_mono.xtime_nsec value to the base.

The result of this is that ktime_get_raw() users (which are all
internal users) see the raw time move faster then it should
(the rate at which can vary with the current size of
tkr_raw.xtime_nsec), which has resulted in at least problems
with graphics rendering performance.

To fix this, we simplify the tkr_raw.base accumulation to only
accumulate the raw_sec portion, and do not include the
tkr_raw.xtime_nsec portion, which will be added at read time.

Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Daniel Mentz <danielmentz@google.com>
Cc: Chris Wilson <chris@chris-wilson.co.uk>
Reported-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: John Stultz <john.stultz@linaro.org>
---
 kernel/time/timekeeping.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cedafa0..7e7e61c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -637,9 +637,7 @@ static inline void tk_update_ktime_data(struct timekeeper *tk)
 	tk->ktime_sec = seconds;
 
 	/* Update the monotonic raw base */
-	seconds = tk->raw_sec;
-	nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
-	tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);
+	tk->tkr_raw.base = ns_to_ktime(tk->raw_sec * NSEC_PER_SEC);
 }
 
 /* must hold timekeeper_lock */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [RFC][PATCH] time: Fix ktime_get_raw() issues caused by incorrect base accumulation
  2017-08-25 22:57     ` [RFC][PATCH] time: Fix ktime_get_raw() issues caused by incorrect base accumulation John Stultz
@ 2017-08-26 10:20       ` Chris Wilson
  2017-08-26 14:10       ` [tip:timers/urgent] time: Fix ktime_get_raw() " tip-bot for John Stultz
  1 sibling, 0 replies; 19+ messages in thread
From: Chris Wilson @ 2017-08-26 10:20 UTC (permalink / raw)
  To: John Stultz, lkml
  Cc: John Stultz, Thomas Gleixner, Ingo Molnar, Miroslav Lichvar,
	Richard Cochran, Prarit Bhargava, Stephen Boyd, Kevin Brodsky,
	Will Deacon, Daniel Mentz

Quoting John Stultz (2017-08-25 23:57:04)
> In commit fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time
> handling"), I mistakenly added the following:
> 
>  /* Update the monotonic raw base */
>  seconds = tk->raw_sec;
>  nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
>  tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);
> 
> Which adds the raw_sec value and the shifted down raw xtime_nsec
> to the base value.
> 
> This is problematic as when calling ktime_get_raw(), we add the
> tk->tkr_raw.xtime_nsec and current offset, shift it down and add
> it to the raw base.
> 
> This results in the shifted down tk->tkr_raw.xtime_nsec being
> added twice.
> 
> My mistake, was that I was matching the monotonic base logic
> above:
> 
>  seconds = (u64)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec);
>  nsec = (u32) tk->wall_to_monotonic.tv_nsec;
>  tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);
> 
> Which adds the wall_to_monotonic.tv_nsec value, but not the
> tk->tkr_mono.xtime_nsec value to the base.
> 
> The result of this is that ktime_get_raw() users (which are all
> internal users) see the raw time move faster then it should
> (the rate at which can vary with the current size of
> tkr_raw.xtime_nsec), which has resulted in at least problems
> with graphics rendering performance.
> 
> To fix this, we simplify the tkr_raw.base accumulation to only
> accumulate the raw_sec portion, and do not include the
> tkr_raw.xtime_nsec portion, which will be added at read time.

This fixes our testcase of using ktime_get_raw() deltas. Thanks,
Tested-by: Chris Wilson <chris@chris-wilson.co.uk>
-Chris

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [tip:timers/urgent] time: Fix ktime_get_raw() incorrect base accumulation
  2017-08-25 22:57     ` [RFC][PATCH] time: Fix ktime_get_raw() issues caused by incorrect base accumulation John Stultz
  2017-08-26 10:20       ` Chris Wilson
@ 2017-08-26 14:10       ` tip-bot for John Stultz
  1 sibling, 0 replies; 19+ messages in thread
From: tip-bot for John Stultz @ 2017-08-26 14:10 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: prarit, hpa, kevin.brodsky, tglx, mingo, richardcochran, chris,
	john.stultz, will.deacon, stephen.boyd, mlichvar, linux-kernel,
	danielmentz

Commit-ID:  0bcdc0987cce9880436b70836c6a92bb8e744fd1
Gitweb:     http://git.kernel.org/tip/0bcdc0987cce9880436b70836c6a92bb8e744fd1
Author:     John Stultz <john.stultz@linaro.org>
AuthorDate: Fri, 25 Aug 2017 15:57:04 -0700
Committer:  Thomas Gleixner <tglx@linutronix.de>
CommitDate: Sat, 26 Aug 2017 16:06:12 +0200

time: Fix ktime_get_raw() incorrect base accumulation

In comqit fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time
handling"), the following code got mistakenly added to the update of the
raw timekeeper:

 /* Update the monotonic raw base */
 seconds = tk->raw_sec;
 nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
 tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);

Which adds the raw_sec value and the shifted down raw xtime_nsec to the
base value.

But the read function adds the shifted down tk->tkr_raw.xtime_nsec value
another time, The result of this is that ktime_get_raw() users (which are
all internal users) see the raw time move faster then it should (the rate
at which can vary with the current size of tkr_raw.xtime_nsec), which has
resulted in at least problems with graphics rendering performance.

The change tried to match the monotonic base update logic:

 seconds = (u64)(tk->xtime_sec + tk->wall_to_monotonic.tv_sec);
 nsec = (u32) tk->wall_to_monotonic.tv_nsec;
 tk->tkr_mono.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);

Which adds the wall_to_monotonic.tv_nsec value, but not the
tk->tkr_mono.xtime_nsec value to the base.

To fix this, simplify the tkr_raw.base accumulation to only accumulate the
raw_sec portion, and do not include the tkr_raw.xtime_nsec portion, which
will be added at read time.

Fixes: fc6eead7c1e2 ("time: Clean up CLOCK_MONOTONIC_RAW time handling")
Reported-and-tested-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: John Stultz <john.stultz@linaro.org>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Cc: Prarit Bhargava <prarit@redhat.com>
Cc: Kevin Brodsky <kevin.brodsky@arm.com>
Cc: Richard Cochran <richardcochran@gmail.com>
Cc: Stephen Boyd <stephen.boyd@linaro.org>
Cc: Will Deacon <will.deacon@arm.com>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
Cc: Daniel Mentz <danielmentz@google.com>
Link: http://lkml.kernel.org/r/1503701824-1645-1-git-send-email-john.stultz@linaro.org

---
 kernel/time/timekeeping.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index cedafa0..7e7e61c 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -637,9 +637,7 @@ static inline void tk_update_ktime_data(struct timekeeper *tk)
 	tk->ktime_sec = seconds;
 
 	/* Update the monotonic raw base */
-	seconds = tk->raw_sec;
-	nsec = (u32)(tk->tkr_raw.xtime_nsec >> tk->tkr_raw.shift);
-	tk->tkr_raw.base = ns_to_ktime(seconds * NSEC_PER_SEC + nsec);
+	tk->tkr_raw.base = ns_to_ktime(tk->raw_sec * NSEC_PER_SEC);
 }
 
 /* must hold timekeeper_lock */

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2017-08-26 14:14 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-27  3:33 [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs John Stultz
2017-05-27  3:33 ` [RFC][PATCH 1/4] time: Fix clock->read(clock) race around clocksource changes John Stultz
2017-05-27  7:31   ` Ingo Molnar
2017-05-27  3:33 ` [RFC][PATCH 2/4] time: Fix CLOCK_MONOTONIC_RAW sub-nanosecond accounting John Stultz
2017-05-27  7:36   ` Ingo Molnar
2017-05-30 18:42     ` Daniel Mentz
2017-05-27  3:33 ` [RFC][PATCH 3/4] arm64: vdso: Fix nsec handling for CLOCK_MONOTONIC_RAW John Stultz
2017-05-30  9:38   ` Will Deacon
2017-05-27  3:33 ` [RFC][PATCH 4/4] time: Clean up CLOCK_MONOTONIC_RAW time handling John Stultz
2017-08-25 13:40   ` Chris Wilson
2017-08-25 18:55     ` John Stultz
2017-08-25 21:16       ` John Stultz
2017-08-25 22:57     ` [RFC][PATCH] time: Fix ktime_get_raw() issues caused by incorrect base accumulation John Stultz
2017-08-26 10:20       ` Chris Wilson
2017-08-26 14:10       ` [tip:timers/urgent] time: Fix ktime_get_raw() " tip-bot for John Stultz
2017-05-27  7:38 ` [RFC][PATCH 0/4] Fixes for two recently found timekeeping bugs Ingo Molnar
2017-05-27 16:16   ` John Stultz
2017-05-28  8:54     ` Ingo Molnar
     [not found] ` <CAE2F3rBuOJqLs5Cu7A9wEruZj1Vmnpy6qAYW=U9FVAOEP73pdg@mail.gmail.com>
2017-05-31  0:11   ` Daniel Mentz

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).