All of lore.kernel.org
 help / color / mirror / Atom feed
* 'perf test tsc' failing, bisected to "sched/clock: Provide better clock continuity"
@ 2017-03-16 14:01 Arnaldo Carvalho de Melo
  2017-03-16 14:53 ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-16 14:01 UTC (permalink / raw)
  To: Peter Zijlstra, Adrian Hunter
  Cc: Jiri Olsa, Namhyung Kim, Wang Nan, Linux Kernel Mailing List

Hi, this entry is failing for a while:

[root@jouet ~]# perf test -v tsc
55: Convert perf time to TSC                   :
--- start ---
test child forked, pid 3008
mmap size 528384B
1st event perf time 93133455486631 tsc 15369449468752
rdtsc          time 93133464598760 tsc 15369473104358
2nd event perf time 93133455506961 tsc 15369449521485
test child finished with -1
---- end ----
Convert perf time to TSC: FAILED!
[root@jouet ~]#

I bisected it to the following kernel change, ideas?

[acme@felicio linux]$ git bisect good
5680d8094ffa9e5cfc81afdd865027ee6417c263 is the first bad commit
commit 5680d8094ffa9e5cfc81afdd865027ee6417c263
Author: Peter Zijlstra <peterz@infradead.org>
Date:   Thu Dec 15 13:36:17 2016 +0100

    sched/clock: Provide better clock continuity
    
    When switching between the unstable and stable variants it is
    currently possible that clock discontinuities occur.
    
    And while these will mostly be 'small', attempt to do better.
    
    As observed on my IVB-EP, the sched_clock() is ~1.5s ahead of the
    ktime_get_ns() based timeline at the point of switchover
    (sched_clock_init_late()) after SMP bringup.
    
    Equally, when the TSC is later found to be unstable -- typically
    because SMM tries to hide its SMI latencies by mucking with the TSC --
    we want to avoid large jumps.
    
    Since the clocksource watchdog reports the issue after the fact we
    cannot exactly fix up time, but since SMI latencies are typically
    small (~10ns range), the discontinuity is mainly due to drift between
    sched_clock() and ktime_get_ns() (which on my desktop is ~79s over
    24days).
    
    I dislike this patch because it adds overhead to the good case in
    favour of dealing with badness. But given the widespread failure of
    TSC stability this is worth it.
    
    Note that in case the TSC makes drastic jumps after SMP bringup we're
    still hosed. There's just not much we can do in that case without
    stupid overhead.
    
    If we were to somehow expose tsc_clocksource_reliable (which is hard
    because this code is also used on ia64 and parisc) we could avoid some
    of the newly introduced overhead.
    
    Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
    Cc: Linus Torvalds <torvalds@linux-foundation.org>
    Cc: Mike Galbraith <efault@gmx.de>
    Cc: Peter Zijlstra <peterz@infradead.org>
    Cc: Thomas Gleixner <tglx@linutronix.de>
    Cc: linux-kernel@vger.kernel.org
    Signed-off-by: Ingo Molnar <mingo@kernel.org>

:040000 040000 152545abe3b879aaa3cf053cdd58ef998c285529 3afcd0a5bc643fdd0fc994ee11cbfd87cfe4c30f M	kernel
[acme@felicio linux]$ 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 'perf test tsc' failing, bisected to "sched/clock: Provide better clock continuity"
  2017-03-16 14:01 'perf test tsc' failing, bisected to "sched/clock: Provide better clock continuity" Arnaldo Carvalho de Melo
@ 2017-03-16 14:53 ` Peter Zijlstra
  2017-03-16 18:11   ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2017-03-16 14:53 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Adrian Hunter, Jiri Olsa, Namhyung Kim, Wang Nan,
	Linux Kernel Mailing List

On Thu, Mar 16, 2017 at 11:01:03AM -0300, Arnaldo Carvalho de Melo wrote:
> Hi, this entry is failing for a while:
> 
> [root@jouet ~]# perf test -v tsc
> 55: Convert perf time to TSC                   :
> --- start ---
> test child forked, pid 3008
> mmap size 528384B
> 1st event perf time 93133455486631 tsc 15369449468752
> rdtsc          time 93133464598760 tsc 15369473104358
> 2nd event perf time 93133455506961 tsc 15369449521485
> test child finished with -1
> ---- end ----
> Convert perf time to TSC: FAILED!
> [root@jouet ~]#
> 
> I bisected it to the following kernel change, ideas?
> 
> [acme@felicio linux]$ git bisect good
> 5680d8094ffa9e5cfc81afdd865027ee6417c263 is the first bad commit
> commit 5680d8094ffa9e5cfc81afdd865027ee6417c263
> Author: Peter Zijlstra <peterz@infradead.org>
> Date:   Thu Dec 15 13:36:17 2016 +0100
> 
>     sched/clock: Provide better clock continuity

Right, ahunter also complained about this. I had a half arsed fugly
patch in the works. Let me see if I can improve and finish.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 'perf test tsc' failing, bisected to "sched/clock: Provide better clock continuity"
  2017-03-16 14:53 ` Peter Zijlstra
@ 2017-03-16 18:11   ` Peter Zijlstra
  2017-03-16 18:36     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2017-03-16 18:11 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Adrian Hunter, Jiri Olsa, Namhyung Kim, Wang Nan,
	Linux Kernel Mailing List

On Thu, Mar 16, 2017 at 03:53:11PM +0100, Peter Zijlstra wrote:
> On Thu, Mar 16, 2017 at 11:01:03AM -0300, Arnaldo Carvalho de Melo wrote:
> > Hi, this entry is failing for a while:
> > 
> > [root@jouet ~]# perf test -v tsc
> > 55: Convert perf time to TSC                   :
> > --- start ---
> > test child forked, pid 3008
> > mmap size 528384B
> > 1st event perf time 93133455486631 tsc 15369449468752
> > rdtsc          time 93133464598760 tsc 15369473104358
> > 2nd event perf time 93133455506961 tsc 15369449521485
> > test child finished with -1
> > ---- end ----
> > Convert perf time to TSC: FAILED!
> > [root@jouet ~]#
> > 
> > I bisected it to the following kernel change, ideas?
> > 
> > [acme@felicio linux]$ git bisect good
> > 5680d8094ffa9e5cfc81afdd865027ee6417c263 is the first bad commit
> > commit 5680d8094ffa9e5cfc81afdd865027ee6417c263
> > Author: Peter Zijlstra <peterz@infradead.org>
> > Date:   Thu Dec 15 13:36:17 2016 +0100
> > 
> >     sched/clock: Provide better clock continuity
> 
> Right, ahunter also complained about this. I had a half arsed fugly
> patch in the works. Let me see if I can improve and finish.

The below seems to cure things for me...


---
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2244,6 +2244,7 @@ void arch_perf_update_userpage(struct pe
 			       struct perf_event_mmap_page *userpg, u64 now)
 {
 	struct cyc2ns_data *data;
+	u64 offset;
 
 	userpg->cap_user_time = 0;
 	userpg->cap_user_time_zero = 0;
@@ -2251,11 +2252,13 @@ void arch_perf_update_userpage(struct pe
 		!!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED);
 	userpg->pmc_width = x86_pmu.cntval_bits;
 
-	if (!sched_clock_stable())
+	if (!using_native_sched_clock() || !sched_clock_stable())
 		return;
 
 	data = cyc2ns_read_begin();
 
+	offset = data->cyc2ns_offset + __sched_clock_offset;
+
 	/*
 	 * Internal timekeeping for enabled/running/stopped times
 	 * is always in the local_clock domain.
@@ -2263,7 +2266,7 @@ void arch_perf_update_userpage(struct pe
 	userpg->cap_user_time = 1;
 	userpg->time_mult = data->cyc2ns_mul;
 	userpg->time_shift = data->cyc2ns_shift;
-	userpg->time_offset = data->cyc2ns_offset - now;
+	userpg->time_offset = offset - now;
 
 	/*
 	 * cap_user_time_zero doesn't make sense when we're using a different
@@ -2271,7 +2274,7 @@ void arch_perf_update_userpage(struct pe
 	 */
 	if (!event->attr.use_clockid) {
 		userpg->cap_user_time_zero = 1;
-		userpg->time_zero = data->cyc2ns_offset;
+		userpg->time_zero = offset;
 	}
 
 	cyc2ns_read_end(data);
--- a/arch/x86/include/asm/timer.h
+++ b/arch/x86/include/asm/timer.h
@@ -12,6 +12,8 @@ extern int recalibrate_cpu_khz(void);
 
 extern int no_timer_check;
 
+extern bool using_native_sched_clock(void);
+
 /*
  * We use the full linear equation: f(x) = a + b*x, in order to allow
  * a continuous function in the face of dynamic freq changes.
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -328,7 +328,7 @@ unsigned long long sched_clock(void)
 	return paravirt_sched_clock();
 }
 
-static inline bool using_native_sched_clock(void)
+bool using_native_sched_clock(void)
 {
 	return pv_time_ops.sched_clock == native_sched_clock;
 }
@@ -336,7 +336,7 @@ static inline bool using_native_sched_cl
 unsigned long long
 sched_clock(void) __attribute__((alias("native_sched_clock")));
 
-static inline bool using_native_sched_clock(void) { return true; }
+bool using_native_sched_clock(void) { return true; }
 #endif
 
 int check_tsc_unstable(void)
--- a/include/linux/sched/clock.h
+++ b/include/linux/sched/clock.h
@@ -54,15 +54,16 @@ static inline u64 local_clock(void)
 }
 #else
 extern void sched_clock_init_late(void);
-/*
- * Architectures can set this to 1 if they have specified
- * CONFIG_HAVE_UNSTABLE_SCHED_CLOCK in their arch Kconfig,
- * but then during bootup it turns out that sched_clock()
- * is reliable after all:
- */
 extern int sched_clock_stable(void);
 extern void clear_sched_clock_stable(void);
 
+/*
+ * When sched_clock_stable(), __sched_clock_offset provides the offset
+ * between local_clock() and sched_clock().
+ */
+extern u64 __sched_clock_offset;
+
+
 extern void sched_clock_tick(void);
 extern void sched_clock_idle_sleep_event(void);
 extern void sched_clock_idle_wakeup_event(u64 delta_ns);
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -96,10 +96,10 @@ static DEFINE_STATIC_KEY_FALSE(__sched_c
 static int __sched_clock_stable_early = 1;
 
 /*
- * We want: ktime_get_ns() + gtod_offset == sched_clock() + raw_offset
+ * We want: ktime_get_ns() + __gtod_offset == sched_clock() + __sched_clock_offset
  */
-static __read_mostly u64 raw_offset;
-static __read_mostly u64 gtod_offset;
+__read_mostly u64 __sched_clock_offset;
+static __read_mostly u64 __gtod_offset;
 
 struct sched_clock_data {
 	u64			tick_raw;
@@ -131,11 +131,11 @@ static void __set_sched_clock_stable(voi
 	/*
 	 * Attempt to make the (initial) unstable->stable transition continuous.
 	 */
-	raw_offset = (scd->tick_gtod + gtod_offset) - (scd->tick_raw);
+	__sched_clock_offset = (scd->tick_gtod + __gtod_offset) - (scd->tick_raw);
 
 	printk(KERN_INFO "sched_clock: Marking stable (%lld, %lld)->(%lld, %lld)\n",
-			scd->tick_gtod, gtod_offset,
-			scd->tick_raw,  raw_offset);
+			scd->tick_gtod, __gtod_offset,
+			scd->tick_raw,  __sched_clock_offset);
 
 	static_branch_enable(&__sched_clock_stable);
 	tick_dep_clear(TICK_DEP_BIT_CLOCK_UNSTABLE);
@@ -161,11 +161,11 @@ static void __clear_sched_clock_stable(v
 	 *
 	 * Still do what we can.
 	 */
-	gtod_offset = (scd->tick_raw + raw_offset) - (scd->tick_gtod);
+	__gtod_offset = (scd->tick_raw + __sched_clock_offset) - (scd->tick_gtod);
 
 	printk(KERN_INFO "sched_clock: Marking unstable (%lld, %lld)<-(%lld, %lld)\n",
-			scd->tick_gtod, gtod_offset,
-			scd->tick_raw,  raw_offset);
+			scd->tick_gtod, __gtod_offset,
+			scd->tick_raw,  __sched_clock_offset);
 
 	tick_dep_set(TICK_DEP_BIT_CLOCK_UNSTABLE);
 
@@ -238,7 +238,7 @@ static u64 sched_clock_local(struct sche
 	 *		      scd->tick_gtod + TICK_NSEC);
 	 */
 
-	clock = scd->tick_gtod + gtod_offset + delta;
+	clock = scd->tick_gtod + __gtod_offset + delta;
 	min_clock = wrap_max(scd->tick_gtod, old_clock);
 	max_clock = wrap_max(old_clock, scd->tick_gtod + TICK_NSEC);
 
@@ -324,7 +324,7 @@ u64 sched_clock_cpu(int cpu)
 	u64 clock;
 
 	if (sched_clock_stable())
-		return sched_clock() + raw_offset;
+		return sched_clock() + __sched_clock_offset;
 
 	if (unlikely(!sched_clock_running))
 		return 0ull;

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 'perf test tsc' failing, bisected to "sched/clock: Provide better clock continuity"
  2017-03-16 18:11   ` Peter Zijlstra
@ 2017-03-16 18:36     ` Arnaldo Carvalho de Melo
  2017-03-16 19:21       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-16 18:36 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Adrian Hunter, Jiri Olsa, Namhyung Kim, Wang Nan,
	Linux Kernel Mailing List

Em Thu, Mar 16, 2017 at 07:11:21PM +0100, Peter Zijlstra escreveu:
> On Thu, Mar 16, 2017 at 03:53:11PM +0100, Peter Zijlstra wrote:
> > On Thu, Mar 16, 2017 at 11:01:03AM -0300, Arnaldo Carvalho de Melo wrote:
> > > [root@jouet ~]# perf test -v tsc
> > > 55: Convert perf time to TSC                   :
> > > Convert perf time to TSC: FAILED!
> > >     sched/clock: Provide better clock continuity

> > Right, ahunter also complained about this. I had a half arsed fugly
> > patch in the works. Let me see if I can improve and finish.
 
> The below seems to cure things for me...

Building...

- Arnaldo

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 'perf test tsc' failing, bisected to "sched/clock: Provide better clock continuity"
  2017-03-16 18:36     ` Arnaldo Carvalho de Melo
@ 2017-03-16 19:21       ` Arnaldo Carvalho de Melo
  2017-03-16 19:22         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-16 19:21 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Adrian Hunter, Jiri Olsa, Namhyung Kim, Wang Nan,
	Linux Kernel Mailing List

Em Thu, Mar 16, 2017 at 03:36:17PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Mar 16, 2017 at 07:11:21PM +0100, Peter Zijlstra escreveu:
> > On Thu, Mar 16, 2017 at 03:53:11PM +0100, Peter Zijlstra wrote:
> > > On Thu, Mar 16, 2017 at 11:01:03AM -0300, Arnaldo Carvalho de Melo wrote:
> > > > [root@jouet ~]# perf test -v tsc
> > > > 55: Convert perf time to TSC                   :
> > > > Convert perf time to TSC: FAILED!
> > > >     sched/clock: Provide better clock continuity
> 
> > > Right, ahunter also complained about this. I had a half arsed fugly
> > > patch in the works. Let me see if I can improve and finish.
>  
> > The below seems to cure things for me...
> 
> Building...

Didn't apply to perf/urgent:

[acme@jouet linux]$ patch -p1 < /wb/1.patch 
patching file arch/x86/events/core.c
Hunk #1 succeeded at 2243 (offset -1 lines).
Hunk #2 succeeded at 2251 (offset -1 lines).
Hunk #3 succeeded at 2265 (offset -1 lines).
Hunk #4 succeeded at 2273 (offset -1 lines).
patching file arch/x86/include/asm/timer.h
patching file arch/x86/kernel/tsc.c
Hunk #1 FAILED at 328.
Hunk #2 FAILED at 336.
2 out of 2 hunks FAILED -- saving rejects to file arch/x86/kernel/tsc.c.rej
can't find file to patch at input line 182
Perhaps you used the wrong -p or --strip option?
The text leading up to this was:
--------------------------
|--- a/include/linux/sched/clock.h
|+++ b/include/linux/sched/clock.h
--------------------------
File to patch: ^C
[acme@jouet linux]$ 
[acme@jouet linux]$

Trying with perf/core...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 'perf test tsc' failing, bisected to "sched/clock: Provide better clock continuity"
  2017-03-16 19:21       ` Arnaldo Carvalho de Melo
@ 2017-03-16 19:22         ` Arnaldo Carvalho de Melo
  2017-03-20 13:20           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-16 19:22 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Peter Zijlstra, Adrian Hunter, Jiri Olsa, Namhyung Kim, Wang Nan,
	Linux Kernel Mailing List

Em Thu, Mar 16, 2017 at 04:21:38PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Mar 16, 2017 at 03:36:17PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Em Thu, Mar 16, 2017 at 07:11:21PM +0100, Peter Zijlstra escreveu:
> > > On Thu, Mar 16, 2017 at 03:53:11PM +0100, Peter Zijlstra wrote:
> > > > On Thu, Mar 16, 2017 at 11:01:03AM -0300, Arnaldo Carvalho de Melo wrote:
> > > > > [root@jouet ~]# perf test -v tsc
> > > > > 55: Convert perf time to TSC                   :
> > > > > Convert perf time to TSC: FAILED!
> > > > >     sched/clock: Provide better clock continuity
> > 
> > > > Right, ahunter also complained about this. I had a half arsed fugly
> > > > patch in the works. Let me see if I can improve and finish.
> >  
> > > The below seems to cure things for me...
> > 
> > Building...
> 
> Didn't apply to perf/urgent:
> 
> [acme@jouet linux]$ patch -p1 < /wb/1.patch 
> patching file arch/x86/events/core.c
> Hunk #1 succeeded at 2243 (offset -1 lines).
> Hunk #2 succeeded at 2251 (offset -1 lines).
> Hunk #3 succeeded at 2265 (offset -1 lines).
> Hunk #4 succeeded at 2273 (offset -1 lines).
> patching file arch/x86/include/asm/timer.h
> patching file arch/x86/kernel/tsc.c
> Hunk #1 FAILED at 328.

perf/core almost super clean:

[acme@felicio linux]$ patch -p1 < /wb/1.patch 
patching file arch/x86/events/core.c
patching file arch/x86/include/asm/timer.h
patching file arch/x86/kernel/tsc.c
patching file include/linux/sched/clock.h
patching file kernel/sched/clock.c
Hunk #3 succeeded at 154 with fuzz 2 (offset -7 lines).
Hunk #4 succeeded at 231 (offset -7 lines).
Hunk #5 succeeded at 317 (offset -7 lines).
[acme@felicio linux]$

Building...

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: 'perf test tsc' failing, bisected to "sched/clock: Provide better clock continuity"
  2017-03-16 19:22         ` Arnaldo Carvalho de Melo
@ 2017-03-20 13:20           ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 7+ messages in thread
From: Arnaldo Carvalho de Melo @ 2017-03-20 13:20 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Adrian Hunter, Jiri Olsa, Namhyung Kim, Wang Nan,
	Linux Kernel Mailing List

Em Thu, Mar 16, 2017 at 04:22:31PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Mar 16, 2017 at 04:21:38PM -0300, Arnaldo Carvalho de Melo escreveu:
> > Didn't apply to perf/urgent:
> > 
> > [acme@jouet linux]$ patch -p1 < /wb/1.patch 
> > patching file arch/x86/events/core.c
> > Hunk #1 succeeded at 2243 (offset -1 lines).
> > Hunk #2 succeeded at 2251 (offset -1 lines).
> > Hunk #3 succeeded at 2265 (offset -1 lines).
> > Hunk #4 succeeded at 2273 (offset -1 lines).
> > patching file arch/x86/include/asm/timer.h
> > patching file arch/x86/kernel/tsc.c
> > Hunk #1 FAILED at 328.
 
> perf/core almost super clean:
 
> [acme@felicio linux]$ patch -p1 < /wb/1.patch 
> patching file arch/x86/events/core.c
> patching file arch/x86/include/asm/timer.h
> patching file arch/x86/kernel/tsc.c
> patching file include/linux/sched/clock.h
> patching file kernel/sched/clock.c
> Hunk #3 succeeded at 154 with fuzz 2 (offset -7 lines).
> Hunk #4 succeeded at 231 (offset -7 lines).
> Hunk #5 succeeded at 317 (offset -7 lines).
> [acme@felicio linux]$
> 
> Building...

Thanks, the test is happy now, see below, please stick my:

Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>

[acme@felicio linux]$ uname -a
Linux felicio.ghostprotocols.net 4.11.0-rc2+ #30 SMP Mon Mar 20 09:47:16 BRT 2017 x86_64 x86_64 x86_64 GNU/Linux
[acme@felicio ~]$ perf test tsc
54: Convert perf time to TSC                   : Ok
[acme@felicio ~]$ perf test -v tsc
54: Convert perf time to TSC                   :
--- start ---
test child forked, pid 10096
mmap size 528384B
1st event perf time 791386772903 tsc 2539203231623
rdtsc          time 791386773842 tsc 2539203234547
2nd event perf time 791386774062 tsc 2539203235229
test child finished with 0
---- end ----
Convert perf time to TSC: Ok
[acme@felicio ~]$

Patch as applied here, i.e. after those fuzzies applying on
tip/perf/core:


diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index 349d4d17aa7f..ed7e4a10c293 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -2244,6 +2244,7 @@ void arch_perf_update_userpage(struct perf_event *event,
 			       struct perf_event_mmap_page *userpg, u64 now)
 {
 	struct cyc2ns_data *data;
+	u64 offset;
 
 	userpg->cap_user_time = 0;
 	userpg->cap_user_time_zero = 0;
@@ -2251,11 +2252,13 @@ void arch_perf_update_userpage(struct perf_event *event,
 		!!(event->hw.flags & PERF_X86_EVENT_RDPMC_ALLOWED);
 	userpg->pmc_width = x86_pmu.cntval_bits;
 
-	if (!sched_clock_stable())
+	if (!using_native_sched_clock() || !sched_clock_stable())
 		return;
 
 	data = cyc2ns_read_begin();
 
+	offset = data->cyc2ns_offset + __sched_clock_offset;
+
 	/*
 	 * Internal timekeeping for enabled/running/stopped times
 	 * is always in the local_clock domain.
@@ -2263,7 +2266,7 @@ void arch_perf_update_userpage(struct perf_event *event,
 	userpg->cap_user_time = 1;
 	userpg->time_mult = data->cyc2ns_mul;
 	userpg->time_shift = data->cyc2ns_shift;
-	userpg->time_offset = data->cyc2ns_offset - now;
+	userpg->time_offset = offset - now;
 
 	/*
 	 * cap_user_time_zero doesn't make sense when we're using a different
@@ -2271,7 +2274,7 @@ void arch_perf_update_userpage(struct perf_event *event,
 	 */
 	if (!event->attr.use_clockid) {
 		userpg->cap_user_time_zero = 1;
-		userpg->time_zero = data->cyc2ns_offset;
+		userpg->time_zero = offset;
 	}
 
 	cyc2ns_read_end(data);
diff --git a/arch/x86/include/asm/timer.h b/arch/x86/include/asm/timer.h
index a04eabd43d06..27e9f9d769b8 100644
--- a/arch/x86/include/asm/timer.h
+++ b/arch/x86/include/asm/timer.h
@@ -12,6 +12,8 @@
 
 extern int no_timer_check;
 
+extern bool using_native_sched_clock(void);
+
 /*
  * We use the full linear equation: f(x) = a + b*x, in order to allow
  * a continuous function in the face of dynamic freq changes.
diff --git a/arch/x86/kernel/tsc.c b/arch/x86/kernel/tsc.c
index 4f7a9833d8e5..de38a141a52a 100644
--- a/arch/x86/kernel/tsc.c
+++ b/arch/x86/kernel/tsc.c
@@ -328,7 +328,7 @@ unsigned long long sched_clock(void)
 	return paravirt_sched_clock();
 }
 
-static inline bool using_native_sched_clock(void)
+bool using_native_sched_clock(void)
 {
 	return pv_time_ops.sched_clock == native_sched_clock;
 }
@@ -336,7 +336,7 @@ static inline bool using_native_sched_clock(void)
 unsigned long long
 sched_clock(void) __attribute__((alias("native_sched_clock")));
 
-static inline bool using_native_sched_clock(void) { return true; }
+bool using_native_sched_clock(void) { return true; }
 #endif
 
 int check_tsc_unstable(void)
diff --git a/include/linux/sched/clock.h b/include/linux/sched/clock.h
index 4a68c6791207..34fe92ce1ebd 100644
--- a/include/linux/sched/clock.h
+++ b/include/linux/sched/clock.h
@@ -54,15 +54,16 @@ static inline u64 local_clock(void)
 }
 #else
 extern void sched_clock_init_late(void);
-/*
- * Architectures can set this to 1 if they have specified
- * CONFIG_HAVE_UNSTABLE_SCHED_CLOCK in their arch Kconfig,
- * but then during bootup it turns out that sched_clock()
- * is reliable after all:
- */
 extern int sched_clock_stable(void);
 extern void clear_sched_clock_stable(void);
 
+/*
+ * When sched_clock_stable(), __sched_clock_offset provides the offset
+ * between local_clock() and sched_clock().
+ */
+extern u64 __sched_clock_offset;
+
+
 extern void sched_clock_tick(void);
 extern void sched_clock_idle_sleep_event(void);
 extern void sched_clock_idle_wakeup_event(u64 delta_ns);
diff --git a/kernel/sched/clock.c b/kernel/sched/clock.c
index a08795e21628..5448d188b8d3 100644
--- a/kernel/sched/clock.c
+++ b/kernel/sched/clock.c
@@ -96,10 +96,10 @@ void sched_clock_init(void)
 static int __sched_clock_stable_early = 1;
 
 /*
- * We want: ktime_get_ns() + gtod_offset == sched_clock() + raw_offset
+ * We want: ktime_get_ns() + __gtod_offset == sched_clock() + __sched_clock_offset
  */
-static __read_mostly u64 raw_offset;
-static __read_mostly u64 gtod_offset;
+__read_mostly u64 __sched_clock_offset;
+static __read_mostly u64 __gtod_offset;
 
 struct sched_clock_data {
 	u64			tick_raw;
@@ -131,11 +131,11 @@ static void __set_sched_clock_stable(void)
 	/*
 	 * Attempt to make the (initial) unstable->stable transition continuous.
 	 */
-	raw_offset = (scd->tick_gtod + gtod_offset) - (scd->tick_raw);
+	__sched_clock_offset = (scd->tick_gtod + __gtod_offset) - (scd->tick_raw);
 
 	printk(KERN_INFO "sched_clock: Marking stable (%lld, %lld)->(%lld, %lld)\n",
-			scd->tick_gtod, gtod_offset,
-			scd->tick_raw,  raw_offset);
+			scd->tick_gtod, __gtod_offset,
+			scd->tick_raw,  __sched_clock_offset);
 
 	static_branch_enable(&__sched_clock_stable);
 	tick_dep_clear(TICK_DEP_BIT_CLOCK_UNSTABLE);
@@ -154,11 +154,11 @@ static void __clear_sched_clock_stable(struct work_struct *work)
 	 *
 	 * Still do what we can.
 	 */
-	gtod_offset = (scd->tick_raw + raw_offset) - (scd->tick_gtod);
+	__gtod_offset = (scd->tick_raw + __sched_clock_offset) - (scd->tick_gtod);
 
 	printk(KERN_INFO "sched_clock: Marking unstable (%lld, %lld)<-(%lld, %lld)\n",
-			scd->tick_gtod, gtod_offset,
-			scd->tick_raw,  raw_offset);
+			scd->tick_gtod, __gtod_offset,
+			scd->tick_raw,  __sched_clock_offset);
 
 	static_branch_disable(&__sched_clock_stable);
 	tick_dep_set(TICK_DEP_BIT_CLOCK_UNSTABLE);
@@ -231,7 +231,7 @@ static u64 sched_clock_local(struct sched_clock_data *scd)
 	 *		      scd->tick_gtod + TICK_NSEC);
 	 */
 
-	clock = scd->tick_gtod + gtod_offset + delta;
+	clock = scd->tick_gtod + __gtod_offset + delta;
 	min_clock = wrap_max(scd->tick_gtod, old_clock);
 	max_clock = wrap_max(old_clock, scd->tick_gtod + TICK_NSEC);
 
@@ -317,7 +317,7 @@ u64 sched_clock_cpu(int cpu)
 	u64 clock;
 
 	if (sched_clock_stable())
-		return sched_clock() + raw_offset;
+		return sched_clock() + __sched_clock_offset;
 
 	if (unlikely(!sched_clock_running))
 		return 0ull;

^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-03-20 13:22 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-16 14:01 'perf test tsc' failing, bisected to "sched/clock: Provide better clock continuity" Arnaldo Carvalho de Melo
2017-03-16 14:53 ` Peter Zijlstra
2017-03-16 18:11   ` Peter Zijlstra
2017-03-16 18:36     ` Arnaldo Carvalho de Melo
2017-03-16 19:21       ` Arnaldo Carvalho de Melo
2017-03-16 19:22         ` Arnaldo Carvalho de Melo
2017-03-20 13:20           ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.