All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-03-20  9:35 ` Leo Yan
  0 siblings, 0 replies; 28+ messages in thread
From: Leo Yan @ 2020-03-20  9:35 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark
  Cc: Leo Yan

Between the system powering on and kernel's sched clock registration,
the arch timer usually has been enabled at the early time and its
counter is incremented during the period of the booting up.  Thus the
arch timer's counter is not completely accounted into the sched clock,
and has a delta between the arch timer's counter and sched clock.  This
delta value should be stored into userpg->time_offset, which later can
be retrieved by Perf tool in the user space for sample timestamp
calculation.

Now userpg->time_offset is assigned to the negative sched clock with
'-now', this value cannot reflect the delta between arch timer's counter
and sched clock, so Perf cannot use it to calculate the sample time.

To fix this issue, this patch calculate the delta between the arch
timer's and sched clock and assign the delta to userpg->time_offset.
The detailed steps are firstly to convert counter to nanoseconds 'ns',
then the offset is calculated as 'now' minus 'ns'.

        |<------------------- 'ns' ---------------------->|
                                |<-------- 'now' -------->|
        |<---- time_offset ---->|
        |-----------------------|-------------------------|
        ^                       ^                         ^
  Power on system     sched clock registration      Perf starts

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 arch/arm64/kernel/perf_event.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index e40b65645c86..226d25d77072 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1143,6 +1143,7 @@ void arch_perf_update_userpage(struct perf_event *event,
 {
 	u32 freq;
 	u32 shift;
+	u64 count, ns, quot, rem;
 
 	/*
 	 * Internal timekeeping for enabled/running/stopped times
@@ -1164,5 +1165,21 @@ void arch_perf_update_userpage(struct perf_event *event,
 		userpg->time_mult >>= 1;
 	}
 	userpg->time_shift = (u16)shift;
-	userpg->time_offset = -now;
+
+	/*
+	 * Since arch timer is enabled ealier than sched clock registration,
+	 * compuate the delta (in nanosecond unit) between the arch timer
+	 * counter and sched clock, assign the delta to time_offset and
+	 * perf tool can use it for timestamp calculation.
+	 *
+	 * The formula for conversion arch timer cycle to ns is:
+	 *   quot = (cyc >> time_shift);
+	 *   rem  = cyc & ((1 << time_shift) - 1);
+	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
+	 */
+	count = arch_timer_read_counter();
+	quot = count >> shift;
+	rem = count & ((1 << shift) - 1);
+	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> shift);
+	userpg->time_offset = now - ns;
 }
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-03-20  9:35 ` Leo Yan
  0 siblings, 0 replies; 28+ messages in thread
From: Leo Yan @ 2020-03-20  9:35 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark
  Cc: Leo Yan

Between the system powering on and kernel's sched clock registration,
the arch timer usually has been enabled at the early time and its
counter is incremented during the period of the booting up.  Thus the
arch timer's counter is not completely accounted into the sched clock,
and has a delta between the arch timer's counter and sched clock.  This
delta value should be stored into userpg->time_offset, which later can
be retrieved by Perf tool in the user space for sample timestamp
calculation.

Now userpg->time_offset is assigned to the negative sched clock with
'-now', this value cannot reflect the delta between arch timer's counter
and sched clock, so Perf cannot use it to calculate the sample time.

To fix this issue, this patch calculate the delta between the arch
timer's and sched clock and assign the delta to userpg->time_offset.
The detailed steps are firstly to convert counter to nanoseconds 'ns',
then the offset is calculated as 'now' minus 'ns'.

        |<------------------- 'ns' ---------------------->|
                                |<-------- 'now' -------->|
        |<---- time_offset ---->|
        |-----------------------|-------------------------|
        ^                       ^                         ^
  Power on system     sched clock registration      Perf starts

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 arch/arm64/kernel/perf_event.c | 19 ++++++++++++++++++-
 1 file changed, 18 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index e40b65645c86..226d25d77072 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -1143,6 +1143,7 @@ void arch_perf_update_userpage(struct perf_event *event,
 {
 	u32 freq;
 	u32 shift;
+	u64 count, ns, quot, rem;
 
 	/*
 	 * Internal timekeeping for enabled/running/stopped times
@@ -1164,5 +1165,21 @@ void arch_perf_update_userpage(struct perf_event *event,
 		userpg->time_mult >>= 1;
 	}
 	userpg->time_shift = (u16)shift;
-	userpg->time_offset = -now;
+
+	/*
+	 * Since arch timer is enabled ealier than sched clock registration,
+	 * compuate the delta (in nanosecond unit) between the arch timer
+	 * counter and sched clock, assign the delta to time_offset and
+	 * perf tool can use it for timestamp calculation.
+	 *
+	 * The formula for conversion arch timer cycle to ns is:
+	 *   quot = (cyc >> time_shift);
+	 *   rem  = cyc & ((1 << time_shift) - 1);
+	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
+	 */
+	count = arch_timer_read_counter();
+	quot = count >> shift;
+	rem = count & ((1 << shift) - 1);
+	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> shift);
+	userpg->time_offset = now - ns;
 }
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
  2020-03-20  9:35 ` Leo Yan
@ 2020-04-01  1:24   ` Leo Yan
  -1 siblings, 0 replies; 28+ messages in thread
From: Leo Yan @ 2020-04-01  1:24 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark

On Fri, Mar 20, 2020 at 05:35:45PM +0800, Leo Yan wrote:
> Between the system powering on and kernel's sched clock registration,
> the arch timer usually has been enabled at the early time and its
> counter is incremented during the period of the booting up.  Thus the
> arch timer's counter is not completely accounted into the sched clock,
> and has a delta between the arch timer's counter and sched clock.  This
> delta value should be stored into userpg->time_offset, which later can
> be retrieved by Perf tool in the user space for sample timestamp
> calculation.
> 
> Now userpg->time_offset is assigned to the negative sched clock with
> '-now', this value cannot reflect the delta between arch timer's counter
> and sched clock, so Perf cannot use it to calculate the sample time.
> 
> To fix this issue, this patch calculate the delta between the arch
> timer's and sched clock and assign the delta to userpg->time_offset.
> The detailed steps are firstly to convert counter to nanoseconds 'ns',
> then the offset is calculated as 'now' minus 'ns'.
> 
>         |<------------------- 'ns' ---------------------->|
>                                 |<-------- 'now' -------->|
>         |<---- time_offset ---->|
>         |-----------------------|-------------------------|
>         ^                       ^                         ^
>   Power on system     sched clock registration      Perf starts
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  arch/arm64/kernel/perf_event.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)

Gentle ping ...

Hi Mike R., Peter,

If possible, could you give a look for this patch?

Thank you,
Leo

> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index e40b65645c86..226d25d77072 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1143,6 +1143,7 @@ void arch_perf_update_userpage(struct perf_event *event,
>  {
>  	u32 freq;
>  	u32 shift;
> +	u64 count, ns, quot, rem;
>  
>  	/*
>  	 * Internal timekeeping for enabled/running/stopped times
> @@ -1164,5 +1165,21 @@ void arch_perf_update_userpage(struct perf_event *event,
>  		userpg->time_mult >>= 1;
>  	}
>  	userpg->time_shift = (u16)shift;
> -	userpg->time_offset = -now;
> +
> +	/*
> +	 * Since arch timer is enabled ealier than sched clock registration,
> +	 * compuate the delta (in nanosecond unit) between the arch timer
> +	 * counter and sched clock, assign the delta to time_offset and
> +	 * perf tool can use it for timestamp calculation.
> +	 *
> +	 * The formula for conversion arch timer cycle to ns is:
> +	 *   quot = (cyc >> time_shift);
> +	 *   rem  = cyc & ((1 << time_shift) - 1);
> +	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
> +	 */
> +	count = arch_timer_read_counter();
> +	quot = count >> shift;
> +	rem = count & ((1 << shift) - 1);
> +	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> shift);
> +	userpg->time_offset = now - ns;
>  }
> -- 
> 2.17.1
> 

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-04-01  1:24   ` Leo Yan
  0 siblings, 0 replies; 28+ messages in thread
From: Leo Yan @ 2020-04-01  1:24 UTC (permalink / raw)
  To: Will Deacon, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark

On Fri, Mar 20, 2020 at 05:35:45PM +0800, Leo Yan wrote:
> Between the system powering on and kernel's sched clock registration,
> the arch timer usually has been enabled at the early time and its
> counter is incremented during the period of the booting up.  Thus the
> arch timer's counter is not completely accounted into the sched clock,
> and has a delta between the arch timer's counter and sched clock.  This
> delta value should be stored into userpg->time_offset, which later can
> be retrieved by Perf tool in the user space for sample timestamp
> calculation.
> 
> Now userpg->time_offset is assigned to the negative sched clock with
> '-now', this value cannot reflect the delta between arch timer's counter
> and sched clock, so Perf cannot use it to calculate the sample time.
> 
> To fix this issue, this patch calculate the delta between the arch
> timer's and sched clock and assign the delta to userpg->time_offset.
> The detailed steps are firstly to convert counter to nanoseconds 'ns',
> then the offset is calculated as 'now' minus 'ns'.
> 
>         |<------------------- 'ns' ---------------------->|
>                                 |<-------- 'now' -------->|
>         |<---- time_offset ---->|
>         |-----------------------|-------------------------|
>         ^                       ^                         ^
>   Power on system     sched clock registration      Perf starts
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  arch/arm64/kernel/perf_event.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)

Gentle ping ...

Hi Mike R., Peter,

If possible, could you give a look for this patch?

Thank you,
Leo

> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index e40b65645c86..226d25d77072 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1143,6 +1143,7 @@ void arch_perf_update_userpage(struct perf_event *event,
>  {
>  	u32 freq;
>  	u32 shift;
> +	u64 count, ns, quot, rem;
>  
>  	/*
>  	 * Internal timekeeping for enabled/running/stopped times
> @@ -1164,5 +1165,21 @@ void arch_perf_update_userpage(struct perf_event *event,
>  		userpg->time_mult >>= 1;
>  	}
>  	userpg->time_shift = (u16)shift;
> -	userpg->time_offset = -now;
> +
> +	/*
> +	 * Since arch timer is enabled ealier than sched clock registration,
> +	 * compuate the delta (in nanosecond unit) between the arch timer
> +	 * counter and sched clock, assign the delta to time_offset and
> +	 * perf tool can use it for timestamp calculation.
> +	 *
> +	 * The formula for conversion arch timer cycle to ns is:
> +	 *   quot = (cyc >> time_shift);
> +	 *   rem  = cyc & ((1 << time_shift) - 1);
> +	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
> +	 */
> +	count = arch_timer_read_counter();
> +	quot = count >> shift;
> +	rem = count & ((1 << shift) - 1);
> +	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> shift);
> +	userpg->time_offset = now - ns;
>  }
> -- 
> 2.17.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
  2020-03-20  9:35 ` Leo Yan
@ 2020-04-30 14:58   ` Will Deacon
  -1 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2020-04-30 14:58 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark, maz, tglx

Hi Leo,

[+Maz and tglx in case I'm barking up the wrong tree]

On Fri, Mar 20, 2020 at 05:35:45PM +0800, Leo Yan wrote:
> Between the system powering on and kernel's sched clock registration,
> the arch timer usually has been enabled at the early time and its
> counter is incremented during the period of the booting up.  Thus the
> arch timer's counter is not completely accounted into the sched clock,
> and has a delta between the arch timer's counter and sched clock.  This
> delta value should be stored into userpg->time_offset, which later can
> be retrieved by Perf tool in the user space for sample timestamp
> calculation.
> 
> Now userpg->time_offset is assigned to the negative sched clock with
> '-now', this value cannot reflect the delta between arch timer's counter
> and sched clock, so Perf cannot use it to calculate the sample time.
> 
> To fix this issue, this patch calculate the delta between the arch
> timer's and sched clock and assign the delta to userpg->time_offset.
> The detailed steps are firstly to convert counter to nanoseconds 'ns',
> then the offset is calculated as 'now' minus 'ns'.
> 
>         |<------------------- 'ns' ---------------------->|
>                                 |<-------- 'now' -------->|
>         |<---- time_offset ---->|
>         |-----------------------|-------------------------|
>         ^                       ^                         ^
>   Power on system     sched clock registration      Perf starts

FWIW, I'm /really/ struggling to understand the problem here.

If I've grokked it correctly (big 'if'), then you can't just factor in
what you call "time_offset" in the diagram above, because there isn't
a guarantee that the counter is zero-initialised at the start.

> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  arch/arm64/kernel/perf_event.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index e40b65645c86..226d25d77072 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1143,6 +1143,7 @@ void arch_perf_update_userpage(struct perf_event *event,
>  {
>  	u32 freq;
>  	u32 shift;
> +	u64 count, ns, quot, rem;
>  
>  	/*
>  	 * Internal timekeeping for enabled/running/stopped times
> @@ -1164,5 +1165,21 @@ void arch_perf_update_userpage(struct perf_event *event,
>  		userpg->time_mult >>= 1;
>  	}
>  	userpg->time_shift = (u16)shift;
> -	userpg->time_offset = -now;
> +
> +	/*
> +	 * Since arch timer is enabled ealier than sched clock registration,
> +	 * compuate the delta (in nanosecond unit) between the arch timer
> +	 * counter and sched clock, assign the delta to time_offset and
> +	 * perf tool can use it for timestamp calculation.
> +	 *
> +	 * The formula for conversion arch timer cycle to ns is:
> +	 *   quot = (cyc >> time_shift);
> +	 *   rem  = cyc & ((1 << time_shift) - 1);
> +	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
> +	 */
> +	count = arch_timer_read_counter();
> +	quot = count >> shift;
> +	rem = count & ((1 << shift) - 1);
> +	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> shift);
> +	userpg->time_offset = now - ns;

Hmm, reading the counter and calculating the delta feels horribly
approximate to me. It would be much better if we could get hold of the
initial epoch cycles from the point at which sched_clock was initialised
using the counter. This represents the true cycle delta between the counter
and what sched_clock uses for 0 ns.

Unfortunately, I can't see a straightforward way to grab that information.
It looks like x86 pulls this directly from the TSC driver.

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-04-30 14:58   ` Will Deacon
  0 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2020-04-30 14:58 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, Al Grant, Mathieu Poirier, Peter Zijlstra,
	Catalin Marinas, linux-kernel, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Ingo Molnar, James Clark, maz, Namhyung Kim,
	tglx, Jiri Olsa, linux-arm-kernel, Mike Leach

Hi Leo,

[+Maz and tglx in case I'm barking up the wrong tree]

On Fri, Mar 20, 2020 at 05:35:45PM +0800, Leo Yan wrote:
> Between the system powering on and kernel's sched clock registration,
> the arch timer usually has been enabled at the early time and its
> counter is incremented during the period of the booting up.  Thus the
> arch timer's counter is not completely accounted into the sched clock,
> and has a delta between the arch timer's counter and sched clock.  This
> delta value should be stored into userpg->time_offset, which later can
> be retrieved by Perf tool in the user space for sample timestamp
> calculation.
> 
> Now userpg->time_offset is assigned to the negative sched clock with
> '-now', this value cannot reflect the delta between arch timer's counter
> and sched clock, so Perf cannot use it to calculate the sample time.
> 
> To fix this issue, this patch calculate the delta between the arch
> timer's and sched clock and assign the delta to userpg->time_offset.
> The detailed steps are firstly to convert counter to nanoseconds 'ns',
> then the offset is calculated as 'now' minus 'ns'.
> 
>         |<------------------- 'ns' ---------------------->|
>                                 |<-------- 'now' -------->|
>         |<---- time_offset ---->|
>         |-----------------------|-------------------------|
>         ^                       ^                         ^
>   Power on system     sched clock registration      Perf starts

FWIW, I'm /really/ struggling to understand the problem here.

If I've grokked it correctly (big 'if'), then you can't just factor in
what you call "time_offset" in the diagram above, because there isn't
a guarantee that the counter is zero-initialised at the start.

> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  arch/arm64/kernel/perf_event.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index e40b65645c86..226d25d77072 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1143,6 +1143,7 @@ void arch_perf_update_userpage(struct perf_event *event,
>  {
>  	u32 freq;
>  	u32 shift;
> +	u64 count, ns, quot, rem;
>  
>  	/*
>  	 * Internal timekeeping for enabled/running/stopped times
> @@ -1164,5 +1165,21 @@ void arch_perf_update_userpage(struct perf_event *event,
>  		userpg->time_mult >>= 1;
>  	}
>  	userpg->time_shift = (u16)shift;
> -	userpg->time_offset = -now;
> +
> +	/*
> +	 * Since arch timer is enabled ealier than sched clock registration,
> +	 * compuate the delta (in nanosecond unit) between the arch timer
> +	 * counter and sched clock, assign the delta to time_offset and
> +	 * perf tool can use it for timestamp calculation.
> +	 *
> +	 * The formula for conversion arch timer cycle to ns is:
> +	 *   quot = (cyc >> time_shift);
> +	 *   rem  = cyc & ((1 << time_shift) - 1);
> +	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
> +	 */
> +	count = arch_timer_read_counter();
> +	quot = count >> shift;
> +	rem = count & ((1 << shift) - 1);
> +	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> shift);
> +	userpg->time_offset = now - ns;

Hmm, reading the counter and calculating the delta feels horribly
approximate to me. It would be much better if we could get hold of the
initial epoch cycles from the point at which sched_clock was initialised
using the counter. This represents the true cycle delta between the counter
and what sched_clock uses for 0 ns.

Unfortunately, I can't see a straightforward way to grab that information.
It looks like x86 pulls this directly from the TSC driver.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
  2020-04-30 14:58   ` Will Deacon
@ 2020-04-30 15:29     ` Marc Zyngier
  -1 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2020-04-30 15:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: Leo Yan, Mark Rutland, Peter Zijlstra, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark, tglx

On 2020-04-30 15:58, Will Deacon wrote:
> Hi Leo,
> 
> [+Maz and tglx in case I'm barking up the wrong tree]
> 
> On Fri, Mar 20, 2020 at 05:35:45PM +0800, Leo Yan wrote:
>> Between the system powering on and kernel's sched clock registration,
>> the arch timer usually has been enabled at the early time and its
>> counter is incremented during the period of the booting up.  Thus the
>> arch timer's counter is not completely accounted into the sched clock,
>> and has a delta between the arch timer's counter and sched clock.  
>> This
>> delta value should be stored into userpg->time_offset, which later can
>> be retrieved by Perf tool in the user space for sample timestamp
>> calculation.
>> 
>> Now userpg->time_offset is assigned to the negative sched clock with
>> '-now', this value cannot reflect the delta between arch timer's 
>> counter
>> and sched clock, so Perf cannot use it to calculate the sample time.
>> 
>> To fix this issue, this patch calculate the delta between the arch
>> timer's and sched clock and assign the delta to userpg->time_offset.
>> The detailed steps are firstly to convert counter to nanoseconds 'ns',
>> then the offset is calculated as 'now' minus 'ns'.
>> 
>>         |<------------------- 'ns' ---------------------->|
>>                                 |<-------- 'now' -------->|
>>         |<---- time_offset ---->|
>>         |-----------------------|-------------------------|
>>         ^                       ^                         ^
>>   Power on system     sched clock registration      Perf starts
> 
> FWIW, I'm /really/ struggling to understand the problem here.
> 
> If I've grokked it correctly (big 'if'), then you can't just factor in
> what you call "time_offset" in the diagram above, because there isn't
> a guarantee that the counter is zero-initialised at the start.

Even if it was, we have no idea of *when* that was. Think kexec, for a
start. Or spending some variable in firmware because of $REASON.

> 
>> Signed-off-by: Leo Yan <leo.yan@linaro.org>
>> ---
>>  arch/arm64/kernel/perf_event.c | 19 ++++++++++++++++++-
>>  1 file changed, 18 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/arm64/kernel/perf_event.c 
>> b/arch/arm64/kernel/perf_event.c
>> index e40b65645c86..226d25d77072 100644
>> --- a/arch/arm64/kernel/perf_event.c
>> +++ b/arch/arm64/kernel/perf_event.c
>> @@ -1143,6 +1143,7 @@ void arch_perf_update_userpage(struct perf_event 
>> *event,
>>  {
>>  	u32 freq;
>>  	u32 shift;
>> +	u64 count, ns, quot, rem;
>> 
>>  	/*
>>  	 * Internal timekeeping for enabled/running/stopped times
>> @@ -1164,5 +1165,21 @@ void arch_perf_update_userpage(struct 
>> perf_event *event,
>>  		userpg->time_mult >>= 1;
>>  	}
>>  	userpg->time_shift = (u16)shift;
>> -	userpg->time_offset = -now;
>> +
>> +	/*
>> +	 * Since arch timer is enabled ealier than sched clock registration,
>> +	 * compuate the delta (in nanosecond unit) between the arch timer
>> +	 * counter and sched clock, assign the delta to time_offset and
>> +	 * perf tool can use it for timestamp calculation.
>> +	 *
>> +	 * The formula for conversion arch timer cycle to ns is:
>> +	 *   quot = (cyc >> time_shift);
>> +	 *   rem  = cyc & ((1 << time_shift) - 1);
>> +	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
>> +	 */
>> +	count = arch_timer_read_counter();
>> +	quot = count >> shift;
>> +	rem = count & ((1 << shift) - 1);
>> +	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> 
>> shift);
>> +	userpg->time_offset = now - ns;
> 
> Hmm, reading the counter and calculating the delta feels horribly
> approximate to me. It would be much better if we could get hold of the
> initial epoch cycles from the point at which sched_clock was 
> initialised
> using the counter. This represents the true cycle delta between the 
> counter
> and what sched_clock uses for 0 ns.

I think this is a sensible solution if you want an epoch that starts at 
0 with
sched_clock being initialized. The other question is whether it is 
possible to
use a different timestamping source for perf that wouldn't need to be 
offset.

> Unfortunately, I can't see a straightforward way to grab that 
> information.
> It looks like x86 pulls this directly from the TSC driver.

I wonder if we could/should make __sched_clock_offset available even 
when
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
help with this particular can or worm...

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-04-30 15:29     ` Marc Zyngier
  0 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2020-04-30 15:29 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, Al Grant, Mathieu Poirier, Peter Zijlstra,
	Catalin Marinas, linux-kernel, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Ingo Molnar, James Clark, Leo Yan,
	Namhyung Kim, tglx, Jiri Olsa, linux-arm-kernel, Mike Leach

On 2020-04-30 15:58, Will Deacon wrote:
> Hi Leo,
> 
> [+Maz and tglx in case I'm barking up the wrong tree]
> 
> On Fri, Mar 20, 2020 at 05:35:45PM +0800, Leo Yan wrote:
>> Between the system powering on and kernel's sched clock registration,
>> the arch timer usually has been enabled at the early time and its
>> counter is incremented during the period of the booting up.  Thus the
>> arch timer's counter is not completely accounted into the sched clock,
>> and has a delta between the arch timer's counter and sched clock.  
>> This
>> delta value should be stored into userpg->time_offset, which later can
>> be retrieved by Perf tool in the user space for sample timestamp
>> calculation.
>> 
>> Now userpg->time_offset is assigned to the negative sched clock with
>> '-now', this value cannot reflect the delta between arch timer's 
>> counter
>> and sched clock, so Perf cannot use it to calculate the sample time.
>> 
>> To fix this issue, this patch calculate the delta between the arch
>> timer's and sched clock and assign the delta to userpg->time_offset.
>> The detailed steps are firstly to convert counter to nanoseconds 'ns',
>> then the offset is calculated as 'now' minus 'ns'.
>> 
>>         |<------------------- 'ns' ---------------------->|
>>                                 |<-------- 'now' -------->|
>>         |<---- time_offset ---->|
>>         |-----------------------|-------------------------|
>>         ^                       ^                         ^
>>   Power on system     sched clock registration      Perf starts
> 
> FWIW, I'm /really/ struggling to understand the problem here.
> 
> If I've grokked it correctly (big 'if'), then you can't just factor in
> what you call "time_offset" in the diagram above, because there isn't
> a guarantee that the counter is zero-initialised at the start.

Even if it was, we have no idea of *when* that was. Think kexec, for a
start. Or spending some variable in firmware because of $REASON.

> 
>> Signed-off-by: Leo Yan <leo.yan@linaro.org>
>> ---
>>  arch/arm64/kernel/perf_event.c | 19 ++++++++++++++++++-
>>  1 file changed, 18 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/arm64/kernel/perf_event.c 
>> b/arch/arm64/kernel/perf_event.c
>> index e40b65645c86..226d25d77072 100644
>> --- a/arch/arm64/kernel/perf_event.c
>> +++ b/arch/arm64/kernel/perf_event.c
>> @@ -1143,6 +1143,7 @@ void arch_perf_update_userpage(struct perf_event 
>> *event,
>>  {
>>  	u32 freq;
>>  	u32 shift;
>> +	u64 count, ns, quot, rem;
>> 
>>  	/*
>>  	 * Internal timekeeping for enabled/running/stopped times
>> @@ -1164,5 +1165,21 @@ void arch_perf_update_userpage(struct 
>> perf_event *event,
>>  		userpg->time_mult >>= 1;
>>  	}
>>  	userpg->time_shift = (u16)shift;
>> -	userpg->time_offset = -now;
>> +
>> +	/*
>> +	 * Since arch timer is enabled ealier than sched clock registration,
>> +	 * compuate the delta (in nanosecond unit) between the arch timer
>> +	 * counter and sched clock, assign the delta to time_offset and
>> +	 * perf tool can use it for timestamp calculation.
>> +	 *
>> +	 * The formula for conversion arch timer cycle to ns is:
>> +	 *   quot = (cyc >> time_shift);
>> +	 *   rem  = cyc & ((1 << time_shift) - 1);
>> +	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
>> +	 */
>> +	count = arch_timer_read_counter();
>> +	quot = count >> shift;
>> +	rem = count & ((1 << shift) - 1);
>> +	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> 
>> shift);
>> +	userpg->time_offset = now - ns;
> 
> Hmm, reading the counter and calculating the delta feels horribly
> approximate to me. It would be much better if we could get hold of the
> initial epoch cycles from the point at which sched_clock was 
> initialised
> using the counter. This represents the true cycle delta between the 
> counter
> and what sched_clock uses for 0 ns.

I think this is a sensible solution if you want an epoch that starts at 
0 with
sched_clock being initialized. The other question is whether it is 
possible to
use a different timestamping source for perf that wouldn't need to be 
offset.

> Unfortunately, I can't see a straightforward way to grab that 
> information.
> It looks like x86 pulls this directly from the TSC driver.

I wonder if we could/should make __sched_clock_offset available even 
when
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
help with this particular can or worm...

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
  2020-04-30 15:29     ` Marc Zyngier
@ 2020-04-30 16:04       ` Peter Zijlstra
  -1 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2020-04-30 16:04 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Will Deacon, Leo Yan, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark, tglx

On Thu, Apr 30, 2020 at 04:29:23PM +0100, Marc Zyngier wrote:

> I wonder if we could/should make __sched_clock_offset available even when
> CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
> help with this particular can or worm...

Errrgh. __sched_clock_offset is only needed on x86 because we transition
from one clock device to another on boot. It really shouldn't exist on
anything sane.

Let me try and understand your particular problem better.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-04-30 16:04       ` Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2020-04-30 16:04 UTC (permalink / raw)
  To: Marc Zyngier
  Cc: Mark Rutland, Catalin Marinas, Mathieu Poirier, James Clark,
	Alexander Shishkin, Jiri Olsa, linux-kernel,
	Arnaldo Carvalho de Melo, Ingo Molnar, Al Grant, Leo Yan,
	Namhyung Kim, tglx, Will Deacon, linux-arm-kernel, Mike Leach

On Thu, Apr 30, 2020 at 04:29:23PM +0100, Marc Zyngier wrote:

> I wonder if we could/should make __sched_clock_offset available even when
> CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
> help with this particular can or worm...

Errrgh. __sched_clock_offset is only needed on x86 because we transition
from one clock device to another on boot. It really shouldn't exist on
anything sane.

Let me try and understand your particular problem better.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
  2020-04-30 16:04       ` Peter Zijlstra
@ 2020-04-30 16:18         ` Will Deacon
  -1 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2020-04-30 16:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Marc Zyngier, Leo Yan, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark, tglx

On Thu, Apr 30, 2020 at 06:04:36PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 30, 2020 at 04:29:23PM +0100, Marc Zyngier wrote:
> 
> > I wonder if we could/should make __sched_clock_offset available even when
> > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
> > help with this particular can or worm...
> 
> Errrgh. __sched_clock_offset is only needed on x86 because we transition
> from one clock device to another on boot. It really shouldn't exist on
> anything sane.

I think we still transition from jiffies on arm64, because we don't register
with sched_clock until the timer driver probes. Marc, is that right?

> Let me try and understand your particular problem better.

I think the long and short of it is that userspace needs a way to convert
the raw counter cycles into a ns value that can be compared against values
coming out of sched_clock. To do this accurately, I think it needs the
cycles value at the point when sched_clock was initialised.

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-04-30 16:18         ` Will Deacon
  0 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2020-04-30 16:18 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mark Rutland, Catalin Marinas, Mathieu Poirier, James Clark,
	Alexander Shishkin, Marc Zyngier, linux-kernel,
	Arnaldo Carvalho de Melo, Ingo Molnar, Al Grant, Leo Yan,
	Namhyung Kim, tglx, Jiri Olsa, linux-arm-kernel, Mike Leach

On Thu, Apr 30, 2020 at 06:04:36PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 30, 2020 at 04:29:23PM +0100, Marc Zyngier wrote:
> 
> > I wonder if we could/should make __sched_clock_offset available even when
> > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
> > help with this particular can or worm...
> 
> Errrgh. __sched_clock_offset is only needed on x86 because we transition
> from one clock device to another on boot. It really shouldn't exist on
> anything sane.

I think we still transition from jiffies on arm64, because we don't register
with sched_clock until the timer driver probes. Marc, is that right?

> Let me try and understand your particular problem better.

I think the long and short of it is that userspace needs a way to convert
the raw counter cycles into a ns value that can be compared against values
coming out of sched_clock. To do this accurately, I think it needs the
cycles value at the point when sched_clock was initialised.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
  2020-04-30 14:58   ` Will Deacon
@ 2020-04-30 16:27     ` Peter Zijlstra
  -1 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2020-04-30 16:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: Leo Yan, Mark Rutland, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Catalin Marinas,
	linux-arm-kernel, linux-kernel, Mathieu Poirier, Mike Leach,
	Al Grant, James Clark, maz, tglx

On Thu, Apr 30, 2020 at 03:58:24PM +0100, Will Deacon wrote:
> On Fri, Mar 20, 2020 at 05:35:45PM +0800, Leo Yan wrote:

> > @@ -1164,5 +1165,21 @@ void arch_perf_update_userpage(struct perf_event *event,
> >  		userpg->time_mult >>= 1;
> >  	}
> >  	userpg->time_shift = (u16)shift;
> > -	userpg->time_offset = -now;
> > +
> > +	/*
> > +	 * Since arch timer is enabled ealier than sched clock registration,
> > +	 * compuate the delta (in nanosecond unit) between the arch timer
> > +	 * counter and sched clock, assign the delta to time_offset and
> > +	 * perf tool can use it for timestamp calculation.
> > +	 *
> > +	 * The formula for conversion arch timer cycle to ns is:
> > +	 *   quot = (cyc >> time_shift);
> > +	 *   rem  = cyc & ((1 << time_shift) - 1);
> > +	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
> > +	 */
> > +	count = arch_timer_read_counter();
> > +	quot = count >> shift;
> > +	rem = count & ((1 << shift) - 1);
> > +	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> shift);
> > +	userpg->time_offset = now - ns;
> 
> Hmm, reading the counter and calculating the delta feels horribly
> approximate to me. It would be much better if we could get hold of the
> initial epoch cycles from the point at which sched_clock was initialised
> using the counter. This represents the true cycle delta between the counter
> and what sched_clock uses for 0 ns.
> 
> Unfortunately, I can't see a straightforward way to grab that information.
> It looks like x86 pulls this directly from the TSC driver.

Yeah, and I'm thinking you should do the same. IIRC ARM uses this
kernel/time/sched_clock.c thing, and if I read that right, the struct
clock_data there has all the bits you need here.

So I'm thinking that you might want to add a helper function here to get
you the good stuff.

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-04-30 16:27     ` Peter Zijlstra
  0 siblings, 0 replies; 28+ messages in thread
From: Peter Zijlstra @ 2020-04-30 16:27 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, maz, Al Grant, Mathieu Poirier, Alexander Shishkin,
	Catalin Marinas, linux-kernel, Arnaldo Carvalho de Melo,
	Ingo Molnar, James Clark, Leo Yan, Namhyung Kim, tglx, Jiri Olsa,
	linux-arm-kernel, Mike Leach

On Thu, Apr 30, 2020 at 03:58:24PM +0100, Will Deacon wrote:
> On Fri, Mar 20, 2020 at 05:35:45PM +0800, Leo Yan wrote:

> > @@ -1164,5 +1165,21 @@ void arch_perf_update_userpage(struct perf_event *event,
> >  		userpg->time_mult >>= 1;
> >  	}
> >  	userpg->time_shift = (u16)shift;
> > -	userpg->time_offset = -now;
> > +
> > +	/*
> > +	 * Since arch timer is enabled ealier than sched clock registration,
> > +	 * compuate the delta (in nanosecond unit) between the arch timer
> > +	 * counter and sched clock, assign the delta to time_offset and
> > +	 * perf tool can use it for timestamp calculation.
> > +	 *
> > +	 * The formula for conversion arch timer cycle to ns is:
> > +	 *   quot = (cyc >> time_shift);
> > +	 *   rem  = cyc & ((1 << time_shift) - 1);
> > +	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
> > +	 */
> > +	count = arch_timer_read_counter();
> > +	quot = count >> shift;
> > +	rem = count & ((1 << shift) - 1);
> > +	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> shift);
> > +	userpg->time_offset = now - ns;
> 
> Hmm, reading the counter and calculating the delta feels horribly
> approximate to me. It would be much better if we could get hold of the
> initial epoch cycles from the point at which sched_clock was initialised
> using the counter. This represents the true cycle delta between the counter
> and what sched_clock uses for 0 ns.
> 
> Unfortunately, I can't see a straightforward way to grab that information.
> It looks like x86 pulls this directly from the TSC driver.

Yeah, and I'm thinking you should do the same. IIRC ARM uses this
kernel/time/sched_clock.c thing, and if I read that right, the struct
clock_data there has all the bits you need here.

So I'm thinking that you might want to add a helper function here to get
you the good stuff.

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
  2020-04-30 16:27     ` Peter Zijlstra
@ 2020-04-30 16:45       ` Will Deacon
  -1 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2020-04-30 16:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Leo Yan, Mark Rutland, Ingo Molnar, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Jiri Olsa, Namhyung Kim, Catalin Marinas,
	linux-arm-kernel, linux-kernel, Mathieu Poirier, Mike Leach,
	Al Grant, James Clark, maz, tglx

On Thu, Apr 30, 2020 at 06:27:50PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 30, 2020 at 03:58:24PM +0100, Will Deacon wrote:
> > On Fri, Mar 20, 2020 at 05:35:45PM +0800, Leo Yan wrote:
> > > +	/*
> > > +	 * Since arch timer is enabled ealier than sched clock registration,
> > > +	 * compuate the delta (in nanosecond unit) between the arch timer
> > > +	 * counter and sched clock, assign the delta to time_offset and
> > > +	 * perf tool can use it for timestamp calculation.
> > > +	 *
> > > +	 * The formula for conversion arch timer cycle to ns is:
> > > +	 *   quot = (cyc >> time_shift);
> > > +	 *   rem  = cyc & ((1 << time_shift) - 1);
> > > +	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
> > > +	 */
> > > +	count = arch_timer_read_counter();
> > > +	quot = count >> shift;
> > > +	rem = count & ((1 << shift) - 1);
> > > +	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> shift);
> > > +	userpg->time_offset = now - ns;
> > 
> > Hmm, reading the counter and calculating the delta feels horribly
> > approximate to me. It would be much better if we could get hold of the
> > initial epoch cycles from the point at which sched_clock was initialised
> > using the counter. This represents the true cycle delta between the counter
> > and what sched_clock uses for 0 ns.
> > 
> > Unfortunately, I can't see a straightforward way to grab that information.
> > It looks like x86 pulls this directly from the TSC driver.
> 
> Yeah, and I'm thinking you should do the same. IIRC ARM uses this
> kernel/time/sched_clock.c thing, and if I read that right, the struct
> clock_data there has all the bits you need here.
> 
> So I'm thinking that you might want to add a helper function here to get
> you the good stuff.

Thanks, Peter.

Leo -- do you think you could look at implementing this as part of a v2,
please?

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-04-30 16:45       ` Will Deacon
  0 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2020-04-30 16:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Mark Rutland, maz, Al Grant, Mathieu Poirier, Alexander Shishkin,
	Catalin Marinas, linux-kernel, Arnaldo Carvalho de Melo,
	Ingo Molnar, James Clark, Leo Yan, Namhyung Kim, tglx, Jiri Olsa,
	linux-arm-kernel, Mike Leach

On Thu, Apr 30, 2020 at 06:27:50PM +0200, Peter Zijlstra wrote:
> On Thu, Apr 30, 2020 at 03:58:24PM +0100, Will Deacon wrote:
> > On Fri, Mar 20, 2020 at 05:35:45PM +0800, Leo Yan wrote:
> > > +	/*
> > > +	 * Since arch timer is enabled ealier than sched clock registration,
> > > +	 * compuate the delta (in nanosecond unit) between the arch timer
> > > +	 * counter and sched clock, assign the delta to time_offset and
> > > +	 * perf tool can use it for timestamp calculation.
> > > +	 *
> > > +	 * The formula for conversion arch timer cycle to ns is:
> > > +	 *   quot = (cyc >> time_shift);
> > > +	 *   rem  = cyc & ((1 << time_shift) - 1);
> > > +	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
> > > +	 */
> > > +	count = arch_timer_read_counter();
> > > +	quot = count >> shift;
> > > +	rem = count & ((1 << shift) - 1);
> > > +	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> shift);
> > > +	userpg->time_offset = now - ns;
> > 
> > Hmm, reading the counter and calculating the delta feels horribly
> > approximate to me. It would be much better if we could get hold of the
> > initial epoch cycles from the point at which sched_clock was initialised
> > using the counter. This represents the true cycle delta between the counter
> > and what sched_clock uses for 0 ns.
> > 
> > Unfortunately, I can't see a straightforward way to grab that information.
> > It looks like x86 pulls this directly from the TSC driver.
> 
> Yeah, and I'm thinking you should do the same. IIRC ARM uses this
> kernel/time/sched_clock.c thing, and if I read that right, the struct
> clock_data there has all the bits you need here.
> 
> So I'm thinking that you might want to add a helper function here to get
> you the good stuff.

Thanks, Peter.

Leo -- do you think you could look at implementing this as part of a v2,
please?

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
  2020-04-30 16:18         ` Will Deacon
@ 2020-04-30 17:33           ` Marc Zyngier
  -1 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2020-04-30 17:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Leo Yan, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark, tglx

On 2020-04-30 17:18, Will Deacon wrote:
> On Thu, Apr 30, 2020 at 06:04:36PM +0200, Peter Zijlstra wrote:
>> On Thu, Apr 30, 2020 at 04:29:23PM +0100, Marc Zyngier wrote:
>> 
>> > I wonder if we could/should make __sched_clock_offset available even when
>> > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
>> > help with this particular can or worm...
>> 
>> Errrgh. __sched_clock_offset is only needed on x86 because we 
>> transition
>> from one clock device to another on boot. It really shouldn't exist on
>> anything sane.
> 
> I think we still transition from jiffies on arm64, because we don't 
> register
> with sched_clock until the timer driver probes. Marc, is that right?

Indeed. The clocksource is only available relatively late, as we need to
discover the details of the platform and enable the various workarounds
(because nobody can get a simple 64bit counter right). So it is only at
that stage that we transition to it.

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-04-30 17:33           ` Marc Zyngier
  0 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2020-04-30 17:33 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, Al Grant, Mathieu Poirier, Peter Zijlstra,
	Catalin Marinas, linux-kernel, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Ingo Molnar, James Clark, Leo Yan,
	Namhyung Kim, tglx, Jiri Olsa, linux-arm-kernel, Mike Leach

On 2020-04-30 17:18, Will Deacon wrote:
> On Thu, Apr 30, 2020 at 06:04:36PM +0200, Peter Zijlstra wrote:
>> On Thu, Apr 30, 2020 at 04:29:23PM +0100, Marc Zyngier wrote:
>> 
>> > I wonder if we could/should make __sched_clock_offset available even when
>> > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
>> > help with this particular can or worm...
>> 
>> Errrgh. __sched_clock_offset is only needed on x86 because we 
>> transition
>> from one clock device to another on boot. It really shouldn't exist on
>> anything sane.
> 
> I think we still transition from jiffies on arm64, because we don't 
> register
> with sched_clock until the timer driver probes. Marc, is that right?

Indeed. The clocksource is only available relatively late, as we need to
discover the details of the platform and enable the various workarounds
(because nobody can get a simple 64bit counter right). So it is only at
that stage that we transition to it.

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
  2020-04-30 16:18         ` Will Deacon
@ 2020-05-01 15:14           ` Leo Yan
  -1 siblings, 0 replies; 28+ messages in thread
From: Leo Yan @ 2020-05-01 15:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Marc Zyngier, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark, tglx

Hi all,

On Thu, Apr 30, 2020 at 05:18:15PM +0100, Will Deacon wrote:
> On Thu, Apr 30, 2020 at 06:04:36PM +0200, Peter Zijlstra wrote:
> > On Thu, Apr 30, 2020 at 04:29:23PM +0100, Marc Zyngier wrote:
> > 
> > > I wonder if we could/should make __sched_clock_offset available even when
> > > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
> > > help with this particular can or worm...
> > 
> > Errrgh. __sched_clock_offset is only needed on x86 because we transition
> > from one clock device to another on boot. It really shouldn't exist on
> > anything sane.
> 
> I think we still transition from jiffies on arm64, because we don't register
> with sched_clock until the timer driver probes. Marc, is that right?
> 
> > Let me try and understand your particular problem better.
> 
> I think the long and short of it is that userspace needs a way to convert
> the raw counter cycles into a ns value that can be compared against values
> coming out of sched_clock. To do this accurately, I think it needs the
> cycles value at the point when sched_clock was initialised.

Will's understanding is exactly what I want to resolve in this patch.

The background info is for the ARM SPE [1] decoding with perf tool, if
the timestamp is enabled, it uses the generic timer's counter as
timestamp source.  SPE trace data only contains the raw counter cycles,
as Will mentioned, the perf tool needs to convert it to a coordinate
value with sched_clock.  This is why this patch tries to calculate the
offset between the raw counter's ns value and sched_clock, eventually
this offset value will be used by SPE's decoding code in Perf tool to
calibrate a 'correct' timestamp.

Based on your suggestions, I will use __sched_clock_offset to resolve
the accuracy issue in patch v2.  (I noticed Peter suggested to use a
new API for wrapping clock_data structure, IIUC, __sched_clock_offset
is more straightforward for this case).

Please correct if I miss anything.  Thank you for reviewing and
suggestions!

Thanks,
Leo

[1] https://lkml.org/lkml/2020/1/23/571

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-05-01 15:14           ` Leo Yan
  0 siblings, 0 replies; 28+ messages in thread
From: Leo Yan @ 2020-05-01 15:14 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, Al Grant, Mathieu Poirier, Peter Zijlstra,
	Marc Zyngier, linux-kernel, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Ingo Molnar, James Clark, Catalin Marinas,
	Namhyung Kim, tglx, Jiri Olsa, linux-arm-kernel, Mike Leach

Hi all,

On Thu, Apr 30, 2020 at 05:18:15PM +0100, Will Deacon wrote:
> On Thu, Apr 30, 2020 at 06:04:36PM +0200, Peter Zijlstra wrote:
> > On Thu, Apr 30, 2020 at 04:29:23PM +0100, Marc Zyngier wrote:
> > 
> > > I wonder if we could/should make __sched_clock_offset available even when
> > > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
> > > help with this particular can or worm...
> > 
> > Errrgh. __sched_clock_offset is only needed on x86 because we transition
> > from one clock device to another on boot. It really shouldn't exist on
> > anything sane.
> 
> I think we still transition from jiffies on arm64, because we don't register
> with sched_clock until the timer driver probes. Marc, is that right?
> 
> > Let me try and understand your particular problem better.
> 
> I think the long and short of it is that userspace needs a way to convert
> the raw counter cycles into a ns value that can be compared against values
> coming out of sched_clock. To do this accurately, I think it needs the
> cycles value at the point when sched_clock was initialised.

Will's understanding is exactly what I want to resolve in this patch.

The background info is for the ARM SPE [1] decoding with perf tool, if
the timestamp is enabled, it uses the generic timer's counter as
timestamp source.  SPE trace data only contains the raw counter cycles,
as Will mentioned, the perf tool needs to convert it to a coordinate
value with sched_clock.  This is why this patch tries to calculate the
offset between the raw counter's ns value and sched_clock, eventually
this offset value will be used by SPE's decoding code in Perf tool to
calibrate a 'correct' timestamp.

Based on your suggestions, I will use __sched_clock_offset to resolve
the accuracy issue in patch v2.  (I noticed Peter suggested to use a
new API for wrapping clock_data structure, IIUC, __sched_clock_offset
is more straightforward for this case).

Please correct if I miss anything.  Thank you for reviewing and
suggestions!

Thanks,
Leo

[1] https://lkml.org/lkml/2020/1/23/571

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
  2020-05-01 15:14           ` Leo Yan
@ 2020-05-01 15:26             ` Will Deacon
  -1 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2020-05-01 15:26 UTC (permalink / raw)
  To: Leo Yan
  Cc: Peter Zijlstra, Marc Zyngier, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark, tglx

On Fri, May 01, 2020 at 11:14:48PM +0800, Leo Yan wrote:
> On Thu, Apr 30, 2020 at 05:18:15PM +0100, Will Deacon wrote:
> > On Thu, Apr 30, 2020 at 06:04:36PM +0200, Peter Zijlstra wrote:
> > > On Thu, Apr 30, 2020 at 04:29:23PM +0100, Marc Zyngier wrote:
> > > 
> > > > I wonder if we could/should make __sched_clock_offset available even when
> > > > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
> > > > help with this particular can or worm...
> > > 
> > > Errrgh. __sched_clock_offset is only needed on x86 because we transition
> > > from one clock device to another on boot. It really shouldn't exist on
> > > anything sane.
> > 
> > I think we still transition from jiffies on arm64, because we don't register
> > with sched_clock until the timer driver probes. Marc, is that right?
> > 
> > > Let me try and understand your particular problem better.
> > 
> > I think the long and short of it is that userspace needs a way to convert
> > the raw counter cycles into a ns value that can be compared against values
> > coming out of sched_clock. To do this accurately, I think it needs the
> > cycles value at the point when sched_clock was initialised.
> 
> Will's understanding is exactly what I want to resolve in this patch.
> 
> The background info is for the ARM SPE [1] decoding with perf tool, if
> the timestamp is enabled, it uses the generic timer's counter as
> timestamp source.  SPE trace data only contains the raw counter cycles,
> as Will mentioned, the perf tool needs to convert it to a coordinate
> value with sched_clock.  This is why this patch tries to calculate the
> offset between the raw counter's ns value and sched_clock, eventually
> this offset value will be used by SPE's decoding code in Perf tool to
> calibrate a 'correct' timestamp.
> 
> Based on your suggestions, I will use __sched_clock_offset to resolve
> the accuracy issue in patch v2.  (I noticed Peter suggested to use a
> new API for wrapping clock_data structure, IIUC, __sched_clock_offset
> is more straightforward for this case).
> 
> Please correct if I miss anything.  Thank you for reviewing and
> suggestions!

I don't think you can use __sched_clock_offset without selecting
HAVE_UNSTABLE_SCHED_CLOCK, and we really don't want to do that just
for this. So Peter's idea about exposing what we need is better, although
you'll probably need to take care with the switch-over from jiffies.

It needs some thought, but one possibility would be to introduce a new
variant of sthe ched_clock_register() function that returns the cycle
offset, and then we could fish that out of the timer driver. If we're
crossing all the 'i's and dotting all the 't's then we'd want to disable the
perf userpage if sched_clock changes clocksource too (a bit like we do for
the vDSO).

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-05-01 15:26             ` Will Deacon
  0 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2020-05-01 15:26 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, Al Grant, Mathieu Poirier, Peter Zijlstra,
	Marc Zyngier, linux-kernel, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Ingo Molnar, James Clark, Catalin Marinas,
	Namhyung Kim, tglx, Jiri Olsa, linux-arm-kernel, Mike Leach

On Fri, May 01, 2020 at 11:14:48PM +0800, Leo Yan wrote:
> On Thu, Apr 30, 2020 at 05:18:15PM +0100, Will Deacon wrote:
> > On Thu, Apr 30, 2020 at 06:04:36PM +0200, Peter Zijlstra wrote:
> > > On Thu, Apr 30, 2020 at 04:29:23PM +0100, Marc Zyngier wrote:
> > > 
> > > > I wonder if we could/should make __sched_clock_offset available even when
> > > > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
> > > > help with this particular can or worm...
> > > 
> > > Errrgh. __sched_clock_offset is only needed on x86 because we transition
> > > from one clock device to another on boot. It really shouldn't exist on
> > > anything sane.
> > 
> > I think we still transition from jiffies on arm64, because we don't register
> > with sched_clock until the timer driver probes. Marc, is that right?
> > 
> > > Let me try and understand your particular problem better.
> > 
> > I think the long and short of it is that userspace needs a way to convert
> > the raw counter cycles into a ns value that can be compared against values
> > coming out of sched_clock. To do this accurately, I think it needs the
> > cycles value at the point when sched_clock was initialised.
> 
> Will's understanding is exactly what I want to resolve in this patch.
> 
> The background info is for the ARM SPE [1] decoding with perf tool, if
> the timestamp is enabled, it uses the generic timer's counter as
> timestamp source.  SPE trace data only contains the raw counter cycles,
> as Will mentioned, the perf tool needs to convert it to a coordinate
> value with sched_clock.  This is why this patch tries to calculate the
> offset between the raw counter's ns value and sched_clock, eventually
> this offset value will be used by SPE's decoding code in Perf tool to
> calibrate a 'correct' timestamp.
> 
> Based on your suggestions, I will use __sched_clock_offset to resolve
> the accuracy issue in patch v2.  (I noticed Peter suggested to use a
> new API for wrapping clock_data structure, IIUC, __sched_clock_offset
> is more straightforward for this case).
> 
> Please correct if I miss anything.  Thank you for reviewing and
> suggestions!

I don't think you can use __sched_clock_offset without selecting
HAVE_UNSTABLE_SCHED_CLOCK, and we really don't want to do that just
for this. So Peter's idea about exposing what we need is better, although
you'll probably need to take care with the switch-over from jiffies.

It needs some thought, but one possibility would be to introduce a new
variant of sthe ched_clock_register() function that returns the cycle
offset, and then we could fish that out of the timer driver. If we're
crossing all the 'i's and dotting all the 't's then we'd want to disable the
perf userpage if sched_clock changes clocksource too (a bit like we do for
the vDSO).

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
  2020-05-01 15:14           ` Leo Yan
@ 2020-05-01 15:29             ` Marc Zyngier
  -1 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2020-05-01 15:29 UTC (permalink / raw)
  To: Leo Yan
  Cc: Will Deacon, Peter Zijlstra, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark, tglx

On 2020-05-01 16:14, Leo Yan wrote:
> Hi all,
> 
> On Thu, Apr 30, 2020 at 05:18:15PM +0100, Will Deacon wrote:
>> On Thu, Apr 30, 2020 at 06:04:36PM +0200, Peter Zijlstra wrote:
>> > On Thu, Apr 30, 2020 at 04:29:23PM +0100, Marc Zyngier wrote:
>> >
>> > > I wonder if we could/should make __sched_clock_offset available even when
>> > > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
>> > > help with this particular can or worm...
>> >
>> > Errrgh. __sched_clock_offset is only needed on x86 because we transition
>> > from one clock device to another on boot. It really shouldn't exist on
>> > anything sane.
>> 
>> I think we still transition from jiffies on arm64, because we don't 
>> register
>> with sched_clock until the timer driver probes. Marc, is that right?
>> 
>> > Let me try and understand your particular problem better.
>> 
>> I think the long and short of it is that userspace needs a way to 
>> convert
>> the raw counter cycles into a ns value that can be compared against 
>> values
>> coming out of sched_clock. To do this accurately, I think it needs the
>> cycles value at the point when sched_clock was initialised.
> 
> Will's understanding is exactly what I want to resolve in this patch.
> 
> The background info is for the ARM SPE [1] decoding with perf tool, if
> the timestamp is enabled, it uses the generic timer's counter as
> timestamp source.  SPE trace data only contains the raw counter cycles,
> as Will mentioned, the perf tool needs to convert it to a coordinate
> value with sched_clock.  This is why this patch tries to calculate the
> offset between the raw counter's ns value and sched_clock, eventually
> this offset value will be used by SPE's decoding code in Perf tool to
> calibrate a 'correct' timestamp.
> 
> Based on your suggestions, I will use __sched_clock_offset to resolve
> the accuracy issue in patch v2.  (I noticed Peter suggested to use a
> new API for wrapping clock_data structure, IIUC, __sched_clock_offset
> is more straightforward for this case).

I think Peter's point was *not* to use __sched_clock_offset which
appears to be only there for the purpose of an x86 workaround (and not
available to other architecture), but to instead expose the relevant
field (epoch_cyc) to the perf subsystem.

I feel it makes sense to make this a generic API, and see whether x86
can move over to it rather than the other way around.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-05-01 15:29             ` Marc Zyngier
  0 siblings, 0 replies; 28+ messages in thread
From: Marc Zyngier @ 2020-05-01 15:29 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, Al Grant, Mathieu Poirier, Peter Zijlstra,
	Jiri Olsa, linux-kernel, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Ingo Molnar, James Clark, Catalin Marinas,
	Namhyung Kim, tglx, Will Deacon, linux-arm-kernel, Mike Leach

On 2020-05-01 16:14, Leo Yan wrote:
> Hi all,
> 
> On Thu, Apr 30, 2020 at 05:18:15PM +0100, Will Deacon wrote:
>> On Thu, Apr 30, 2020 at 06:04:36PM +0200, Peter Zijlstra wrote:
>> > On Thu, Apr 30, 2020 at 04:29:23PM +0100, Marc Zyngier wrote:
>> >
>> > > I wonder if we could/should make __sched_clock_offset available even when
>> > > CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
>> > > help with this particular can or worm...
>> >
>> > Errrgh. __sched_clock_offset is only needed on x86 because we transition
>> > from one clock device to another on boot. It really shouldn't exist on
>> > anything sane.
>> 
>> I think we still transition from jiffies on arm64, because we don't 
>> register
>> with sched_clock until the timer driver probes. Marc, is that right?
>> 
>> > Let me try and understand your particular problem better.
>> 
>> I think the long and short of it is that userspace needs a way to 
>> convert
>> the raw counter cycles into a ns value that can be compared against 
>> values
>> coming out of sched_clock. To do this accurately, I think it needs the
>> cycles value at the point when sched_clock was initialised.
> 
> Will's understanding is exactly what I want to resolve in this patch.
> 
> The background info is for the ARM SPE [1] decoding with perf tool, if
> the timestamp is enabled, it uses the generic timer's counter as
> timestamp source.  SPE trace data only contains the raw counter cycles,
> as Will mentioned, the perf tool needs to convert it to a coordinate
> value with sched_clock.  This is why this patch tries to calculate the
> offset between the raw counter's ns value and sched_clock, eventually
> this offset value will be used by SPE's decoding code in Perf tool to
> calibrate a 'correct' timestamp.
> 
> Based on your suggestions, I will use __sched_clock_offset to resolve
> the accuracy issue in patch v2.  (I noticed Peter suggested to use a
> new API for wrapping clock_data structure, IIUC, __sched_clock_offset
> is more straightforward for this case).

I think Peter's point was *not* to use __sched_clock_offset which
appears to be only there for the purpose of an x86 workaround (and not
available to other architecture), but to instead expose the relevant
field (epoch_cyc) to the perf subsystem.

I feel it makes sense to make this a generic API, and see whether x86
can move over to it rather than the other way around.

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
  2020-05-01 15:26             ` Will Deacon
@ 2020-05-01 16:10               ` Leo Yan
  -1 siblings, 0 replies; 28+ messages in thread
From: Leo Yan @ 2020-05-01 16:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: Peter Zijlstra, Marc Zyngier, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark, tglx

On Fri, May 01, 2020 at 04:26:09PM +0100, Will Deacon wrote:

[...]

> > > > Let me try and understand your particular problem better.
> > > 
> > > I think the long and short of it is that userspace needs a way to convert
> > > the raw counter cycles into a ns value that can be compared against values
> > > coming out of sched_clock. To do this accurately, I think it needs the
> > > cycles value at the point when sched_clock was initialised.
> > 
> > Will's understanding is exactly what I want to resolve in this patch.
> > 
> > The background info is for the ARM SPE [1] decoding with perf tool, if
> > the timestamp is enabled, it uses the generic timer's counter as
> > timestamp source.  SPE trace data only contains the raw counter cycles,
> > as Will mentioned, the perf tool needs to convert it to a coordinate
> > value with sched_clock.  This is why this patch tries to calculate the
> > offset between the raw counter's ns value and sched_clock, eventually
> > this offset value will be used by SPE's decoding code in Perf tool to
> > calibrate a 'correct' timestamp.
> > 
> > Based on your suggestions, I will use __sched_clock_offset to resolve
> > the accuracy issue in patch v2.  (I noticed Peter suggested to use a
> > new API for wrapping clock_data structure, IIUC, __sched_clock_offset
> > is more straightforward for this case).
> > 
> > Please correct if I miss anything.  Thank you for reviewing and
> > suggestions!
> 
> I don't think you can use __sched_clock_offset without selecting
> HAVE_UNSTABLE_SCHED_CLOCK, and we really don't want to do that just
> for this. So Peter's idea about exposing what we need is better, although
> you'll probably need to take care with the switch-over from jiffies.
> 
> It needs some thought, but one possibility would be to introduce a new
> variant of sthe ched_clock_register() function that returns the cycle
> offset, and then we could fish that out of the timer driver.

Thanks a lot for you and Marc for correction.

> If we're
> crossing all the 'i's and dotting all the 't's then we'd want to disable the
> perf userpage if sched_clock changes clocksource too (a bit like we do for
> the vDSO).

To be honest, one thing is not clear for me is how the perf tool to
update the arch timer's parameters in the middle of tracing after
disable and re-enable per userpage.  I will note for this and look
into detailed implementation for this part.

Thanks for sharing comprehensive thoughts!

Leo

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-05-01 16:10               ` Leo Yan
  0 siblings, 0 replies; 28+ messages in thread
From: Leo Yan @ 2020-05-01 16:10 UTC (permalink / raw)
  To: Will Deacon
  Cc: Mark Rutland, Al Grant, Mathieu Poirier, Peter Zijlstra,
	Marc Zyngier, linux-kernel, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Ingo Molnar, James Clark, Catalin Marinas,
	Namhyung Kim, tglx, Jiri Olsa, linux-arm-kernel, Mike Leach

On Fri, May 01, 2020 at 04:26:09PM +0100, Will Deacon wrote:

[...]

> > > > Let me try and understand your particular problem better.
> > > 
> > > I think the long and short of it is that userspace needs a way to convert
> > > the raw counter cycles into a ns value that can be compared against values
> > > coming out of sched_clock. To do this accurately, I think it needs the
> > > cycles value at the point when sched_clock was initialised.
> > 
> > Will's understanding is exactly what I want to resolve in this patch.
> > 
> > The background info is for the ARM SPE [1] decoding with perf tool, if
> > the timestamp is enabled, it uses the generic timer's counter as
> > timestamp source.  SPE trace data only contains the raw counter cycles,
> > as Will mentioned, the perf tool needs to convert it to a coordinate
> > value with sched_clock.  This is why this patch tries to calculate the
> > offset between the raw counter's ns value and sched_clock, eventually
> > this offset value will be used by SPE's decoding code in Perf tool to
> > calibrate a 'correct' timestamp.
> > 
> > Based on your suggestions, I will use __sched_clock_offset to resolve
> > the accuracy issue in patch v2.  (I noticed Peter suggested to use a
> > new API for wrapping clock_data structure, IIUC, __sched_clock_offset
> > is more straightforward for this case).
> > 
> > Please correct if I miss anything.  Thank you for reviewing and
> > suggestions!
> 
> I don't think you can use __sched_clock_offset without selecting
> HAVE_UNSTABLE_SCHED_CLOCK, and we really don't want to do that just
> for this. So Peter's idea about exposing what we need is better, although
> you'll probably need to take care with the switch-over from jiffies.
> 
> It needs some thought, but one possibility would be to introduce a new
> variant of sthe ched_clock_register() function that returns the cycle
> offset, and then we could fish that out of the timer driver.

Thanks a lot for you and Marc for correction.

> If we're
> crossing all the 'i's and dotting all the 't's then we'd want to disable the
> perf userpage if sched_clock changes clocksource too (a bit like we do for
> the vDSO).

To be honest, one thing is not clear for me is how the perf tool to
update the arch timer's parameters in the middle of tracing after
disable and re-enable per userpage.  I will note for this and look
into detailed implementation for this part.

Thanks for sharing comprehensive thoughts!

Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
  2020-05-01 16:10               ` Leo Yan
@ 2020-05-01 17:13                 ` Will Deacon
  -1 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2020-05-01 17:13 UTC (permalink / raw)
  To: Leo Yan
  Cc: Peter Zijlstra, Marc Zyngier, Mark Rutland, Ingo Molnar,
	Arnaldo Carvalho de Melo, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, Catalin Marinas, linux-arm-kernel, linux-kernel,
	Mathieu Poirier, Mike Leach, Al Grant, James Clark, tglx

On Sat, May 02, 2020 at 12:10:50AM +0800, Leo Yan wrote:
> On Fri, May 01, 2020 at 04:26:09PM +0100, Will Deacon wrote:
> 
> [...]
> 
> > > > > Let me try and understand your particular problem better.
> > > > 
> > > > I think the long and short of it is that userspace needs a way to convert
> > > > the raw counter cycles into a ns value that can be compared against values
> > > > coming out of sched_clock. To do this accurately, I think it needs the
> > > > cycles value at the point when sched_clock was initialised.
> > > 
> > > Will's understanding is exactly what I want to resolve in this patch.
> > > 
> > > The background info is for the ARM SPE [1] decoding with perf tool, if
> > > the timestamp is enabled, it uses the generic timer's counter as
> > > timestamp source.  SPE trace data only contains the raw counter cycles,
> > > as Will mentioned, the perf tool needs to convert it to a coordinate
> > > value with sched_clock.  This is why this patch tries to calculate the
> > > offset between the raw counter's ns value and sched_clock, eventually
> > > this offset value will be used by SPE's decoding code in Perf tool to
> > > calibrate a 'correct' timestamp.
> > > 
> > > Based on your suggestions, I will use __sched_clock_offset to resolve
> > > the accuracy issue in patch v2.  (I noticed Peter suggested to use a
> > > new API for wrapping clock_data structure, IIUC, __sched_clock_offset
> > > is more straightforward for this case).
> > > 
> > > Please correct if I miss anything.  Thank you for reviewing and
> > > suggestions!
> > 
> > I don't think you can use __sched_clock_offset without selecting
> > HAVE_UNSTABLE_SCHED_CLOCK, and we really don't want to do that just
> > for this. So Peter's idea about exposing what we need is better, although
> > you'll probably need to take care with the switch-over from jiffies.
> > 
> > It needs some thought, but one possibility would be to introduce a new
> > variant of sthe ched_clock_register() function that returns the cycle
> > offset, and then we could fish that out of the timer driver.
> 
> Thanks a lot for you and Marc for correction.
> 
> > If we're
> > crossing all the 'i's and dotting all the 't's then we'd want to disable the
> > perf userpage if sched_clock changes clocksource too (a bit like we do for
> > the vDSO).
> 
> To be honest, one thing is not clear for me is how the perf tool to
> update the arch timer's parameters in the middle of tracing after
> disable and re-enable per userpage.  I will note for this and look
> into detailed implementation for this part.

I don't fully understand the concern but, generally, the seqlock should
take care of any inconsistencies in the data page.

Will

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
@ 2020-05-01 17:13                 ` Will Deacon
  0 siblings, 0 replies; 28+ messages in thread
From: Will Deacon @ 2020-05-01 17:13 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, Al Grant, Mathieu Poirier, Peter Zijlstra,
	Marc Zyngier, linux-kernel, Arnaldo Carvalho de Melo,
	Alexander Shishkin, Ingo Molnar, James Clark, Catalin Marinas,
	Namhyung Kim, tglx, Jiri Olsa, linux-arm-kernel, Mike Leach

On Sat, May 02, 2020 at 12:10:50AM +0800, Leo Yan wrote:
> On Fri, May 01, 2020 at 04:26:09PM +0100, Will Deacon wrote:
> 
> [...]
> 
> > > > > Let me try and understand your particular problem better.
> > > > 
> > > > I think the long and short of it is that userspace needs a way to convert
> > > > the raw counter cycles into a ns value that can be compared against values
> > > > coming out of sched_clock. To do this accurately, I think it needs the
> > > > cycles value at the point when sched_clock was initialised.
> > > 
> > > Will's understanding is exactly what I want to resolve in this patch.
> > > 
> > > The background info is for the ARM SPE [1] decoding with perf tool, if
> > > the timestamp is enabled, it uses the generic timer's counter as
> > > timestamp source.  SPE trace data only contains the raw counter cycles,
> > > as Will mentioned, the perf tool needs to convert it to a coordinate
> > > value with sched_clock.  This is why this patch tries to calculate the
> > > offset between the raw counter's ns value and sched_clock, eventually
> > > this offset value will be used by SPE's decoding code in Perf tool to
> > > calibrate a 'correct' timestamp.
> > > 
> > > Based on your suggestions, I will use __sched_clock_offset to resolve
> > > the accuracy issue in patch v2.  (I noticed Peter suggested to use a
> > > new API for wrapping clock_data structure, IIUC, __sched_clock_offset
> > > is more straightforward for this case).
> > > 
> > > Please correct if I miss anything.  Thank you for reviewing and
> > > suggestions!
> > 
> > I don't think you can use __sched_clock_offset without selecting
> > HAVE_UNSTABLE_SCHED_CLOCK, and we really don't want to do that just
> > for this. So Peter's idea about exposing what we need is better, although
> > you'll probably need to take care with the switch-over from jiffies.
> > 
> > It needs some thought, but one possibility would be to introduce a new
> > variant of sthe ched_clock_register() function that returns the cycle
> > offset, and then we could fish that out of the timer driver.
> 
> Thanks a lot for you and Marc for correction.
> 
> > If we're
> > crossing all the 'i's and dotting all the 't's then we'd want to disable the
> > perf userpage if sched_clock changes clocksource too (a bit like we do for
> > the vDSO).
> 
> To be honest, one thing is not clear for me is how the perf tool to
> update the arch timer's parameters in the middle of tracing after
> disable and re-enable per userpage.  I will note for this and look
> into detailed implementation for this part.

I don't fully understand the concern but, generally, the seqlock should
take care of any inconsistencies in the data page.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2020-05-01 17:14 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-20  9:35 [PATCH] arm64: perf_event: Fix time_offset for arch timer Leo Yan
2020-03-20  9:35 ` Leo Yan
2020-04-01  1:24 ` Leo Yan
2020-04-01  1:24   ` Leo Yan
2020-04-30 14:58 ` Will Deacon
2020-04-30 14:58   ` Will Deacon
2020-04-30 15:29   ` Marc Zyngier
2020-04-30 15:29     ` Marc Zyngier
2020-04-30 16:04     ` Peter Zijlstra
2020-04-30 16:04       ` Peter Zijlstra
2020-04-30 16:18       ` Will Deacon
2020-04-30 16:18         ` Will Deacon
2020-04-30 17:33         ` Marc Zyngier
2020-04-30 17:33           ` Marc Zyngier
2020-05-01 15:14         ` Leo Yan
2020-05-01 15:14           ` Leo Yan
2020-05-01 15:26           ` Will Deacon
2020-05-01 15:26             ` Will Deacon
2020-05-01 16:10             ` Leo Yan
2020-05-01 16:10               ` Leo Yan
2020-05-01 17:13               ` Will Deacon
2020-05-01 17:13                 ` Will Deacon
2020-05-01 15:29           ` Marc Zyngier
2020-05-01 15:29             ` Marc Zyngier
2020-04-30 16:27   ` Peter Zijlstra
2020-04-30 16:27     ` Peter Zijlstra
2020-04-30 16:45     ` Will Deacon
2020-04-30 16:45       ` Will Deacon

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.