All of lore.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will@kernel.org>
To: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	Mike Leach <mike.leach@linaro.org>, Al Grant <Al.Grant@arm.com>,
	James Clark <James.Clark@arm.com>,
	maz@kernel.org, tglx@linutronix.de
Subject: Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
Date: Thu, 30 Apr 2020 15:58:24 +0100	[thread overview]
Message-ID: <20200430145823.GA25258@willie-the-truck> (raw)
In-Reply-To: <20200320093545.28227-1-leo.yan@linaro.org>

Hi Leo,

[+Maz and tglx in case I'm barking up the wrong tree]

On Fri, Mar 20, 2020 at 05:35:45PM +0800, Leo Yan wrote:
> Between the system powering on and kernel's sched clock registration,
> the arch timer usually has been enabled at the early time and its
> counter is incremented during the period of the booting up.  Thus the
> arch timer's counter is not completely accounted into the sched clock,
> and has a delta between the arch timer's counter and sched clock.  This
> delta value should be stored into userpg->time_offset, which later can
> be retrieved by Perf tool in the user space for sample timestamp
> calculation.
> 
> Now userpg->time_offset is assigned to the negative sched clock with
> '-now', this value cannot reflect the delta between arch timer's counter
> and sched clock, so Perf cannot use it to calculate the sample time.
> 
> To fix this issue, this patch calculate the delta between the arch
> timer's and sched clock and assign the delta to userpg->time_offset.
> The detailed steps are firstly to convert counter to nanoseconds 'ns',
> then the offset is calculated as 'now' minus 'ns'.
> 
>         |<------------------- 'ns' ---------------------->|
>                                 |<-------- 'now' -------->|
>         |<---- time_offset ---->|
>         |-----------------------|-------------------------|
>         ^                       ^                         ^
>   Power on system     sched clock registration      Perf starts

FWIW, I'm /really/ struggling to understand the problem here.

If I've grokked it correctly (big 'if'), then you can't just factor in
what you call "time_offset" in the diagram above, because there isn't
a guarantee that the counter is zero-initialised at the start.

> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  arch/arm64/kernel/perf_event.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index e40b65645c86..226d25d77072 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1143,6 +1143,7 @@ void arch_perf_update_userpage(struct perf_event *event,
>  {
>  	u32 freq;
>  	u32 shift;
> +	u64 count, ns, quot, rem;
>  
>  	/*
>  	 * Internal timekeeping for enabled/running/stopped times
> @@ -1164,5 +1165,21 @@ void arch_perf_update_userpage(struct perf_event *event,
>  		userpg->time_mult >>= 1;
>  	}
>  	userpg->time_shift = (u16)shift;
> -	userpg->time_offset = -now;
> +
> +	/*
> +	 * Since arch timer is enabled ealier than sched clock registration,
> +	 * compuate the delta (in nanosecond unit) between the arch timer
> +	 * counter and sched clock, assign the delta to time_offset and
> +	 * perf tool can use it for timestamp calculation.
> +	 *
> +	 * The formula for conversion arch timer cycle to ns is:
> +	 *   quot = (cyc >> time_shift);
> +	 *   rem  = cyc & ((1 << time_shift) - 1);
> +	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
> +	 */
> +	count = arch_timer_read_counter();
> +	quot = count >> shift;
> +	rem = count & ((1 << shift) - 1);
> +	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> shift);
> +	userpg->time_offset = now - ns;

Hmm, reading the counter and calculating the delta feels horribly
approximate to me. It would be much better if we could get hold of the
initial epoch cycles from the point at which sched_clock was initialised
using the counter. This represents the true cycle delta between the counter
and what sched_clock uses for 0 ns.

Unfortunately, I can't see a straightforward way to grab that information.
It looks like x86 pulls this directly from the TSC driver.

Will

WARNING: multiple messages have this Message-ID (diff)
From: Will Deacon <will@kernel.org>
To: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>, Al Grant <Al.Grant@arm.com>,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-kernel@vger.kernel.org,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Ingo Molnar <mingo@redhat.com>, James Clark <James.Clark@arm.com>,
	maz@kernel.org, Namhyung Kim <namhyung@kernel.org>,
	tglx@linutronix.de, Jiri Olsa <jolsa@redhat.com>,
	linux-arm-kernel@lists.infradead.org,
	Mike Leach <mike.leach@linaro.org>
Subject: Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
Date: Thu, 30 Apr 2020 15:58:24 +0100	[thread overview]
Message-ID: <20200430145823.GA25258@willie-the-truck> (raw)
In-Reply-To: <20200320093545.28227-1-leo.yan@linaro.org>

Hi Leo,

[+Maz and tglx in case I'm barking up the wrong tree]

On Fri, Mar 20, 2020 at 05:35:45PM +0800, Leo Yan wrote:
> Between the system powering on and kernel's sched clock registration,
> the arch timer usually has been enabled at the early time and its
> counter is incremented during the period of the booting up.  Thus the
> arch timer's counter is not completely accounted into the sched clock,
> and has a delta between the arch timer's counter and sched clock.  This
> delta value should be stored into userpg->time_offset, which later can
> be retrieved by Perf tool in the user space for sample timestamp
> calculation.
> 
> Now userpg->time_offset is assigned to the negative sched clock with
> '-now', this value cannot reflect the delta between arch timer's counter
> and sched clock, so Perf cannot use it to calculate the sample time.
> 
> To fix this issue, this patch calculate the delta between the arch
> timer's and sched clock and assign the delta to userpg->time_offset.
> The detailed steps are firstly to convert counter to nanoseconds 'ns',
> then the offset is calculated as 'now' minus 'ns'.
> 
>         |<------------------- 'ns' ---------------------->|
>                                 |<-------- 'now' -------->|
>         |<---- time_offset ---->|
>         |-----------------------|-------------------------|
>         ^                       ^                         ^
>   Power on system     sched clock registration      Perf starts

FWIW, I'm /really/ struggling to understand the problem here.

If I've grokked it correctly (big 'if'), then you can't just factor in
what you call "time_offset" in the diagram above, because there isn't
a guarantee that the counter is zero-initialised at the start.

> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  arch/arm64/kernel/perf_event.c | 19 ++++++++++++++++++-
>  1 file changed, 18 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
> index e40b65645c86..226d25d77072 100644
> --- a/arch/arm64/kernel/perf_event.c
> +++ b/arch/arm64/kernel/perf_event.c
> @@ -1143,6 +1143,7 @@ void arch_perf_update_userpage(struct perf_event *event,
>  {
>  	u32 freq;
>  	u32 shift;
> +	u64 count, ns, quot, rem;
>  
>  	/*
>  	 * Internal timekeeping for enabled/running/stopped times
> @@ -1164,5 +1165,21 @@ void arch_perf_update_userpage(struct perf_event *event,
>  		userpg->time_mult >>= 1;
>  	}
>  	userpg->time_shift = (u16)shift;
> -	userpg->time_offset = -now;
> +
> +	/*
> +	 * Since arch timer is enabled ealier than sched clock registration,
> +	 * compuate the delta (in nanosecond unit) between the arch timer
> +	 * counter and sched clock, assign the delta to time_offset and
> +	 * perf tool can use it for timestamp calculation.
> +	 *
> +	 * The formula for conversion arch timer cycle to ns is:
> +	 *   quot = (cyc >> time_shift);
> +	 *   rem  = cyc & ((1 << time_shift) - 1);
> +	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
> +	 */
> +	count = arch_timer_read_counter();
> +	quot = count >> shift;
> +	rem = count & ((1 << shift) - 1);
> +	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> shift);
> +	userpg->time_offset = now - ns;

Hmm, reading the counter and calculating the delta feels horribly
approximate to me. It would be much better if we could get hold of the
initial epoch cycles from the point at which sched_clock was initialised
using the counter. This represents the true cycle delta between the counter
and what sched_clock uses for 0 ns.

Unfortunately, I can't see a straightforward way to grab that information.
It looks like x86 pulls this directly from the TSC driver.

Will

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  parent reply	other threads:[~2020-04-30 14:58 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-20  9:35 [PATCH] arm64: perf_event: Fix time_offset for arch timer Leo Yan
2020-03-20  9:35 ` Leo Yan
2020-04-01  1:24 ` Leo Yan
2020-04-01  1:24   ` Leo Yan
2020-04-30 14:58 ` Will Deacon [this message]
2020-04-30 14:58   ` Will Deacon
2020-04-30 15:29   ` Marc Zyngier
2020-04-30 15:29     ` Marc Zyngier
2020-04-30 16:04     ` Peter Zijlstra
2020-04-30 16:04       ` Peter Zijlstra
2020-04-30 16:18       ` Will Deacon
2020-04-30 16:18         ` Will Deacon
2020-04-30 17:33         ` Marc Zyngier
2020-04-30 17:33           ` Marc Zyngier
2020-05-01 15:14         ` Leo Yan
2020-05-01 15:14           ` Leo Yan
2020-05-01 15:26           ` Will Deacon
2020-05-01 15:26             ` Will Deacon
2020-05-01 16:10             ` Leo Yan
2020-05-01 16:10               ` Leo Yan
2020-05-01 17:13               ` Will Deacon
2020-05-01 17:13                 ` Will Deacon
2020-05-01 15:29           ` Marc Zyngier
2020-05-01 15:29             ` Marc Zyngier
2020-04-30 16:27   ` Peter Zijlstra
2020-04-30 16:27     ` Peter Zijlstra
2020-04-30 16:45     ` Will Deacon
2020-04-30 16:45       ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200430145823.GA25258@willie-the-truck \
    --to=will@kernel.org \
    --cc=Al.Grant@arm.com \
    --cc=James.Clark@arm.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=jolsa@redhat.com \
    --cc=leo.yan@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.poirier@linaro.org \
    --cc=maz@kernel.org \
    --cc=mike.leach@linaro.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.