linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Marc Zyngier <maz@kernel.org>
To: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>, Mark Rutland <mark.rutland@arm.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	Catalin Marinas <catalin.marinas@arm.com>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	Mike Leach <mike.leach@linaro.org>, Al Grant <Al.Grant@arm.com>,
	James Clark <James.Clark@arm.com>,
	tglx@linutronix.de
Subject: Re: [PATCH] arm64: perf_event: Fix time_offset for arch timer
Date: Thu, 30 Apr 2020 16:29:23 +0100	[thread overview]
Message-ID: <4d924f705245c797a19d3a73eb0c1ba0@kernel.org> (raw)
In-Reply-To: <20200430145823.GA25258@willie-the-truck>

On 2020-04-30 15:58, Will Deacon wrote:
> Hi Leo,
> 
> [+Maz and tglx in case I'm barking up the wrong tree]
> 
> On Fri, Mar 20, 2020 at 05:35:45PM +0800, Leo Yan wrote:
>> Between the system powering on and kernel's sched clock registration,
>> the arch timer usually has been enabled at the early time and its
>> counter is incremented during the period of the booting up.  Thus the
>> arch timer's counter is not completely accounted into the sched clock,
>> and has a delta between the arch timer's counter and sched clock.  
>> This
>> delta value should be stored into userpg->time_offset, which later can
>> be retrieved by Perf tool in the user space for sample timestamp
>> calculation.
>> 
>> Now userpg->time_offset is assigned to the negative sched clock with
>> '-now', this value cannot reflect the delta between arch timer's 
>> counter
>> and sched clock, so Perf cannot use it to calculate the sample time.
>> 
>> To fix this issue, this patch calculate the delta between the arch
>> timer's and sched clock and assign the delta to userpg->time_offset.
>> The detailed steps are firstly to convert counter to nanoseconds 'ns',
>> then the offset is calculated as 'now' minus 'ns'.
>> 
>>         |<------------------- 'ns' ---------------------->|
>>                                 |<-------- 'now' -------->|
>>         |<---- time_offset ---->|
>>         |-----------------------|-------------------------|
>>         ^                       ^                         ^
>>   Power on system     sched clock registration      Perf starts
> 
> FWIW, I'm /really/ struggling to understand the problem here.
> 
> If I've grokked it correctly (big 'if'), then you can't just factor in
> what you call "time_offset" in the diagram above, because there isn't
> a guarantee that the counter is zero-initialised at the start.

Even if it was, we have no idea of *when* that was. Think kexec, for a
start. Or spending some variable in firmware because of $REASON.

> 
>> Signed-off-by: Leo Yan <leo.yan@linaro.org>
>> ---
>>  arch/arm64/kernel/perf_event.c | 19 ++++++++++++++++++-
>>  1 file changed, 18 insertions(+), 1 deletion(-)
>> 
>> diff --git a/arch/arm64/kernel/perf_event.c 
>> b/arch/arm64/kernel/perf_event.c
>> index e40b65645c86..226d25d77072 100644
>> --- a/arch/arm64/kernel/perf_event.c
>> +++ b/arch/arm64/kernel/perf_event.c
>> @@ -1143,6 +1143,7 @@ void arch_perf_update_userpage(struct perf_event 
>> *event,
>>  {
>>  	u32 freq;
>>  	u32 shift;
>> +	u64 count, ns, quot, rem;
>> 
>>  	/*
>>  	 * Internal timekeeping for enabled/running/stopped times
>> @@ -1164,5 +1165,21 @@ void arch_perf_update_userpage(struct 
>> perf_event *event,
>>  		userpg->time_mult >>= 1;
>>  	}
>>  	userpg->time_shift = (u16)shift;
>> -	userpg->time_offset = -now;
>> +
>> +	/*
>> +	 * Since arch timer is enabled ealier than sched clock registration,
>> +	 * compuate the delta (in nanosecond unit) between the arch timer
>> +	 * counter and sched clock, assign the delta to time_offset and
>> +	 * perf tool can use it for timestamp calculation.
>> +	 *
>> +	 * The formula for conversion arch timer cycle to ns is:
>> +	 *   quot = (cyc >> time_shift);
>> +	 *   rem  = cyc & ((1 << time_shift) - 1);
>> +	 *   ns   = quot * time_mult + ((rem * time_mult) >> time_shift);
>> +	 */
>> +	count = arch_timer_read_counter();
>> +	quot = count >> shift;
>> +	rem = count & ((1 << shift) - 1);
>> +	ns = quot * userpg->time_mult + ((rem * userpg->time_mult) >> 
>> shift);
>> +	userpg->time_offset = now - ns;
> 
> Hmm, reading the counter and calculating the delta feels horribly
> approximate to me. It would be much better if we could get hold of the
> initial epoch cycles from the point at which sched_clock was 
> initialised
> using the counter. This represents the true cycle delta between the 
> counter
> and what sched_clock uses for 0 ns.

I think this is a sensible solution if you want an epoch that starts at 
0 with
sched_clock being initialized. The other question is whether it is 
possible to
use a different timestamping source for perf that wouldn't need to be 
offset.

> Unfortunately, I can't see a straightforward way to grab that 
> information.
> It looks like x86 pulls this directly from the TSC driver.

I wonder if we could/should make __sched_clock_offset available even 
when
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK isn't defined. It feels like it would
help with this particular can or worm...

         M.
-- 
Jazz is not dead. It just smells funny...

  reply	other threads:[~2020-04-30 15:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-03-20  9:35 [PATCH] arm64: perf_event: Fix time_offset for arch timer Leo Yan
2020-04-01  1:24 ` Leo Yan
2020-04-30 14:58 ` Will Deacon
2020-04-30 15:29   ` Marc Zyngier [this message]
2020-04-30 16:04     ` Peter Zijlstra
2020-04-30 16:18       ` Will Deacon
2020-04-30 17:33         ` Marc Zyngier
2020-05-01 15:14         ` Leo Yan
2020-05-01 15:26           ` Will Deacon
2020-05-01 16:10             ` Leo Yan
2020-05-01 17:13               ` Will Deacon
2020-05-01 15:29           ` Marc Zyngier
2020-04-30 16:27   ` Peter Zijlstra
2020-04-30 16:45     ` Will Deacon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4d924f705245c797a19d3a73eb0c1ba0@kernel.org \
    --to=maz@kernel.org \
    --cc=Al.Grant@arm.com \
    --cc=James.Clark@arm.com \
    --cc=acme@kernel.org \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=catalin.marinas@arm.com \
    --cc=jolsa@redhat.com \
    --cc=leo.yan@linaro.org \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.poirier@linaro.org \
    --cc=mike.leach@linaro.org \
    --cc=mingo@redhat.com \
    --cc=namhyung@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).