All of lore.kernel.org
 help / color / mirror / Atom feed
From: Stephane Eranian <eranian@google.com>
To: John Stultz <john.stultz@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"mingo@elte.hu" <mingo@elte.hu>,
	Paul Mackerras <paulus@samba.org>,
	Anton Blanchard <anton@samba.org>,
	Will Deacon <will.deacon@arm.com>,
	"ak@linux.intel.com" <ak@linux.intel.com>,
	Pekka Enberg <penberg@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Robert Richter <robert.richter@amd.com>,
	tglx <tglx@linutronix.de>
Subject: Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples
Date: Sun, 11 Nov 2012 21:32:43 +0100	[thread overview]
Message-ID: <CABPqkBRwAEDU3g0D7JH-iMq2JVH63h1pydKOdSWhYwkX27pesA@mail.gmail.com> (raw)
In-Reply-To: <509DB632.7070305@linaro.org>

On Sat, Nov 10, 2012 at 3:04 AM, John Stultz <john.stultz@linaro.org> wrote:
> On 10/16/2012 10:23 AM, Peter Zijlstra wrote:
>>
>> On Tue, 2012-10-16 at 12:13 +0200, Stephane Eranian wrote:
>>>
>>> Hi,
>>>
>>> There are many situations where we want to correlate events happening at
>>> the user level with samples recorded in the perf_event kernel sampling
>>> buffer.
>>> For instance, we might want to correlate the call to a function or
>>> creation of
>>> a file with samples. Similarly, when we want to monitor a JVM with jitted
>>> code,
>>> we need to be able to correlate jitted code mappings with perf event
>>> samples
>>> for symbolization.
>>>
>>> Perf_events allows timestamping of samples with PERF_SAMPLE_TIME.
>>> That causes each PERF_RECORD_SAMPLE to include a timestamp
>>> generated by calling the local_clock() -> sched_clock_cpu() function.
>>>
>>> To make correlating user vs. kernel samples easy, we would need to
>>> access that sched_clock() functionality. However, none of the existing
>>> clock calls permit this at this point. They all return timestamps which
>>> are
>>> not using the same source and/or offset as sched_clock.
>>>
>>> I believe a similar issue exists with the ftrace subsystem.
>>>
>>> The problem needs to be adressed in a portable manner. Solutions
>>> based on reading TSC for the user level to reconstruct sched_clock()
>>> don't seem appropriate to me.
>>>
>>> One possibility to address this limitation would be to extend
>>> clock_gettime()
>>> with a new clock time, e.g., CLOCK_PERF.
>>>
>>> However, I understand that sched_clock_cpu() provides ordering guarantees
>>> only
>>> when invoked on the same CPU repeatedly, i.e., it's not globally
>>> synchronized.
>>> But we already have to deal with this problem when merging samples
>>> obtained
>>> from different CPU sampling buffer in per-thread mode. So this is not
>>> necessarily
>>> a showstopper.
>>>
>>> Alternatives could be to use uprobes but that's less practical to setup.
>>>
>>> Anyone with better ideas?
>>
>> You forgot to CC the time people ;-)
>>
>> I've no problem with adding CLOCK_PERF (or another/better name).
>
> Hrm. I'm not excited about exporting that sort of internal kernel details to
> userland.
>
> The behavior and expectations from sched_clock() has changed over the years,
> so I'm not sure its wise to export it, since we'd have to preserve its
> behavior from then on.
>
It's not about just exposing sched_clock(). We need to expose a time source
that is exactly equivalent to what perf_event uses internally. If sched_clock()
changes, then perf_event clock will change too and so would that new time
source for clock_gettime(). As long as everything remains consistent, we are
good.

> Also I worry that it will be abused in the same way that direct TSC access
> is, where the seemingly better performance from the more careful/correct
> CLOCK_MONOTONIC would cause developers to write fragile userland code that
> will break when moved from one machine to the next.
>
The only goal for this new time source is for correlating user-level
samples with
kernel level samples, i.e., application level events with a PMU counter overflow
for instance. Anybody trying anything else would be on their own.

clock_gettime(CLOCK_PERF): guarantee to return the same time source as
that used by the perf_event subsystem to timestamp samples when
PERF_SAMPLE_TIME is requested in attr->sample_type.

> I'd probably rather perf output timestamps to userland using sane clocks
> (CLOCK_MONOTONIC), rather then trying to introduce a new time domain to
> userland.   But I probably could be convinced I'm wrong.
>
Can you get CLOCK_MONOTONIC efficiently and in ALL circumstances without
grabbing any locks because that would need to run from NMI context?

  reply	other threads:[~2012-11-11 20:32 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-16 10:13 [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples Stephane Eranian
2012-10-16 17:23 ` Peter Zijlstra
2012-10-18 19:33   ` Stephane Eranian
2012-11-10  2:04   ` John Stultz
2012-11-11 20:32     ` Stephane Eranian [this message]
2012-11-12 18:53       ` John Stultz
2012-11-12 20:54         ` Stephane Eranian
2012-11-12 22:39           ` John Stultz
2012-11-13 20:58     ` Steven Rostedt
2012-11-14 22:26       ` John Stultz
2012-11-14 23:30         ` Steven Rostedt
2013-02-01 14:18   ` Pawel Moll
2013-02-05 21:18     ` David Ahern
2013-02-05 22:13     ` Stephane Eranian
2013-02-05 22:28       ` John Stultz
2013-02-06  1:19         ` Steven Rostedt
2013-02-06 18:17           ` Pawel Moll
2013-02-13 20:00             ` Stephane Eranian
2013-02-14 10:33               ` Pawel Moll
2013-02-18 15:16                 ` Stephane Eranian
2013-02-18 18:59                   ` David Ahern
2013-02-18 20:35         ` Thomas Gleixner
2013-02-19 18:25           ` John Stultz
2013-02-19 19:55             ` Thomas Gleixner
2013-02-19 20:15               ` Thomas Gleixner
2013-02-19 20:35                 ` John Stultz
2013-02-19 21:50                   ` Thomas Gleixner
2013-02-19 22:20                     ` John Stultz
2013-02-20 10:06                       ` Thomas Gleixner
2013-02-20 10:29             ` Peter Zijlstra
2013-02-23  6:04               ` John Stultz
2013-02-25 14:10                 ` Peter Zijlstra
2013-03-14 15:34                   ` Stephane Eranian
2013-03-14 19:57                     ` Pawel Moll
2013-03-31 16:23                       ` David Ahern
2013-04-01 18:29                         ` John Stultz
2013-04-01 22:29                           ` David Ahern
2013-04-01 23:12                             ` John Stultz
2013-04-03  9:17                             ` Stephane Eranian
2013-04-03 13:55                               ` David Ahern
2013-04-03 14:00                                 ` Stephane Eranian
2013-04-03 14:14                                   ` David Ahern
2013-04-03 14:22                                     ` Stephane Eranian
2013-04-03 17:57                                       ` John Stultz
2013-04-04  8:12                                         ` Stephane Eranian
2013-04-04 22:26                                           ` John Stultz
2013-04-02  7:54                           ` Peter Zijlstra
2013-04-02 16:05                             ` Pawel Moll
2013-04-02 16:19                             ` John Stultz
2013-04-02 16:34                               ` Pawel Moll
2013-04-03 17:19                               ` Pawel Moll
2013-04-03 17:29                                 ` John Stultz
2013-04-03 17:35                                   ` Pawel Moll
2013-04-03 17:50                                     ` John Stultz
2013-04-04  7:37                                       ` Richard Cochran
2013-04-04 16:33                                         ` Pawel Moll
2013-04-04 16:29                                       ` Pawel Moll
2013-04-05 18:16                                         ` Pawel Moll
2013-04-06 11:05                                           ` Richard Cochran
2013-04-08 17:58                                             ` Pawel Moll
2013-04-08 19:05                                               ` John Stultz
2013-04-09  5:02                                                 ` Richard Cochran
2013-02-06 18:17       ` Pawel Moll
2013-06-26 16:49     ` David Ahern
2013-07-15 10:44       ` Pawel Moll

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CABPqkBRwAEDU3g0D7JH-iMq2JVH63h1pydKOdSWhYwkX27pesA@mail.gmail.com \
    --to=eranian@google.com \
    --cc=ak@linux.intel.com \
    --cc=anton@samba.org \
    --cc=john.stultz@linaro.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=penberg@gmail.com \
    --cc=peterz@infradead.org \
    --cc=robert.richter@amd.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.