All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Stultz <john.stultz@linaro.org>
To: Stephane Eranian <eranian@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"mingo@elte.hu" <mingo@elte.hu>,
	Paul Mackerras <paulus@samba.org>,
	Anton Blanchard <anton@samba.org>,
	Will Deacon <will.deacon@arm.com>,
	"ak@linux.intel.com" <ak@linux.intel.com>,
	Pekka Enberg <penberg@gmail.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Robert Richter <robert.richter@amd.com>,
	tglx <tglx@linutronix.de>
Subject: Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples
Date: Mon, 12 Nov 2012 14:39:22 -0800	[thread overview]
Message-ID: <50A17A9A.1060400@linaro.org> (raw)
In-Reply-To: <CABPqkBQonkddt0mfkowM-oFcOPBTA2gNAju+PPrpcfqEs-L=KA@mail.gmail.com>

On 11/12/2012 12:54 PM, Stephane Eranian wrote:
> On Mon, Nov 12, 2012 at 7:53 PM, John Stultz <john.stultz@linaro.org> wrote:
>> On 11/11/2012 12:32 PM, Stephane Eranian wrote:
>>> On Sat, Nov 10, 2012 at 3:04 AM, John Stultz <john.stultz@linaro.org>
>>> wrote:
>>>> Also I worry that it will be abused in the same way that direct TSC
>>>> access
>>>> is, where the seemingly better performance from the more careful/correct
>>>> CLOCK_MONOTONIC would cause developers to write fragile userland code
>>>> that
>>>> will break when moved from one machine to the next.
>>>>
>>> The only goal for this new time source is for correlating user-level
>>> samples with
>>> kernel level samples, i.e., application level events with a PMU counter
>>> overflow
>>> for instance. Anybody trying anything else would be on their own.
>>>
>>> clock_gettime(CLOCK_PERF): guarantee to return the same time source as
>>> that used by the perf_event subsystem to timestamp samples when
>>> PERF_SAMPLE_TIME is requested in attr->sample_type.
>>
>> I'm not familiar enough with perf's interfaces, but if you are going to make
>> this clockid bound so tightly with perf, could you maybe export a perf
>> timestamp from one of perf's interfaces rather then using the more generic
>> clock_gettime() interface?
>>
> Yeah, I considered that as well. But it is more complicated. The only syscall
> we could extend for perf_events is ioctl(). But that one requires that an
> event be created so we obtain a file descriptor for the ioctl() call
> So we'd have to
> pretend programming a dummy event just for the purpose of obtained a timestamp.
> We could do that but that's not so nice. But more amenable to the

Sorry, you trailed off.   Did you want to finish that thought? (I do 
that all the time.  :)

> Keep in mind that the clock_gettime() would be used by programs which are not
> self-monitoring but may be monitored externally by a tool such as perf. We just
> need to them to emit their events with a timestamp that can be
> correlated offline
> with those of perf_events.

Again, forgive me for not really knowing much about perf here, but could 
you have a perf log an event when clock_gettime() was called, possibly 
recording the returned value, so you could correlate that data yourself?


>>>> I'd probably rather perf output timestamps to userland using sane clocks
>>>> (CLOCK_MONOTONIC), rather then trying to introduce a new time domain to
>>>> userland.   But I probably could be convinced I'm wrong.
>>>>
>>> Can you get CLOCK_MONOTONIC efficiently and in ALL circumstances without
>>> grabbing any locks because that would need to run from NMI context?
>> No,  of course why we have sched_clock. But I'm suggesting we consider
>> changing what perf exports (via maybe interpolation/translation) to be
>> CLOCK_MONOTONIC-ish.
>>
> Explain to me the key difference between monotonic and what sched_clock()
> is returning today? Does this have to do with the global monotonic vs.
> the cpu-wide
> monotonic?

So CLOCK_MONOTONIC is the number of NTP corrected (for accuracy) seconds 
+ nsecs that the machine has been up for (so that doesn't include time 
in suspend). Its promised to be globally monotonic across cpus.

In my understanding, sched_clock's definition has changed over time. It 
used to be a fast but possibly incorrect nanoseconds since boot, but 
with suspend and other events it could reset/overflow and users (then 
only the scheduler) would be able to deal with it. It also wasn't 
guaranteed to be consistent across cpus.  So it was limited to 
calculating approximate time intervals on a single cpu.

However, with cfs  (And Peter or Ingo could probably hop in and clarify 
further) I believe it started to require some cross-cpu consistency and 
reset events would cause probelms with the scheduler, so additional 
layers have been added to try to enforce these additional requirements.

I suspect they aren't that far off, except calibration frequency errors 
go uncorrected with sched_clock. But was thinking you could get periodic 
timestamps in perf that correlated CLOCK_MONOTONIC with sched_clock and 
then allow the kernel to interpolate the sched_clock times out to 
something pretty close to CLOCK_MONOTONIC. That way perf wouldn't leak 
the sched_clock time domain to userland.

Again, sorry for being a pain here. The CLOCK_PERF would be a easy 
solution, but I just want to make sure its really the best one long term.

thanks
-john




  reply	other threads:[~2012-11-12 22:40 UTC|newest]

Thread overview: 65+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-10-16 10:13 [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples Stephane Eranian
2012-10-16 17:23 ` Peter Zijlstra
2012-10-18 19:33   ` Stephane Eranian
2012-11-10  2:04   ` John Stultz
2012-11-11 20:32     ` Stephane Eranian
2012-11-12 18:53       ` John Stultz
2012-11-12 20:54         ` Stephane Eranian
2012-11-12 22:39           ` John Stultz [this message]
2012-11-13 20:58     ` Steven Rostedt
2012-11-14 22:26       ` John Stultz
2012-11-14 23:30         ` Steven Rostedt
2013-02-01 14:18   ` Pawel Moll
2013-02-05 21:18     ` David Ahern
2013-02-05 22:13     ` Stephane Eranian
2013-02-05 22:28       ` John Stultz
2013-02-06  1:19         ` Steven Rostedt
2013-02-06 18:17           ` Pawel Moll
2013-02-13 20:00             ` Stephane Eranian
2013-02-14 10:33               ` Pawel Moll
2013-02-18 15:16                 ` Stephane Eranian
2013-02-18 18:59                   ` David Ahern
2013-02-18 20:35         ` Thomas Gleixner
2013-02-19 18:25           ` John Stultz
2013-02-19 19:55             ` Thomas Gleixner
2013-02-19 20:15               ` Thomas Gleixner
2013-02-19 20:35                 ` John Stultz
2013-02-19 21:50                   ` Thomas Gleixner
2013-02-19 22:20                     ` John Stultz
2013-02-20 10:06                       ` Thomas Gleixner
2013-02-20 10:29             ` Peter Zijlstra
2013-02-23  6:04               ` John Stultz
2013-02-25 14:10                 ` Peter Zijlstra
2013-03-14 15:34                   ` Stephane Eranian
2013-03-14 19:57                     ` Pawel Moll
2013-03-31 16:23                       ` David Ahern
2013-04-01 18:29                         ` John Stultz
2013-04-01 22:29                           ` David Ahern
2013-04-01 23:12                             ` John Stultz
2013-04-03  9:17                             ` Stephane Eranian
2013-04-03 13:55                               ` David Ahern
2013-04-03 14:00                                 ` Stephane Eranian
2013-04-03 14:14                                   ` David Ahern
2013-04-03 14:22                                     ` Stephane Eranian
2013-04-03 17:57                                       ` John Stultz
2013-04-04  8:12                                         ` Stephane Eranian
2013-04-04 22:26                                           ` John Stultz
2013-04-02  7:54                           ` Peter Zijlstra
2013-04-02 16:05                             ` Pawel Moll
2013-04-02 16:19                             ` John Stultz
2013-04-02 16:34                               ` Pawel Moll
2013-04-03 17:19                               ` Pawel Moll
2013-04-03 17:29                                 ` John Stultz
2013-04-03 17:35                                   ` Pawel Moll
2013-04-03 17:50                                     ` John Stultz
2013-04-04  7:37                                       ` Richard Cochran
2013-04-04 16:33                                         ` Pawel Moll
2013-04-04 16:29                                       ` Pawel Moll
2013-04-05 18:16                                         ` Pawel Moll
2013-04-06 11:05                                           ` Richard Cochran
2013-04-08 17:58                                             ` Pawel Moll
2013-04-08 19:05                                               ` John Stultz
2013-04-09  5:02                                                 ` Richard Cochran
2013-02-06 18:17       ` Pawel Moll
2013-06-26 16:49     ` David Ahern
2013-07-15 10:44       ` Pawel Moll

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=50A17A9A.1060400@linaro.org \
    --to=john.stultz@linaro.org \
    --cc=ak@linux.intel.com \
    --cc=anton@samba.org \
    --cc=eranian@google.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulus@samba.org \
    --cc=penberg@gmail.com \
    --cc=peterz@infradead.org \
    --cc=robert.richter@amd.com \
    --cc=rostedt@goodmis.org \
    --cc=tglx@linutronix.de \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.