From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756871Ab3BAOSN (ORCPT ); Fri, 1 Feb 2013 09:18:13 -0500 Received: from service87.mimecast.com ([91.220.42.44]:47194 "EHLO service87.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755309Ab3BAOSJ convert rfc822-to-8bit (ORCPT ); Fri, 1 Feb 2013 09:18:09 -0500 Message-ID: <1359728280.8360.15.camel@hornet> Subject: Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples From: Pawel Moll To: Peter Zijlstra Cc: Stephane Eranian , LKML , "mingo@elte.hu" , Paul Mackerras , Anton Blanchard , Will Deacon , "ak@linux.intel.com" , Pekka Enberg , Steven Rostedt , Robert Richter , tglx , John Stultz Date: Fri, 01 Feb 2013 14:18:00 +0000 In-Reply-To: <1350408232.2336.42.camel@laptop> References: <1350408232.2336.42.camel@laptop> X-Mailer: Evolution 3.6.2-0ubuntu0.1 Mime-Version: 1.0 X-OriginalArrivalTime: 01 Feb 2013 14:18:01.0020 (UTC) FILETIME=[F10E7FC0:01CE0086] X-MC-Unique: 113020114180609801 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hello, I'd like to revive the topic... On Tue, 2012-10-16 at 18:23 +0100, Peter Zijlstra wrote: > On Tue, 2012-10-16 at 12:13 +0200, Stephane Eranian wrote: > > Hi, > > > > There are many situations where we want to correlate events happening at > > the user level with samples recorded in the perf_event kernel sampling buffer. > > For instance, we might want to correlate the call to a function or creation of > > a file with samples. Similarly, when we want to monitor a JVM with jitted code, > > we need to be able to correlate jitted code mappings with perf event samples > > for symbolization. > > > > Perf_events allows timestamping of samples with PERF_SAMPLE_TIME. > > That causes each PERF_RECORD_SAMPLE to include a timestamp > > generated by calling the local_clock() -> sched_clock_cpu() function. > > > > To make correlating user vs. kernel samples easy, we would need to > > access that sched_clock() functionality. However, none of the existing > > clock calls permit this at this point. They all return timestamps which are > > not using the same source and/or offset as sched_clock. > > > > I believe a similar issue exists with the ftrace subsystem. > > > > The problem needs to be adressed in a portable manner. Solutions > > based on reading TSC for the user level to reconstruct sched_clock() > > don't seem appropriate to me. > > > > One possibility to address this limitation would be to extend clock_gettime() > > with a new clock time, e.g., CLOCK_PERF. > > > > However, I understand that sched_clock_cpu() provides ordering guarantees only > > when invoked on the same CPU repeatedly, i.e., it's not globally synchronized. > > But we already have to deal with this problem when merging samples obtained > > from different CPU sampling buffer in per-thread mode. So this is not > > necessarily > > a showstopper. > > > > Alternatives could be to use uprobes but that's less practical to setup. > > > > Anyone with better ideas? > > You forgot to CC the time people ;-) > > I've no problem with adding CLOCK_PERF (or another/better name). > > Thomas, John? I've just faced the same issue - correlating an event in userspace with data from the perf stream, but to my mind what I want to get is a value returned by perf_clock() _in the current "session" context_. Stephane didn't like the idea of opening a "fake" perf descriptor in order to get the timestamp, but surely one must have the "session" already running to be interested in such data in the first place? So I think the ioctl() idea is not out of place here... How about the simple change below? Regards Pawel 8<--- >>From 2ad51a27fbf64bf98cee190efc3fbd7002819692 Mon Sep 17 00:00:00 2001 From: Pawel Moll Date: Fri, 1 Feb 2013 14:03:56 +0000 Subject: [PATCH] perf: Add ioctl to return current time value To co-relate user space events with the perf events stream a current (as in: "what time(stamp) is it now?") time value must be made available. This patch adds a perf ioctl that makes this possible. Signed-off-by: Pawel Moll --- include/uapi/linux/perf_event.h | 1 + kernel/events/core.c | 8 ++++++++ 2 files changed, 9 insertions(+) diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h index 4f63c05..b745fb0 100644 --- a/include/uapi/linux/perf_event.h +++ b/include/uapi/linux/perf_event.h @@ -316,6 +316,7 @@ struct perf_event_attr { #define PERF_EVENT_IOC_PERIOD _IOW('$', 4, __u64) #define PERF_EVENT_IOC_SET_OUTPUT _IO ('$', 5) #define PERF_EVENT_IOC_SET_FILTER _IOW('$', 6, char *) +#define PERF_EVENT_IOC_GET_TIME _IOR('$', 7, __u64) enum perf_event_ioc_flags { PERF_IOC_FLAG_GROUP = 1U << 0, diff --git a/kernel/events/core.c b/kernel/events/core.c index 301079d..4202b1c 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -3298,6 +3298,14 @@ static long perf_ioctl(struct file *file, unsigned int cmd, unsigned long arg) case PERF_EVENT_IOC_SET_FILTER: return perf_event_set_filter(event, (void __user *)arg); + case PERF_EVENT_IOC_GET_TIME: + { + u64 time = perf_clock(); + if (copy_to_user((void __user *)arg, &time, sizeof(time))) + return -EFAULT; + return 0; + } + default: return -ENOTTY; } -- 1.7.10.4