From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755329Ab2JPRYd (ORCPT ); Tue, 16 Oct 2012 13:24:33 -0400 Received: from casper.infradead.org ([85.118.1.10]:58041 "EHLO casper.infradead.org" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1755135Ab2JPRYc (ORCPT ); Tue, 16 Oct 2012 13:24:32 -0400 Subject: Re: [RFC] perf: need to expose sched_clock to correlate user samples with kernel samples From: Peter Zijlstra To: Stephane Eranian Cc: LKML , "mingo@elte.hu" , Paul Mackerras , Anton Blanchard , Will Deacon , "ak@linux.intel.com" , Pekka Enberg , Steven Rostedt , Robert Richter , tglx , John Stultz In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Date: Tue, 16 Oct 2012 19:23:52 +0200 Message-ID: <1350408232.2336.42.camel@laptop> Mime-Version: 1.0 X-Mailer: Evolution 2.32.2 Content-Transfer-Encoding: 7bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, 2012-10-16 at 12:13 +0200, Stephane Eranian wrote: > Hi, > > There are many situations where we want to correlate events happening at > the user level with samples recorded in the perf_event kernel sampling buffer. > For instance, we might want to correlate the call to a function or creation of > a file with samples. Similarly, when we want to monitor a JVM with jitted code, > we need to be able to correlate jitted code mappings with perf event samples > for symbolization. > > Perf_events allows timestamping of samples with PERF_SAMPLE_TIME. > That causes each PERF_RECORD_SAMPLE to include a timestamp > generated by calling the local_clock() -> sched_clock_cpu() function. > > To make correlating user vs. kernel samples easy, we would need to > access that sched_clock() functionality. However, none of the existing > clock calls permit this at this point. They all return timestamps which are > not using the same source and/or offset as sched_clock. > > I believe a similar issue exists with the ftrace subsystem. > > The problem needs to be adressed in a portable manner. Solutions > based on reading TSC for the user level to reconstruct sched_clock() > don't seem appropriate to me. > > One possibility to address this limitation would be to extend clock_gettime() > with a new clock time, e.g., CLOCK_PERF. > > However, I understand that sched_clock_cpu() provides ordering guarantees only > when invoked on the same CPU repeatedly, i.e., it's not globally synchronized. > But we already have to deal with this problem when merging samples obtained > from different CPU sampling buffer in per-thread mode. So this is not > necessarily > a showstopper. > > Alternatives could be to use uprobes but that's less practical to setup. > > Anyone with better ideas? You forgot to CC the time people ;-) I've no problem with adding CLOCK_PERF (or another/better name). Thomas, John?