From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752519AbbCaXzB (ORCPT ); Tue, 31 Mar 2015 19:55:01 -0400 Received: from mail-ob0-f182.google.com ([209.85.214.182]:34057 "EHLO mail-ob0-f182.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751551AbbCaXzA (ORCPT ); Tue, 31 Mar 2015 19:55:00 -0400 MIME-Version: 1.0 Date: Tue, 31 Mar 2015 16:54:59 -0700 Message-ID: Subject: Re: [PATCH v6 0/4] perf: add support for profiling jitted code From: Stephane Eranian To: Brendan Gregg Cc: LKML , Arnaldo Carvalho de Melo , Peter Zijlstra , Ingo Molnar , Andi Kleen , Jiri Olsa , Namhyung Kim , Rose Belcher , Sukadev Bhattiprolu , Sonny Rao , John Mccutchan , David Ahern , Adrian Hunter , Pawel Moll Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi Gregg, On Tue, Mar 31, 2015 at 2:31 PM, Brendan Gregg wrote: > > On Tue, Mar 31, 2015 at 12:33 AM, Brendan Gregg > wrote: > > G'Day Stephane, > > > > On Mon, Mar 30, 2015 at 3:19 PM, Stephane Eranian wrote: > > [...] > >> The current support only works when the runtime is monitored from > >> start to finish: perf record java --agentpath:libpfmjvmti.so my_class. > >> > >> Once the run is completed, the jitdump file needs to be injected into > >> the perf.data file. This is accomplished by using the perf inject command. > >> This will also generate an ELF image for each jitted function. The > >> inject MMAP records will point to those ELF images. The reasoning > >> behind using ELF images is that it makes processing for perf report > >> and annotate automatic and transparent. It also makes it easier to > >> package and analyze on a remote machine. > > [...] > > > > This is really impressive work. Do we have an idea of the overhead for > > running the java agent? > > Thanks Gregg. Happy to see you find these patches useful. I think with PeterZ's latest clock changes, things are easier to run now. > > Today, I'm using perf-map-agent, loaded dynamically, to dump a > > /tmp/perf*.map file as needed. My company has tens of thousands of > > Linux instances running Java, but very few need profiling, and we > > don't know which beforehand. So a snapshot-on-demand approach is > > ideal. An always-on approach, well, we'd have to know the overhead (I > > can build the agent and test...). > > I built the agent and tested with an application framework > micro-benchmark, and saw the performance overhead drop after start > from about 13% initially (measured as a reduction in maximum req/sec > given fixed CPU capacity), to 1.1% after a minute, and then 0.13% > (which is really just noise) after several minutes of high load. > If you're JIT runtime does not keep recompiling, then yes, I expect the overhead to be concentrated on startup and each time a new function is executed. Then after no callback is really needed. And this is what you observed. > > So the overhead is basically zero after (minutes of) warmup, at least > for my test. My jit.dump file reached 8 Mbytes, and was growing by a > tiny amount every 30 seconds or so (hence the near-zero overhead). I'm > much less concerned about overheads now. > > I'll test with a production workload if I can... But I'm still curious > about why we're even doing this, instead of the previous method of > taking symbol snapshots. Is there a backstory? If it involves a case > of high symbol churn, then this should also mean non-zero overhead to > constantly log. > Yes, so either you have the JIT runtime activate that agent from startup or we need to have a mechanism to kick the agent when perf is running. As for the fsync() question, yes, there is a race between JIT runtime startup and dumping into the jitdump and perf inject. One thing I will add in the locking on the inject side to make sure inject reads a sane file (without truncated records). The layout of the jitdump is such that it does not hold the number of records in the file. Inject just reads until EOF, so that should be okay with locks. If you run perf inject, then you are done with the collection. Pipe mode is still not operational, will look at it next. Hopefully we can also make it work with the jitdump file.