linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH v6 0/4] perf: add support for profiling jitted code
@ 2015-03-31 23:54 Stephane Eranian
  0 siblings, 0 replies; 4+ messages in thread
From: Stephane Eranian @ 2015-03-31 23:54 UTC (permalink / raw)
  To: Brendan Gregg
  Cc: LKML, Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar,
	Andi Kleen, Jiri Olsa, Namhyung Kim, Rose Belcher,
	Sukadev Bhattiprolu, Sonny Rao, John Mccutchan, David Ahern,
	Adrian Hunter, Pawel Moll

Hi Gregg,

On Tue, Mar 31, 2015 at 2:31 PM, Brendan Gregg
<brendan.d.gregg@gmail.com> wrote:
>
> On Tue, Mar 31, 2015 at 12:33 AM, Brendan Gregg
> <brendan.d.gregg@gmail.com> wrote:
> > G'Day Stephane,
> >
> > On Mon, Mar 30, 2015 at 3:19 PM, Stephane Eranian <eranian@google.com> wrote:
> > [...]
> >> The current support only works when the runtime is monitored from
> >> start to finish: perf record java --agentpath:libpfmjvmti.so my_class.
> >>
> >> Once the run is completed, the jitdump file needs to be injected into
> >> the perf.data file. This is accomplished by using the perf inject command.
> >> This will also generate an ELF image for each jitted function. The
> >> inject MMAP records will point to those ELF images. The reasoning
> >> behind using ELF images is that it makes processing for perf report
> >> and annotate automatic and transparent. It also makes it easier to
> >> package and analyze on a remote machine.
> > [...]
> >
> > This is really impressive work. Do we have an idea of the overhead for
> > running the java agent?
> >


Thanks Gregg. Happy to see you find these patches useful. I think
with PeterZ's latest clock changes, things are easier to run now.

> > Today, I'm using perf-map-agent, loaded dynamically, to dump a
> > /tmp/perf*.map file as needed. My company has tens of thousands of
> > Linux instances running Java, but very few need profiling, and we
> > don't know which beforehand. So a snapshot-on-demand approach is
> > ideal. An always-on approach, well, we'd have to know the overhead (I
> > can build the agent and test...).
>
> I built the agent and tested with an application framework
> micro-benchmark, and saw the performance overhead drop after start
> from about 13% initially (measured as a reduction in maximum req/sec
> given fixed CPU capacity), to 1.1% after a minute, and then 0.13%
> (which is really just noise) after several minutes of high load.
>

If you're JIT runtime does not keep recompiling, then yes, I expect the
overhead to be concentrated on startup and each time a new function
is executed. Then after no callback is really needed. And this is what you
observed.

>
> So the overhead is basically zero after (minutes of) warmup, at least
> for my test. My jit.dump file reached 8 Mbytes, and was growing by a
> tiny amount every 30 seconds or so (hence the near-zero overhead). I'm
> much less concerned about overheads now.
>
> I'll test with a production workload if I can... But I'm still curious
> about why we're even doing this, instead of the previous method of
> taking symbol snapshots. Is there a backstory? If it involves a case
> of high symbol churn, then this should also mean non-zero overhead to
> constantly log.
>
Yes, so either you have the JIT runtime activate that agent from startup
or we need to have a mechanism to kick the agent when perf is running.

As for the fsync() question, yes, there is a race between JIT runtime
startup and dumping into the jitdump and perf inject. One thing I will
add in the locking on the inject side to make sure inject reads a sane
file (without truncated records). The layout of the jitdump is such that
it does not hold the number of records in the file. Inject just reads
until EOF, so that should be okay with locks. If you run
perf inject, then you are done with the collection. Pipe mode is still
not operational, will look at it next. Hopefully we can also make it work
with the jitdump file.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v6 0/4] perf: add support for profiling jitted code
  2015-03-31  7:33 ` Brendan Gregg
@ 2015-03-31 21:31   ` Brendan Gregg
  0 siblings, 0 replies; 4+ messages in thread
From: Brendan Gregg @ 2015-03-31 21:31 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, acme, Peter Zijlstra, Ingo Molnar, Andi Kleen, Jiri Olsa,
	Namhyung Kim, Rose Belcher, sukadev, sonnyrao, johnmccutchan,
	David Ahern, adrian.hunter, Pawel Moll

On Tue, Mar 31, 2015 at 12:33 AM, Brendan Gregg
<brendan.d.gregg@gmail.com> wrote:
> G'Day Stephane,
>
> On Mon, Mar 30, 2015 at 3:19 PM, Stephane Eranian <eranian@google.com> wrote:
> [...]
>> The current support only works when the runtime is monitored from
>> start to finish: perf record java --agentpath:libpfmjvmti.so my_class.
>>
>> Once the run is completed, the jitdump file needs to be injected into
>> the perf.data file. This is accomplished by using the perf inject command.
>> This will also generate an ELF image for each jitted function. The
>> inject MMAP records will point to those ELF images. The reasoning
>> behind using ELF images is that it makes processing for perf report
>> and annotate automatic and transparent. It also makes it easier to
>> package and analyze on a remote machine.
> [...]
>
> This is really impressive work. Do we have an idea of the overhead for
> running the java agent?
>
> Today, I'm using perf-map-agent, loaded dynamically, to dump a
> /tmp/perf*.map file as needed. My company has tens of thousands of
> Linux instances running Java, but very few need profiling, and we
> don't know which beforehand. So a snapshot-on-demand approach is
> ideal. An always-on approach, well, we'd have to know the overhead (I
> can build the agent and test...).

I built the agent and tested with an application framework
micro-benchmark, and saw the performance overhead drop after start
from about 13% initially (measured as a reduction in maximum req/sec
given fixed CPU capacity), to 1.1% after a minute, and then 0.13%
(which is really just noise) after several minutes of high load.

So the overhead is basically zero after (minutes of) warmup, at least
for my test. My jit.dump file reached 8 Mbytes, and was growing by a
tiny amount every 30 seconds or so (hence the near-zero overhead). I'm
much less concerned about overheads now.

I'll test with a production workload if I can... But I'm still curious
about why we're even doing this, instead of the previous method of
taking symbol snapshots. Is there a backstory? If it involves a case
of high symbol churn, then this should also mean non-zero overhead to
constantly log.

Brendan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH v6 0/4] perf: add support for profiling jitted code
  2015-03-30 22:19 Stephane Eranian
@ 2015-03-31  7:33 ` Brendan Gregg
  2015-03-31 21:31   ` Brendan Gregg
  0 siblings, 1 reply; 4+ messages in thread
From: Brendan Gregg @ 2015-03-31  7:33 UTC (permalink / raw)
  To: Stephane Eranian
  Cc: LKML, acme, Peter Zijlstra, Ingo Molnar, Andi Kleen, Jiri Olsa,
	Namhyung Kim, Rose Belcher, sukadev, sonnyrao, johnmccutchan,
	David Ahern, adrian.hunter, Pawel Moll

G'Day Stephane,

On Mon, Mar 30, 2015 at 3:19 PM, Stephane Eranian <eranian@google.com> wrote:
[...]
> The current support only works when the runtime is monitored from
> start to finish: perf record java --agentpath:libpfmjvmti.so my_class.
>
> Once the run is completed, the jitdump file needs to be injected into
> the perf.data file. This is accomplished by using the perf inject command.
> This will also generate an ELF image for each jitted function. The
> inject MMAP records will point to those ELF images. The reasoning
> behind using ELF images is that it makes processing for perf report
> and annotate automatic and transparent. It also makes it easier to
> package and analyze on a remote machine.
[...]

This is really impressive work. Do we have an idea of the overhead for
running the java agent?

Today, I'm using perf-map-agent, loaded dynamically, to dump a
/tmp/perf*.map file as needed. My company has tens of thousands of
Linux instances running Java, but very few need profiling, and we
don't know which beforehand. So a snapshot-on-demand approach is
ideal. An always-on approach, well, we'd have to know the overhead (I
can build the agent and test...).

During a profile, our symbols can change, but it's not been a serious
problem. I can take a before and after snapshot of symbols
(/tmp/perf.map files), then highlight which have moved and are
therefore suspicious (not perfect, but ok). Yes, having a timestamped
symbol log would be most accurate, but I wouldn't pay a lot of
overhead for it. Is there a (horror) story behind why this was so
necessary?

Minor comment: I didn't notice an fsync() in the jvmti code. Can
symbol updates be buffered and not written out by the time perf is
reading them?

Again, this is impressive work. Just wanted to share comments from a
heavy user of perf with Java. You may already have other, better,
ideas for doing on-demand symbol dumps anyway.

Brendan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* [PATCH v6 0/4] perf: add support for profiling jitted code
@ 2015-03-30 22:19 Stephane Eranian
  2015-03-31  7:33 ` Brendan Gregg
  0 siblings, 1 reply; 4+ messages in thread
From: Stephane Eranian @ 2015-03-30 22:19 UTC (permalink / raw)
  To: linux-kernel
  Cc: acme, peterz, mingo, ak, jolsa, namhyung, cel, sukadev, sonnyrao,
	johnmccutchan, dsahern, adrian.hunter, pawel.moll

This patch series extends perf record/report/annotate to enable
profiling of jitted (just-in-time compiled) code. The current
perf tool provides very limited support for profiling jitted
code for some runtime environments. But the support is experimental
and cannot be used in complex environments. It relies on files
in /tmp, for instance. It does not support annotate mode or
rejitted code.

This patch series adds a better way of profiling jitted code
with the following advantages:
   - support any jitted code environment (some with modifications)
   - support Java runtime with JVMTI interface with no modifications
   - provides a portable JVMTI agent library
   - known to support V8 runtime
   - known to support DART runtime
   - supports code rejitting and movements
   - no files in /tmp
   - meta-data file is unique to each run
   - no changes to perf report/annotate
   - support per-thread and system-wide profiling
   - support monitoring of multiple simultaneous Jit runtimes

The support is based on cooperation with the runtime. For Java runtimes,
supporting the JVMTI interface, there is no change necessary. For other
runtimes, modifications are necessary to emit the meta-data necessary
to support symbolization and annotation of the samples. Those modifications
are fairly straighforward and already tested on V8 and DART.

The jit environment emits a binary dump file which contains the jitted
code (in raw format) and meta-data describing the mapping of functions.
The binary format is documented in the jitdump.h header file. It is
adapted from the OProfile jitdump format.

To enable synchronization of the runtime MMAPs with those recorded by
the kernel on behalf of the perf tool, the runtime needs to timestamp
any record in the dump file using the same time source. This third
version is using Pawel Moll's CLOCK_MONOTONIC patch posted
	https://lkml.org/lkml/2015/1/21/711

The patch is packaged in this series for convenience. Without this patch
installed, records emitted by the runtime cannot be properly synchronized
(inserted in the flow of MMAPS) and symbolization is not possible.

The current support only works when the runtime is monitored from
start to finish: perf record java --agentpath:libpfmjvmti.so my_class.

Once the run is completed, the jitdump file needs to be injected into
the perf.data file. This is accomplished by using the perf inject command.
This will also generate an ELF image for each jitted function. The
inject MMAP records will point to those ELF images. The reasoning
behind using ELF images is that it makes processing for perf report
and annotate automatic and transparent. It also makes it easier to
package and analyze on a remote machine. 

The reporting is unchanged, simply invoke perf report or perf annotate
on the modified perf.data file. The jitted code will appear symbolized
and the assembly view will display the instruction level profile!

As an added bonus, the series includes support for demangling function
signature from OpenJDK.

The current patch series does not include support for source line
information. Therefore, the source code cannot yet be displayed
in perf annotate. This will come in a later update.

Furthermore, we believe there is a way to skip the perf inject phase
and have perf report/annotate directly inject the MMAP records 
on the fly during processing of the perf.data file. Perf report would
also generate the ELF files if necessary. Such optimization, would
make using this extension seamless in system-wide mode and larger
environments. This will be added in a later update as well.

In V2, we have switched to Pawell Moll and David Ahern posix
clock kernel module instead. We have dropped the patch which
modified the arguments to map_init() because the change was
not used. We are not printing the return type of Java methods
anymore and have made the Java demangler a separate module.
We also rebased to 3.19.0+ from tip.git.

In V3, we switched to Pawel Moll's CLOCK_MONOTONIC perf
clock patches. This patch switch perf_events from sched_clock
to CLOCK_MONOTONIC, a clock source which is available to users.

In V4, we rebased to 4.0-rc5. We also simplified the process by
getting rid of the requirement to pass the jitdump file name to
perf inject. Now, perf injects automtically detects if jitdumps
were generated and it merges the relevant meta-data. This is
accomplished by having the jit runtime mmap the jitdump file
for the purpose of creating a MMAP record in the perf.data file.
That MMAP contains all the info to locate the jitdump file and
generate the ELF images for jitted functions.

In V5, we rebase to acme's perf/core branch (instead of tip.git).
We fixed some bswap issues, switched to using scnprintf() and fixed
formatting issues. Also made sure all the files were included in the
patches. We also fix one error message in the JVMTI agent.


In V6, we switched back to using tip.git to leverage PeterZ's clockid
patch for perf_events in 4.0.0-rc6. Clock source can now be specified
per event and they are connected with the MONOTONIC Posix clock. We
leverage this extension to timestamp samples in the jit runtime and
correlate them with perf samples. Notice the -k mono option in perf
record example below.

To use the new feature:
  - need to run with 4.0.0-rc6
  - compile perf
  - cd tools/perf/jvmti; make; install wherever is appropriate

  Example using openJDK:
   $ perf record -k mono java -agentpath:libjvmti.so my_class
   $ perf inject -i perf.data -jit -o perf.data.jitted
   $ perf report -i perf.data.jitted

Thanks to all the contributors and testers. Special thanks
to PeterZ for adding the clock source to perf_events and solving
the problem of common timesource for user and kernel level samples.

Enjoy,


Peter Zijlstra (1):
  perf,record: Add clockid parameter

Stephane Eranian (3):
  perf tools: add Java demangling support
  perf inject: add jitdump mmap injection support
  perf tools: add JVMTI agent library

 tools/build/Makefile.feature             |   2 +
 tools/build/feature/Makefile             |   4 +
 tools/build/feature/test-libcrypto.c     |   9 +
 tools/perf/Documentation/perf-inject.txt |   6 +
 tools/perf/Documentation/perf-record.txt |   7 +
 tools/perf/builtin-inject.c              |  60 +++-
 tools/perf/builtin-record.c              |  78 ++++
 tools/perf/config/Makefile               |  11 +
 tools/perf/jvmti/Makefile                |  70 ++++
 tools/perf/jvmti/jvmti_agent.c           | 367 +++++++++++++++++++
 tools/perf/jvmti/jvmti_agent.h           |  23 ++
 tools/perf/jvmti/libjvmti.c              | 149 ++++++++
 tools/perf/perf.h                        |   1 +
 tools/perf/util/Build                    |   3 +
 tools/perf/util/demangle-java.c          | 199 +++++++++++
 tools/perf/util/demangle-java.h          |  10 +
 tools/perf/util/evsel.c                  |  31 +-
 tools/perf/util/genelf.c                 | 479 +++++++++++++++++++++++++
 tools/perf/util/genelf.h                 |   6 +
 tools/perf/util/header.c                 |   2 +
 tools/perf/util/jit.h                    |  15 +
 tools/perf/util/jitdump.c                | 588 +++++++++++++++++++++++++++++++
 tools/perf/util/jitdump.h                |  92 +++++
 tools/perf/util/symbol-elf.c             |   3 +
 24 files changed, 2201 insertions(+), 14 deletions(-)
 create mode 100644 tools/build/feature/test-libcrypto.c
 create mode 100644 tools/perf/jvmti/Makefile
 create mode 100644 tools/perf/jvmti/jvmti_agent.c
 create mode 100644 tools/perf/jvmti/jvmti_agent.h
 create mode 100644 tools/perf/jvmti/libjvmti.c
 create mode 100644 tools/perf/util/demangle-java.c
 create mode 100644 tools/perf/util/demangle-java.h
 create mode 100644 tools/perf/util/genelf.c
 create mode 100644 tools/perf/util/genelf.h
 create mode 100644 tools/perf/util/jit.h
 create mode 100644 tools/perf/util/jitdump.c
 create mode 100644 tools/perf/util/jitdump.h

-- 
1.9.1


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-03-31 23:55 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-31 23:54 [PATCH v6 0/4] perf: add support for profiling jitted code Stephane Eranian
  -- strict thread matches above, loose matches on Subject: below --
2015-03-30 22:19 Stephane Eranian
2015-03-31  7:33 ` Brendan Gregg
2015-03-31 21:31   ` Brendan Gregg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).