linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area  and Instruction Tracing
@ 2015-07-17 16:33 Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 01/25] perf auxtrace: Add Intel PT as an AUX area tracing type Adrian Hunter
                   ` (24 more replies)
  0 siblings, 25 replies; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Hi

Here is V8 patches for the introduction of an abstraction for
using the AUX area and Instruction tracing. The patches for
AUX area support have been applied, just leaving patches for
Intel PT and Intel BTS.  These have now been updated to reflect
new Intel PT features, plus a couple of fixes.

The patches can also be found here:

	http://git.infradead.org/users/ahunter/linux-perf.git

An example (unchanged from V3) perf.data file and build id archive
can be found here:

	http://git.infradead.org/~ahunter/tfr/

There is also a tar of the 3 most relevant files with debugging
symbols. These need to be placed in under the correct paths in
/usr/lib/debug to get symbols.

Changes in V8:

   New patches:
      perf auxtrace: Fix period type 'i' not working
      perf tools: Fix perf-with-kcore handling of arguments containing spaces
      perf tools: Fix Intel PT 'instructions' sample period
      perf tools: Add perf_pmu__format_bits()
      perf tools: Validate config term maximum value
      perf tools: Extend the event parser maximum error index
      perf tools: Add Intel PT support for PSB periods
      perf tools: Add new Intel PT packet definitions
      perf tools: Pass Intel PT information for decoding MTC and CYC
      perf tools: Add Intel PT support for decoding MTC packets
      perf tools: Add Intel PT support for using MTC packets
      perf tools: Add Intel PT support for decoding CYC packets
      perf tools: Add Intel PT support for using CYC packets
      perf tools: Add Intel PT support for decoding TRACESTOP packets
      perf tools: Update Intel PT documentation

Changes in V7:

   Patches already applied:
      perf db-export: Fix thread ref-counting
      perf tools: Ensure thread-stack is flushed
      perf tools: Allow auxtrace data alignment

   perf tools: Add Intel PT instruction decoder
      Copy the x86 instruction decoder into perf tools source

   perf tools: Add Intel PT decoder
      Fix Intel PT getting stuck in a loop

        Check for being stuck in a loop.  That can happen if a
        decoder error results in the decoder erroneously setting
        the ip to an address that is itself in an infinite loop
        that consumes no packets.  The only way to be in a loop
        that consumes no packets is if it consists of unconditional
        branches.  So the check for being stuck is if we see
        a repeating cycle of consecutive unconditional branches.

   perf tools: Add Intel PT support
      Add missing err check in intel-pt.
      Fix missing thread__puts
      Improve Intel PT sync to sideband events

        To help synchronize trace data with sideband events
        the timestamp when returning to userspace is estimated.

        That was not always being done if switch information
        was not available, but it is still useful for sync'ing
        to mmap changes, so simplify by doing it always when
        TSC is available.  Also add log prints to help debug
        synchronization to sideband.

      Improve Intel PT timestamp estimation

        Intel PT uses timestamps to synchronize side-band information
        to trace data.  However timestamps may not be frequent enough.
        To improve accuracy, an estimated timestamp is calculated based
        on the number of instructions executed since the last known
        timestamp.

        This patch improves that estimate by taking into account the CPU
        frequency as represented by the Intel PT CBR (core-to-bus ratio)
        packet.


   perf tools: Add Intel BTS support
      Fix missing thread__puts
      Add a fix for an infinite loop in intel_bts_process_buffer
      misplaced in a followup patch in the original patchkit

   perf tools: Output sample flags and insn_len from intel_pt
      Folded into: perf tools: Add Intel PT support

   perf tools: Output sample flags and insn_len from intel_bts
      Folded into: perf tools: Add Intel BTS support

   perf tools: Intel PT to always update thread stack trace number
      Folded into: perf tools: Add Intel PT support

   perf tools: Intel BTS to always update thread stack trace number
      Folded into: perf tools: Add Intel BTS support

Changes in V6:

   Some minor expansion of commit messages.

   Patches already applied:
      perf tools: Disallow PMU events intel_pt and intel_bts until there is support

   perf db-export: Fix thread ref-counting
      New patch

   perf tools: Ensure thread-stack is flushed
      New patch

   perf tools: Add Intel PT support
      Support thread ref-counting

   perf tools: Add Intel PT decoder
      Fix a bug: FUP packet in PSB to update last IP

   perf tools: Take Intel PT into use
      Add Overview and Quickstart sections to intel_pt.txt

   perf tools: Add Intel BTS support
      Add Overview to intel_bts.txt
      Support thread ref-counting

   perf tools: Add example call-graph script
      Add documentation comments to scripts

Changes in V5:

   Patches already applied:
      perf report: Fix placement of itrace option in documentation
      perf tools: Add AUX area tracing index
      perf tools: Hit all build ids when AUX area tracing
      perf tools: Add build option NO_AUXTRACE to exclude AUX area tracing
      perf auxtrace: Add option to synthesize events for transactions
      perf tools: Add support for PERF_RECORD_AUX
      perf tools: Add support for PERF_RECORD_ITRACE_START
      perf tools: Add AUX area tracing Snapshot Mode
      perf record: Add AUX area tracing Snapshot Mode support

   perf tools: Disallow PMU events intel_pt and intel_bts until there is support
      New patch

   perf tools: Add Intel PT decoder
      Style improvements pointed out by Acme: aligning '=', single line initializing
      Make use of zalloc() not malloc / memset
      Make use of zfree
      Map internal error codes to fixed constants for output
      Change intel_pt_error_message() to intel_pt__strerror()

   perf tools: Add Intel PT support
      Make use of zfree

   perf tools: Take Intel PT into use
      Allow "intel_pt" PMU to be selected as an event

   perf tools: Add Intel BTS support
      Allow "intel_bts" PMU to be selected as an event
      Make use of zfree
      Map internal error codes to fixed constants for output
      Let "intel_bts" show up in 'perf list'

   perf tools: Output sample flags and insn_len from intel_bts
      Map internal error codes to fixed constants for output

Changes on V4:

   perf tools: Amend mmap ref counting for the AUX area mmap
      Dropped because already applied

   perf script: Always allow fields 'addr' and 'cpu' for auxtrace
      Dropped because already applied

   perf report: Add Instruction Tracing support
      Dropped because already applied

   perf report: Fix placement of itrace option in documentation
      New patch

   perf tools: Add AUX area tracing index
      Change size checks for more flexibility i.e.
      - don't mind if an indexed auxtrace_event is bigger than
      struct auxtrace_event
      - don't mind if the auxtrace index does not fill the whole
      file section
      Rename 'index' variable to 'ent' to avoid build errors on
      older gcc

   perf tools: Add build option NO_AUXTRACE to exclude AUX area tracing
      Fix whitespace alignment of NO_AUXTRACE=1
      Add NO_AUXTRACE=1 to make_minimal

   perf tools: Add support for PERF_RECORD_AUX
      Expand commit message

   perf tools: Add AUX area tracing Snapshot Mode
      Whitespace fixups

   perf record: Add AUX area tracing Snapshot Mode support
      Whitespace fixups
      Don't init static variables to 0 or NULL

   perf tools: Add Intel PT packet decoder
      Whitespace fixups

   perf tools: Add Intel PT instruction decoder
      Avoid build error on older (broken) gcc by adding -Wno-override-init
      Avoid build errors due to funny collate sequences i.e. use LC_COLLATE=C etc

   perf tools: Add Intel PT decoder
      Avoid build errors initializing structures to 0

   perf tools: Add Intel PT support
      Avoid build errors initializing structures to 0
      Allow for perf_pmu__config_terms() having an extra parameter now
      Allow for parse_events() having an extra parameter now
      Rename 'div' variable to 'd' to avoid build errors
      Whitespace fixup
      Remove a couple of unused enums

   perf tools: Add Intel BTS support
      Avoid build errors initializing structures to 0
      Allow for parse_events() having an extra parameter now

   perf tools: Put itrace options into an asciidoc include
      New patch

Changes in V3:

   New patch:
      perf tools: Amend mmap ref counting for the AUX area mmap

   Move some code under arch:
      perf tools: Add Intel PT support
      perf tools: Add Intel BTS support

   Updated documentation:
      perf report: Add Instruction Tracing support
      perf auxtrace: Add option to synthesize events for transactions
      perf tools: Take Intel PT into use
      perf tools: Add Intel BTS support

   Patches already applied:
      perf header: Add AUX area tracing feature
      perf evlist: Add support for mmapping an AUX area buffer
      perf tools: Add user events for AUX area tracing
      perf tools: Add support for AUX area recording
      perf record: Add basic AUX area tracing support
      perf record: Extend -m option for AUX area tracing mmap pages
      perf tools: Add a user event for AUX area tracing errors
      perf session: Add hooks to allow transparent decoding of AUX area tracing data
      perf session: Add instruction tracing options
      perf auxtrace: Add helpers for AUX area tracing errors
      perf auxtrace: Add helpers for queuing AUX area tracing data
      perf auxtrace: Add a heap for sorting AUX area tracing queues
      perf auxtrace: Add processing for AUX area tracing events
      perf auxtrace: Add a hashtable for caching
      perf tools: Add member to struct dso for an instruction cache
      perf script: Add Instruction Tracing support
      perf inject: Re-pipe AUX area tracing events
      perf inject: Add Instruction Tracing support
      perf script: Add field option 'flags' to print sample flags
      perf tools: Add aux_watermark member of struct perf_event_attr

Changes in V2:

   Get rid of MIN()
      perf auxtrace: Add helpers for AUX area tracing errors
      perf inject: Re-pipe AUX area tracing events
      perf tools: Add build option NO_AUXTRACE to exclude AUX area tracing


Intel BTS can be used on most recent Intel CPUs. Intel PT
is available on Broadwell.

Examples:

	Trace 'ls' with Intel BTS userspace only

	perf record --per-thread -e intel_bts//u ls
	perf report
	perf script

	Trace 'ls' with Intel BTS kernel and userspace

	~/libexec/perf-core/perf-with-kcore record bts-ls --per-thread -e intel_bts// -- ls
	~/libexec/perf-core/perf-with-kcore report bts-ls
	~/libexec/perf-core/perf-with-kcore script bts-ls

	Trace 'ls' with Intel PT userspace only

	perf record -e intel_pt//u ls
	perf report
	perf script

	Trace 'ls' with Intel PT kernel and userspace

	~/libexec/perf-core/perf-with-kcore record pt-ls -e intel_pt// -- ls
	~/libexec/perf-core/perf-with-kcore report pt-ls
	~/libexec/perf-core/perf-with-kcore script pt-ls


The abstraction has two separate aspects:
	1. recording AUX area data
	2. processing AUX area data

Recording consists of mmapping a separate buffer and copying
the data into the perf.data file.  The buffer is an AUX area
buffer.  The data is written preceded by a new user event
PERF_RECORD_AUXTRACE.  The data is too big to fit in the event
but follows immediately afterward. Session processing has to
skip to get to the next event header in a similar fashion to
the existing PERF_RECORD_HEADER_TRACING_DATA
event.  The main recording patches are:

      perf evlist: Add support for mmapping an AUX area buffer
      perf tools: Add user events for AUX area tracing
      perf tools: Add support for AUX area recording
      perf record: Add basic AUX area tracing support

Processing consists of providing hooks in session processing
to enable a decoder to see all the events and deliver synthesized
events transparently into the event stream.  The main processing
patch is:

      perf session: Add hooks to allow transparent decoding of AUX area tracing data


Adrian Hunter (25):
      perf auxtrace: Add Intel PT as an AUX area tracing type
      perf tools: Add Intel PT packet decoder
      perf tools: Add Intel PT instruction decoder
      perf tools: Add Intel PT log
      perf tools: Add Intel PT decoder
      perf tools: Add Intel PT support
      perf tools: Take Intel PT into use
      perf tools: Add Intel BTS support
      perf tools: Put itrace options into an asciidoc include
      perf tools: Add example call-graph script
      perf auxtrace: Fix period type 'i' not working
      perf tools: Fix perf-with-kcore handling of arguments containing spaces
      perf tools: Fix Intel PT 'instructions' sample period
      perf tools: Add perf_pmu__format_bits()
      perf tools: Validate config term maximum value
      perf tools: Extend the event parser maximum error index
      perf tools: Add Intel PT support for PSB periods
      perf tools: Add new Intel PT packet definitions
      perf tools: Pass Intel PT information for decoding MTC and CYC
      perf tools: Add Intel PT support for decoding MTC packets
      perf tools: Add Intel PT support for using MTC packets
      perf tools: Add Intel PT support for decoding CYC packets
      perf tools: Add Intel PT support for using CYC packets
      perf tools: Add Intel PT support for decoding TRACESTOP packets
      perf tools: Update Intel PT documentation

 tools/build/Makefile.build                         |    2 +
 tools/perf/.gitignore                              |    1 +
 tools/perf/Documentation/intel-bts.txt             |   86 +
 tools/perf/Documentation/intel-pt.txt              |  766 +++++++
 tools/perf/Documentation/itrace.txt                |   22 +
 tools/perf/Documentation/perf-inject.txt           |   23 +-
 tools/perf/Documentation/perf-report.txt           |   23 +-
 tools/perf/Documentation/perf-script.txt           |   23 +-
 tools/perf/Makefile.perf                           |   12 +-
 tools/perf/arch/x86/util/Build                     |    5 +
 tools/perf/arch/x86/util/auxtrace.c                |   83 +
 tools/perf/arch/x86/util/intel-bts.c               |  458 ++++
 tools/perf/arch/x86/util/intel-pt.c                | 1007 +++++++++
 tools/perf/arch/x86/util/pmu.c                     |   18 +
 tools/perf/perf-with-kcore.sh                      |   28 +-
 .../scripts/python/call-graph-from-postgresql.py   |  327 +++
 tools/perf/scripts/python/export-to-postgresql.py  |   47 +
 tools/perf/util/Build                              |    3 +
 tools/perf/util/auxtrace.c                         |   15 +-
 tools/perf/util/auxtrace.h                         |    2 +
 tools/perf/util/intel-bts.c                        |  933 ++++++++
 tools/perf/util/intel-bts.h                        |   43 +
 tools/perf/util/intel-pt-decoder/Build             |   11 +
 .../util/intel-pt-decoder/gen-insn-attr-x86.awk    |  386 ++++
 tools/perf/util/intel-pt-decoder/inat.c            |   96 +
 tools/perf/util/intel-pt-decoder/inat.h            |  221 ++
 tools/perf/util/intel-pt-decoder/inat_types.h      |   29 +
 tools/perf/util/intel-pt-decoder/insn.c            |  594 +++++
 tools/perf/util/intel-pt-decoder/insn.h            |  201 ++
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 2345 ++++++++++++++++++++
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |  109 +
 .../util/intel-pt-decoder/intel-pt-insn-decoder.c  |  246 ++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |   65 +
 tools/perf/util/intel-pt-decoder/intel-pt-log.c    |  155 ++
 tools/perf/util/intel-pt-decoder/intel-pt-log.h    |   52 +
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   |  518 +++++
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.h   |   70 +
 .../perf/util/intel-pt-decoder/x86-opcode-map.txt  |  970 ++++++++
 tools/perf/util/intel-pt.c                         | 1956 ++++++++++++++++
 tools/perf/util/intel-pt.h                         |   56 +
 tools/perf/util/parse-events.c                     |    2 +-
 tools/perf/util/pmu.c                              |   51 +-
 tools/perf/util/pmu.h                              |    1 +
 43 files changed, 11970 insertions(+), 91 deletions(-)
 create mode 100644 tools/perf/Documentation/intel-bts.txt
 create mode 100644 tools/perf/Documentation/intel-pt.txt
 create mode 100644 tools/perf/Documentation/itrace.txt
 create mode 100644 tools/perf/arch/x86/util/auxtrace.c
 create mode 100644 tools/perf/arch/x86/util/intel-bts.c
 create mode 100644 tools/perf/arch/x86/util/intel-pt.c
 create mode 100644 tools/perf/arch/x86/util/pmu.c
 create mode 100644 tools/perf/scripts/python/call-graph-from-postgresql.py
 create mode 100644 tools/perf/util/intel-bts.c
 create mode 100644 tools/perf/util/intel-bts.h
 create mode 100644 tools/perf/util/intel-pt-decoder/Build
 create mode 100644 tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
 create mode 100644 tools/perf/util/intel-pt-decoder/inat.c
 create mode 100644 tools/perf/util/intel-pt-decoder/inat.h
 create mode 100644 tools/perf/util/intel-pt-decoder/inat_types.h
 create mode 100644 tools/perf/util/intel-pt-decoder/insn.c
 create mode 100644 tools/perf/util/intel-pt-decoder/insn.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
 create mode 100644 tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
 create mode 100644 tools/perf/util/intel-pt.c
 create mode 100644 tools/perf/util/intel-pt.h


Regards
Adrian


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH V8 01/25] perf auxtrace: Add Intel PT as an AUX area tracing type
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-20  9:57   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 02/25] perf tools: Add Intel PT packet decoder Adrian Hunter
                   ` (23 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add the Intel Processor Trace type constant PERF_AUXTRACE_INTEL_PT.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/auxtrace.c | 1 +
 tools/perf/util/auxtrace.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 7e7405c9b936..ea29022a4f52 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -884,6 +884,7 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 		fprintf(stdout, " type: %u\n", type);
 
 	switch (type) {
+	case PERF_AUXTRACE_INTEL_PT:
 	case PERF_AUXTRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 471aecbc4d68..7d12f33a3a06 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -39,6 +39,7 @@ struct events_stats;
 
 enum auxtrace_type {
 	PERF_AUXTRACE_UNKNOWN,
+	PERF_AUXTRACE_INTEL_PT,
 };
 
 enum itrace_period_type {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 02/25] perf tools: Add Intel PT packet decoder
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 01/25] perf auxtrace: Add Intel PT as an AUX area tracing type Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-20  9:57   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 03/25] perf tools: Add Intel PT instruction decoder Adrian Hunter
                   ` (22 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add support for decoding Intel Processor Trace packets.

This essentially provides intel_pt_get_packet() which takes a buffer of
binary data and returns the decoded packet.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/Build                              |   1 +
 tools/perf/util/intel-pt-decoder/Build             |   1 +
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   | 400 +++++++++++++++++++++
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.h   |  64 ++++
 4 files changed, 466 insertions(+)
 create mode 100644 tools/perf/util/intel-pt-decoder/Build
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 601d11440596..1b67598d4c87 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -76,6 +76,7 @@ libperf-$(CONFIG_X86) += tsc.o
 libperf-y += cloexec.o
 libperf-y += thread-stack.o
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 libperf-y += parse-branch-options.o
 
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
new file mode 100644
index 000000000000..9d67381a9bd3
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -0,0 +1 @@
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
new file mode 100644
index 000000000000..988c82c6652d
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
@@ -0,0 +1,400 @@
+/*
+ * intel_pt_pkt_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <endian.h>
+#include <byteswap.h>
+
+#include "intel-pt-pkt-decoder.h"
+
+#define BIT(n)		(1 << (n))
+
+#define BIT63		((uint64_t)1 << 63)
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define le16_to_cpu bswap_16
+#define le32_to_cpu bswap_32
+#define le64_to_cpu bswap_64
+#define memcpy_le64(d, s, n) do { \
+	memcpy((d), (s), (n));    \
+	*(d) = le64_to_cpu(*(d)); \
+} while (0)
+#else
+#define le16_to_cpu
+#define le32_to_cpu
+#define le64_to_cpu
+#define memcpy_le64 memcpy
+#endif
+
+static const char * const packet_name[] = {
+	[INTEL_PT_BAD]		= "Bad Packet!",
+	[INTEL_PT_PAD]		= "PAD",
+	[INTEL_PT_TNT]		= "TNT",
+	[INTEL_PT_TIP_PGD]	= "TIP.PGD",
+	[INTEL_PT_TIP_PGE]	= "TIP.PGE",
+	[INTEL_PT_TSC]		= "TSC",
+	[INTEL_PT_MODE_EXEC]	= "MODE.Exec",
+	[INTEL_PT_MODE_TSX]	= "MODE.TSX",
+	[INTEL_PT_TIP]		= "TIP",
+	[INTEL_PT_FUP]		= "FUP",
+	[INTEL_PT_PSB]		= "PSB",
+	[INTEL_PT_PSBEND]	= "PSBEND",
+	[INTEL_PT_CBR]		= "CBR",
+	[INTEL_PT_PIP]		= "PIP",
+	[INTEL_PT_OVF]		= "OVF",
+};
+
+const char *intel_pt_pkt_name(enum intel_pt_pkt_type type)
+{
+	return packet_name[type];
+}
+
+static int intel_pt_get_long_tnt(const unsigned char *buf, size_t len,
+				 struct intel_pt_pkt *packet)
+{
+	uint64_t payload;
+	int count;
+
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	payload = le64_to_cpu(*(uint64_t *)buf);
+
+	for (count = 47; count; count--) {
+		if (payload & BIT63)
+			break;
+		payload <<= 1;
+	}
+
+	packet->type = INTEL_PT_TNT;
+	packet->count = count;
+	packet->payload = payload << 1;
+	return 8;
+}
+
+static int intel_pt_get_pip(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	uint64_t payload = 0;
+
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	packet->type = INTEL_PT_PIP;
+	memcpy_le64(&payload, buf + 2, 6);
+	packet->payload = payload >> 1;
+
+	return 8;
+}
+
+static int intel_pt_get_cbr(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 4)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_CBR;
+	packet->payload = buf[2];
+	return 4;
+}
+
+static int intel_pt_get_ovf(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_OVF;
+	return 2;
+}
+
+static int intel_pt_get_psb(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	int i;
+
+	if (len < 16)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	for (i = 2; i < 16; i += 2) {
+		if (buf[i] != 2 || buf[i + 1] != 0x82)
+			return INTEL_PT_BAD_PACKET;
+	}
+
+	packet->type = INTEL_PT_PSB;
+	return 16;
+}
+
+static int intel_pt_get_psbend(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_PSBEND;
+	return 2;
+}
+
+static int intel_pt_get_pad(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_PAD;
+	return 1;
+}
+
+static int intel_pt_get_ext(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	switch (buf[1]) {
+	case 0xa3: /* Long TNT */
+		return intel_pt_get_long_tnt(buf, len, packet);
+	case 0x43: /* PIP */
+		return intel_pt_get_pip(buf, len, packet);
+	case 0x03: /* CBR */
+		return intel_pt_get_cbr(buf, len, packet);
+	case 0xf3: /* OVF */
+		return intel_pt_get_ovf(packet);
+	case 0x82: /* PSB */
+		return intel_pt_get_psb(buf, len, packet);
+	case 0x23: /* PSBEND */
+		return intel_pt_get_psbend(packet);
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+}
+
+static int intel_pt_get_short_tnt(unsigned int byte,
+				  struct intel_pt_pkt *packet)
+{
+	int count;
+
+	for (count = 6; count; count--) {
+		if (byte & BIT(7))
+			break;
+		byte <<= 1;
+	}
+
+	packet->type = INTEL_PT_TNT;
+	packet->count = count;
+	packet->payload = (uint64_t)byte << 57;
+
+	return 1;
+}
+
+static int intel_pt_get_ip(enum intel_pt_pkt_type type, unsigned int byte,
+			   const unsigned char *buf, size_t len,
+			   struct intel_pt_pkt *packet)
+{
+	switch (byte >> 5) {
+	case 0:
+		packet->count = 0;
+		break;
+	case 1:
+		if (len < 3)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 2;
+		packet->payload = le16_to_cpu(*(uint16_t *)(buf + 1));
+		break;
+	case 2:
+		if (len < 5)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 4;
+		packet->payload = le32_to_cpu(*(uint32_t *)(buf + 1));
+		break;
+	case 3:
+	case 6:
+		if (len < 7)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 6;
+		memcpy_le64(&packet->payload, buf + 1, 6);
+		break;
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+
+	packet->type = type;
+
+	return packet->count + 1;
+}
+
+static int intel_pt_get_mode(const unsigned char *buf, size_t len,
+			     struct intel_pt_pkt *packet)
+{
+	if (len < 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	switch (buf[1] >> 5) {
+	case 0:
+		packet->type = INTEL_PT_MODE_EXEC;
+		switch (buf[1] & 3) {
+		case 0:
+			packet->payload = 16;
+			break;
+		case 1:
+			packet->payload = 64;
+			break;
+		case 2:
+			packet->payload = 32;
+			break;
+		default:
+			return INTEL_PT_BAD_PACKET;
+		}
+		break;
+	case 1:
+		packet->type = INTEL_PT_MODE_TSX;
+		if ((buf[1] & 3) == 3)
+			return INTEL_PT_BAD_PACKET;
+		packet->payload = buf[1] & 3;
+		break;
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+
+	return 2;
+}
+
+static int intel_pt_get_tsc(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_TSC;
+	memcpy_le64(&packet->payload, buf + 1, 7);
+	return 8;
+}
+
+static int intel_pt_do_get_packet(const unsigned char *buf, size_t len,
+				  struct intel_pt_pkt *packet)
+{
+	unsigned int byte;
+
+	memset(packet, 0, sizeof(struct intel_pt_pkt));
+
+	if (!len)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	byte = buf[0];
+	if (!(byte & BIT(0))) {
+		if (byte == 0)
+			return intel_pt_get_pad(packet);
+		if (byte == 2)
+			return intel_pt_get_ext(buf, len, packet);
+		return intel_pt_get_short_tnt(byte, packet);
+	}
+
+	switch (byte & 0x1f) {
+	case 0x0D:
+		return intel_pt_get_ip(INTEL_PT_TIP, byte, buf, len, packet);
+	case 0x11:
+		return intel_pt_get_ip(INTEL_PT_TIP_PGE, byte, buf, len,
+				       packet);
+	case 0x01:
+		return intel_pt_get_ip(INTEL_PT_TIP_PGD, byte, buf, len,
+				       packet);
+	case 0x1D:
+		return intel_pt_get_ip(INTEL_PT_FUP, byte, buf, len, packet);
+	case 0x19:
+		switch (byte) {
+		case 0x99:
+			return intel_pt_get_mode(buf, len, packet);
+		case 0x19:
+			return intel_pt_get_tsc(buf, len, packet);
+		default:
+			return INTEL_PT_BAD_PACKET;
+		}
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+}
+
+int intel_pt_get_packet(const unsigned char *buf, size_t len,
+			struct intel_pt_pkt *packet)
+{
+	int ret;
+
+	ret = intel_pt_do_get_packet(buf, len, packet);
+	if (ret > 0) {
+		while (ret < 8 && len > (size_t)ret && !buf[ret])
+			ret += 1;
+	}
+	return ret;
+}
+
+int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf,
+		      size_t buf_len)
+{
+	int ret, i;
+	unsigned long long payload = packet->payload;
+	const char *name = intel_pt_pkt_name(packet->type);
+
+	switch (packet->type) {
+	case INTEL_PT_BAD:
+	case INTEL_PT_PAD:
+	case INTEL_PT_PSB:
+	case INTEL_PT_PSBEND:
+	case INTEL_PT_OVF:
+		return snprintf(buf, buf_len, "%s", name);
+	case INTEL_PT_TNT: {
+		size_t blen = buf_len;
+
+		ret = snprintf(buf, blen, "%s ", name);
+		if (ret < 0)
+			return ret;
+		buf += ret;
+		blen -= ret;
+		for (i = 0; i < packet->count; i++) {
+			if (payload & BIT63)
+				ret = snprintf(buf, blen, "T");
+			else
+				ret = snprintf(buf, blen, "N");
+			if (ret < 0)
+				return ret;
+			buf += ret;
+			blen -= ret;
+			payload <<= 1;
+		}
+		ret = snprintf(buf, blen, " (%d)", packet->count);
+		if (ret < 0)
+			return ret;
+		blen -= ret;
+		return buf_len - blen;
+	}
+	case INTEL_PT_TIP_PGD:
+	case INTEL_PT_TIP_PGE:
+	case INTEL_PT_TIP:
+	case INTEL_PT_FUP:
+		if (!(packet->count))
+			return snprintf(buf, buf_len, "%s no ip", name);
+	case INTEL_PT_CBR:
+		return snprintf(buf, buf_len, "%s 0x%llx", name, payload);
+	case INTEL_PT_TSC:
+		if (packet->count)
+			return snprintf(buf, buf_len,
+					"%s 0x%llx CTC 0x%x FC 0x%x",
+					name, payload, packet->count & 0xffff,
+					(packet->count >> 16) & 0x1ff);
+		else
+			return snprintf(buf, buf_len, "%s 0x%llx",
+					name, payload);
+	case INTEL_PT_MODE_EXEC:
+		return snprintf(buf, buf_len, "%s %lld", name, payload);
+	case INTEL_PT_MODE_TSX:
+		return snprintf(buf, buf_len, "%s TXAbort:%u InTX:%u",
+				name, (unsigned)(payload >> 1) & 1,
+				(unsigned)payload & 1);
+	case INTEL_PT_PIP:
+		ret = snprintf(buf, buf_len, "%s 0x%llx",
+			       name, payload);
+		return ret;
+	default:
+		break;
+	}
+	return snprintf(buf, buf_len, "%s 0x%llx (%d)",
+			name, payload, packet->count);
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
new file mode 100644
index 000000000000..53404fa942b3
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
@@ -0,0 +1,64 @@
+/*
+ * intel_pt_pkt_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_PKT_DECODER_H__
+#define INCLUDE__INTEL_PT_PKT_DECODER_H__
+
+#include <stddef.h>
+#include <stdint.h>
+
+#define INTEL_PT_PKT_DESC_MAX	256
+
+#define INTEL_PT_NEED_MORE_BYTES	-1
+#define INTEL_PT_BAD_PACKET		-2
+
+#define INTEL_PT_PSB_STR		"\002\202\002\202\002\202\002\202" \
+					"\002\202\002\202\002\202\002\202"
+#define INTEL_PT_PSB_LEN		16
+
+#define INTEL_PT_PKT_MAX_SZ		16
+
+enum intel_pt_pkt_type {
+	INTEL_PT_BAD,
+	INTEL_PT_PAD,
+	INTEL_PT_TNT,
+	INTEL_PT_TIP_PGD,
+	INTEL_PT_TIP_PGE,
+	INTEL_PT_TSC,
+	INTEL_PT_MODE_EXEC,
+	INTEL_PT_MODE_TSX,
+	INTEL_PT_TIP,
+	INTEL_PT_FUP,
+	INTEL_PT_PSB,
+	INTEL_PT_PSBEND,
+	INTEL_PT_CBR,
+	INTEL_PT_PIP,
+	INTEL_PT_OVF,
+};
+
+struct intel_pt_pkt {
+	enum intel_pt_pkt_type	type;
+	int			count;
+	uint64_t		payload;
+};
+
+const char *intel_pt_pkt_name(enum intel_pt_pkt_type);
+
+int intel_pt_get_packet(const unsigned char *buf, size_t len,
+			struct intel_pt_pkt *packet);
+
+int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf, size_t len);
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 03/25] perf tools: Add Intel PT instruction decoder
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 01/25] perf auxtrace: Add Intel PT as an AUX area tracing type Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 02/25] perf tools: Add Intel PT packet decoder Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-12 20:55   ` Arnaldo Carvalho de Melo
  2015-07-17 16:33 ` [PATCH V8 04/25] perf tools: Add Intel PT log Adrian Hunter
                   ` (21 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add support for decoding instructions for Intel Processor Trace.  The
kernel x86 instruction decoder is copied for this.

This essentially provides intel_pt_get_insn() which takes a binary
buffer, uses the kernel's x86 instruction decoder to get details of the
instruction and then categorizes it for consumption by an Intel PT
decoder.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/build/Makefile.build                         |   2 +
 tools/perf/.gitignore                              |   1 +
 tools/perf/Makefile.perf                           |  12 +-
 tools/perf/util/intel-pt-decoder/Build             |  12 +-
 .../util/intel-pt-decoder/gen-insn-attr-x86.awk    | 386 ++++++++
 tools/perf/util/intel-pt-decoder/inat.c            |  96 ++
 tools/perf/util/intel-pt-decoder/inat.h            | 221 +++++
 tools/perf/util/intel-pt-decoder/inat_types.h      |  29 +
 tools/perf/util/intel-pt-decoder/insn.c            | 594 +++++++++++++
 tools/perf/util/intel-pt-decoder/insn.h            | 201 +++++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.c  | 246 ++++++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |  65 ++
 .../perf/util/intel-pt-decoder/x86-opcode-map.txt  | 970 +++++++++++++++++++++
 13 files changed, 2832 insertions(+), 3 deletions(-)
 create mode 100644 tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
 create mode 100644 tools/perf/util/intel-pt-decoder/inat.c
 create mode 100644 tools/perf/util/intel-pt-decoder/inat.h
 create mode 100644 tools/perf/util/intel-pt-decoder/inat_types.h
 create mode 100644 tools/perf/util/intel-pt-decoder/insn.c
 create mode 100644 tools/perf/util/intel-pt-decoder/insn.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
 create mode 100644 tools/perf/util/intel-pt-decoder/x86-opcode-map.txt

diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index faca2bf6a430..8120af9c0341 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -57,6 +57,8 @@ quiet_cmd_cc_i_c = CPP      $@
 quiet_cmd_cc_s_c = AS       $@
       cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $<
 
+quiet_cmd_gen = GEN      $@
+
 # Link agregate command
 # If there's nothing to link, create empty $@ object.
 quiet_cmd_ld_multi = LD       $@
diff --git a/tools/perf/.gitignore b/tools/perf/.gitignore
index 09db62ba5786..3d1bb802dbf4 100644
--- a/tools/perf/.gitignore
+++ b/tools/perf/.gitignore
@@ -29,3 +29,4 @@ config.mak.autogen
 *.pyc
 *.pyo
 .config-detected
+util/intel-pt-decoder/inat-tables.c
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 7a4b549214e3..0dff7da45618 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -76,6 +76,12 @@ include config/utilities.mak
 #
 # Define NO_AUXTRACE if you do not want AUX area tracing support
 
+# As per kernel Makefile, avoid funny character set dependencies
+unexport LC_ALL
+LC_COLLATE=C
+LC_NUMERIC=C
+export LC_COLLATE LC_NUMERIC
+
 ifeq ($(srctree),)
 srctree := $(patsubst %/,%,$(dir $(shell pwd)))
 srctree := $(patsubst %/,%,$(dir $(srctree)))
@@ -122,6 +128,7 @@ INSTALL = install
 FLEX    = flex
 BISON   = bison
 STRIP   = strip
+AWK     = awk
 
 LIB_DIR          = $(srctree)/tools/lib/api/
 TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
@@ -276,7 +283,7 @@ strip: $(PROGRAMS) $(OUTPUT)perf
 
 PERF_IN := $(OUTPUT)perf-in.o
 
-export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX
+export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX AWK
 build := -f $(srctree)/tools/build/Makefile.build dir=. obj
 
 $(PERF_IN): $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h FORCE
@@ -547,7 +554,8 @@ clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
 	$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
 	$(Q)$(RM) $(OUTPUT).config-detected
 	$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf perf-read-vdso32 perf-read-vdsox32
-	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex*
+	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex* \
+		$(OUTPUT)util/intel-pt-decoder/inat-tables.c
 	$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean
 	$(python-clean)
 
diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
index 9d67381a9bd3..5a46ce13c1f8 100644
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -1 +1,11 @@
-libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o
+
+inat_tables_script = util/intel-pt-decoder/gen-insn-attr-x86.awk
+inat_tables_maps = util/intel-pt-decoder/x86-opcode-map.txt
+
+$(OUTPUT)util/intel-pt-decoder/inat-tables.c: $(inat_tables_script) $(inat_tables_maps)
+	@$(call echo-cmd,gen)$(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
+
+$(OUTPUT)util/intel-pt-decoder/intel-pt-insn-decoder.o: util/intel-pt-decoder/inat.c $(OUTPUT)util/intel-pt-decoder/inat-tables.c
+
+CFLAGS_intel-pt-insn-decoder.o += -I$(OUTPUT)util/intel-pt-decoder -Wno-override-init
diff --git a/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk b/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
new file mode 100644
index 000000000000..517567347aac
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
@@ -0,0 +1,386 @@
+#!/bin/awk -f
+# gen-insn-attr-x86.awk: Instruction attribute table generator
+# Written by Masami Hiramatsu <mhiramat@redhat.com>
+#
+# Usage: awk -f gen-insn-attr-x86.awk x86-opcode-map.txt > inat-tables.c
+
+# Awk implementation sanity check
+function check_awk_implement() {
+	if (sprintf("%x", 0) != "0")
+		return "Your awk has a printf-format problem."
+	return ""
+}
+
+# Clear working vars
+function clear_vars() {
+	delete table
+	delete lptable2
+	delete lptable1
+	delete lptable3
+	eid = -1 # escape id
+	gid = -1 # group id
+	aid = -1 # AVX id
+	tname = ""
+}
+
+BEGIN {
+	# Implementation error checking
+	awkchecked = check_awk_implement()
+	if (awkchecked != "") {
+		print "Error: " awkchecked > "/dev/stderr"
+		print "Please try to use gawk." > "/dev/stderr"
+		exit 1
+	}
+
+	# Setup generating tables
+	print "/* x86 opcode map generated from x86-opcode-map.txt */"
+	print "/* Do not change this code. */\n"
+	ggid = 1
+	geid = 1
+	gaid = 0
+	delete etable
+	delete gtable
+	delete atable
+
+	opnd_expr = "^[A-Za-z/]"
+	ext_expr = "^\\("
+	sep_expr = "^\\|$"
+	group_expr = "^Grp[0-9A-Za-z]+"
+
+	imm_expr = "^[IJAOL][a-z]"
+	imm_flag["Ib"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
+	imm_flag["Jb"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
+	imm_flag["Iw"] = "INAT_MAKE_IMM(INAT_IMM_WORD)"
+	imm_flag["Id"] = "INAT_MAKE_IMM(INAT_IMM_DWORD)"
+	imm_flag["Iq"] = "INAT_MAKE_IMM(INAT_IMM_QWORD)"
+	imm_flag["Ap"] = "INAT_MAKE_IMM(INAT_IMM_PTR)"
+	imm_flag["Iz"] = "INAT_MAKE_IMM(INAT_IMM_VWORD32)"
+	imm_flag["Jz"] = "INAT_MAKE_IMM(INAT_IMM_VWORD32)"
+	imm_flag["Iv"] = "INAT_MAKE_IMM(INAT_IMM_VWORD)"
+	imm_flag["Ob"] = "INAT_MOFFSET"
+	imm_flag["Ov"] = "INAT_MOFFSET"
+	imm_flag["Lx"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
+
+	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
+	force64_expr = "\\([df]64\\)"
+	rex_expr = "^REX(\\.[XRWB]+)*"
+	fpu_expr = "^ESC" # TODO
+
+	lprefix1_expr = "\\((66|!F3)\\)"
+	lprefix2_expr = "\\(F3\\)"
+	lprefix3_expr = "\\((F2|!F3|66\\&F2)\\)"
+	lprefix_expr = "\\((66|F2|F3)\\)"
+	max_lprefix = 4
+
+	# All opcodes starting with lower-case 'v' or with (v1) superscript
+	# accepts VEX prefix
+	vexok_opcode_expr = "^v.*"
+	vexok_expr = "\\(v1\\)"
+	# All opcodes with (v) superscript supports *only* VEX prefix
+	vexonly_expr = "\\(v\\)"
+
+	prefix_expr = "\\(Prefix\\)"
+	prefix_num["Operand-Size"] = "INAT_PFX_OPNDSZ"
+	prefix_num["REPNE"] = "INAT_PFX_REPNE"
+	prefix_num["REP/REPE"] = "INAT_PFX_REPE"
+	prefix_num["XACQUIRE"] = "INAT_PFX_REPNE"
+	prefix_num["XRELEASE"] = "INAT_PFX_REPE"
+	prefix_num["LOCK"] = "INAT_PFX_LOCK"
+	prefix_num["SEG=CS"] = "INAT_PFX_CS"
+	prefix_num["SEG=DS"] = "INAT_PFX_DS"
+	prefix_num["SEG=ES"] = "INAT_PFX_ES"
+	prefix_num["SEG=FS"] = "INAT_PFX_FS"
+	prefix_num["SEG=GS"] = "INAT_PFX_GS"
+	prefix_num["SEG=SS"] = "INAT_PFX_SS"
+	prefix_num["Address-Size"] = "INAT_PFX_ADDRSZ"
+	prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
+	prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
+
+	clear_vars()
+}
+
+function semantic_error(msg) {
+	print "Semantic error at " NR ": " msg > "/dev/stderr"
+	exit 1
+}
+
+function debug(msg) {
+	print "DEBUG: " msg
+}
+
+function array_size(arr,   i,c) {
+	c = 0
+	for (i in arr)
+		c++
+	return c
+}
+
+/^Table:/ {
+	print "/* " $0 " */"
+	if (tname != "")
+		semantic_error("Hit Table: before EndTable:.");
+}
+
+/^Referrer:/ {
+	if (NF != 1) {
+		# escape opcode table
+		ref = ""
+		for (i = 2; i <= NF; i++)
+			ref = ref $i
+		eid = escape[ref]
+		tname = sprintf("inat_escape_table_%d", eid)
+	}
+}
+
+/^AVXcode:/ {
+	if (NF != 1) {
+		# AVX/escape opcode table
+		aid = $2
+		if (gaid <= aid)
+			gaid = aid + 1
+		if (tname == "")	# AVX only opcode table
+			tname = sprintf("inat_avx_table_%d", $2)
+	}
+	if (aid == -1 && eid == -1)	# primary opcode table
+		tname = "inat_primary_table"
+}
+
+/^GrpTable:/ {
+	print "/* " $0 " */"
+	if (!($2 in group))
+		semantic_error("No group: " $2 )
+	gid = group[$2]
+	tname = "inat_group_table_" gid
+}
+
+function print_table(tbl,name,fmt,n)
+{
+	print "const insn_attr_t " name " = {"
+	for (i = 0; i < n; i++) {
+		id = sprintf(fmt, i)
+		if (tbl[id])
+			print "	[" id "] = " tbl[id] ","
+	}
+	print "};"
+}
+
+/^EndTable/ {
+	if (gid != -1) {
+		# print group tables
+		if (array_size(table) != 0) {
+			print_table(table, tname "[INAT_GROUP_TABLE_SIZE]",
+				    "0x%x", 8)
+			gtable[gid,0] = tname
+		}
+		if (array_size(lptable1) != 0) {
+			print_table(lptable1, tname "_1[INAT_GROUP_TABLE_SIZE]",
+				    "0x%x", 8)
+			gtable[gid,1] = tname "_1"
+		}
+		if (array_size(lptable2) != 0) {
+			print_table(lptable2, tname "_2[INAT_GROUP_TABLE_SIZE]",
+				    "0x%x", 8)
+			gtable[gid,2] = tname "_2"
+		}
+		if (array_size(lptable3) != 0) {
+			print_table(lptable3, tname "_3[INAT_GROUP_TABLE_SIZE]",
+				    "0x%x", 8)
+			gtable[gid,3] = tname "_3"
+		}
+	} else {
+		# print primary/escaped tables
+		if (array_size(table) != 0) {
+			print_table(table, tname "[INAT_OPCODE_TABLE_SIZE]",
+				    "0x%02x", 256)
+			etable[eid,0] = tname
+			if (aid >= 0)
+				atable[aid,0] = tname
+		}
+		if (array_size(lptable1) != 0) {
+			print_table(lptable1,tname "_1[INAT_OPCODE_TABLE_SIZE]",
+				    "0x%02x", 256)
+			etable[eid,1] = tname "_1"
+			if (aid >= 0)
+				atable[aid,1] = tname "_1"
+		}
+		if (array_size(lptable2) != 0) {
+			print_table(lptable2,tname "_2[INAT_OPCODE_TABLE_SIZE]",
+				    "0x%02x", 256)
+			etable[eid,2] = tname "_2"
+			if (aid >= 0)
+				atable[aid,2] = tname "_2"
+		}
+		if (array_size(lptable3) != 0) {
+			print_table(lptable3,tname "_3[INAT_OPCODE_TABLE_SIZE]",
+				    "0x%02x", 256)
+			etable[eid,3] = tname "_3"
+			if (aid >= 0)
+				atable[aid,3] = tname "_3"
+		}
+	}
+	print ""
+	clear_vars()
+}
+
+function add_flags(old,new) {
+	if (old && new)
+		return old " | " new
+	else if (old)
+		return old
+	else
+		return new
+}
+
+# convert operands to flags.
+function convert_operands(count,opnd,       i,j,imm,mod)
+{
+	imm = null
+	mod = null
+	for (j = 1; j <= count; j++) {
+		i = opnd[j]
+		if (match(i, imm_expr) == 1) {
+			if (!imm_flag[i])
+				semantic_error("Unknown imm opnd: " i)
+			if (imm) {
+				if (i != "Ib")
+					semantic_error("Second IMM error")
+				imm = add_flags(imm, "INAT_SCNDIMM")
+			} else
+				imm = imm_flag[i]
+		} else if (match(i, modrm_expr))
+			mod = "INAT_MODRM"
+	}
+	return add_flags(imm, mod)
+}
+
+/^[0-9a-f]+\:/ {
+	if (NR == 1)
+		next
+	# get index
+	idx = "0x" substr($1, 1, index($1,":") - 1)
+	if (idx in table)
+		semantic_error("Redefine " idx " in " tname)
+
+	# check if escaped opcode
+	if ("escape" == $2) {
+		if ($3 != "#")
+			semantic_error("No escaped name")
+		ref = ""
+		for (i = 4; i <= NF; i++)
+			ref = ref $i
+		if (ref in escape)
+			semantic_error("Redefine escape (" ref ")")
+		escape[ref] = geid
+		geid++
+		table[idx] = "INAT_MAKE_ESCAPE(" escape[ref] ")"
+		next
+	}
+
+	variant = null
+	# converts
+	i = 2
+	while (i <= NF) {
+		opcode = $(i++)
+		delete opnds
+		ext = null
+		flags = null
+		opnd = null
+		# parse one opcode
+		if (match($i, opnd_expr)) {
+			opnd = $i
+			count = split($(i++), opnds, ",")
+			flags = convert_operands(count, opnds)
+		}
+		if (match($i, ext_expr))
+			ext = $(i++)
+		if (match($i, sep_expr))
+			i++
+		else if (i < NF)
+			semantic_error($i " is not a separator")
+
+		# check if group opcode
+		if (match(opcode, group_expr)) {
+			if (!(opcode in group)) {
+				group[opcode] = ggid
+				ggid++
+			}
+			flags = add_flags(flags, "INAT_MAKE_GROUP(" group[opcode] ")")
+		}
+		# check force(or default) 64bit
+		if (match(ext, force64_expr))
+			flags = add_flags(flags, "INAT_FORCE64")
+
+		# check REX prefix
+		if (match(opcode, rex_expr))
+			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
+
+		# check coprocessor escape : TODO
+		if (match(opcode, fpu_expr))
+			flags = add_flags(flags, "INAT_MODRM")
+
+		# check VEX codes
+		if (match(ext, vexonly_expr))
+			flags = add_flags(flags, "INAT_VEXOK | INAT_VEXONLY")
+		else if (match(ext, vexok_expr) || match(opcode, vexok_opcode_expr))
+			flags = add_flags(flags, "INAT_VEXOK")
+
+		# check prefixes
+		if (match(ext, prefix_expr)) {
+			if (!prefix_num[opcode])
+				semantic_error("Unknown prefix: " opcode)
+			flags = add_flags(flags, "INAT_MAKE_PREFIX(" prefix_num[opcode] ")")
+		}
+		if (length(flags) == 0)
+			continue
+		# check if last prefix
+		if (match(ext, lprefix1_expr)) {
+			lptable1[idx] = add_flags(lptable1[idx],flags)
+			variant = "INAT_VARIANT"
+		}
+		if (match(ext, lprefix2_expr)) {
+			lptable2[idx] = add_flags(lptable2[idx],flags)
+			variant = "INAT_VARIANT"
+		}
+		if (match(ext, lprefix3_expr)) {
+			lptable3[idx] = add_flags(lptable3[idx],flags)
+			variant = "INAT_VARIANT"
+		}
+		if (!match(ext, lprefix_expr)){
+			table[idx] = add_flags(table[idx],flags)
+		}
+	}
+	if (variant)
+		table[idx] = add_flags(table[idx],variant)
+}
+
+END {
+	if (awkchecked != "")
+		exit 1
+	# print escape opcode map's array
+	print "/* Escape opcode map array */"
+	print "const insn_attr_t * const inat_escape_tables[INAT_ESC_MAX + 1]" \
+	      "[INAT_LSTPFX_MAX + 1] = {"
+	for (i = 0; i < geid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (etable[i,j])
+				print "	["i"]["j"] = "etable[i,j]","
+	print "};\n"
+	# print group opcode map's array
+	print "/* Group opcode map array */"
+	print "const insn_attr_t * const inat_group_tables[INAT_GRP_MAX + 1]"\
+	      "[INAT_LSTPFX_MAX + 1] = {"
+	for (i = 0; i < ggid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (gtable[i,j])
+				print "	["i"]["j"] = "gtable[i,j]","
+	print "};\n"
+	# print AVX opcode map's array
+	print "/* AVX opcode map array */"
+	print "const insn_attr_t * const inat_avx_tables[X86_VEX_M_MAX + 1]"\
+	      "[INAT_LSTPFX_MAX + 1] = {"
+	for (i = 0; i < gaid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (atable[i,j])
+				print "	["i"]["j"] = "atable[i,j]","
+	print "};"
+}
diff --git a/tools/perf/util/intel-pt-decoder/inat.c b/tools/perf/util/intel-pt-decoder/inat.c
new file mode 100644
index 000000000000..feeaa509dfe4
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/inat.c
@@ -0,0 +1,96 @@
+/*
+ * x86 instruction attribute tables
+ *
+ * Written by Masami Hiramatsu <mhiramat@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+#include <asm/insn.h>
+
+/* Attribute tables are generated from opcode map */
+#include "inat-tables.c"
+
+/* Attribute search APIs */
+insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode)
+{
+	return inat_primary_table[opcode];
+}
+
+int inat_get_last_prefix_id(insn_byte_t last_pfx)
+{
+	insn_attr_t lpfx_attr;
+
+	lpfx_attr = inat_get_opcode_attribute(last_pfx);
+	return inat_last_prefix_id(lpfx_attr);
+}
+
+insn_attr_t inat_get_escape_attribute(insn_byte_t opcode, int lpfx_id,
+				      insn_attr_t esc_attr)
+{
+	const insn_attr_t *table;
+	int n;
+
+	n = inat_escape_id(esc_attr);
+
+	table = inat_escape_tables[n][0];
+	if (!table)
+		return 0;
+	if (inat_has_variant(table[opcode]) && lpfx_id) {
+		table = inat_escape_tables[n][lpfx_id];
+		if (!table)
+			return 0;
+	}
+	return table[opcode];
+}
+
+insn_attr_t inat_get_group_attribute(insn_byte_t modrm, int lpfx_id,
+				     insn_attr_t grp_attr)
+{
+	const insn_attr_t *table;
+	int n;
+
+	n = inat_group_id(grp_attr);
+
+	table = inat_group_tables[n][0];
+	if (!table)
+		return inat_group_common_attribute(grp_attr);
+	if (inat_has_variant(table[X86_MODRM_REG(modrm)]) && lpfx_id) {
+		table = inat_group_tables[n][lpfx_id];
+		if (!table)
+			return inat_group_common_attribute(grp_attr);
+	}
+	return table[X86_MODRM_REG(modrm)] |
+	       inat_group_common_attribute(grp_attr);
+}
+
+insn_attr_t inat_get_avx_attribute(insn_byte_t opcode, insn_byte_t vex_m,
+				   insn_byte_t vex_p)
+{
+	const insn_attr_t *table;
+	if (vex_m > X86_VEX_M_MAX || vex_p > INAT_LSTPFX_MAX)
+		return 0;
+	/* At first, this checks the master table */
+	table = inat_avx_tables[vex_m][0];
+	if (!table)
+		return 0;
+	if (!inat_is_group(table[opcode]) && vex_p) {
+		/* If this is not a group, get attribute directly */
+		table = inat_avx_tables[vex_m][vex_p];
+		if (!table)
+			return 0;
+	}
+	return table[opcode];
+}
diff --git a/tools/perf/util/intel-pt-decoder/inat.h b/tools/perf/util/intel-pt-decoder/inat.h
new file mode 100644
index 000000000000..74a2e312e8a2
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/inat.h
@@ -0,0 +1,221 @@
+#ifndef _ASM_X86_INAT_H
+#define _ASM_X86_INAT_H
+/*
+ * x86 instruction attributes
+ *
+ * Written by Masami Hiramatsu <mhiramat@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+#include <asm/inat_types.h>
+
+/*
+ * Internal bits. Don't use bitmasks directly, because these bits are
+ * unstable. You should use checking functions.
+ */
+
+#define INAT_OPCODE_TABLE_SIZE 256
+#define INAT_GROUP_TABLE_SIZE 8
+
+/* Legacy last prefixes */
+#define INAT_PFX_OPNDSZ	1	/* 0x66 */ /* LPFX1 */
+#define INAT_PFX_REPE	2	/* 0xF3 */ /* LPFX2 */
+#define INAT_PFX_REPNE	3	/* 0xF2 */ /* LPFX3 */
+/* Other Legacy prefixes */
+#define INAT_PFX_LOCK	4	/* 0xF0 */
+#define INAT_PFX_CS	5	/* 0x2E */
+#define INAT_PFX_DS	6	/* 0x3E */
+#define INAT_PFX_ES	7	/* 0x26 */
+#define INAT_PFX_FS	8	/* 0x64 */
+#define INAT_PFX_GS	9	/* 0x65 */
+#define INAT_PFX_SS	10	/* 0x36 */
+#define INAT_PFX_ADDRSZ	11	/* 0x67 */
+/* x86-64 REX prefix */
+#define INAT_PFX_REX	12	/* 0x4X */
+/* AVX VEX prefixes */
+#define INAT_PFX_VEX2	13	/* 2-bytes VEX prefix */
+#define INAT_PFX_VEX3	14	/* 3-bytes VEX prefix */
+
+#define INAT_LSTPFX_MAX	3
+#define INAT_LGCPFX_MAX	11
+
+/* Immediate size */
+#define INAT_IMM_BYTE		1
+#define INAT_IMM_WORD		2
+#define INAT_IMM_DWORD		3
+#define INAT_IMM_QWORD		4
+#define INAT_IMM_PTR		5
+#define INAT_IMM_VWORD32	6
+#define INAT_IMM_VWORD		7
+
+/* Legacy prefix */
+#define INAT_PFX_OFFS	0
+#define INAT_PFX_BITS	4
+#define INAT_PFX_MAX    ((1 << INAT_PFX_BITS) - 1)
+#define INAT_PFX_MASK	(INAT_PFX_MAX << INAT_PFX_OFFS)
+/* Escape opcodes */
+#define INAT_ESC_OFFS	(INAT_PFX_OFFS + INAT_PFX_BITS)
+#define INAT_ESC_BITS	2
+#define INAT_ESC_MAX	((1 << INAT_ESC_BITS) - 1)
+#define INAT_ESC_MASK	(INAT_ESC_MAX << INAT_ESC_OFFS)
+/* Group opcodes (1-16) */
+#define INAT_GRP_OFFS	(INAT_ESC_OFFS + INAT_ESC_BITS)
+#define INAT_GRP_BITS	5
+#define INAT_GRP_MAX	((1 << INAT_GRP_BITS) - 1)
+#define INAT_GRP_MASK	(INAT_GRP_MAX << INAT_GRP_OFFS)
+/* Immediates */
+#define INAT_IMM_OFFS	(INAT_GRP_OFFS + INAT_GRP_BITS)
+#define INAT_IMM_BITS	3
+#define INAT_IMM_MASK	(((1 << INAT_IMM_BITS) - 1) << INAT_IMM_OFFS)
+/* Flags */
+#define INAT_FLAG_OFFS	(INAT_IMM_OFFS + INAT_IMM_BITS)
+#define INAT_MODRM	(1 << (INAT_FLAG_OFFS))
+#define INAT_FORCE64	(1 << (INAT_FLAG_OFFS + 1))
+#define INAT_SCNDIMM	(1 << (INAT_FLAG_OFFS + 2))
+#define INAT_MOFFSET	(1 << (INAT_FLAG_OFFS + 3))
+#define INAT_VARIANT	(1 << (INAT_FLAG_OFFS + 4))
+#define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
+#define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
+/* Attribute making macros for attribute tables */
+#define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
+#define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
+#define INAT_MAKE_GROUP(grp)	((grp << INAT_GRP_OFFS) | INAT_MODRM)
+#define INAT_MAKE_IMM(imm)	(imm << INAT_IMM_OFFS)
+
+/* Attribute search APIs */
+extern insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode);
+extern int inat_get_last_prefix_id(insn_byte_t last_pfx);
+extern insn_attr_t inat_get_escape_attribute(insn_byte_t opcode,
+					     int lpfx_id,
+					     insn_attr_t esc_attr);
+extern insn_attr_t inat_get_group_attribute(insn_byte_t modrm,
+					    int lpfx_id,
+					    insn_attr_t esc_attr);
+extern insn_attr_t inat_get_avx_attribute(insn_byte_t opcode,
+					  insn_byte_t vex_m,
+					  insn_byte_t vex_pp);
+
+/* Attribute checking functions */
+static inline int inat_is_legacy_prefix(insn_attr_t attr)
+{
+	attr &= INAT_PFX_MASK;
+	return attr && attr <= INAT_LGCPFX_MAX;
+}
+
+static inline int inat_is_address_size_prefix(insn_attr_t attr)
+{
+	return (attr & INAT_PFX_MASK) == INAT_PFX_ADDRSZ;
+}
+
+static inline int inat_is_operand_size_prefix(insn_attr_t attr)
+{
+	return (attr & INAT_PFX_MASK) == INAT_PFX_OPNDSZ;
+}
+
+static inline int inat_is_rex_prefix(insn_attr_t attr)
+{
+	return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
+}
+
+static inline int inat_last_prefix_id(insn_attr_t attr)
+{
+	if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
+		return 0;
+	else
+		return attr & INAT_PFX_MASK;
+}
+
+static inline int inat_is_vex_prefix(insn_attr_t attr)
+{
+	attr &= INAT_PFX_MASK;
+	return attr == INAT_PFX_VEX2 || attr == INAT_PFX_VEX3;
+}
+
+static inline int inat_is_vex3_prefix(insn_attr_t attr)
+{
+	return (attr & INAT_PFX_MASK) == INAT_PFX_VEX3;
+}
+
+static inline int inat_is_escape(insn_attr_t attr)
+{
+	return attr & INAT_ESC_MASK;
+}
+
+static inline int inat_escape_id(insn_attr_t attr)
+{
+	return (attr & INAT_ESC_MASK) >> INAT_ESC_OFFS;
+}
+
+static inline int inat_is_group(insn_attr_t attr)
+{
+	return attr & INAT_GRP_MASK;
+}
+
+static inline int inat_group_id(insn_attr_t attr)
+{
+	return (attr & INAT_GRP_MASK) >> INAT_GRP_OFFS;
+}
+
+static inline int inat_group_common_attribute(insn_attr_t attr)
+{
+	return attr & ~INAT_GRP_MASK;
+}
+
+static inline int inat_has_immediate(insn_attr_t attr)
+{
+	return attr & INAT_IMM_MASK;
+}
+
+static inline int inat_immediate_size(insn_attr_t attr)
+{
+	return (attr & INAT_IMM_MASK) >> INAT_IMM_OFFS;
+}
+
+static inline int inat_has_modrm(insn_attr_t attr)
+{
+	return attr & INAT_MODRM;
+}
+
+static inline int inat_is_force64(insn_attr_t attr)
+{
+	return attr & INAT_FORCE64;
+}
+
+static inline int inat_has_second_immediate(insn_attr_t attr)
+{
+	return attr & INAT_SCNDIMM;
+}
+
+static inline int inat_has_moffset(insn_attr_t attr)
+{
+	return attr & INAT_MOFFSET;
+}
+
+static inline int inat_has_variant(insn_attr_t attr)
+{
+	return attr & INAT_VARIANT;
+}
+
+static inline int inat_accept_vex(insn_attr_t attr)
+{
+	return attr & INAT_VEXOK;
+}
+
+static inline int inat_must_vex(insn_attr_t attr)
+{
+	return attr & INAT_VEXONLY;
+}
+#endif
diff --git a/tools/perf/util/intel-pt-decoder/inat_types.h b/tools/perf/util/intel-pt-decoder/inat_types.h
new file mode 100644
index 000000000000..cb3c20ce39cf
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/inat_types.h
@@ -0,0 +1,29 @@
+#ifndef _ASM_X86_INAT_TYPES_H
+#define _ASM_X86_INAT_TYPES_H
+/*
+ * x86 instruction attributes
+ *
+ * Written by Masami Hiramatsu <mhiramat@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+
+/* Instruction attributes */
+typedef unsigned int insn_attr_t;
+typedef unsigned char insn_byte_t;
+typedef signed int insn_value_t;
+
+#endif
diff --git a/tools/perf/util/intel-pt-decoder/insn.c b/tools/perf/util/intel-pt-decoder/insn.c
new file mode 100644
index 000000000000..8f72b334aea0
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/insn.c
@@ -0,0 +1,594 @@
+/*
+ * x86 instruction analysis
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2002, 2004, 2009
+ */
+
+#ifdef __KERNEL__
+#include <linux/string.h>
+#else
+#include <string.h>
+#endif
+#include <asm/inat.h>
+#include <asm/insn.h>
+
+/* Verify next sizeof(t) bytes can be on the same instruction */
+#define validate_next(t, insn, n)	\
+	((insn)->next_byte + sizeof(t) + n <= (insn)->end_kaddr)
+
+#define __get_next(t, insn)	\
+	({ t r = *(t*)insn->next_byte; insn->next_byte += sizeof(t); r; })
+
+#define __peek_nbyte_next(t, insn, n)	\
+	({ t r = *(t*)((insn)->next_byte + n); r; })
+
+#define get_next(t, insn)	\
+	({ if (unlikely(!validate_next(t, insn, 0))) goto err_out; __get_next(t, insn); })
+
+#define peek_nbyte_next(t, insn, n)	\
+	({ if (unlikely(!validate_next(t, insn, n))) goto err_out; __peek_nbyte_next(t, insn, n); })
+
+#define peek_next(t, insn)	peek_nbyte_next(t, insn, 0)
+
+/**
+ * insn_init() - initialize struct insn
+ * @insn:	&struct insn to be initialized
+ * @kaddr:	address (in kernel memory) of instruction (or copy thereof)
+ * @x86_64:	!0 for 64-bit kernel or 64-bit app
+ */
+void insn_init(struct insn *insn, const void *kaddr, int buf_len, int x86_64)
+{
+	/*
+	 * Instructions longer than MAX_INSN_SIZE (15 bytes) are invalid
+	 * even if the input buffer is long enough to hold them.
+	 */
+	if (buf_len > MAX_INSN_SIZE)
+		buf_len = MAX_INSN_SIZE;
+
+	memset(insn, 0, sizeof(*insn));
+	insn->kaddr = kaddr;
+	insn->end_kaddr = kaddr + buf_len;
+	insn->next_byte = kaddr;
+	insn->x86_64 = x86_64 ? 1 : 0;
+	insn->opnd_bytes = 4;
+	if (x86_64)
+		insn->addr_bytes = 8;
+	else
+		insn->addr_bytes = 4;
+}
+
+/**
+ * insn_get_prefixes - scan x86 instruction prefix bytes
+ * @insn:	&struct insn containing instruction
+ *
+ * Populates the @insn->prefixes bitmap, and updates @insn->next_byte
+ * to point to the (first) opcode.  No effect if @insn->prefixes.got
+ * is already set.
+ */
+void insn_get_prefixes(struct insn *insn)
+{
+	struct insn_field *prefixes = &insn->prefixes;
+	insn_attr_t attr;
+	insn_byte_t b, lb;
+	int i, nb;
+
+	if (prefixes->got)
+		return;
+
+	nb = 0;
+	lb = 0;
+	b = peek_next(insn_byte_t, insn);
+	attr = inat_get_opcode_attribute(b);
+	while (inat_is_legacy_prefix(attr)) {
+		/* Skip if same prefix */
+		for (i = 0; i < nb; i++)
+			if (prefixes->bytes[i] == b)
+				goto found;
+		if (nb == 4)
+			/* Invalid instruction */
+			break;
+		prefixes->bytes[nb++] = b;
+		if (inat_is_address_size_prefix(attr)) {
+			/* address size switches 2/4 or 4/8 */
+			if (insn->x86_64)
+				insn->addr_bytes ^= 12;
+			else
+				insn->addr_bytes ^= 6;
+		} else if (inat_is_operand_size_prefix(attr)) {
+			/* oprand size switches 2/4 */
+			insn->opnd_bytes ^= 6;
+		}
+found:
+		prefixes->nbytes++;
+		insn->next_byte++;
+		lb = b;
+		b = peek_next(insn_byte_t, insn);
+		attr = inat_get_opcode_attribute(b);
+	}
+	/* Set the last prefix */
+	if (lb && lb != insn->prefixes.bytes[3]) {
+		if (unlikely(insn->prefixes.bytes[3])) {
+			/* Swap the last prefix */
+			b = insn->prefixes.bytes[3];
+			for (i = 0; i < nb; i++)
+				if (prefixes->bytes[i] == lb)
+					prefixes->bytes[i] = b;
+		}
+		insn->prefixes.bytes[3] = lb;
+	}
+
+	/* Decode REX prefix */
+	if (insn->x86_64) {
+		b = peek_next(insn_byte_t, insn);
+		attr = inat_get_opcode_attribute(b);
+		if (inat_is_rex_prefix(attr)) {
+			insn->rex_prefix.value = b;
+			insn->rex_prefix.nbytes = 1;
+			insn->next_byte++;
+			if (X86_REX_W(b))
+				/* REX.W overrides opnd_size */
+				insn->opnd_bytes = 8;
+		}
+	}
+	insn->rex_prefix.got = 1;
+
+	/* Decode VEX prefix */
+	b = peek_next(insn_byte_t, insn);
+	attr = inat_get_opcode_attribute(b);
+	if (inat_is_vex_prefix(attr)) {
+		insn_byte_t b2 = peek_nbyte_next(insn_byte_t, insn, 1);
+		if (!insn->x86_64) {
+			/*
+			 * In 32-bits mode, if the [7:6] bits (mod bits of
+			 * ModRM) on the second byte are not 11b, it is
+			 * LDS or LES.
+			 */
+			if (X86_MODRM_MOD(b2) != 3)
+				goto vex_end;
+		}
+		insn->vex_prefix.bytes[0] = b;
+		insn->vex_prefix.bytes[1] = b2;
+		if (inat_is_vex3_prefix(attr)) {
+			b2 = peek_nbyte_next(insn_byte_t, insn, 2);
+			insn->vex_prefix.bytes[2] = b2;
+			insn->vex_prefix.nbytes = 3;
+			insn->next_byte += 3;
+			if (insn->x86_64 && X86_VEX_W(b2))
+				/* VEX.W overrides opnd_size */
+				insn->opnd_bytes = 8;
+		} else {
+			/*
+			 * For VEX2, fake VEX3-like byte#2.
+			 * Makes it easier to decode vex.W, vex.vvvv,
+			 * vex.L and vex.pp. Masking with 0x7f sets vex.W == 0.
+			 */
+			insn->vex_prefix.bytes[2] = b2 & 0x7f;
+			insn->vex_prefix.nbytes = 2;
+			insn->next_byte += 2;
+		}
+	}
+vex_end:
+	insn->vex_prefix.got = 1;
+
+	prefixes->got = 1;
+
+err_out:
+	return;
+}
+
+/**
+ * insn_get_opcode - collect opcode(s)
+ * @insn:	&struct insn containing instruction
+ *
+ * Populates @insn->opcode, updates @insn->next_byte to point past the
+ * opcode byte(s), and set @insn->attr (except for groups).
+ * If necessary, first collects any preceding (prefix) bytes.
+ * Sets @insn->opcode.value = opcode1.  No effect if @insn->opcode.got
+ * is already 1.
+ */
+void insn_get_opcode(struct insn *insn)
+{
+	struct insn_field *opcode = &insn->opcode;
+	insn_byte_t op;
+	int pfx_id;
+	if (opcode->got)
+		return;
+	if (!insn->prefixes.got)
+		insn_get_prefixes(insn);
+
+	/* Get first opcode */
+	op = get_next(insn_byte_t, insn);
+	opcode->bytes[0] = op;
+	opcode->nbytes = 1;
+
+	/* Check if there is VEX prefix or not */
+	if (insn_is_avx(insn)) {
+		insn_byte_t m, p;
+		m = insn_vex_m_bits(insn);
+		p = insn_vex_p_bits(insn);
+		insn->attr = inat_get_avx_attribute(op, m, p);
+		if (!inat_accept_vex(insn->attr) && !inat_is_group(insn->attr))
+			insn->attr = 0;	/* This instruction is bad */
+		goto end;	/* VEX has only 1 byte for opcode */
+	}
+
+	insn->attr = inat_get_opcode_attribute(op);
+	while (inat_is_escape(insn->attr)) {
+		/* Get escaped opcode */
+		op = get_next(insn_byte_t, insn);
+		opcode->bytes[opcode->nbytes++] = op;
+		pfx_id = insn_last_prefix_id(insn);
+		insn->attr = inat_get_escape_attribute(op, pfx_id, insn->attr);
+	}
+	if (inat_must_vex(insn->attr))
+		insn->attr = 0;	/* This instruction is bad */
+end:
+	opcode->got = 1;
+
+err_out:
+	return;
+}
+
+/**
+ * insn_get_modrm - collect ModRM byte, if any
+ * @insn:	&struct insn containing instruction
+ *
+ * Populates @insn->modrm and updates @insn->next_byte to point past the
+ * ModRM byte, if any.  If necessary, first collects the preceding bytes
+ * (prefixes and opcode(s)).  No effect if @insn->modrm.got is already 1.
+ */
+void insn_get_modrm(struct insn *insn)
+{
+	struct insn_field *modrm = &insn->modrm;
+	insn_byte_t pfx_id, mod;
+	if (modrm->got)
+		return;
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+
+	if (inat_has_modrm(insn->attr)) {
+		mod = get_next(insn_byte_t, insn);
+		modrm->value = mod;
+		modrm->nbytes = 1;
+		if (inat_is_group(insn->attr)) {
+			pfx_id = insn_last_prefix_id(insn);
+			insn->attr = inat_get_group_attribute(mod, pfx_id,
+							      insn->attr);
+			if (insn_is_avx(insn) && !inat_accept_vex(insn->attr))
+				insn->attr = 0;	/* This is bad */
+		}
+	}
+
+	if (insn->x86_64 && inat_is_force64(insn->attr))
+		insn->opnd_bytes = 8;
+	modrm->got = 1;
+
+err_out:
+	return;
+}
+
+
+/**
+ * insn_rip_relative() - Does instruction use RIP-relative addressing mode?
+ * @insn:	&struct insn containing instruction
+ *
+ * If necessary, first collects the instruction up to and including the
+ * ModRM byte.  No effect if @insn->x86_64 is 0.
+ */
+int insn_rip_relative(struct insn *insn)
+{
+	struct insn_field *modrm = &insn->modrm;
+
+	if (!insn->x86_64)
+		return 0;
+	if (!modrm->got)
+		insn_get_modrm(insn);
+	/*
+	 * For rip-relative instructions, the mod field (top 2 bits)
+	 * is zero and the r/m field (bottom 3 bits) is 0x5.
+	 */
+	return (modrm->nbytes && (modrm->value & 0xc7) == 0x5);
+}
+
+/**
+ * insn_get_sib() - Get the SIB byte of instruction
+ * @insn:	&struct insn containing instruction
+ *
+ * If necessary, first collects the instruction up to and including the
+ * ModRM byte.
+ */
+void insn_get_sib(struct insn *insn)
+{
+	insn_byte_t modrm;
+
+	if (insn->sib.got)
+		return;
+	if (!insn->modrm.got)
+		insn_get_modrm(insn);
+	if (insn->modrm.nbytes) {
+		modrm = (insn_byte_t)insn->modrm.value;
+		if (insn->addr_bytes != 2 &&
+		    X86_MODRM_MOD(modrm) != 3 && X86_MODRM_RM(modrm) == 4) {
+			insn->sib.value = get_next(insn_byte_t, insn);
+			insn->sib.nbytes = 1;
+		}
+	}
+	insn->sib.got = 1;
+
+err_out:
+	return;
+}
+
+
+/**
+ * insn_get_displacement() - Get the displacement of instruction
+ * @insn:	&struct insn containing instruction
+ *
+ * If necessary, first collects the instruction up to and including the
+ * SIB byte.
+ * Displacement value is sign-expanded.
+ */
+void insn_get_displacement(struct insn *insn)
+{
+	insn_byte_t mod, rm, base;
+
+	if (insn->displacement.got)
+		return;
+	if (!insn->sib.got)
+		insn_get_sib(insn);
+	if (insn->modrm.nbytes) {
+		/*
+		 * Interpreting the modrm byte:
+		 * mod = 00 - no displacement fields (exceptions below)
+		 * mod = 01 - 1-byte displacement field
+		 * mod = 10 - displacement field is 4 bytes, or 2 bytes if
+		 * 	address size = 2 (0x67 prefix in 32-bit mode)
+		 * mod = 11 - no memory operand
+		 *
+		 * If address size = 2...
+		 * mod = 00, r/m = 110 - displacement field is 2 bytes
+		 *
+		 * If address size != 2...
+		 * mod != 11, r/m = 100 - SIB byte exists
+		 * mod = 00, SIB base = 101 - displacement field is 4 bytes
+		 * mod = 00, r/m = 101 - rip-relative addressing, displacement
+		 * 	field is 4 bytes
+		 */
+		mod = X86_MODRM_MOD(insn->modrm.value);
+		rm = X86_MODRM_RM(insn->modrm.value);
+		base = X86_SIB_BASE(insn->sib.value);
+		if (mod == 3)
+			goto out;
+		if (mod == 1) {
+			insn->displacement.value = get_next(char, insn);
+			insn->displacement.nbytes = 1;
+		} else if (insn->addr_bytes == 2) {
+			if ((mod == 0 && rm == 6) || mod == 2) {
+				insn->displacement.value =
+					 get_next(short, insn);
+				insn->displacement.nbytes = 2;
+			}
+		} else {
+			if ((mod == 0 && rm == 5) || mod == 2 ||
+			    (mod == 0 && base == 5)) {
+				insn->displacement.value = get_next(int, insn);
+				insn->displacement.nbytes = 4;
+			}
+		}
+	}
+out:
+	insn->displacement.got = 1;
+
+err_out:
+	return;
+}
+
+/* Decode moffset16/32/64. Return 0 if failed */
+static int __get_moffset(struct insn *insn)
+{
+	switch (insn->addr_bytes) {
+	case 2:
+		insn->moffset1.value = get_next(short, insn);
+		insn->moffset1.nbytes = 2;
+		break;
+	case 4:
+		insn->moffset1.value = get_next(int, insn);
+		insn->moffset1.nbytes = 4;
+		break;
+	case 8:
+		insn->moffset1.value = get_next(int, insn);
+		insn->moffset1.nbytes = 4;
+		insn->moffset2.value = get_next(int, insn);
+		insn->moffset2.nbytes = 4;
+		break;
+	default:	/* opnd_bytes must be modified manually */
+		goto err_out;
+	}
+	insn->moffset1.got = insn->moffset2.got = 1;
+
+	return 1;
+
+err_out:
+	return 0;
+}
+
+/* Decode imm v32(Iz). Return 0 if failed */
+static int __get_immv32(struct insn *insn)
+{
+	switch (insn->opnd_bytes) {
+	case 2:
+		insn->immediate.value = get_next(short, insn);
+		insn->immediate.nbytes = 2;
+		break;
+	case 4:
+	case 8:
+		insn->immediate.value = get_next(int, insn);
+		insn->immediate.nbytes = 4;
+		break;
+	default:	/* opnd_bytes must be modified manually */
+		goto err_out;
+	}
+
+	return 1;
+
+err_out:
+	return 0;
+}
+
+/* Decode imm v64(Iv/Ov), Return 0 if failed */
+static int __get_immv(struct insn *insn)
+{
+	switch (insn->opnd_bytes) {
+	case 2:
+		insn->immediate1.value = get_next(short, insn);
+		insn->immediate1.nbytes = 2;
+		break;
+	case 4:
+		insn->immediate1.value = get_next(int, insn);
+		insn->immediate1.nbytes = 4;
+		break;
+	case 8:
+		insn->immediate1.value = get_next(int, insn);
+		insn->immediate1.nbytes = 4;
+		insn->immediate2.value = get_next(int, insn);
+		insn->immediate2.nbytes = 4;
+		break;
+	default:	/* opnd_bytes must be modified manually */
+		goto err_out;
+	}
+	insn->immediate1.got = insn->immediate2.got = 1;
+
+	return 1;
+err_out:
+	return 0;
+}
+
+/* Decode ptr16:16/32(Ap) */
+static int __get_immptr(struct insn *insn)
+{
+	switch (insn->opnd_bytes) {
+	case 2:
+		insn->immediate1.value = get_next(short, insn);
+		insn->immediate1.nbytes = 2;
+		break;
+	case 4:
+		insn->immediate1.value = get_next(int, insn);
+		insn->immediate1.nbytes = 4;
+		break;
+	case 8:
+		/* ptr16:64 is not exist (no segment) */
+		return 0;
+	default:	/* opnd_bytes must be modified manually */
+		goto err_out;
+	}
+	insn->immediate2.value = get_next(unsigned short, insn);
+	insn->immediate2.nbytes = 2;
+	insn->immediate1.got = insn->immediate2.got = 1;
+
+	return 1;
+err_out:
+	return 0;
+}
+
+/**
+ * insn_get_immediate() - Get the immediates of instruction
+ * @insn:	&struct insn containing instruction
+ *
+ * If necessary, first collects the instruction up to and including the
+ * displacement bytes.
+ * Basically, most of immediates are sign-expanded. Unsigned-value can be
+ * get by bit masking with ((1 << (nbytes * 8)) - 1)
+ */
+void insn_get_immediate(struct insn *insn)
+{
+	if (insn->immediate.got)
+		return;
+	if (!insn->displacement.got)
+		insn_get_displacement(insn);
+
+	if (inat_has_moffset(insn->attr)) {
+		if (!__get_moffset(insn))
+			goto err_out;
+		goto done;
+	}
+
+	if (!inat_has_immediate(insn->attr))
+		/* no immediates */
+		goto done;
+
+	switch (inat_immediate_size(insn->attr)) {
+	case INAT_IMM_BYTE:
+		insn->immediate.value = get_next(char, insn);
+		insn->immediate.nbytes = 1;
+		break;
+	case INAT_IMM_WORD:
+		insn->immediate.value = get_next(short, insn);
+		insn->immediate.nbytes = 2;
+		break;
+	case INAT_IMM_DWORD:
+		insn->immediate.value = get_next(int, insn);
+		insn->immediate.nbytes = 4;
+		break;
+	case INAT_IMM_QWORD:
+		insn->immediate1.value = get_next(int, insn);
+		insn->immediate1.nbytes = 4;
+		insn->immediate2.value = get_next(int, insn);
+		insn->immediate2.nbytes = 4;
+		break;
+	case INAT_IMM_PTR:
+		if (!__get_immptr(insn))
+			goto err_out;
+		break;
+	case INAT_IMM_VWORD32:
+		if (!__get_immv32(insn))
+			goto err_out;
+		break;
+	case INAT_IMM_VWORD:
+		if (!__get_immv(insn))
+			goto err_out;
+		break;
+	default:
+		/* Here, insn must have an immediate, but failed */
+		goto err_out;
+	}
+	if (inat_has_second_immediate(insn->attr)) {
+		insn->immediate2.value = get_next(char, insn);
+		insn->immediate2.nbytes = 1;
+	}
+done:
+	insn->immediate.got = 1;
+
+err_out:
+	return;
+}
+
+/**
+ * insn_get_length() - Get the length of instruction
+ * @insn:	&struct insn containing instruction
+ *
+ * If necessary, first collects the instruction up to and including the
+ * immediates bytes.
+ */
+void insn_get_length(struct insn *insn)
+{
+	if (insn->length)
+		return;
+	if (!insn->immediate.got)
+		insn_get_immediate(insn);
+	insn->length = (unsigned char)((unsigned long)insn->next_byte
+				     - (unsigned long)insn->kaddr);
+}
diff --git a/tools/perf/util/intel-pt-decoder/insn.h b/tools/perf/util/intel-pt-decoder/insn.h
new file mode 100644
index 000000000000..e7814b74caf8
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/insn.h
@@ -0,0 +1,201 @@
+#ifndef _ASM_X86_INSN_H
+#define _ASM_X86_INSN_H
+/*
+ * x86 instruction analysis
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2009
+ */
+
+/* insn_attr_t is defined in inat.h */
+#include <asm/inat.h>
+
+struct insn_field {
+	union {
+		insn_value_t value;
+		insn_byte_t bytes[4];
+	};
+	/* !0 if we've run insn_get_xxx() for this field */
+	unsigned char got;
+	unsigned char nbytes;
+};
+
+struct insn {
+	struct insn_field prefixes;	/*
+					 * Prefixes
+					 * prefixes.bytes[3]: last prefix
+					 */
+	struct insn_field rex_prefix;	/* REX prefix */
+	struct insn_field vex_prefix;	/* VEX prefix */
+	struct insn_field opcode;	/*
+					 * opcode.bytes[0]: opcode1
+					 * opcode.bytes[1]: opcode2
+					 * opcode.bytes[2]: opcode3
+					 */
+	struct insn_field modrm;
+	struct insn_field sib;
+	struct insn_field displacement;
+	union {
+		struct insn_field immediate;
+		struct insn_field moffset1;	/* for 64bit MOV */
+		struct insn_field immediate1;	/* for 64bit imm or off16/32 */
+	};
+	union {
+		struct insn_field moffset2;	/* for 64bit MOV */
+		struct insn_field immediate2;	/* for 64bit imm or seg16 */
+	};
+
+	insn_attr_t attr;
+	unsigned char opnd_bytes;
+	unsigned char addr_bytes;
+	unsigned char length;
+	unsigned char x86_64;
+
+	const insn_byte_t *kaddr;	/* kernel address of insn to analyze */
+	const insn_byte_t *end_kaddr;	/* kernel address of last insn in buffer */
+	const insn_byte_t *next_byte;
+};
+
+#define MAX_INSN_SIZE	15
+
+#define X86_MODRM_MOD(modrm) (((modrm) & 0xc0) >> 6)
+#define X86_MODRM_REG(modrm) (((modrm) & 0x38) >> 3)
+#define X86_MODRM_RM(modrm) ((modrm) & 0x07)
+
+#define X86_SIB_SCALE(sib) (((sib) & 0xc0) >> 6)
+#define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
+#define X86_SIB_BASE(sib) ((sib) & 0x07)
+
+#define X86_REX_W(rex) ((rex) & 8)
+#define X86_REX_R(rex) ((rex) & 4)
+#define X86_REX_X(rex) ((rex) & 2)
+#define X86_REX_B(rex) ((rex) & 1)
+
+/* VEX bit flags  */
+#define X86_VEX_W(vex)	((vex) & 0x80)	/* VEX3 Byte2 */
+#define X86_VEX_R(vex)	((vex) & 0x80)	/* VEX2/3 Byte1 */
+#define X86_VEX_X(vex)	((vex) & 0x40)	/* VEX3 Byte1 */
+#define X86_VEX_B(vex)	((vex) & 0x20)	/* VEX3 Byte1 */
+#define X86_VEX_L(vex)	((vex) & 0x04)	/* VEX3 Byte2, VEX2 Byte1 */
+/* VEX bit fields */
+#define X86_VEX3_M(vex)	((vex) & 0x1f)		/* VEX3 Byte1 */
+#define X86_VEX2_M	1			/* VEX2.M always 1 */
+#define X86_VEX_V(vex)	(((vex) & 0x78) >> 3)	/* VEX3 Byte2, VEX2 Byte1 */
+#define X86_VEX_P(vex)	((vex) & 0x03)		/* VEX3 Byte2, VEX2 Byte1 */
+#define X86_VEX_M_MAX	0x1f			/* VEX3.M Maximum value */
+
+extern void insn_init(struct insn *insn, const void *kaddr, int buf_len, int x86_64);
+extern void insn_get_prefixes(struct insn *insn);
+extern void insn_get_opcode(struct insn *insn);
+extern void insn_get_modrm(struct insn *insn);
+extern void insn_get_sib(struct insn *insn);
+extern void insn_get_displacement(struct insn *insn);
+extern void insn_get_immediate(struct insn *insn);
+extern void insn_get_length(struct insn *insn);
+
+/* Attribute will be determined after getting ModRM (for opcode groups) */
+static inline void insn_get_attribute(struct insn *insn)
+{
+	insn_get_modrm(insn);
+}
+
+/* Instruction uses RIP-relative addressing */
+extern int insn_rip_relative(struct insn *insn);
+
+/* Init insn for kernel text */
+static inline void kernel_insn_init(struct insn *insn,
+				    const void *kaddr, int buf_len)
+{
+#ifdef CONFIG_X86_64
+	insn_init(insn, kaddr, buf_len, 1);
+#else /* CONFIG_X86_32 */
+	insn_init(insn, kaddr, buf_len, 0);
+#endif
+}
+
+static inline int insn_is_avx(struct insn *insn)
+{
+	if (!insn->prefixes.got)
+		insn_get_prefixes(insn);
+	return (insn->vex_prefix.value != 0);
+}
+
+/* Ensure this instruction is decoded completely */
+static inline int insn_complete(struct insn *insn)
+{
+	return insn->opcode.got && insn->modrm.got && insn->sib.got &&
+		insn->displacement.got && insn->immediate.got;
+}
+
+static inline insn_byte_t insn_vex_m_bits(struct insn *insn)
+{
+	if (insn->vex_prefix.nbytes == 2)	/* 2 bytes VEX */
+		return X86_VEX2_M;
+	else
+		return X86_VEX3_M(insn->vex_prefix.bytes[1]);
+}
+
+static inline insn_byte_t insn_vex_p_bits(struct insn *insn)
+{
+	if (insn->vex_prefix.nbytes == 2)	/* 2 bytes VEX */
+		return X86_VEX_P(insn->vex_prefix.bytes[1]);
+	else
+		return X86_VEX_P(insn->vex_prefix.bytes[2]);
+}
+
+/* Get the last prefix id from last prefix or VEX prefix */
+static inline int insn_last_prefix_id(struct insn *insn)
+{
+	if (insn_is_avx(insn))
+		return insn_vex_p_bits(insn);	/* VEX_p is a SIMD prefix id */
+
+	if (insn->prefixes.bytes[3])
+		return inat_get_last_prefix_id(insn->prefixes.bytes[3]);
+
+	return 0;
+}
+
+/* Offset of each field from kaddr */
+static inline int insn_offset_rex_prefix(struct insn *insn)
+{
+	return insn->prefixes.nbytes;
+}
+static inline int insn_offset_vex_prefix(struct insn *insn)
+{
+	return insn_offset_rex_prefix(insn) + insn->rex_prefix.nbytes;
+}
+static inline int insn_offset_opcode(struct insn *insn)
+{
+	return insn_offset_vex_prefix(insn) + insn->vex_prefix.nbytes;
+}
+static inline int insn_offset_modrm(struct insn *insn)
+{
+	return insn_offset_opcode(insn) + insn->opcode.nbytes;
+}
+static inline int insn_offset_sib(struct insn *insn)
+{
+	return insn_offset_modrm(insn) + insn->modrm.nbytes;
+}
+static inline int insn_offset_displacement(struct insn *insn)
+{
+	return insn_offset_sib(insn) + insn->sib.nbytes;
+}
+static inline int insn_offset_immediate(struct insn *insn)
+{
+	return insn_offset_displacement(insn) + insn->displacement.nbytes;
+}
+
+#endif /* _ASM_X86_INSN_H */
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
new file mode 100644
index 000000000000..2fa82c567446
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
@@ -0,0 +1,246 @@
+/*
+ * intel_pt_insn_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <endian.h>
+#include <byteswap.h>
+
+#include "event.h"
+
+#include <asm/insn.h>
+
+#include "inat.c"
+#include <insn.c>
+
+#include "intel-pt-insn-decoder.h"
+
+/* Based on branch_type() from perf_event_intel_lbr.c */
+static void intel_pt_insn_decoder(struct insn *insn,
+				  struct intel_pt_insn *intel_pt_insn)
+{
+	enum intel_pt_insn_op op = INTEL_PT_OP_OTHER;
+	enum intel_pt_insn_branch branch = INTEL_PT_BR_NO_BRANCH;
+	int ext;
+
+	if (insn_is_avx(insn)) {
+		intel_pt_insn->op = INTEL_PT_OP_OTHER;
+		intel_pt_insn->branch = INTEL_PT_BR_NO_BRANCH;
+		intel_pt_insn->length = insn->length;
+		return;
+	}
+
+	switch (insn->opcode.bytes[0]) {
+	case 0xf:
+		switch (insn->opcode.bytes[1]) {
+		case 0x05: /* syscall */
+		case 0x34: /* sysenter */
+			op = INTEL_PT_OP_SYSCALL;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 0x07: /* sysret */
+		case 0x35: /* sysexit */
+			op = INTEL_PT_OP_SYSRET;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 0x80 ... 0x8f: /* jcc */
+			op = INTEL_PT_OP_JCC;
+			branch = INTEL_PT_BR_CONDITIONAL;
+			break;
+		default:
+			break;
+		}
+		break;
+	case 0x70 ... 0x7f: /* jcc */
+		op = INTEL_PT_OP_JCC;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xc2: /* near ret */
+	case 0xc3: /* near ret */
+	case 0xca: /* far ret */
+	case 0xcb: /* far ret */
+		op = INTEL_PT_OP_RET;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xcf: /* iret */
+		op = INTEL_PT_OP_IRET;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xcc ... 0xce: /* int */
+		op = INTEL_PT_OP_INT;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xe8: /* call near rel */
+		op = INTEL_PT_OP_CALL;
+		branch = INTEL_PT_BR_UNCONDITIONAL;
+		break;
+	case 0x9a: /* call far absolute */
+		op = INTEL_PT_OP_CALL;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xe0 ... 0xe2: /* loop */
+		op = INTEL_PT_OP_LOOP;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xe3: /* jcc */
+		op = INTEL_PT_OP_JCC;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xe9: /* jmp */
+	case 0xeb: /* jmp */
+		op = INTEL_PT_OP_JMP;
+		branch = INTEL_PT_BR_UNCONDITIONAL;
+		break;
+	case 0xea: /* far jmp */
+		op = INTEL_PT_OP_JMP;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xff: /* call near absolute, call far absolute ind */
+		ext = (insn->modrm.bytes[0] >> 3) & 0x7;
+		switch (ext) {
+		case 2: /* near ind call */
+		case 3: /* far ind call */
+			op = INTEL_PT_OP_CALL;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 4:
+		case 5:
+			op = INTEL_PT_OP_JMP;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		default:
+			break;
+		}
+		break;
+	default:
+		break;
+	}
+
+	intel_pt_insn->op = op;
+	intel_pt_insn->branch = branch;
+	intel_pt_insn->length = insn->length;
+
+	if (branch == INTEL_PT_BR_CONDITIONAL ||
+	    branch == INTEL_PT_BR_UNCONDITIONAL) {
+#if __BYTE_ORDER == __BIG_ENDIAN
+		switch (insn->immediate.nbytes) {
+		case 1:
+			intel_pt_insn->rel = insn->immediate.value;
+			break;
+		case 2:
+			intel_pt_insn->rel =
+					bswap_16((short)insn->immediate.value);
+			break;
+		case 4:
+			intel_pt_insn->rel = bswap_32(insn->immediate.value);
+			break;
+		}
+#else
+		intel_pt_insn->rel = insn->immediate.value;
+#endif
+	}
+}
+
+int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
+		      struct intel_pt_insn *intel_pt_insn)
+{
+	struct insn insn;
+
+	insn_init(&insn, buf, len, x86_64);
+	insn_get_length(&insn);
+	if (!insn_complete(&insn) || insn.length > len)
+		return -1;
+	intel_pt_insn_decoder(&insn, intel_pt_insn);
+	if (insn.length < INTEL_PT_INSN_DBG_BUF_SZ)
+		memcpy(intel_pt_insn->buf, buf, insn.length);
+	else
+		memcpy(intel_pt_insn->buf, buf, INTEL_PT_INSN_DBG_BUF_SZ);
+	return 0;
+}
+
+const char *branch_name[] = {
+	[INTEL_PT_OP_OTHER]	= "Other",
+	[INTEL_PT_OP_CALL]	= "Call",
+	[INTEL_PT_OP_RET]	= "Ret",
+	[INTEL_PT_OP_JCC]	= "Jcc",
+	[INTEL_PT_OP_JMP]	= "Jmp",
+	[INTEL_PT_OP_LOOP]	= "Loop",
+	[INTEL_PT_OP_IRET]	= "IRet",
+	[INTEL_PT_OP_INT]	= "Int",
+	[INTEL_PT_OP_SYSCALL]	= "Syscall",
+	[INTEL_PT_OP_SYSRET]	= "Sysret",
+};
+
+const char *intel_pt_insn_name(enum intel_pt_insn_op op)
+{
+	return branch_name[op];
+}
+
+int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
+		       size_t buf_len)
+{
+	switch (intel_pt_insn->branch) {
+	case INTEL_PT_BR_CONDITIONAL:
+	case INTEL_PT_BR_UNCONDITIONAL:
+		return snprintf(buf, buf_len, "%s %s%d",
+				intel_pt_insn_name(intel_pt_insn->op),
+				intel_pt_insn->rel > 0 ? "+" : "",
+				intel_pt_insn->rel);
+	case INTEL_PT_BR_NO_BRANCH:
+	case INTEL_PT_BR_INDIRECT:
+		return snprintf(buf, buf_len, "%s",
+				intel_pt_insn_name(intel_pt_insn->op));
+	default:
+		break;
+	}
+	return 0;
+}
+
+size_t intel_pt_insn_max_size(void)
+{
+	return MAX_INSN_SIZE;
+}
+
+int intel_pt_insn_type(enum intel_pt_insn_op op)
+{
+	switch (op) {
+	case INTEL_PT_OP_OTHER:
+		return 0;
+	case INTEL_PT_OP_CALL:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL;
+	case INTEL_PT_OP_RET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN;
+	case INTEL_PT_OP_JCC:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
+	case INTEL_PT_OP_JMP:
+		return PERF_IP_FLAG_BRANCH;
+	case INTEL_PT_OP_LOOP:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
+	case INTEL_PT_OP_IRET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
+		       PERF_IP_FLAG_INTERRUPT;
+	case INTEL_PT_OP_INT:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+		       PERF_IP_FLAG_INTERRUPT;
+	case INTEL_PT_OP_SYSCALL:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+		       PERF_IP_FLAG_SYSCALLRET;
+	case INTEL_PT_OP_SYSRET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
+		       PERF_IP_FLAG_SYSCALLRET;
+	default:
+		return 0;
+	}
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
new file mode 100644
index 000000000000..b0adbf37323e
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
@@ -0,0 +1,65 @@
+/*
+ * intel_pt_insn_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_INSN_DECODER_H__
+#define INCLUDE__INTEL_PT_INSN_DECODER_H__
+
+#include <stddef.h>
+#include <stdint.h>
+
+#define INTEL_PT_INSN_DESC_MAX		32
+#define INTEL_PT_INSN_DBG_BUF_SZ	16
+
+enum intel_pt_insn_op {
+	INTEL_PT_OP_OTHER,
+	INTEL_PT_OP_CALL,
+	INTEL_PT_OP_RET,
+	INTEL_PT_OP_JCC,
+	INTEL_PT_OP_JMP,
+	INTEL_PT_OP_LOOP,
+	INTEL_PT_OP_IRET,
+	INTEL_PT_OP_INT,
+	INTEL_PT_OP_SYSCALL,
+	INTEL_PT_OP_SYSRET,
+};
+
+enum intel_pt_insn_branch {
+	INTEL_PT_BR_NO_BRANCH,
+	INTEL_PT_BR_INDIRECT,
+	INTEL_PT_BR_CONDITIONAL,
+	INTEL_PT_BR_UNCONDITIONAL,
+};
+
+struct intel_pt_insn {
+	enum intel_pt_insn_op		op;
+	enum intel_pt_insn_branch	branch;
+	int				length;
+	int32_t				rel;
+	unsigned char			buf[INTEL_PT_INSN_DBG_BUF_SZ];
+};
+
+int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
+		      struct intel_pt_insn *intel_pt_insn);
+
+const char *intel_pt_insn_name(enum intel_pt_insn_op op);
+
+int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
+		       size_t buf_len);
+
+size_t intel_pt_insn_max_size(void);
+
+int intel_pt_insn_type(enum intel_pt_insn_op op);
+
+#endif
diff --git a/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt b/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
new file mode 100644
index 000000000000..816488c0b97e
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
@@ -0,0 +1,970 @@
+# x86 Opcode Maps
+#
+# This is (mostly) based on following documentations.
+# - Intel(R) 64 and IA-32 Architectures Software Developer's Manual Vol.2C
+#   (#326018-047US, June 2013)
+#
+#<Opcode maps>
+# Table: table-name
+# Referrer: escaped-name
+# AVXcode: avx-code
+# opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...]
+# (or)
+# opcode: escape # escaped-name
+# EndTable
+#
+#<group maps>
+# GrpTable: GrpXXX
+# reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...]
+# EndTable
+#
+# AVX Superscripts
+#  (v): this opcode requires VEX prefix.
+#  (v1): this opcode only supports 128bit VEX.
+#
+# Last Prefix Superscripts
+#  - (66): the last prefix is 0x66
+#  - (F3): the last prefix is 0xF3
+#  - (F2): the last prefix is 0xF2
+#  - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
+#  - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+
+Table: one byte opcode
+Referrer:
+AVXcode:
+# 0x00 - 0x0f
+00: ADD Eb,Gb
+01: ADD Ev,Gv
+02: ADD Gb,Eb
+03: ADD Gv,Ev
+04: ADD AL,Ib
+05: ADD rAX,Iz
+06: PUSH ES (i64)
+07: POP ES (i64)
+08: OR Eb,Gb
+09: OR Ev,Gv
+0a: OR Gb,Eb
+0b: OR Gv,Ev
+0c: OR AL,Ib
+0d: OR rAX,Iz
+0e: PUSH CS (i64)
+0f: escape # 2-byte escape
+# 0x10 - 0x1f
+10: ADC Eb,Gb
+11: ADC Ev,Gv
+12: ADC Gb,Eb
+13: ADC Gv,Ev
+14: ADC AL,Ib
+15: ADC rAX,Iz
+16: PUSH SS (i64)
+17: POP SS (i64)
+18: SBB Eb,Gb
+19: SBB Ev,Gv
+1a: SBB Gb,Eb
+1b: SBB Gv,Ev
+1c: SBB AL,Ib
+1d: SBB rAX,Iz
+1e: PUSH DS (i64)
+1f: POP DS (i64)
+# 0x20 - 0x2f
+20: AND Eb,Gb
+21: AND Ev,Gv
+22: AND Gb,Eb
+23: AND Gv,Ev
+24: AND AL,Ib
+25: AND rAx,Iz
+26: SEG=ES (Prefix)
+27: DAA (i64)
+28: SUB Eb,Gb
+29: SUB Ev,Gv
+2a: SUB Gb,Eb
+2b: SUB Gv,Ev
+2c: SUB AL,Ib
+2d: SUB rAX,Iz
+2e: SEG=CS (Prefix)
+2f: DAS (i64)
+# 0x30 - 0x3f
+30: XOR Eb,Gb
+31: XOR Ev,Gv
+32: XOR Gb,Eb
+33: XOR Gv,Ev
+34: XOR AL,Ib
+35: XOR rAX,Iz
+36: SEG=SS (Prefix)
+37: AAA (i64)
+38: CMP Eb,Gb
+39: CMP Ev,Gv
+3a: CMP Gb,Eb
+3b: CMP Gv,Ev
+3c: CMP AL,Ib
+3d: CMP rAX,Iz
+3e: SEG=DS (Prefix)
+3f: AAS (i64)
+# 0x40 - 0x4f
+40: INC eAX (i64) | REX (o64)
+41: INC eCX (i64) | REX.B (o64)
+42: INC eDX (i64) | REX.X (o64)
+43: INC eBX (i64) | REX.XB (o64)
+44: INC eSP (i64) | REX.R (o64)
+45: INC eBP (i64) | REX.RB (o64)
+46: INC eSI (i64) | REX.RX (o64)
+47: INC eDI (i64) | REX.RXB (o64)
+48: DEC eAX (i64) | REX.W (o64)
+49: DEC eCX (i64) | REX.WB (o64)
+4a: DEC eDX (i64) | REX.WX (o64)
+4b: DEC eBX (i64) | REX.WXB (o64)
+4c: DEC eSP (i64) | REX.WR (o64)
+4d: DEC eBP (i64) | REX.WRB (o64)
+4e: DEC eSI (i64) | REX.WRX (o64)
+4f: DEC eDI (i64) | REX.WRXB (o64)
+# 0x50 - 0x5f
+50: PUSH rAX/r8 (d64)
+51: PUSH rCX/r9 (d64)
+52: PUSH rDX/r10 (d64)
+53: PUSH rBX/r11 (d64)
+54: PUSH rSP/r12 (d64)
+55: PUSH rBP/r13 (d64)
+56: PUSH rSI/r14 (d64)
+57: PUSH rDI/r15 (d64)
+58: POP rAX/r8 (d64)
+59: POP rCX/r9 (d64)
+5a: POP rDX/r10 (d64)
+5b: POP rBX/r11 (d64)
+5c: POP rSP/r12 (d64)
+5d: POP rBP/r13 (d64)
+5e: POP rSI/r14 (d64)
+5f: POP rDI/r15 (d64)
+# 0x60 - 0x6f
+60: PUSHA/PUSHAD (i64)
+61: POPA/POPAD (i64)
+62: BOUND Gv,Ma (i64)
+63: ARPL Ew,Gw (i64) | MOVSXD Gv,Ev (o64)
+64: SEG=FS (Prefix)
+65: SEG=GS (Prefix)
+66: Operand-Size (Prefix)
+67: Address-Size (Prefix)
+68: PUSH Iz (d64)
+69: IMUL Gv,Ev,Iz
+6a: PUSH Ib (d64)
+6b: IMUL Gv,Ev,Ib
+6c: INS/INSB Yb,DX
+6d: INS/INSW/INSD Yz,DX
+6e: OUTS/OUTSB DX,Xb
+6f: OUTS/OUTSW/OUTSD DX,Xz
+# 0x70 - 0x7f
+70: JO Jb
+71: JNO Jb
+72: JB/JNAE/JC Jb
+73: JNB/JAE/JNC Jb
+74: JZ/JE Jb
+75: JNZ/JNE Jb
+76: JBE/JNA Jb
+77: JNBE/JA Jb
+78: JS Jb
+79: JNS Jb
+7a: JP/JPE Jb
+7b: JNP/JPO Jb
+7c: JL/JNGE Jb
+7d: JNL/JGE Jb
+7e: JLE/JNG Jb
+7f: JNLE/JG Jb
+# 0x80 - 0x8f
+80: Grp1 Eb,Ib (1A)
+81: Grp1 Ev,Iz (1A)
+82: Grp1 Eb,Ib (1A),(i64)
+83: Grp1 Ev,Ib (1A)
+84: TEST Eb,Gb
+85: TEST Ev,Gv
+86: XCHG Eb,Gb
+87: XCHG Ev,Gv
+88: MOV Eb,Gb
+89: MOV Ev,Gv
+8a: MOV Gb,Eb
+8b: MOV Gv,Ev
+8c: MOV Ev,Sw
+8d: LEA Gv,M
+8e: MOV Sw,Ew
+8f: Grp1A (1A) | POP Ev (d64)
+# 0x90 - 0x9f
+90: NOP | PAUSE (F3) | XCHG r8,rAX
+91: XCHG rCX/r9,rAX
+92: XCHG rDX/r10,rAX
+93: XCHG rBX/r11,rAX
+94: XCHG rSP/r12,rAX
+95: XCHG rBP/r13,rAX
+96: XCHG rSI/r14,rAX
+97: XCHG rDI/r15,rAX
+98: CBW/CWDE/CDQE
+99: CWD/CDQ/CQO
+9a: CALLF Ap (i64)
+9b: FWAIT/WAIT
+9c: PUSHF/D/Q Fv (d64)
+9d: POPF/D/Q Fv (d64)
+9e: SAHF
+9f: LAHF
+# 0xa0 - 0xaf
+a0: MOV AL,Ob
+a1: MOV rAX,Ov
+a2: MOV Ob,AL
+a3: MOV Ov,rAX
+a4: MOVS/B Yb,Xb
+a5: MOVS/W/D/Q Yv,Xv
+a6: CMPS/B Xb,Yb
+a7: CMPS/W/D Xv,Yv
+a8: TEST AL,Ib
+a9: TEST rAX,Iz
+aa: STOS/B Yb,AL
+ab: STOS/W/D/Q Yv,rAX
+ac: LODS/B AL,Xb
+ad: LODS/W/D/Q rAX,Xv
+ae: SCAS/B AL,Yb
+# Note: The May 2011 Intel manual shows Xv for the second parameter of the
+# next instruction but Yv is correct
+af: SCAS/W/D/Q rAX,Yv
+# 0xb0 - 0xbf
+b0: MOV AL/R8L,Ib
+b1: MOV CL/R9L,Ib
+b2: MOV DL/R10L,Ib
+b3: MOV BL/R11L,Ib
+b4: MOV AH/R12L,Ib
+b5: MOV CH/R13L,Ib
+b6: MOV DH/R14L,Ib
+b7: MOV BH/R15L,Ib
+b8: MOV rAX/r8,Iv
+b9: MOV rCX/r9,Iv
+ba: MOV rDX/r10,Iv
+bb: MOV rBX/r11,Iv
+bc: MOV rSP/r12,Iv
+bd: MOV rBP/r13,Iv
+be: MOV rSI/r14,Iv
+bf: MOV rDI/r15,Iv
+# 0xc0 - 0xcf
+c0: Grp2 Eb,Ib (1A)
+c1: Grp2 Ev,Ib (1A)
+c2: RETN Iw (f64)
+c3: RETN
+c4: LES Gz,Mp (i64) | VEX+2byte (Prefix)
+c5: LDS Gz,Mp (i64) | VEX+1byte (Prefix)
+c6: Grp11A Eb,Ib (1A)
+c7: Grp11B Ev,Iz (1A)
+c8: ENTER Iw,Ib
+c9: LEAVE (d64)
+ca: RETF Iw
+cb: RETF
+cc: INT3
+cd: INT Ib
+ce: INTO (i64)
+cf: IRET/D/Q
+# 0xd0 - 0xdf
+d0: Grp2 Eb,1 (1A)
+d1: Grp2 Ev,1 (1A)
+d2: Grp2 Eb,CL (1A)
+d3: Grp2 Ev,CL (1A)
+d4: AAM Ib (i64)
+d5: AAD Ib (i64)
+d6:
+d7: XLAT/XLATB
+d8: ESC
+d9: ESC
+da: ESC
+db: ESC
+dc: ESC
+dd: ESC
+de: ESC
+df: ESC
+# 0xe0 - 0xef
+# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
+# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
+# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
+e0: LOOPNE/LOOPNZ Jb (f64)
+e1: LOOPE/LOOPZ Jb (f64)
+e2: LOOP Jb (f64)
+e3: JrCXZ Jb (f64)
+e4: IN AL,Ib
+e5: IN eAX,Ib
+e6: OUT Ib,AL
+e7: OUT Ib,eAX
+# With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
+# in "near" jumps and calls is 16-bit. For CALL,
+# push of return address is 16-bit wide, RSP is decremented by 2
+# but is not truncated to 16 bits, unlike RIP.
+e8: CALL Jz (f64)
+e9: JMP-near Jz (f64)
+ea: JMP-far Ap (i64)
+eb: JMP-short Jb (f64)
+ec: IN AL,DX
+ed: IN eAX,DX
+ee: OUT DX,AL
+ef: OUT DX,eAX
+# 0xf0 - 0xff
+f0: LOCK (Prefix)
+f1:
+f2: REPNE (Prefix) | XACQUIRE (Prefix)
+f3: REP/REPE (Prefix) | XRELEASE (Prefix)
+f4: HLT
+f5: CMC
+f6: Grp3_1 Eb (1A)
+f7: Grp3_2 Ev (1A)
+f8: CLC
+f9: STC
+fa: CLI
+fb: STI
+fc: CLD
+fd: STD
+fe: Grp4 (1A)
+ff: Grp5 (1A)
+EndTable
+
+Table: 2-byte opcode (0x0f)
+Referrer: 2-byte escape
+AVXcode: 1
+# 0x0f 0x00-0x0f
+00: Grp6 (1A)
+01: Grp7 (1A)
+02: LAR Gv,Ew
+03: LSL Gv,Ew
+04:
+05: SYSCALL (o64)
+06: CLTS
+07: SYSRET (o64)
+08: INVD
+09: WBINVD
+0a:
+0b: UD2 (1B)
+0c:
+# AMD's prefetch group. Intel supports prefetchw(/1) only.
+0d: GrpP
+0e: FEMMS
+# 3DNow! uses the last imm byte as opcode extension.
+0f: 3DNow! Pq,Qq,Ib
+# 0x0f 0x10-0x1f
+# NOTE: According to Intel SDM opcode map, vmovups and vmovupd has no operands
+# but it actually has operands. And also, vmovss and vmovsd only accept 128bit.
+# MOVSS/MOVSD has too many forms(3) on SDM. This map just shows a typical form.
+# Many AVX instructions lack v1 superscript, according to Intel AVX-Prgramming
+# Reference A.1
+10: vmovups Vps,Wps | vmovupd Vpd,Wpd (66) | vmovss Vx,Hx,Wss (F3),(v1) | vmovsd Vx,Hx,Wsd (F2),(v1)
+11: vmovups Wps,Vps | vmovupd Wpd,Vpd (66) | vmovss Wss,Hx,Vss (F3),(v1) | vmovsd Wsd,Hx,Vsd (F2),(v1)
+12: vmovlps Vq,Hq,Mq (v1) | vmovhlps Vq,Hq,Uq (v1) | vmovlpd Vq,Hq,Mq (66),(v1) | vmovsldup Vx,Wx (F3) | vmovddup Vx,Wx (F2)
+13: vmovlps Mq,Vq (v1) | vmovlpd Mq,Vq (66),(v1)
+14: vunpcklps Vx,Hx,Wx | vunpcklpd Vx,Hx,Wx (66)
+15: vunpckhps Vx,Hx,Wx | vunpckhpd Vx,Hx,Wx (66)
+16: vmovhps Vdq,Hq,Mq (v1) | vmovlhps Vdq,Hq,Uq (v1) | vmovhpd Vdq,Hq,Mq (66),(v1) | vmovshdup Vx,Wx (F3)
+17: vmovhps Mq,Vq (v1) | vmovhpd Mq,Vq (66),(v1)
+18: Grp16 (1A)
+19:
+1a: BNDCL Ev,Gv | BNDCU Ev,Gv | BNDMOV Gv,Ev | BNDLDX Gv,Ev,Gv
+1b: BNDCN Ev,Gv | BNDMOV Ev,Gv | BNDMK Gv,Ev | BNDSTX Ev,GV,Gv
+1c:
+1d:
+1e:
+1f: NOP Ev
+# 0x0f 0x20-0x2f
+20: MOV Rd,Cd
+21: MOV Rd,Dd
+22: MOV Cd,Rd
+23: MOV Dd,Rd
+24:
+25:
+26:
+27:
+28: vmovaps Vps,Wps | vmovapd Vpd,Wpd (66)
+29: vmovaps Wps,Vps | vmovapd Wpd,Vpd (66)
+2a: cvtpi2ps Vps,Qpi | cvtpi2pd Vpd,Qpi (66) | vcvtsi2ss Vss,Hss,Ey (F3),(v1) | vcvtsi2sd Vsd,Hsd,Ey (F2),(v1)
+2b: vmovntps Mps,Vps | vmovntpd Mpd,Vpd (66)
+2c: cvttps2pi Ppi,Wps | cvttpd2pi Ppi,Wpd (66) | vcvttss2si Gy,Wss (F3),(v1) | vcvttsd2si Gy,Wsd (F2),(v1)
+2d: cvtps2pi Ppi,Wps | cvtpd2pi Qpi,Wpd (66) | vcvtss2si Gy,Wss (F3),(v1) | vcvtsd2si Gy,Wsd (F2),(v1)
+2e: vucomiss Vss,Wss (v1) | vucomisd  Vsd,Wsd (66),(v1)
+2f: vcomiss Vss,Wss (v1) | vcomisd  Vsd,Wsd (66),(v1)
+# 0x0f 0x30-0x3f
+30: WRMSR
+31: RDTSC
+32: RDMSR
+33: RDPMC
+34: SYSENTER
+35: SYSEXIT
+36:
+37: GETSEC
+38: escape # 3-byte escape 1
+39:
+3a: escape # 3-byte escape 2
+3b:
+3c:
+3d:
+3e:
+3f:
+# 0x0f 0x40-0x4f
+40: CMOVO Gv,Ev
+41: CMOVNO Gv,Ev
+42: CMOVB/C/NAE Gv,Ev
+43: CMOVAE/NB/NC Gv,Ev
+44: CMOVE/Z Gv,Ev
+45: CMOVNE/NZ Gv,Ev
+46: CMOVBE/NA Gv,Ev
+47: CMOVA/NBE Gv,Ev
+48: CMOVS Gv,Ev
+49: CMOVNS Gv,Ev
+4a: CMOVP/PE Gv,Ev
+4b: CMOVNP/PO Gv,Ev
+4c: CMOVL/NGE Gv,Ev
+4d: CMOVNL/GE Gv,Ev
+4e: CMOVLE/NG Gv,Ev
+4f: CMOVNLE/G Gv,Ev
+# 0x0f 0x50-0x5f
+50: vmovmskps Gy,Ups | vmovmskpd Gy,Upd (66)
+51: vsqrtps Vps,Wps | vsqrtpd Vpd,Wpd (66) | vsqrtss Vss,Hss,Wss (F3),(v1) | vsqrtsd Vsd,Hsd,Wsd (F2),(v1)
+52: vrsqrtps Vps,Wps | vrsqrtss Vss,Hss,Wss (F3),(v1)
+53: vrcpps Vps,Wps | vrcpss Vss,Hss,Wss (F3),(v1)
+54: vandps Vps,Hps,Wps | vandpd Vpd,Hpd,Wpd (66)
+55: vandnps Vps,Hps,Wps | vandnpd Vpd,Hpd,Wpd (66)
+56: vorps Vps,Hps,Wps | vorpd Vpd,Hpd,Wpd (66)
+57: vxorps Vps,Hps,Wps | vxorpd Vpd,Hpd,Wpd (66)
+58: vaddps Vps,Hps,Wps | vaddpd Vpd,Hpd,Wpd (66) | vaddss Vss,Hss,Wss (F3),(v1) | vaddsd Vsd,Hsd,Wsd (F2),(v1)
+59: vmulps Vps,Hps,Wps | vmulpd Vpd,Hpd,Wpd (66) | vmulss Vss,Hss,Wss (F3),(v1) | vmulsd Vsd,Hsd,Wsd (F2),(v1)
+5a: vcvtps2pd Vpd,Wps | vcvtpd2ps Vps,Wpd (66) | vcvtss2sd Vsd,Hx,Wss (F3),(v1) | vcvtsd2ss Vss,Hx,Wsd (F2),(v1)
+5b: vcvtdq2ps Vps,Wdq | vcvtps2dq Vdq,Wps (66) | vcvttps2dq Vdq,Wps (F3)
+5c: vsubps Vps,Hps,Wps | vsubpd Vpd,Hpd,Wpd (66) | vsubss Vss,Hss,Wss (F3),(v1) | vsubsd Vsd,Hsd,Wsd (F2),(v1)
+5d: vminps Vps,Hps,Wps | vminpd Vpd,Hpd,Wpd (66) | vminss Vss,Hss,Wss (F3),(v1) | vminsd Vsd,Hsd,Wsd (F2),(v1)
+5e: vdivps Vps,Hps,Wps | vdivpd Vpd,Hpd,Wpd (66) | vdivss Vss,Hss,Wss (F3),(v1) | vdivsd Vsd,Hsd,Wsd (F2),(v1)
+5f: vmaxps Vps,Hps,Wps | vmaxpd Vpd,Hpd,Wpd (66) | vmaxss Vss,Hss,Wss (F3),(v1) | vmaxsd Vsd,Hsd,Wsd (F2),(v1)
+# 0x0f 0x60-0x6f
+60: punpcklbw Pq,Qd | vpunpcklbw Vx,Hx,Wx (66),(v1)
+61: punpcklwd Pq,Qd | vpunpcklwd Vx,Hx,Wx (66),(v1)
+62: punpckldq Pq,Qd | vpunpckldq Vx,Hx,Wx (66),(v1)
+63: packsswb Pq,Qq | vpacksswb Vx,Hx,Wx (66),(v1)
+64: pcmpgtb Pq,Qq | vpcmpgtb Vx,Hx,Wx (66),(v1)
+65: pcmpgtw Pq,Qq | vpcmpgtw Vx,Hx,Wx (66),(v1)
+66: pcmpgtd Pq,Qq | vpcmpgtd Vx,Hx,Wx (66),(v1)
+67: packuswb Pq,Qq | vpackuswb Vx,Hx,Wx (66),(v1)
+68: punpckhbw Pq,Qd | vpunpckhbw Vx,Hx,Wx (66),(v1)
+69: punpckhwd Pq,Qd | vpunpckhwd Vx,Hx,Wx (66),(v1)
+6a: punpckhdq Pq,Qd | vpunpckhdq Vx,Hx,Wx (66),(v1)
+6b: packssdw Pq,Qd | vpackssdw Vx,Hx,Wx (66),(v1)
+6c: vpunpcklqdq Vx,Hx,Wx (66),(v1)
+6d: vpunpckhqdq Vx,Hx,Wx (66),(v1)
+6e: movd/q Pd,Ey | vmovd/q Vy,Ey (66),(v1)
+6f: movq Pq,Qq | vmovdqa Vx,Wx (66) | vmovdqu Vx,Wx (F3)
+# 0x0f 0x70-0x7f
+70: pshufw Pq,Qq,Ib | vpshufd Vx,Wx,Ib (66),(v1) | vpshufhw Vx,Wx,Ib (F3),(v1) | vpshuflw Vx,Wx,Ib (F2),(v1)
+71: Grp12 (1A)
+72: Grp13 (1A)
+73: Grp14 (1A)
+74: pcmpeqb Pq,Qq | vpcmpeqb Vx,Hx,Wx (66),(v1)
+75: pcmpeqw Pq,Qq | vpcmpeqw Vx,Hx,Wx (66),(v1)
+76: pcmpeqd Pq,Qq | vpcmpeqd Vx,Hx,Wx (66),(v1)
+# Note: Remove (v), because vzeroall and vzeroupper becomes emms without VEX.
+77: emms | vzeroupper | vzeroall
+78: VMREAD Ey,Gy
+79: VMWRITE Gy,Ey
+7a:
+7b:
+7c: vhaddpd Vpd,Hpd,Wpd (66) | vhaddps Vps,Hps,Wps (F2)
+7d: vhsubpd Vpd,Hpd,Wpd (66) | vhsubps Vps,Hps,Wps (F2)
+7e: movd/q Ey,Pd | vmovd/q Ey,Vy (66),(v1) | vmovq Vq,Wq (F3),(v1)
+7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqu Wx,Vx (F3)
+# 0x0f 0x80-0x8f
+# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
+80: JO Jz (f64)
+81: JNO Jz (f64)
+82: JB/JC/JNAE Jz (f64)
+83: JAE/JNB/JNC Jz (f64)
+84: JE/JZ Jz (f64)
+85: JNE/JNZ Jz (f64)
+86: JBE/JNA Jz (f64)
+87: JA/JNBE Jz (f64)
+88: JS Jz (f64)
+89: JNS Jz (f64)
+8a: JP/JPE Jz (f64)
+8b: JNP/JPO Jz (f64)
+8c: JL/JNGE Jz (f64)
+8d: JNL/JGE Jz (f64)
+8e: JLE/JNG Jz (f64)
+8f: JNLE/JG Jz (f64)
+# 0x0f 0x90-0x9f
+90: SETO Eb
+91: SETNO Eb
+92: SETB/C/NAE Eb
+93: SETAE/NB/NC Eb
+94: SETE/Z Eb
+95: SETNE/NZ Eb
+96: SETBE/NA Eb
+97: SETA/NBE Eb
+98: SETS Eb
+99: SETNS Eb
+9a: SETP/PE Eb
+9b: SETNP/PO Eb
+9c: SETL/NGE Eb
+9d: SETNL/GE Eb
+9e: SETLE/NG Eb
+9f: SETNLE/G Eb
+# 0x0f 0xa0-0xaf
+a0: PUSH FS (d64)
+a1: POP FS (d64)
+a2: CPUID
+a3: BT Ev,Gv
+a4: SHLD Ev,Gv,Ib
+a5: SHLD Ev,Gv,CL
+a6: GrpPDLK
+a7: GrpRNG
+a8: PUSH GS (d64)
+a9: POP GS (d64)
+aa: RSM
+ab: BTS Ev,Gv
+ac: SHRD Ev,Gv,Ib
+ad: SHRD Ev,Gv,CL
+ae: Grp15 (1A),(1C)
+af: IMUL Gv,Ev
+# 0x0f 0xb0-0xbf
+b0: CMPXCHG Eb,Gb
+b1: CMPXCHG Ev,Gv
+b2: LSS Gv,Mp
+b3: BTR Ev,Gv
+b4: LFS Gv,Mp
+b5: LGS Gv,Mp
+b6: MOVZX Gv,Eb
+b7: MOVZX Gv,Ew
+b8: JMPE (!F3) | POPCNT Gv,Ev (F3)
+b9: Grp10 (1A)
+ba: Grp8 Ev,Ib (1A)
+bb: BTC Ev,Gv
+bc: BSF Gv,Ev (!F3) | TZCNT Gv,Ev (F3)
+bd: BSR Gv,Ev (!F3) | LZCNT Gv,Ev (F3)
+be: MOVSX Gv,Eb
+bf: MOVSX Gv,Ew
+# 0x0f 0xc0-0xcf
+c0: XADD Eb,Gb
+c1: XADD Ev,Gv
+c2: vcmpps Vps,Hps,Wps,Ib | vcmppd Vpd,Hpd,Wpd,Ib (66) | vcmpss Vss,Hss,Wss,Ib (F3),(v1) | vcmpsd Vsd,Hsd,Wsd,Ib (F2),(v1)
+c3: movnti My,Gy
+c4: pinsrw Pq,Ry/Mw,Ib | vpinsrw Vdq,Hdq,Ry/Mw,Ib (66),(v1)
+c5: pextrw Gd,Nq,Ib | vpextrw Gd,Udq,Ib (66),(v1)
+c6: vshufps Vps,Hps,Wps,Ib | vshufpd Vpd,Hpd,Wpd,Ib (66)
+c7: Grp9 (1A)
+c8: BSWAP RAX/EAX/R8/R8D
+c9: BSWAP RCX/ECX/R9/R9D
+ca: BSWAP RDX/EDX/R10/R10D
+cb: BSWAP RBX/EBX/R11/R11D
+cc: BSWAP RSP/ESP/R12/R12D
+cd: BSWAP RBP/EBP/R13/R13D
+ce: BSWAP RSI/ESI/R14/R14D
+cf: BSWAP RDI/EDI/R15/R15D
+# 0x0f 0xd0-0xdf
+d0: vaddsubpd Vpd,Hpd,Wpd (66) | vaddsubps Vps,Hps,Wps (F2)
+d1: psrlw Pq,Qq | vpsrlw Vx,Hx,Wx (66),(v1)
+d2: psrld Pq,Qq | vpsrld Vx,Hx,Wx (66),(v1)
+d3: psrlq Pq,Qq | vpsrlq Vx,Hx,Wx (66),(v1)
+d4: paddq Pq,Qq | vpaddq Vx,Hx,Wx (66),(v1)
+d5: pmullw Pq,Qq | vpmullw Vx,Hx,Wx (66),(v1)
+d6: vmovq Wq,Vq (66),(v1) | movq2dq Vdq,Nq (F3) | movdq2q Pq,Uq (F2)
+d7: pmovmskb Gd,Nq | vpmovmskb Gd,Ux (66),(v1)
+d8: psubusb Pq,Qq | vpsubusb Vx,Hx,Wx (66),(v1)
+d9: psubusw Pq,Qq | vpsubusw Vx,Hx,Wx (66),(v1)
+da: pminub Pq,Qq | vpminub Vx,Hx,Wx (66),(v1)
+db: pand Pq,Qq | vpand Vx,Hx,Wx (66),(v1)
+dc: paddusb Pq,Qq | vpaddusb Vx,Hx,Wx (66),(v1)
+dd: paddusw Pq,Qq | vpaddusw Vx,Hx,Wx (66),(v1)
+de: pmaxub Pq,Qq | vpmaxub Vx,Hx,Wx (66),(v1)
+df: pandn Pq,Qq | vpandn Vx,Hx,Wx (66),(v1)
+# 0x0f 0xe0-0xef
+e0: pavgb Pq,Qq | vpavgb Vx,Hx,Wx (66),(v1)
+e1: psraw Pq,Qq | vpsraw Vx,Hx,Wx (66),(v1)
+e2: psrad Pq,Qq | vpsrad Vx,Hx,Wx (66),(v1)
+e3: pavgw Pq,Qq | vpavgw Vx,Hx,Wx (66),(v1)
+e4: pmulhuw Pq,Qq | vpmulhuw Vx,Hx,Wx (66),(v1)
+e5: pmulhw Pq,Qq | vpmulhw Vx,Hx,Wx (66),(v1)
+e6: vcvttpd2dq Vx,Wpd (66) | vcvtdq2pd Vx,Wdq (F3) | vcvtpd2dq Vx,Wpd (F2)
+e7: movntq Mq,Pq | vmovntdq Mx,Vx (66)
+e8: psubsb Pq,Qq | vpsubsb Vx,Hx,Wx (66),(v1)
+e9: psubsw Pq,Qq | vpsubsw Vx,Hx,Wx (66),(v1)
+ea: pminsw Pq,Qq | vpminsw Vx,Hx,Wx (66),(v1)
+eb: por Pq,Qq | vpor Vx,Hx,Wx (66),(v1)
+ec: paddsb Pq,Qq | vpaddsb Vx,Hx,Wx (66),(v1)
+ed: paddsw Pq,Qq | vpaddsw Vx,Hx,Wx (66),(v1)
+ee: pmaxsw Pq,Qq | vpmaxsw Vx,Hx,Wx (66),(v1)
+ef: pxor Pq,Qq | vpxor Vx,Hx,Wx (66),(v1)
+# 0x0f 0xf0-0xff
+f0: vlddqu Vx,Mx (F2)
+f1: psllw Pq,Qq | vpsllw Vx,Hx,Wx (66),(v1)
+f2: pslld Pq,Qq | vpslld Vx,Hx,Wx (66),(v1)
+f3: psllq Pq,Qq | vpsllq Vx,Hx,Wx (66),(v1)
+f4: pmuludq Pq,Qq | vpmuludq Vx,Hx,Wx (66),(v1)
+f5: pmaddwd Pq,Qq | vpmaddwd Vx,Hx,Wx (66),(v1)
+f6: psadbw Pq,Qq | vpsadbw Vx,Hx,Wx (66),(v1)
+f7: maskmovq Pq,Nq | vmaskmovdqu Vx,Ux (66),(v1)
+f8: psubb Pq,Qq | vpsubb Vx,Hx,Wx (66),(v1)
+f9: psubw Pq,Qq | vpsubw Vx,Hx,Wx (66),(v1)
+fa: psubd Pq,Qq | vpsubd Vx,Hx,Wx (66),(v1)
+fb: psubq Pq,Qq | vpsubq Vx,Hx,Wx (66),(v1)
+fc: paddb Pq,Qq | vpaddb Vx,Hx,Wx (66),(v1)
+fd: paddw Pq,Qq | vpaddw Vx,Hx,Wx (66),(v1)
+fe: paddd Pq,Qq | vpaddd Vx,Hx,Wx (66),(v1)
+ff:
+EndTable
+
+Table: 3-byte opcode 1 (0x0f 0x38)
+Referrer: 3-byte escape 1
+AVXcode: 2
+# 0x0f 0x38 0x00-0x0f
+00: pshufb Pq,Qq | vpshufb Vx,Hx,Wx (66),(v1)
+01: phaddw Pq,Qq | vphaddw Vx,Hx,Wx (66),(v1)
+02: phaddd Pq,Qq | vphaddd Vx,Hx,Wx (66),(v1)
+03: phaddsw Pq,Qq | vphaddsw Vx,Hx,Wx (66),(v1)
+04: pmaddubsw Pq,Qq | vpmaddubsw Vx,Hx,Wx (66),(v1)
+05: phsubw Pq,Qq | vphsubw Vx,Hx,Wx (66),(v1)
+06: phsubd Pq,Qq | vphsubd Vx,Hx,Wx (66),(v1)
+07: phsubsw Pq,Qq | vphsubsw Vx,Hx,Wx (66),(v1)
+08: psignb Pq,Qq | vpsignb Vx,Hx,Wx (66),(v1)
+09: psignw Pq,Qq | vpsignw Vx,Hx,Wx (66),(v1)
+0a: psignd Pq,Qq | vpsignd Vx,Hx,Wx (66),(v1)
+0b: pmulhrsw Pq,Qq | vpmulhrsw Vx,Hx,Wx (66),(v1)
+0c: vpermilps Vx,Hx,Wx (66),(v)
+0d: vpermilpd Vx,Hx,Wx (66),(v)
+0e: vtestps Vx,Wx (66),(v)
+0f: vtestpd Vx,Wx (66),(v)
+# 0x0f 0x38 0x10-0x1f
+10: pblendvb Vdq,Wdq (66)
+11:
+12:
+13: vcvtph2ps Vx,Wx,Ib (66),(v)
+14: blendvps Vdq,Wdq (66)
+15: blendvpd Vdq,Wdq (66)
+16: vpermps Vqq,Hqq,Wqq (66),(v)
+17: vptest Vx,Wx (66)
+18: vbroadcastss Vx,Wd (66),(v)
+19: vbroadcastsd Vqq,Wq (66),(v)
+1a: vbroadcastf128 Vqq,Mdq (66),(v)
+1b:
+1c: pabsb Pq,Qq | vpabsb Vx,Wx (66),(v1)
+1d: pabsw Pq,Qq | vpabsw Vx,Wx (66),(v1)
+1e: pabsd Pq,Qq | vpabsd Vx,Wx (66),(v1)
+1f:
+# 0x0f 0x38 0x20-0x2f
+20: vpmovsxbw Vx,Ux/Mq (66),(v1)
+21: vpmovsxbd Vx,Ux/Md (66),(v1)
+22: vpmovsxbq Vx,Ux/Mw (66),(v1)
+23: vpmovsxwd Vx,Ux/Mq (66),(v1)
+24: vpmovsxwq Vx,Ux/Md (66),(v1)
+25: vpmovsxdq Vx,Ux/Mq (66),(v1)
+26:
+27:
+28: vpmuldq Vx,Hx,Wx (66),(v1)
+29: vpcmpeqq Vx,Hx,Wx (66),(v1)
+2a: vmovntdqa Vx,Mx (66),(v1)
+2b: vpackusdw Vx,Hx,Wx (66),(v1)
+2c: vmaskmovps Vx,Hx,Mx (66),(v)
+2d: vmaskmovpd Vx,Hx,Mx (66),(v)
+2e: vmaskmovps Mx,Hx,Vx (66),(v)
+2f: vmaskmovpd Mx,Hx,Vx (66),(v)
+# 0x0f 0x38 0x30-0x3f
+30: vpmovzxbw Vx,Ux/Mq (66),(v1)
+31: vpmovzxbd Vx,Ux/Md (66),(v1)
+32: vpmovzxbq Vx,Ux/Mw (66),(v1)
+33: vpmovzxwd Vx,Ux/Mq (66),(v1)
+34: vpmovzxwq Vx,Ux/Md (66),(v1)
+35: vpmovzxdq Vx,Ux/Mq (66),(v1)
+36: vpermd Vqq,Hqq,Wqq (66),(v)
+37: vpcmpgtq Vx,Hx,Wx (66),(v1)
+38: vpminsb Vx,Hx,Wx (66),(v1)
+39: vpminsd Vx,Hx,Wx (66),(v1)
+3a: vpminuw Vx,Hx,Wx (66),(v1)
+3b: vpminud Vx,Hx,Wx (66),(v1)
+3c: vpmaxsb Vx,Hx,Wx (66),(v1)
+3d: vpmaxsd Vx,Hx,Wx (66),(v1)
+3e: vpmaxuw Vx,Hx,Wx (66),(v1)
+3f: vpmaxud Vx,Hx,Wx (66),(v1)
+# 0x0f 0x38 0x40-0x8f
+40: vpmulld Vx,Hx,Wx (66),(v1)
+41: vphminposuw Vdq,Wdq (66),(v1)
+42:
+43:
+44:
+45: vpsrlvd/q Vx,Hx,Wx (66),(v)
+46: vpsravd Vx,Hx,Wx (66),(v)
+47: vpsllvd/q Vx,Hx,Wx (66),(v)
+# Skip 0x48-0x57
+58: vpbroadcastd Vx,Wx (66),(v)
+59: vpbroadcastq Vx,Wx (66),(v)
+5a: vbroadcasti128 Vqq,Mdq (66),(v)
+# Skip 0x5b-0x77
+78: vpbroadcastb Vx,Wx (66),(v)
+79: vpbroadcastw Vx,Wx (66),(v)
+# Skip 0x7a-0x7f
+80: INVEPT Gy,Mdq (66)
+81: INVPID Gy,Mdq (66)
+82: INVPCID Gy,Mdq (66)
+8c: vpmaskmovd/q Vx,Hx,Mx (66),(v)
+8e: vpmaskmovd/q Mx,Vx,Hx (66),(v)
+# 0x0f 0x38 0x90-0xbf (FMA)
+90: vgatherdd/q Vx,Hx,Wx (66),(v)
+91: vgatherqd/q Vx,Hx,Wx (66),(v)
+92: vgatherdps/d Vx,Hx,Wx (66),(v)
+93: vgatherqps/d Vx,Hx,Wx (66),(v)
+94:
+95:
+96: vfmaddsub132ps/d Vx,Hx,Wx (66),(v)
+97: vfmsubadd132ps/d Vx,Hx,Wx (66),(v)
+98: vfmadd132ps/d Vx,Hx,Wx (66),(v)
+99: vfmadd132ss/d Vx,Hx,Wx (66),(v),(v1)
+9a: vfmsub132ps/d Vx,Hx,Wx (66),(v)
+9b: vfmsub132ss/d Vx,Hx,Wx (66),(v),(v1)
+9c: vfnmadd132ps/d Vx,Hx,Wx (66),(v)
+9d: vfnmadd132ss/d Vx,Hx,Wx (66),(v),(v1)
+9e: vfnmsub132ps/d Vx,Hx,Wx (66),(v)
+9f: vfnmsub132ss/d Vx,Hx,Wx (66),(v),(v1)
+a6: vfmaddsub213ps/d Vx,Hx,Wx (66),(v)
+a7: vfmsubadd213ps/d Vx,Hx,Wx (66),(v)
+a8: vfmadd213ps/d Vx,Hx,Wx (66),(v)
+a9: vfmadd213ss/d Vx,Hx,Wx (66),(v),(v1)
+aa: vfmsub213ps/d Vx,Hx,Wx (66),(v)
+ab: vfmsub213ss/d Vx,Hx,Wx (66),(v),(v1)
+ac: vfnmadd213ps/d Vx,Hx,Wx (66),(v)
+ad: vfnmadd213ss/d Vx,Hx,Wx (66),(v),(v1)
+ae: vfnmsub213ps/d Vx,Hx,Wx (66),(v)
+af: vfnmsub213ss/d Vx,Hx,Wx (66),(v),(v1)
+b6: vfmaddsub231ps/d Vx,Hx,Wx (66),(v)
+b7: vfmsubadd231ps/d Vx,Hx,Wx (66),(v)
+b8: vfmadd231ps/d Vx,Hx,Wx (66),(v)
+b9: vfmadd231ss/d Vx,Hx,Wx (66),(v),(v1)
+ba: vfmsub231ps/d Vx,Hx,Wx (66),(v)
+bb: vfmsub231ss/d Vx,Hx,Wx (66),(v),(v1)
+bc: vfnmadd231ps/d Vx,Hx,Wx (66),(v)
+bd: vfnmadd231ss/d Vx,Hx,Wx (66),(v),(v1)
+be: vfnmsub231ps/d Vx,Hx,Wx (66),(v)
+bf: vfnmsub231ss/d Vx,Hx,Wx (66),(v),(v1)
+# 0x0f 0x38 0xc0-0xff
+db: VAESIMC Vdq,Wdq (66),(v1)
+dc: VAESENC Vdq,Hdq,Wdq (66),(v1)
+dd: VAESENCLAST Vdq,Hdq,Wdq (66),(v1)
+de: VAESDEC Vdq,Hdq,Wdq (66),(v1)
+df: VAESDECLAST Vdq,Hdq,Wdq (66),(v1)
+f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
+f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
+f2: ANDN Gy,By,Ey (v)
+f3: Grp17 (1A)
+f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v)
+f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v)
+f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
+EndTable
+
+Table: 3-byte opcode 2 (0x0f 0x3a)
+Referrer: 3-byte escape 2
+AVXcode: 3
+# 0x0f 0x3a 0x00-0xff
+00: vpermq Vqq,Wqq,Ib (66),(v)
+01: vpermpd Vqq,Wqq,Ib (66),(v)
+02: vpblendd Vx,Hx,Wx,Ib (66),(v)
+03:
+04: vpermilps Vx,Wx,Ib (66),(v)
+05: vpermilpd Vx,Wx,Ib (66),(v)
+06: vperm2f128 Vqq,Hqq,Wqq,Ib (66),(v)
+07:
+08: vroundps Vx,Wx,Ib (66)
+09: vroundpd Vx,Wx,Ib (66)
+0a: vroundss Vss,Wss,Ib (66),(v1)
+0b: vroundsd Vsd,Wsd,Ib (66),(v1)
+0c: vblendps Vx,Hx,Wx,Ib (66)
+0d: vblendpd Vx,Hx,Wx,Ib (66)
+0e: vpblendw Vx,Hx,Wx,Ib (66),(v1)
+0f: palignr Pq,Qq,Ib | vpalignr Vx,Hx,Wx,Ib (66),(v1)
+14: vpextrb Rd/Mb,Vdq,Ib (66),(v1)
+15: vpextrw Rd/Mw,Vdq,Ib (66),(v1)
+16: vpextrd/q Ey,Vdq,Ib (66),(v1)
+17: vextractps Ed,Vdq,Ib (66),(v1)
+18: vinsertf128 Vqq,Hqq,Wqq,Ib (66),(v)
+19: vextractf128 Wdq,Vqq,Ib (66),(v)
+1d: vcvtps2ph Wx,Vx,Ib (66),(v)
+20: vpinsrb Vdq,Hdq,Ry/Mb,Ib (66),(v1)
+21: vinsertps Vdq,Hdq,Udq/Md,Ib (66),(v1)
+22: vpinsrd/q Vdq,Hdq,Ey,Ib (66),(v1)
+38: vinserti128 Vqq,Hqq,Wqq,Ib (66),(v)
+39: vextracti128 Wdq,Vqq,Ib (66),(v)
+40: vdpps Vx,Hx,Wx,Ib (66)
+41: vdppd Vdq,Hdq,Wdq,Ib (66),(v1)
+42: vmpsadbw Vx,Hx,Wx,Ib (66),(v1)
+44: vpclmulqdq Vdq,Hdq,Wdq,Ib (66),(v1)
+46: vperm2i128 Vqq,Hqq,Wqq,Ib (66),(v)
+4a: vblendvps Vx,Hx,Wx,Lx (66),(v)
+4b: vblendvpd Vx,Hx,Wx,Lx (66),(v)
+4c: vpblendvb Vx,Hx,Wx,Lx (66),(v1)
+60: vpcmpestrm Vdq,Wdq,Ib (66),(v1)
+61: vpcmpestri Vdq,Wdq,Ib (66),(v1)
+62: vpcmpistrm Vdq,Wdq,Ib (66),(v1)
+63: vpcmpistri Vdq,Wdq,Ib (66),(v1)
+df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
+f0: RORX Gy,Ey,Ib (F2),(v)
+EndTable
+
+GrpTable: Grp1
+0: ADD
+1: OR
+2: ADC
+3: SBB
+4: AND
+5: SUB
+6: XOR
+7: CMP
+EndTable
+
+GrpTable: Grp1A
+0: POP
+EndTable
+
+GrpTable: Grp2
+0: ROL
+1: ROR
+2: RCL
+3: RCR
+4: SHL/SAL
+5: SHR
+6:
+7: SAR
+EndTable
+
+GrpTable: Grp3_1
+0: TEST Eb,Ib
+1:
+2: NOT Eb
+3: NEG Eb
+4: MUL AL,Eb
+5: IMUL AL,Eb
+6: DIV AL,Eb
+7: IDIV AL,Eb
+EndTable
+
+GrpTable: Grp3_2
+0: TEST Ev,Iz
+1:
+2: NOT Ev
+3: NEG Ev
+4: MUL rAX,Ev
+5: IMUL rAX,Ev
+6: DIV rAX,Ev
+7: IDIV rAX,Ev
+EndTable
+
+GrpTable: Grp4
+0: INC Eb
+1: DEC Eb
+EndTable
+
+GrpTable: Grp5
+0: INC Ev
+1: DEC Ev
+# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
+2: CALLN Ev (f64)
+3: CALLF Ep
+4: JMPN Ev (f64)
+5: JMPF Mp
+6: PUSH Ev (d64)
+7:
+EndTable
+
+GrpTable: Grp6
+0: SLDT Rv/Mw
+1: STR Rv/Mw
+2: LLDT Ew
+3: LTR Ew
+4: VERR Ew
+5: VERW Ew
+EndTable
+
+GrpTable: Grp7
+0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B)
+1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B)
+2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B)
+3: LIDT Ms
+4: SMSW Mw/Rv
+5:
+6: LMSW Ew
+7: INVLPG Mb | SWAPGS (o64),(000),(11B) | RDTSCP (001),(11B)
+EndTable
+
+GrpTable: Grp8
+4: BT
+5: BTS
+6: BTR
+7: BTC
+EndTable
+
+GrpTable: Grp9
+1: CMPXCHG8B/16B Mq/Mdq
+6: VMPTRLD Mq | VMCLEAR Mq (66) | VMXON Mq (F3) | RDRAND Rv (11B)
+7: VMPTRST Mq | VMPTRST Mq (F3) | RDSEED Rv (11B)
+EndTable
+
+GrpTable: Grp10
+EndTable
+
+# Grp11A and Grp11B are expressed as Grp11 in Intel SDM
+GrpTable: Grp11A
+0: MOV Eb,Ib
+7: XABORT Ib (000),(11B)
+EndTable
+
+GrpTable: Grp11B
+0: MOV Eb,Iz
+7: XBEGIN Jz (000),(11B)
+EndTable
+
+GrpTable: Grp12
+2: psrlw Nq,Ib (11B) | vpsrlw Hx,Ux,Ib (66),(11B),(v1)
+4: psraw Nq,Ib (11B) | vpsraw Hx,Ux,Ib (66),(11B),(v1)
+6: psllw Nq,Ib (11B) | vpsllw Hx,Ux,Ib (66),(11B),(v1)
+EndTable
+
+GrpTable: Grp13
+2: psrld Nq,Ib (11B) | vpsrld Hx,Ux,Ib (66),(11B),(v1)
+4: psrad Nq,Ib (11B) | vpsrad Hx,Ux,Ib (66),(11B),(v1)
+6: pslld Nq,Ib (11B) | vpslld Hx,Ux,Ib (66),(11B),(v1)
+EndTable
+
+GrpTable: Grp14
+2: psrlq Nq,Ib (11B) | vpsrlq Hx,Ux,Ib (66),(11B),(v1)
+3: vpsrldq Hx,Ux,Ib (66),(11B),(v1)
+6: psllq Nq,Ib (11B) | vpsllq Hx,Ux,Ib (66),(11B),(v1)
+7: vpslldq Hx,Ux,Ib (66),(11B),(v1)
+EndTable
+
+GrpTable: Grp15
+0: fxsave | RDFSBASE Ry (F3),(11B)
+1: fxstor | RDGSBASE Ry (F3),(11B)
+2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
+3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+4: XSAVE
+5: XRSTOR | lfence (11B)
+6: XSAVEOPT | mfence (11B)
+7: clflush | sfence (11B)
+EndTable
+
+GrpTable: Grp16
+0: prefetch NTA
+1: prefetch T0
+2: prefetch T1
+3: prefetch T2
+EndTable
+
+GrpTable: Grp17
+1: BLSR By,Ey (v)
+2: BLSMSK By,Ey (v)
+3: BLSI By,Ey (v)
+EndTable
+
+# AMD's Prefetch Group
+GrpTable: GrpP
+0: PREFETCH
+1: PREFETCHW
+EndTable
+
+GrpTable: GrpPDLK
+0: MONTMUL
+1: XSHA1
+2: XSHA2
+EndTable
+
+GrpTable: GrpRNG
+0: xstore-rng
+1: xcrypt-ecb
+2: xcrypt-cbc
+4: xcrypt-cfb
+5: xcrypt-ofb
+EndTable
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 04/25] perf tools: Add Intel PT log
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (2 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 03/25] perf tools: Add Intel PT instruction decoder Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-20  9:58   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 05/25] perf tools: Add Intel PT decoder Adrian Hunter
                   ` (20 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add a facility to log Intel Processor Trace decoding.  The log is
intended for debugging purposes only.

The log file name is "intel_pt.log" and is opened in the current
directory.  The log contains a record of all packets and instructions
decoded and can get very large (10 MB would be a small one).

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt-decoder/Build          |   2 +-
 tools/perf/util/intel-pt-decoder/intel-pt-log.c | 155 ++++++++++++++++++++++++
 tools/perf/util/intel-pt-decoder/intel-pt-log.h |  52 ++++++++
 3 files changed, 208 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.h

diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
index 5a46ce13c1f8..3c717b420c89 100644
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -1,4 +1,4 @@
-libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o intel-pt-log.o
 
 inat_tables_script = util/intel-pt-decoder/gen-insn-attr-x86.awk
 inat_tables_maps = util/intel-pt-decoder/x86-opcode-map.txt
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.c b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
new file mode 100644
index 000000000000..d09c7d9f9050
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
@@ -0,0 +1,155 @@
+/*
+ * intel_pt_log.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdarg.h>
+#include <stdbool.h>
+#include <string.h>
+
+#include "intel-pt-log.h"
+#include "intel-pt-insn-decoder.h"
+
+#include "intel-pt-pkt-decoder.h"
+
+#define MAX_LOG_NAME 256
+
+static FILE *f;
+static char log_name[MAX_LOG_NAME];
+static bool enable_logging;
+
+void intel_pt_log_enable(void)
+{
+	enable_logging = true;
+}
+
+void intel_pt_log_disable(void)
+{
+	if (f)
+		fflush(f);
+	enable_logging = false;
+}
+
+void intel_pt_log_set_name(const char *name)
+{
+	strncpy(log_name, name, MAX_LOG_NAME - 5);
+	strcat(log_name, ".log");
+}
+
+static void intel_pt_print_data(const unsigned char *buf, int len, uint64_t pos,
+				int indent)
+{
+	int i;
+
+	for (i = 0; i < indent; i++)
+		fprintf(f, " ");
+
+	fprintf(f, "  %08" PRIx64 ": ", pos);
+	for (i = 0; i < len; i++)
+		fprintf(f, " %02x", buf[i]);
+	for (; i < 16; i++)
+		fprintf(f, "   ");
+	fprintf(f, " ");
+}
+
+static void intel_pt_print_no_data(uint64_t pos, int indent)
+{
+	int i;
+
+	for (i = 0; i < indent; i++)
+		fprintf(f, " ");
+
+	fprintf(f, "  %08" PRIx64 ": ", pos);
+	for (i = 0; i < 16; i++)
+		fprintf(f, "   ");
+	fprintf(f, " ");
+}
+
+static int intel_pt_log_open(void)
+{
+	if (!enable_logging)
+		return -1;
+
+	if (f)
+		return 0;
+
+	if (!log_name[0])
+		return -1;
+
+	f = fopen(log_name, "w+");
+	if (!f) {
+		enable_logging = false;
+		return -1;
+	}
+
+	return 0;
+}
+
+void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			 uint64_t pos, const unsigned char *buf)
+{
+	char desc[INTEL_PT_PKT_DESC_MAX];
+
+	if (intel_pt_log_open())
+		return;
+
+	intel_pt_print_data(buf, pkt_len, pos, 0);
+	intel_pt_pkt_desc(packet, desc, INTEL_PT_PKT_DESC_MAX);
+	fprintf(f, "%s\n", desc);
+}
+
+void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	char desc[INTEL_PT_INSN_DESC_MAX];
+	size_t len = intel_pt_insn->length;
+
+	if (intel_pt_log_open())
+		return;
+
+	if (len > INTEL_PT_INSN_DBG_BUF_SZ)
+		len = INTEL_PT_INSN_DBG_BUF_SZ;
+	intel_pt_print_data(intel_pt_insn->buf, len, ip, 8);
+	if (intel_pt_insn_desc(intel_pt_insn, desc, INTEL_PT_INSN_DESC_MAX) > 0)
+		fprintf(f, "%s\n", desc);
+	else
+		fprintf(f, "Bad instruction!\n");
+}
+
+void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	char desc[INTEL_PT_INSN_DESC_MAX];
+
+	if (intel_pt_log_open())
+		return;
+
+	intel_pt_print_no_data(ip, 8);
+	if (intel_pt_insn_desc(intel_pt_insn, desc, INTEL_PT_INSN_DESC_MAX) > 0)
+		fprintf(f, "%s\n", desc);
+	else
+		fprintf(f, "Bad instruction!\n");
+}
+
+void intel_pt_log(const char *fmt, ...)
+{
+	va_list args;
+
+	if (intel_pt_log_open())
+		return;
+
+	va_start(args, fmt);
+	vfprintf(f, fmt, args);
+	va_end(args);
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.h b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
new file mode 100644
index 000000000000..db3942f83677
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
@@ -0,0 +1,52 @@
+/*
+ * intel_pt_log.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_LOG_H__
+#define INCLUDE__INTEL_PT_LOG_H__
+
+#include <stdint.h>
+#include <inttypes.h>
+
+struct intel_pt_pkt;
+
+void intel_pt_log_enable(void);
+void intel_pt_log_disable(void);
+void intel_pt_log_set_name(const char *name);
+
+void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			 uint64_t pos, const unsigned char *buf);
+
+struct intel_pt_insn;
+
+void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip);
+void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn,
+			       uint64_t ip);
+
+__attribute__((format(printf, 1, 2)))
+void intel_pt_log(const char *fmt, ...);
+
+#define x64_fmt "0x%" PRIx64
+
+static inline void intel_pt_log_at(const char *msg, uint64_t u)
+{
+	intel_pt_log("%s at " x64_fmt "\n", msg, u);
+}
+
+static inline void intel_pt_log_to(const char *msg, uint64_t u)
+{
+	intel_pt_log("%s to " x64_fmt "\n", msg, u);
+}
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 05/25] perf tools: Add Intel PT decoder
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (3 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 04/25] perf tools: Add Intel PT log Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-20  9:58   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 06/25] perf tools: Add Intel PT support Adrian Hunter
                   ` (19 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add support for decoding an Intel Processor Trace.

Intel PT trace data must be 'decoded' which involves walking the object
code and matching the trace data packets.

The decoder requests a buffer of binary data via a get_trace()
call-back, which it decodes using instruction information which it gets
via another call-back walk_insn().

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt-decoder/Build             |    2 +-
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 1816 ++++++++++++++++++++
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |  104 ++
 3 files changed, 1921 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.h

diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
index 3c717b420c89..240730d682c1 100644
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -1,4 +1,4 @@
-libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o intel-pt-log.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o intel-pt-log.o intel-pt-decoder.o
 
 inat_tables_script = util/intel-pt-decoder/gen-insn-attr-x86.awk
 inat_tables_maps = util/intel-pt-decoder/x86-opcode-map.txt
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
new file mode 100644
index 000000000000..f8ac462fec1a
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -0,0 +1,1816 @@
+/*
+ * intel_pt_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include "../cache.h"
+#include "../util.h"
+
+#include "intel-pt-insn-decoder.h"
+#include "intel-pt-pkt-decoder.h"
+#include "intel-pt-decoder.h"
+#include "intel-pt-log.h"
+
+#define INTEL_PT_BLK_SIZE 1024
+
+#define BIT63 (((uint64_t)1 << 63))
+
+#define INTEL_PT_RETURN 1
+
+/* Maximum number of loops with no packets consumed i.e. stuck in a loop */
+#define INTEL_PT_MAX_LOOPS 10000
+
+struct intel_pt_blk {
+	struct intel_pt_blk *prev;
+	uint64_t ip[INTEL_PT_BLK_SIZE];
+};
+
+struct intel_pt_stack {
+	struct intel_pt_blk *blk;
+	struct intel_pt_blk *spare;
+	int pos;
+};
+
+enum intel_pt_pkt_state {
+	INTEL_PT_STATE_NO_PSB,
+	INTEL_PT_STATE_NO_IP,
+	INTEL_PT_STATE_ERR_RESYNC,
+	INTEL_PT_STATE_IN_SYNC,
+	INTEL_PT_STATE_TNT,
+	INTEL_PT_STATE_TIP,
+	INTEL_PT_STATE_TIP_PGD,
+	INTEL_PT_STATE_FUP,
+	INTEL_PT_STATE_FUP_NO_TIP,
+};
+
+#ifdef INTEL_PT_STRICT
+#define INTEL_PT_STATE_ERR1	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR2	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR3	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR4	INTEL_PT_STATE_NO_PSB
+#else
+#define INTEL_PT_STATE_ERR1	(decoder->pkt_state)
+#define INTEL_PT_STATE_ERR2	INTEL_PT_STATE_NO_IP
+#define INTEL_PT_STATE_ERR3	INTEL_PT_STATE_ERR_RESYNC
+#define INTEL_PT_STATE_ERR4	INTEL_PT_STATE_IN_SYNC
+#endif
+
+struct intel_pt_decoder {
+	int (*get_trace)(struct intel_pt_buffer *buffer, void *data);
+	int (*walk_insn)(struct intel_pt_insn *intel_pt_insn,
+			 uint64_t *insn_cnt_ptr, uint64_t *ip, uint64_t to_ip,
+			 uint64_t max_insn_cnt, void *data);
+	void *data;
+	struct intel_pt_state state;
+	const unsigned char *buf;
+	size_t len;
+	bool return_compression;
+	bool pge;
+	uint64_t pos;
+	uint64_t last_ip;
+	uint64_t ip;
+	uint64_t cr3;
+	uint64_t timestamp;
+	uint64_t tsc_timestamp;
+	uint64_t ref_timestamp;
+	uint64_t ret_addr;
+	struct intel_pt_stack stack;
+	enum intel_pt_pkt_state pkt_state;
+	struct intel_pt_pkt packet;
+	struct intel_pt_pkt tnt;
+	int pkt_step;
+	int pkt_len;
+	unsigned int cbr;
+	unsigned int max_non_turbo_ratio;
+	int exec_mode;
+	unsigned int insn_bytes;
+	uint64_t sign_bit;
+	uint64_t sign_bits;
+	uint64_t period;
+	enum intel_pt_period_type period_type;
+	uint64_t period_insn_cnt;
+	uint64_t period_mask;
+	uint64_t period_ticks;
+	uint64_t last_masked_timestamp;
+	bool continuous_period;
+	bool overflow;
+	bool set_fup_tx_flags;
+	unsigned int fup_tx_flags;
+	unsigned int tx_flags;
+	uint64_t timestamp_insn_cnt;
+	uint64_t stuck_ip;
+	int no_progress;
+	int stuck_ip_prd;
+	int stuck_ip_cnt;
+	const unsigned char *next_buf;
+	size_t next_len;
+	unsigned char temp_buf[INTEL_PT_PKT_MAX_SZ];
+};
+
+static uint64_t intel_pt_lower_power_of_2(uint64_t x)
+{
+	int i;
+
+	for (i = 0; x != 1; i++)
+		x >>= 1;
+
+	return x << i;
+}
+
+static void intel_pt_setup_period(struct intel_pt_decoder *decoder)
+{
+	if (decoder->period_type == INTEL_PT_PERIOD_TICKS) {
+		uint64_t period;
+
+		period = intel_pt_lower_power_of_2(decoder->period);
+		decoder->period_mask  = ~(period - 1);
+		decoder->period_ticks = period;
+	}
+}
+
+struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
+{
+	struct intel_pt_decoder *decoder;
+
+	if (!params->get_trace || !params->walk_insn)
+		return NULL;
+
+	decoder = zalloc(sizeof(struct intel_pt_decoder));
+	if (!decoder)
+		return NULL;
+
+	decoder->get_trace          = params->get_trace;
+	decoder->walk_insn          = params->walk_insn;
+	decoder->data               = params->data;
+	decoder->return_compression = params->return_compression;
+
+	decoder->sign_bit           = (uint64_t)1 << 47;
+	decoder->sign_bits          = ~(((uint64_t)1 << 48) - 1);
+
+	decoder->period             = params->period;
+	decoder->period_type        = params->period_type;
+
+	decoder->max_non_turbo_ratio = params->max_non_turbo_ratio;
+
+	intel_pt_setup_period(decoder);
+
+	return decoder;
+}
+
+static void intel_pt_pop_blk(struct intel_pt_stack *stack)
+{
+	struct intel_pt_blk *blk = stack->blk;
+
+	stack->blk = blk->prev;
+	if (!stack->spare)
+		stack->spare = blk;
+	else
+		free(blk);
+}
+
+static uint64_t intel_pt_pop(struct intel_pt_stack *stack)
+{
+	if (!stack->pos) {
+		if (!stack->blk)
+			return 0;
+		intel_pt_pop_blk(stack);
+		if (!stack->blk)
+			return 0;
+		stack->pos = INTEL_PT_BLK_SIZE;
+	}
+	return stack->blk->ip[--stack->pos];
+}
+
+static int intel_pt_alloc_blk(struct intel_pt_stack *stack)
+{
+	struct intel_pt_blk *blk;
+
+	if (stack->spare) {
+		blk = stack->spare;
+		stack->spare = NULL;
+	} else {
+		blk = malloc(sizeof(struct intel_pt_blk));
+		if (!blk)
+			return -ENOMEM;
+	}
+
+	blk->prev = stack->blk;
+	stack->blk = blk;
+	stack->pos = 0;
+	return 0;
+}
+
+static int intel_pt_push(struct intel_pt_stack *stack, uint64_t ip)
+{
+	int err;
+
+	if (!stack->blk || stack->pos == INTEL_PT_BLK_SIZE) {
+		err = intel_pt_alloc_blk(stack);
+		if (err)
+			return err;
+	}
+
+	stack->blk->ip[stack->pos++] = ip;
+	return 0;
+}
+
+static void intel_pt_clear_stack(struct intel_pt_stack *stack)
+{
+	while (stack->blk)
+		intel_pt_pop_blk(stack);
+	stack->pos = 0;
+}
+
+static void intel_pt_free_stack(struct intel_pt_stack *stack)
+{
+	intel_pt_clear_stack(stack);
+	zfree(&stack->blk);
+	zfree(&stack->spare);
+}
+
+void intel_pt_decoder_free(struct intel_pt_decoder *decoder)
+{
+	intel_pt_free_stack(&decoder->stack);
+	free(decoder);
+}
+
+static int intel_pt_ext_err(int code)
+{
+	switch (code) {
+	case -ENOMEM:
+		return INTEL_PT_ERR_NOMEM;
+	case -ENOSYS:
+		return INTEL_PT_ERR_INTERN;
+	case -EBADMSG:
+		return INTEL_PT_ERR_BADPKT;
+	case -ENODATA:
+		return INTEL_PT_ERR_NODATA;
+	case -EILSEQ:
+		return INTEL_PT_ERR_NOINSN;
+	case -ENOENT:
+		return INTEL_PT_ERR_MISMAT;
+	case -EOVERFLOW:
+		return INTEL_PT_ERR_OVR;
+	case -ENOSPC:
+		return INTEL_PT_ERR_LOST;
+	case -ELOOP:
+		return INTEL_PT_ERR_NELOOP;
+	default:
+		return INTEL_PT_ERR_UNK;
+	}
+}
+
+static const char *intel_pt_err_msgs[] = {
+	[INTEL_PT_ERR_NOMEM]  = "Memory allocation failed",
+	[INTEL_PT_ERR_INTERN] = "Internal error",
+	[INTEL_PT_ERR_BADPKT] = "Bad packet",
+	[INTEL_PT_ERR_NODATA] = "No more data",
+	[INTEL_PT_ERR_NOINSN] = "Failed to get instruction",
+	[INTEL_PT_ERR_MISMAT] = "Trace doesn't match instruction",
+	[INTEL_PT_ERR_OVR]    = "Overflow packet",
+	[INTEL_PT_ERR_LOST]   = "Lost trace data",
+	[INTEL_PT_ERR_UNK]    = "Unknown error!",
+	[INTEL_PT_ERR_NELOOP] = "Never-ending loop",
+};
+
+int intel_pt__strerror(int code, char *buf, size_t buflen)
+{
+	if (code < 1 || code > INTEL_PT_ERR_MAX)
+		code = INTEL_PT_ERR_UNK;
+	strlcpy(buf, intel_pt_err_msgs[code], buflen);
+	return 0;
+}
+
+static uint64_t intel_pt_calc_ip(struct intel_pt_decoder *decoder,
+				 const struct intel_pt_pkt *packet,
+				 uint64_t last_ip)
+{
+	uint64_t ip;
+
+	switch (packet->count) {
+	case 2:
+		ip = (last_ip & (uint64_t)0xffffffffffff0000ULL) |
+		     packet->payload;
+		break;
+	case 4:
+		ip = (last_ip & (uint64_t)0xffffffff00000000ULL) |
+		     packet->payload;
+		break;
+	case 6:
+		ip = packet->payload;
+		break;
+	default:
+		return 0;
+	}
+
+	if (ip & decoder->sign_bit)
+		return ip | decoder->sign_bits;
+
+	return ip;
+}
+
+static inline void intel_pt_set_last_ip(struct intel_pt_decoder *decoder)
+{
+	decoder->last_ip = intel_pt_calc_ip(decoder, &decoder->packet,
+					    decoder->last_ip);
+}
+
+static inline void intel_pt_set_ip(struct intel_pt_decoder *decoder)
+{
+	intel_pt_set_last_ip(decoder);
+	decoder->ip = decoder->last_ip;
+}
+
+static void intel_pt_decoder_log_packet(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log_packet(&decoder->packet, decoder->pkt_len, decoder->pos,
+			    decoder->buf);
+}
+
+static int intel_pt_bug(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log("ERROR: Internal error\n");
+	decoder->pkt_state = INTEL_PT_STATE_NO_PSB;
+	return -ENOSYS;
+}
+
+static inline void intel_pt_clear_tx_flags(struct intel_pt_decoder *decoder)
+{
+	decoder->tx_flags = 0;
+}
+
+static inline void intel_pt_update_in_tx(struct intel_pt_decoder *decoder)
+{
+	decoder->tx_flags = decoder->packet.payload & INTEL_PT_IN_TX;
+}
+
+static int intel_pt_bad_packet(struct intel_pt_decoder *decoder)
+{
+	intel_pt_clear_tx_flags(decoder);
+	decoder->pkt_len = 1;
+	decoder->pkt_step = 1;
+	intel_pt_decoder_log_packet(decoder);
+	if (decoder->pkt_state != INTEL_PT_STATE_NO_PSB) {
+		intel_pt_log("ERROR: Bad packet\n");
+		decoder->pkt_state = INTEL_PT_STATE_ERR1;
+	}
+	return -EBADMSG;
+}
+
+static int intel_pt_get_data(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_buffer buffer = { .buf = 0, };
+	int ret;
+
+	decoder->pkt_step = 0;
+
+	intel_pt_log("Getting more data\n");
+	ret = decoder->get_trace(&buffer, decoder->data);
+	if (ret)
+		return ret;
+	decoder->buf = buffer.buf;
+	decoder->len = buffer.len;
+	if (!decoder->len) {
+		intel_pt_log("No more data\n");
+		return -ENODATA;
+	}
+	if (!buffer.consecutive) {
+		decoder->ip = 0;
+		decoder->pkt_state = INTEL_PT_STATE_NO_PSB;
+		decoder->ref_timestamp = buffer.ref_timestamp;
+		decoder->timestamp = 0;
+		decoder->state.trace_nr = buffer.trace_nr;
+		intel_pt_log("Reference timestamp 0x%" PRIx64 "\n",
+			     decoder->ref_timestamp);
+		return -ENOLINK;
+	}
+
+	return 0;
+}
+
+static int intel_pt_get_next_data(struct intel_pt_decoder *decoder)
+{
+	if (!decoder->next_buf)
+		return intel_pt_get_data(decoder);
+
+	decoder->buf = decoder->next_buf;
+	decoder->len = decoder->next_len;
+	decoder->next_buf = 0;
+	decoder->next_len = 0;
+	return 0;
+}
+
+static int intel_pt_get_split_packet(struct intel_pt_decoder *decoder)
+{
+	unsigned char *buf = decoder->temp_buf;
+	size_t old_len, len, n;
+	int ret;
+
+	old_len = decoder->len;
+	len = decoder->len;
+	memcpy(buf, decoder->buf, len);
+
+	ret = intel_pt_get_data(decoder);
+	if (ret) {
+		decoder->pos += old_len;
+		return ret < 0 ? ret : -EINVAL;
+	}
+
+	n = INTEL_PT_PKT_MAX_SZ - len;
+	if (n > decoder->len)
+		n = decoder->len;
+	memcpy(buf + len, decoder->buf, n);
+	len += n;
+
+	ret = intel_pt_get_packet(buf, len, &decoder->packet);
+	if (ret < (int)old_len) {
+		decoder->next_buf = decoder->buf;
+		decoder->next_len = decoder->len;
+		decoder->buf = buf;
+		decoder->len = old_len;
+		return intel_pt_bad_packet(decoder);
+	}
+
+	decoder->next_buf = decoder->buf + (ret - old_len);
+	decoder->next_len = decoder->len - (ret - old_len);
+
+	decoder->buf = buf;
+	decoder->len = ret;
+
+	return ret;
+}
+
+static int intel_pt_get_next_packet(struct intel_pt_decoder *decoder)
+{
+	int ret;
+
+	do {
+		decoder->pos += decoder->pkt_step;
+		decoder->buf += decoder->pkt_step;
+		decoder->len -= decoder->pkt_step;
+
+		if (!decoder->len) {
+			ret = intel_pt_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		ret = intel_pt_get_packet(decoder->buf, decoder->len,
+					  &decoder->packet);
+		if (ret == INTEL_PT_NEED_MORE_BYTES &&
+		    decoder->len < INTEL_PT_PKT_MAX_SZ && !decoder->next_buf) {
+			ret = intel_pt_get_split_packet(decoder);
+			if (ret < 0)
+				return ret;
+		}
+		if (ret <= 0)
+			return intel_pt_bad_packet(decoder);
+
+		decoder->pkt_len = ret;
+		decoder->pkt_step = ret;
+		intel_pt_decoder_log_packet(decoder);
+	} while (decoder->packet.type == INTEL_PT_PAD);
+
+	return 0;
+}
+
+static uint64_t intel_pt_next_period(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp, masked_timestamp;
+
+	timestamp = decoder->timestamp + decoder->timestamp_insn_cnt;
+	masked_timestamp = timestamp & decoder->period_mask;
+	if (decoder->continuous_period) {
+		if (masked_timestamp != decoder->last_masked_timestamp)
+			return 1;
+	} else {
+		timestamp += 1;
+		masked_timestamp = timestamp & decoder->period_mask;
+		if (masked_timestamp != decoder->last_masked_timestamp) {
+			decoder->last_masked_timestamp = masked_timestamp;
+			decoder->continuous_period = true;
+		}
+	}
+	return decoder->period_ticks - (timestamp - masked_timestamp);
+}
+
+static uint64_t intel_pt_next_sample(struct intel_pt_decoder *decoder)
+{
+	switch (decoder->period_type) {
+	case INTEL_PT_PERIOD_INSTRUCTIONS:
+		return decoder->period - decoder->period_insn_cnt;
+	case INTEL_PT_PERIOD_TICKS:
+		return intel_pt_next_period(decoder);
+	case INTEL_PT_PERIOD_NONE:
+	default:
+		return 0;
+	}
+}
+
+static void intel_pt_sample_insn(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp, masked_timestamp;
+
+	switch (decoder->period_type) {
+	case INTEL_PT_PERIOD_INSTRUCTIONS:
+		decoder->period_insn_cnt = 0;
+		break;
+	case INTEL_PT_PERIOD_TICKS:
+		timestamp = decoder->timestamp + decoder->timestamp_insn_cnt;
+		masked_timestamp = timestamp & decoder->period_mask;
+		decoder->last_masked_timestamp = masked_timestamp;
+		break;
+	case INTEL_PT_PERIOD_NONE:
+	default:
+		break;
+	}
+
+	decoder->state.type |= INTEL_PT_INSTRUCTION;
+}
+
+static int intel_pt_walk_insn(struct intel_pt_decoder *decoder,
+			      struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	uint64_t max_insn_cnt, insn_cnt = 0;
+	int err;
+
+	max_insn_cnt = intel_pt_next_sample(decoder);
+
+	err = decoder->walk_insn(intel_pt_insn, &insn_cnt, &decoder->ip, ip,
+				 max_insn_cnt, decoder->data);
+
+	decoder->timestamp_insn_cnt += insn_cnt;
+	decoder->period_insn_cnt += insn_cnt;
+
+	if (err) {
+		decoder->no_progress = 0;
+		decoder->pkt_state = INTEL_PT_STATE_ERR2;
+		intel_pt_log_at("ERROR: Failed to get instruction",
+				decoder->ip);
+		if (err == -ENOENT)
+			return -ENOLINK;
+		return -EILSEQ;
+	}
+
+	if (ip && decoder->ip == ip) {
+		err = -EAGAIN;
+		goto out;
+	}
+
+	if (max_insn_cnt && insn_cnt >= max_insn_cnt)
+		intel_pt_sample_insn(decoder);
+
+	if (intel_pt_insn->branch == INTEL_PT_BR_NO_BRANCH) {
+		decoder->state.type = INTEL_PT_INSTRUCTION;
+		decoder->state.from_ip = decoder->ip;
+		decoder->state.to_ip = 0;
+		decoder->ip += intel_pt_insn->length;
+		err = INTEL_PT_RETURN;
+		goto out;
+	}
+
+	if (intel_pt_insn->op == INTEL_PT_OP_CALL) {
+		/* Zero-length calls are excluded */
+		if (intel_pt_insn->branch != INTEL_PT_BR_UNCONDITIONAL ||
+		    intel_pt_insn->rel) {
+			err = intel_pt_push(&decoder->stack, decoder->ip +
+					    intel_pt_insn->length);
+			if (err)
+				goto out;
+		}
+	} else if (intel_pt_insn->op == INTEL_PT_OP_RET) {
+		decoder->ret_addr = intel_pt_pop(&decoder->stack);
+	}
+
+	if (intel_pt_insn->branch == INTEL_PT_BR_UNCONDITIONAL) {
+		int cnt = decoder->no_progress++;
+
+		decoder->state.from_ip = decoder->ip;
+		decoder->ip += intel_pt_insn->length +
+				intel_pt_insn->rel;
+		decoder->state.to_ip = decoder->ip;
+		err = INTEL_PT_RETURN;
+
+		/*
+		 * Check for being stuck in a loop.  This can happen if a
+		 * decoder error results in the decoder erroneously setting the
+		 * ip to an address that is itself in an infinite loop that
+		 * consumes no packets.  When that happens, there must be an
+		 * unconditional branch.
+		 */
+		if (cnt) {
+			if (cnt == 1) {
+				decoder->stuck_ip = decoder->state.to_ip;
+				decoder->stuck_ip_prd = 1;
+				decoder->stuck_ip_cnt = 1;
+			} else if (cnt > INTEL_PT_MAX_LOOPS ||
+				   decoder->state.to_ip == decoder->stuck_ip) {
+				intel_pt_log_at("ERROR: Never-ending loop",
+						decoder->state.to_ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+				err = -ELOOP;
+				goto out;
+			} else if (!--decoder->stuck_ip_cnt) {
+				decoder->stuck_ip_prd += 1;
+				decoder->stuck_ip_cnt = decoder->stuck_ip_prd;
+				decoder->stuck_ip = decoder->state.to_ip;
+			}
+		}
+		goto out_no_progress;
+	}
+out:
+	decoder->no_progress = 0;
+out_no_progress:
+	decoder->state.insn_op = intel_pt_insn->op;
+	decoder->state.insn_len = intel_pt_insn->length;
+
+	if (decoder->tx_flags & INTEL_PT_IN_TX)
+		decoder->state.flags |= INTEL_PT_IN_TX;
+
+	return err;
+}
+
+static int intel_pt_walk_fup(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	uint64_t ip;
+	int err;
+
+	ip = decoder->last_ip;
+
+	while (1) {
+		err = intel_pt_walk_insn(decoder, &intel_pt_insn, ip);
+		if (err == INTEL_PT_RETURN)
+			return 0;
+		if (err == -EAGAIN) {
+			if (decoder->set_fup_tx_flags) {
+				decoder->set_fup_tx_flags = false;
+				decoder->tx_flags = decoder->fup_tx_flags;
+				decoder->state.type = INTEL_PT_TRANSACTION;
+				decoder->state.from_ip = decoder->ip;
+				decoder->state.to_ip = 0;
+				decoder->state.flags = decoder->fup_tx_flags;
+				return 0;
+			}
+			return err;
+		}
+		decoder->set_fup_tx_flags = false;
+		if (err)
+			return err;
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+			intel_pt_log_at("ERROR: Unexpected indirect branch",
+					decoder->ip);
+			decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+			return -ENOENT;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+			intel_pt_log_at("ERROR: Unexpected conditional branch",
+					decoder->ip);
+			decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+			return -ENOENT;
+		}
+
+		intel_pt_bug(decoder);
+	}
+}
+
+static int intel_pt_walk_tip(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	int err;
+
+	err = intel_pt_walk_insn(decoder, &intel_pt_insn, 0);
+	if (err == INTEL_PT_RETURN)
+		return 0;
+	if (err)
+		return err;
+
+	if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+		if (decoder->pkt_state == INTEL_PT_STATE_TIP_PGD) {
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = 0;
+			if (decoder->packet.count != 0)
+				decoder->ip = decoder->last_ip;
+		} else {
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->state.from_ip = decoder->ip;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				decoder->state.to_ip = decoder->last_ip;
+				decoder->ip = decoder->last_ip;
+			}
+		}
+		return 0;
+	}
+
+	if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+		intel_pt_log_at("ERROR: Conditional branch when expecting indirect branch",
+				decoder->ip);
+		decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+		return -ENOENT;
+	}
+
+	return intel_pt_bug(decoder);
+}
+
+static int intel_pt_walk_tnt(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	int err;
+
+	while (1) {
+		err = intel_pt_walk_insn(decoder, &intel_pt_insn, 0);
+		if (err == INTEL_PT_RETURN)
+			return 0;
+		if (err)
+			return err;
+
+		if (intel_pt_insn.op == INTEL_PT_OP_RET) {
+			if (!decoder->return_compression) {
+				intel_pt_log_at("ERROR: RET when expecting conditional branch",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			if (!decoder->ret_addr) {
+				intel_pt_log_at("ERROR: Bad RET compression (stack empty)",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			if (!(decoder->tnt.payload & BIT63)) {
+				intel_pt_log_at("ERROR: Bad RET compression (TNT=N)",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			decoder->tnt.count -= 1;
+			if (!decoder->tnt.count)
+				decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->tnt.payload <<= 1;
+			decoder->state.from_ip = decoder->ip;
+			decoder->ip = decoder->ret_addr;
+			decoder->state.to_ip = decoder->ip;
+			return 0;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+			/* Handle deferred TIPs */
+			err = intel_pt_get_next_packet(decoder);
+			if (err)
+				return err;
+			if (decoder->packet.type != INTEL_PT_TIP ||
+			    decoder->packet.count == 0) {
+				intel_pt_log_at("ERROR: Missing deferred TIP for indirect branch",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				decoder->pkt_step = 0;
+				return -ENOENT;
+			}
+			intel_pt_set_last_ip(decoder);
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = decoder->last_ip;
+			decoder->ip = decoder->last_ip;
+			return 0;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+			decoder->tnt.count -= 1;
+			if (!decoder->tnt.count)
+				decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			if (decoder->tnt.payload & BIT63) {
+				decoder->tnt.payload <<= 1;
+				decoder->state.from_ip = decoder->ip;
+				decoder->ip += intel_pt_insn.length +
+					       intel_pt_insn.rel;
+				decoder->state.to_ip = decoder->ip;
+				return 0;
+			}
+			/* Instruction sample for a non-taken branch */
+			if (decoder->state.type & INTEL_PT_INSTRUCTION) {
+				decoder->tnt.payload <<= 1;
+				decoder->state.type = INTEL_PT_INSTRUCTION;
+				decoder->state.from_ip = decoder->ip;
+				decoder->state.to_ip = 0;
+				decoder->ip += intel_pt_insn.length;
+				return 0;
+			}
+			decoder->ip += intel_pt_insn.length;
+			if (!decoder->tnt.count)
+				return -EAGAIN;
+			decoder->tnt.payload <<= 1;
+			continue;
+		}
+
+		return intel_pt_bug(decoder);
+	}
+}
+
+static int intel_pt_mode_tsx(struct intel_pt_decoder *decoder, bool *no_tip)
+{
+	unsigned int fup_tx_flags;
+	int err;
+
+	fup_tx_flags = decoder->packet.payload &
+		       (INTEL_PT_IN_TX | INTEL_PT_ABORT_TX);
+	err = intel_pt_get_next_packet(decoder);
+	if (err)
+		return err;
+	if (decoder->packet.type == INTEL_PT_FUP) {
+		decoder->fup_tx_flags = fup_tx_flags;
+		decoder->set_fup_tx_flags = true;
+		if (!(decoder->fup_tx_flags & INTEL_PT_ABORT_TX))
+			*no_tip = true;
+	} else {
+		intel_pt_log_at("ERROR: Missing FUP after MODE.TSX",
+				decoder->pos);
+		intel_pt_update_in_tx(decoder);
+	}
+	return 0;
+}
+
+static void intel_pt_calc_tsc_timestamp(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp;
+
+	if (decoder->ref_timestamp) {
+		timestamp = decoder->packet.payload |
+			    (decoder->ref_timestamp & (0xffULL << 56));
+		if (timestamp < decoder->ref_timestamp) {
+			if (decoder->ref_timestamp - timestamp > (1ULL << 55))
+				timestamp += (1ULL << 56);
+		} else {
+			if (timestamp - decoder->ref_timestamp > (1ULL << 55))
+				timestamp -= (1ULL << 56);
+		}
+		decoder->tsc_timestamp = timestamp;
+		decoder->timestamp = timestamp;
+		decoder->ref_timestamp = 0;
+		decoder->timestamp_insn_cnt = 0;
+	} else if (decoder->timestamp) {
+		timestamp = decoder->packet.payload |
+			    (decoder->timestamp & (0xffULL << 56));
+		if (timestamp < decoder->timestamp &&
+		    decoder->timestamp - timestamp < 0x100) {
+			intel_pt_log_to("ERROR: Suppressing backwards timestamp",
+					timestamp);
+			timestamp = decoder->timestamp;
+		}
+		while (timestamp < decoder->timestamp) {
+			intel_pt_log_to("Wraparound timestamp", timestamp);
+			timestamp += (1ULL << 56);
+		}
+		decoder->tsc_timestamp = timestamp;
+		decoder->timestamp = timestamp;
+		decoder->timestamp_insn_cnt = 0;
+	}
+
+	intel_pt_log_to("Setting timestamp", decoder->timestamp);
+}
+
+static int intel_pt_overflow(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log("ERROR: Buffer overflow\n");
+	intel_pt_clear_tx_flags(decoder);
+	decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+	decoder->overflow = true;
+	return -EOVERFLOW;
+}
+
+/* Walk PSB+ packets when already in sync. */
+static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_PSBEND:
+			return 0;
+
+		case INTEL_PT_TIP_PGD:
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+		case INTEL_PT_TNT:
+		case INTEL_PT_BAD:
+		case INTEL_PT_PSB:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			return -EAGAIN;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_FUP:
+			decoder->pge = true;
+			intel_pt_set_last_ip(decoder);
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	if (decoder->tx_flags & INTEL_PT_ABORT_TX) {
+		decoder->tx_flags = 0;
+		decoder->state.flags &= ~INTEL_PT_IN_TX;
+		decoder->state.flags |= INTEL_PT_ABORT_TX;
+	} else {
+		decoder->state.flags |= INTEL_PT_ASYNC;
+	}
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TNT:
+		case INTEL_PT_FUP:
+		case INTEL_PT_PSB:
+		case INTEL_PT_TSC:
+		case INTEL_PT_CBR:
+		case INTEL_PT_MODE_TSX:
+		case INTEL_PT_BAD:
+		case INTEL_PT_PSBEND:
+			intel_pt_log("ERROR: Missing TIP after FUP\n");
+			decoder->pkt_state = INTEL_PT_STATE_ERR3;
+			return -ENOENT;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TIP_PGD:
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = 0;
+			if (decoder->packet.count != 0) {
+				intel_pt_set_ip(decoder);
+				intel_pt_log("Omitting PGD ip " x64_fmt "\n",
+					     decoder->ip);
+			}
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			return 0;
+
+		case INTEL_PT_TIP_PGE:
+			decoder->pge = true;
+			intel_pt_log("Omitting PGE ip " x64_fmt "\n",
+				     decoder->ip);
+			decoder->state.from_ip = 0;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				intel_pt_set_ip(decoder);
+				decoder->state.to_ip = decoder->ip;
+			}
+			return 0;
+
+		case INTEL_PT_TIP:
+			decoder->state.from_ip = decoder->ip;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				intel_pt_set_ip(decoder);
+				decoder->state.to_ip = decoder->ip;
+			}
+			return 0;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PAD:
+			break;
+
+		default:
+			return intel_pt_bug(decoder);
+		}
+	}
+}
+
+static int intel_pt_walk_trace(struct intel_pt_decoder *decoder)
+{
+	bool no_tip = false;
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+next:
+		switch (decoder->packet.type) {
+		case INTEL_PT_TNT:
+			if (!decoder->packet.count)
+				break;
+			decoder->tnt = decoder->packet;
+			decoder->pkt_state = INTEL_PT_STATE_TNT;
+			err = intel_pt_walk_tnt(decoder);
+			if (err == -EAGAIN)
+				break;
+			return err;
+
+		case INTEL_PT_TIP_PGD:
+			if (decoder->packet.count != 0)
+				intel_pt_set_last_ip(decoder);
+			decoder->pkt_state = INTEL_PT_STATE_TIP_PGD;
+			return intel_pt_walk_tip(decoder);
+
+		case INTEL_PT_TIP_PGE: {
+			decoder->pge = true;
+			if (decoder->packet.count == 0) {
+				intel_pt_log_at("Skipping zero TIP.PGE",
+						decoder->pos);
+				break;
+			}
+			intel_pt_set_ip(decoder);
+			decoder->state.from_ip = 0;
+			decoder->state.to_ip = decoder->ip;
+			return 0;
+		}
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TIP:
+			if (decoder->packet.count != 0)
+				intel_pt_set_last_ip(decoder);
+			decoder->pkt_state = INTEL_PT_STATE_TIP;
+			return intel_pt_walk_tip(decoder);
+
+		case INTEL_PT_FUP:
+			if (decoder->packet.count == 0) {
+				intel_pt_log_at("Skipping zero FUP",
+						decoder->pos);
+				no_tip = false;
+				break;
+			}
+			intel_pt_set_last_ip(decoder);
+			err = intel_pt_walk_fup(decoder);
+			if (err != -EAGAIN) {
+				if (err)
+					return err;
+				if (no_tip)
+					decoder->pkt_state =
+						INTEL_PT_STATE_FUP_NO_TIP;
+				else
+					decoder->pkt_state = INTEL_PT_STATE_FUP;
+				return 0;
+			}
+			if (no_tip) {
+				no_tip = false;
+				break;
+			}
+			return intel_pt_walk_fup_tip(decoder);
+
+		case INTEL_PT_PSB:
+			intel_pt_clear_stack(&decoder->stack);
+			err = intel_pt_walk_psbend(decoder);
+			if (err == -EAGAIN)
+				goto next;
+			if (err)
+				return err;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			/* MODE_TSX need not be followed by FUP */
+			if (!decoder->pge) {
+				intel_pt_update_in_tx(decoder);
+				break;
+			}
+			err = intel_pt_mode_tsx(decoder, &no_tip);
+			if (err)
+				return err;
+			goto next;
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_PSBEND:
+		case INTEL_PT_PAD:
+			break;
+
+		default:
+			return intel_pt_bug(decoder);
+		}
+	}
+}
+
+/* Walk PSB+ packets to get in sync. */
+static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TIP_PGD:
+			decoder->continuous_period = false;
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			return -ENOENT;
+
+		case INTEL_PT_FUP:
+			decoder->pge = true;
+			if (decoder->last_ip || decoder->packet.count == 6 ||
+			    decoder->packet.count == 0) {
+				uint64_t current_ip = decoder->ip;
+
+				intel_pt_set_ip(decoder);
+				if (current_ip)
+					intel_pt_log_to("Setting IP",
+							decoder->ip);
+			}
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_TNT:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			if (decoder->ip)
+				decoder->pkt_state = INTEL_PT_STATE_ERR4;
+			else
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+			return -ENOENT;
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_PSBEND:
+			return 0;
+
+		case INTEL_PT_PSB:
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TIP_PGD:
+			decoder->continuous_period = false;
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+			decoder->pge = decoder->packet.type != INTEL_PT_TIP_PGD;
+			if (decoder->last_ip || decoder->packet.count == 6 ||
+			    decoder->packet.count == 0)
+				intel_pt_set_ip(decoder);
+			if (decoder->ip)
+				return 0;
+			break;
+
+		case INTEL_PT_FUP:
+			if (decoder->overflow) {
+				if (decoder->last_ip ||
+				    decoder->packet.count == 6 ||
+				    decoder->packet.count == 0)
+					intel_pt_set_ip(decoder);
+				if (decoder->ip)
+					return 0;
+			}
+			if (decoder->packet.count)
+				intel_pt_set_last_ip(decoder);
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_PSB:
+			err = intel_pt_walk_psb(decoder);
+			if (err)
+				return err;
+			if (decoder->ip) {
+				/* Do not have a sample */
+				decoder->state.type = 0;
+				return 0;
+			}
+			break;
+
+		case INTEL_PT_TNT:
+		case INTEL_PT_PSBEND:
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_sync_ip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	intel_pt_log("Scanning for full IP\n");
+	err = intel_pt_walk_to_ip(decoder);
+	if (err)
+		return err;
+
+	decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+	decoder->overflow = false;
+
+	decoder->state.from_ip = 0;
+	decoder->state.to_ip = decoder->ip;
+	intel_pt_log_to("Setting IP", decoder->ip);
+
+	return 0;
+}
+
+static int intel_pt_part_psb(struct intel_pt_decoder *decoder)
+{
+	const unsigned char *end = decoder->buf + decoder->len;
+	size_t i;
+
+	for (i = INTEL_PT_PSB_LEN - 1; i; i--) {
+		if (i > decoder->len)
+			continue;
+		if (!memcmp(end - i, INTEL_PT_PSB_STR, i))
+			return i;
+	}
+	return 0;
+}
+
+static int intel_pt_rest_psb(struct intel_pt_decoder *decoder, int part_psb)
+{
+	size_t rest_psb = INTEL_PT_PSB_LEN - part_psb;
+	const char *psb = INTEL_PT_PSB_STR;
+
+	if (rest_psb > decoder->len ||
+	    memcmp(decoder->buf, psb + part_psb, rest_psb))
+		return 0;
+
+	return rest_psb;
+}
+
+static int intel_pt_get_split_psb(struct intel_pt_decoder *decoder,
+				  int part_psb)
+{
+	int rest_psb, ret;
+
+	decoder->pos += decoder->len;
+	decoder->len = 0;
+
+	ret = intel_pt_get_next_data(decoder);
+	if (ret)
+		return ret;
+
+	rest_psb = intel_pt_rest_psb(decoder, part_psb);
+	if (!rest_psb)
+		return 0;
+
+	decoder->pos -= part_psb;
+	decoder->next_buf = decoder->buf + rest_psb;
+	decoder->next_len = decoder->len - rest_psb;
+	memcpy(decoder->temp_buf, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	decoder->buf = decoder->temp_buf;
+	decoder->len = INTEL_PT_PSB_LEN;
+
+	return 0;
+}
+
+static int intel_pt_scan_for_psb(struct intel_pt_decoder *decoder)
+{
+	unsigned char *next;
+	int ret;
+
+	intel_pt_log("Scanning for PSB\n");
+	while (1) {
+		if (!decoder->len) {
+			ret = intel_pt_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		next = memmem(decoder->buf, decoder->len, INTEL_PT_PSB_STR,
+			      INTEL_PT_PSB_LEN);
+		if (!next) {
+			int part_psb;
+
+			part_psb = intel_pt_part_psb(decoder);
+			if (part_psb) {
+				ret = intel_pt_get_split_psb(decoder, part_psb);
+				if (ret)
+					return ret;
+			} else {
+				decoder->pos += decoder->len;
+				decoder->len = 0;
+			}
+			continue;
+		}
+
+		decoder->pkt_step = next - decoder->buf;
+		return intel_pt_get_next_packet(decoder);
+	}
+}
+
+static int intel_pt_sync(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	decoder->pge = false;
+	decoder->continuous_period = false;
+	decoder->last_ip = 0;
+	decoder->ip = 0;
+	intel_pt_clear_stack(&decoder->stack);
+
+	err = intel_pt_scan_for_psb(decoder);
+	if (err)
+		return err;
+
+	decoder->pkt_state = INTEL_PT_STATE_NO_IP;
+
+	err = intel_pt_walk_psb(decoder);
+	if (err)
+		return err;
+
+	if (decoder->ip) {
+		decoder->state.type = 0; /* Do not have a sample */
+		decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+	} else {
+		return intel_pt_sync_ip(decoder);
+	}
+
+	return 0;
+}
+
+static uint64_t intel_pt_est_timestamp(struct intel_pt_decoder *decoder)
+{
+	uint64_t est = decoder->timestamp_insn_cnt << 1;
+
+	if (!decoder->cbr || !decoder->max_non_turbo_ratio)
+		goto out;
+
+	est *= decoder->max_non_turbo_ratio;
+	est /= decoder->cbr;
+out:
+	return decoder->timestamp + est;
+}
+
+const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	do {
+		decoder->state.type = INTEL_PT_BRANCH;
+		decoder->state.flags = 0;
+
+		switch (decoder->pkt_state) {
+		case INTEL_PT_STATE_NO_PSB:
+			err = intel_pt_sync(decoder);
+			break;
+		case INTEL_PT_STATE_NO_IP:
+			decoder->last_ip = 0;
+			/* Fall through */
+		case INTEL_PT_STATE_ERR_RESYNC:
+			err = intel_pt_sync_ip(decoder);
+			break;
+		case INTEL_PT_STATE_IN_SYNC:
+			err = intel_pt_walk_trace(decoder);
+			break;
+		case INTEL_PT_STATE_TNT:
+			err = intel_pt_walk_tnt(decoder);
+			if (err == -EAGAIN)
+				err = intel_pt_walk_trace(decoder);
+			break;
+		case INTEL_PT_STATE_TIP:
+		case INTEL_PT_STATE_TIP_PGD:
+			err = intel_pt_walk_tip(decoder);
+			break;
+		case INTEL_PT_STATE_FUP:
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			err = intel_pt_walk_fup(decoder);
+			if (err == -EAGAIN)
+				err = intel_pt_walk_fup_tip(decoder);
+			else if (!err)
+				decoder->pkt_state = INTEL_PT_STATE_FUP;
+			break;
+		case INTEL_PT_STATE_FUP_NO_TIP:
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			err = intel_pt_walk_fup(decoder);
+			if (err == -EAGAIN)
+				err = intel_pt_walk_trace(decoder);
+			break;
+		default:
+			err = intel_pt_bug(decoder);
+			break;
+		}
+	} while (err == -ENOLINK);
+
+	decoder->state.err = err ? intel_pt_ext_err(err) : 0;
+	decoder->state.timestamp = decoder->timestamp;
+	decoder->state.est_timestamp = intel_pt_est_timestamp(decoder);
+	decoder->state.cr3 = decoder->cr3;
+
+	if (err)
+		decoder->state.from_ip = decoder->ip;
+
+	return &decoder->state;
+}
+
+static bool intel_pt_at_psb(unsigned char *buf, size_t len)
+{
+	if (len < INTEL_PT_PSB_LEN)
+		return false;
+	return memmem(buf, INTEL_PT_PSB_LEN, INTEL_PT_PSB_STR,
+		      INTEL_PT_PSB_LEN);
+}
+
+/**
+ * intel_pt_next_psb - move buffer pointer to the start of the next PSB packet.
+ * @buf: pointer to buffer pointer
+ * @len: size of buffer
+ *
+ * Updates the buffer pointer to point to the start of the next PSB packet if
+ * there is one, otherwise the buffer pointer is unchanged.  If @buf is updated,
+ * @len is adjusted accordingly.
+ *
+ * Return: %true if a PSB packet is found, %false otherwise.
+ */
+static bool intel_pt_next_psb(unsigned char **buf, size_t *len)
+{
+	unsigned char *next;
+
+	next = memmem(*buf, *len, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	if (next) {
+		*len -= next - *buf;
+		*buf = next;
+		return true;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_step_psb - move buffer pointer to the start of the following PSB
+ *                     packet.
+ * @buf: pointer to buffer pointer
+ * @len: size of buffer
+ *
+ * Updates the buffer pointer to point to the start of the following PSB packet
+ * (skipping the PSB at @buf itself) if there is one, otherwise the buffer
+ * pointer is unchanged.  If @buf is updated, @len is adjusted accordingly.
+ *
+ * Return: %true if a PSB packet is found, %false otherwise.
+ */
+static bool intel_pt_step_psb(unsigned char **buf, size_t *len)
+{
+	unsigned char *next;
+
+	if (!*len)
+		return false;
+
+	next = memmem(*buf + 1, *len - 1, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	if (next) {
+		*len -= next - *buf;
+		*buf = next;
+		return true;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_last_psb - find the last PSB packet in a buffer.
+ * @buf: buffer
+ * @len: size of buffer
+ *
+ * This function finds the last PSB in a buffer.
+ *
+ * Return: A pointer to the last PSB in @buf if found, %NULL otherwise.
+ */
+static unsigned char *intel_pt_last_psb(unsigned char *buf, size_t len)
+{
+	const char *n = INTEL_PT_PSB_STR;
+	unsigned char *p;
+	size_t k;
+
+	if (len < INTEL_PT_PSB_LEN)
+		return NULL;
+
+	k = len - INTEL_PT_PSB_LEN + 1;
+	while (1) {
+		p = memrchr(buf, n[0], k);
+		if (!p)
+			return NULL;
+		if (!memcmp(p + 1, n + 1, INTEL_PT_PSB_LEN - 1))
+			return p;
+		k = p - buf;
+		if (!k)
+			return NULL;
+	}
+}
+
+/**
+ * intel_pt_next_tsc - find and return next TSC.
+ * @buf: buffer
+ * @len: size of buffer
+ * @tsc: TSC value returned
+ *
+ * Find a TSC packet in @buf and return the TSC value.  This function assumes
+ * that @buf starts at a PSB and that PSB+ will contain TSC and so stops if a
+ * PSBEND packet is found.
+ *
+ * Return: %true if TSC is found, false otherwise.
+ */
+static bool intel_pt_next_tsc(unsigned char *buf, size_t len, uint64_t *tsc)
+{
+	struct intel_pt_pkt packet;
+	int ret;
+
+	while (len) {
+		ret = intel_pt_get_packet(buf, len, &packet);
+		if (ret <= 0)
+			return false;
+		if (packet.type == INTEL_PT_TSC) {
+			*tsc = packet.payload;
+			return true;
+		}
+		if (packet.type == INTEL_PT_PSBEND)
+			return false;
+		buf += ret;
+		len -= ret;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_tsc_cmp - compare 7-byte TSCs.
+ * @tsc1: first TSC to compare
+ * @tsc2: second TSC to compare
+ *
+ * This function compares 7-byte TSC values allowing for the possibility that
+ * TSC wrapped around.  Generally it is not possible to know if TSC has wrapped
+ * around so for that purpose this function assumes the absolute difference is
+ * less than half the maximum difference.
+ *
+ * Return: %-1 if @tsc1 is before @tsc2, %0 if @tsc1 == @tsc2, %1 if @tsc1 is
+ * after @tsc2.
+ */
+static int intel_pt_tsc_cmp(uint64_t tsc1, uint64_t tsc2)
+{
+	const uint64_t halfway = (1ULL << 55);
+
+	if (tsc1 == tsc2)
+		return 0;
+
+	if (tsc1 < tsc2) {
+		if (tsc2 - tsc1 < halfway)
+			return -1;
+		else
+			return 1;
+	} else {
+		if (tsc1 - tsc2 < halfway)
+			return 1;
+		else
+			return -1;
+	}
+}
+
+/**
+ * intel_pt_find_overlap_tsc - determine start of non-overlapped trace data
+ *                             using TSC.
+ * @buf_a: first buffer
+ * @len_a: size of first buffer
+ * @buf_b: second buffer
+ * @len_b: size of second buffer
+ *
+ * If the trace contains TSC we can look at the last TSC of @buf_a and the
+ * first TSC of @buf_b in order to determine if the buffers overlap, and then
+ * walk forward in @buf_b until a later TSC is found.  A precondition is that
+ * @buf_a and @buf_b are positioned at a PSB.
+ *
+ * Return: A pointer into @buf_b from where non-overlapped data starts, or
+ * @buf_b + @len_b if there is no non-overlapped data.
+ */
+static unsigned char *intel_pt_find_overlap_tsc(unsigned char *buf_a,
+						size_t len_a,
+						unsigned char *buf_b,
+						size_t len_b)
+{
+	uint64_t tsc_a, tsc_b;
+	unsigned char *p;
+	size_t len;
+
+	p = intel_pt_last_psb(buf_a, len_a);
+	if (!p)
+		return buf_b; /* No PSB in buf_a => no overlap */
+
+	len = len_a - (p - buf_a);
+	if (!intel_pt_next_tsc(p, len, &tsc_a)) {
+		/* The last PSB+ in buf_a is incomplete, so go back one more */
+		len_a -= len;
+		p = intel_pt_last_psb(buf_a, len_a);
+		if (!p)
+			return buf_b; /* No full PSB+ => assume no overlap */
+		len = len_a - (p - buf_a);
+		if (!intel_pt_next_tsc(p, len, &tsc_a))
+			return buf_b; /* No TSC in buf_a => assume no overlap */
+	}
+
+	while (1) {
+		/* Ignore PSB+ with no TSC */
+		if (intel_pt_next_tsc(buf_b, len_b, &tsc_b) &&
+		    intel_pt_tsc_cmp(tsc_a, tsc_b) < 0)
+			return buf_b; /* tsc_a < tsc_b => no overlap */
+
+		if (!intel_pt_step_psb(&buf_b, &len_b))
+			return buf_b + len_b; /* No PSB in buf_b => no data */
+	}
+}
+
+/**
+ * intel_pt_find_overlap - determine start of non-overlapped trace data.
+ * @buf_a: first buffer
+ * @len_a: size of first buffer
+ * @buf_b: second buffer
+ * @len_b: size of second buffer
+ * @have_tsc: can use TSC packets to detect overlap
+ *
+ * When trace samples or snapshots are recorded there is the possibility that
+ * the data overlaps.  Note that, for the purposes of decoding, data is only
+ * useful if it begins with a PSB packet.
+ *
+ * Return: A pointer into @buf_b from where non-overlapped data starts, or
+ * @buf_b + @len_b if there is no non-overlapped data.
+ */
+unsigned char *intel_pt_find_overlap(unsigned char *buf_a, size_t len_a,
+				     unsigned char *buf_b, size_t len_b,
+				     bool have_tsc)
+{
+	unsigned char *found;
+
+	/* Buffer 'b' must start at PSB so throw away everything before that */
+	if (!intel_pt_next_psb(&buf_b, &len_b))
+		return buf_b + len_b; /* No PSB */
+
+	if (!intel_pt_next_psb(&buf_a, &len_a))
+		return buf_b; /* No overlap */
+
+	if (have_tsc) {
+		found = intel_pt_find_overlap_tsc(buf_a, len_a, buf_b, len_b);
+		if (found)
+			return found;
+	}
+
+	/*
+	 * Buffer 'b' cannot end within buffer 'a' so, for comparison purposes,
+	 * we can ignore the first part of buffer 'a'.
+	 */
+	while (len_b < len_a) {
+		if (!intel_pt_step_psb(&buf_a, &len_a))
+			return buf_b; /* No overlap */
+	}
+
+	/* Now len_b >= len_a */
+	if (len_b > len_a) {
+		/* The leftover buffer 'b' must start at a PSB */
+		while (!intel_pt_at_psb(buf_b + len_a, len_b - len_a)) {
+			if (!intel_pt_step_psb(&buf_a, &len_a))
+				return buf_b; /* No overlap */
+		}
+	}
+
+	while (1) {
+		/* Potential overlap so check the bytes */
+		found = memmem(buf_a, len_a, buf_b, len_a);
+		if (found)
+			return buf_b + len_a;
+
+		/* Try again at next PSB in buffer 'a' */
+		if (!intel_pt_step_psb(&buf_a, &len_a))
+			return buf_b; /* No overlap */
+
+		/* The leftover buffer 'b' must start at a PSB */
+		while (!intel_pt_at_psb(buf_b + len_a, len_b - len_a)) {
+			if (!intel_pt_step_psb(&buf_a, &len_a))
+				return buf_b; /* No overlap */
+		}
+	}
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
new file mode 100644
index 000000000000..4c4880230cc9
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -0,0 +1,104 @@
+/*
+ * intel_pt_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_DECODER_H__
+#define INCLUDE__INTEL_PT_DECODER_H__
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+
+#include "intel-pt-insn-decoder.h"
+
+#define INTEL_PT_IN_TX		(1 << 0)
+#define INTEL_PT_ABORT_TX	(1 << 1)
+#define INTEL_PT_ASYNC		(1 << 2)
+
+enum intel_pt_sample_type {
+	INTEL_PT_BRANCH		= 1 << 0,
+	INTEL_PT_INSTRUCTION	= 1 << 1,
+	INTEL_PT_TRANSACTION	= 1 << 2,
+};
+
+enum intel_pt_period_type {
+	INTEL_PT_PERIOD_NONE,
+	INTEL_PT_PERIOD_INSTRUCTIONS,
+	INTEL_PT_PERIOD_TICKS,
+};
+
+enum {
+	INTEL_PT_ERR_NOMEM = 1,
+	INTEL_PT_ERR_INTERN,
+	INTEL_PT_ERR_BADPKT,
+	INTEL_PT_ERR_NODATA,
+	INTEL_PT_ERR_NOINSN,
+	INTEL_PT_ERR_MISMAT,
+	INTEL_PT_ERR_OVR,
+	INTEL_PT_ERR_LOST,
+	INTEL_PT_ERR_UNK,
+	INTEL_PT_ERR_NELOOP,
+	INTEL_PT_ERR_MAX,
+};
+
+struct intel_pt_state {
+	enum intel_pt_sample_type type;
+	int err;
+	uint64_t from_ip;
+	uint64_t to_ip;
+	uint64_t cr3;
+	uint64_t timestamp;
+	uint64_t est_timestamp;
+	uint64_t trace_nr;
+	uint32_t flags;
+	enum intel_pt_insn_op insn_op;
+	int insn_len;
+};
+
+struct intel_pt_insn;
+
+struct intel_pt_buffer {
+	const unsigned char *buf;
+	size_t len;
+	bool consecutive;
+	uint64_t ref_timestamp;
+	uint64_t trace_nr;
+};
+
+struct intel_pt_params {
+	int (*get_trace)(struct intel_pt_buffer *buffer, void *data);
+	int (*walk_insn)(struct intel_pt_insn *intel_pt_insn,
+			 uint64_t *insn_cnt_ptr, uint64_t *ip, uint64_t to_ip,
+			 uint64_t max_insn_cnt, void *data);
+	void *data;
+	bool return_compression;
+	uint64_t period;
+	enum intel_pt_period_type period_type;
+	unsigned max_non_turbo_ratio;
+};
+
+struct intel_pt_decoder;
+
+struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params);
+void intel_pt_decoder_free(struct intel_pt_decoder *decoder);
+
+const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder);
+
+unsigned char *intel_pt_find_overlap(unsigned char *buf_a, size_t len_a,
+				     unsigned char *buf_b, size_t len_b,
+				     bool have_tsc);
+
+int intel_pt__strerror(int code, char *buf, size_t buflen);
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 06/25] perf tools: Add Intel PT support
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (4 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 05/25] perf tools: Add Intel PT decoder Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-20  9:59   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 07/25] perf tools: Take Intel PT into use Adrian Hunter
                   ` (18 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add support for Intel Processor Trace.

Intel PT support fits within the new auxtrace infrastructure.  Recording
is supporting by identifying the Intel PT PMU, parsing options and
setting up events.

Decoding is supported by queuing up trace data by cpu or thread and then
decoding synchronously delivering synthesized event samples into the
session processing for tools to consume.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/arch/x86/util/Build      |    2 +
 tools/perf/arch/x86/util/intel-pt.c |  752 ++++++++++++++
 tools/perf/util/Build               |    1 +
 tools/perf/util/intel-pt.c          | 1911 +++++++++++++++++++++++++++++++++++
 tools/perf/util/intel-pt.h          |   51 +
 5 files changed, 2717 insertions(+)
 create mode 100644 tools/perf/arch/x86/util/intel-pt.c
 create mode 100644 tools/perf/util/intel-pt.c
 create mode 100644 tools/perf/util/intel-pt.h

diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index cfbccc4e3187..139608878888 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -6,3 +6,5 @@ libperf-$(CONFIG_DWARF) += dwarf-regs.o
 
 libperf-$(CONFIG_LIBUNWIND)          += unwind-libunwind.o
 libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
+
+libperf-$(CONFIG_AUXTRACE) += intel-pt.o
diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
new file mode 100644
index 000000000000..da7d2c15e611
--- /dev/null
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -0,0 +1,752 @@
+/*
+ * intel_pt.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdbool.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "../../perf.h"
+#include "../../util/session.h"
+#include "../../util/event.h"
+#include "../../util/evlist.h"
+#include "../../util/evsel.h"
+#include "../../util/cpumap.h"
+#include "../../util/parse-options.h"
+#include "../../util/parse-events.h"
+#include "../../util/pmu.h"
+#include "../../util/debug.h"
+#include "../../util/auxtrace.h"
+#include "../../util/tsc.h"
+#include "../../util/intel-pt.h"
+
+#define KiB(x) ((x) * 1024)
+#define MiB(x) ((x) * 1024 * 1024)
+#define KiB_MASK(x) (KiB(x) - 1)
+#define MiB_MASK(x) (MiB(x) - 1)
+
+#define INTEL_PT_DEFAULT_SAMPLE_SIZE	KiB(4)
+
+#define INTEL_PT_MAX_SAMPLE_SIZE	KiB(60)
+
+#define INTEL_PT_PSB_PERIOD_NEAR	256
+
+struct intel_pt_snapshot_ref {
+	void *ref_buf;
+	size_t ref_offset;
+	bool wrapped;
+};
+
+struct intel_pt_recording {
+	struct auxtrace_record		itr;
+	struct perf_pmu			*intel_pt_pmu;
+	int				have_sched_switch;
+	struct perf_evlist		*evlist;
+	bool				snapshot_mode;
+	bool				snapshot_init_done;
+	size_t				snapshot_size;
+	size_t				snapshot_ref_buf_size;
+	int				snapshot_ref_cnt;
+	struct intel_pt_snapshot_ref	*snapshot_refs;
+};
+
+static int intel_pt_parse_terms_with_default(struct list_head *formats,
+					     const char *str,
+					     u64 *config)
+{
+	struct list_head *terms;
+	struct perf_event_attr attr = { .size = 0, };
+	int err;
+
+	terms = malloc(sizeof(struct list_head));
+	if (!terms)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(terms);
+
+	err = parse_events_terms(terms, str);
+	if (err)
+		goto out_free;
+
+	attr.config = *config;
+	err = perf_pmu__config_terms(formats, &attr, terms, true, NULL);
+	if (err)
+		goto out_free;
+
+	*config = attr.config;
+out_free:
+	parse_events__free_terms(terms);
+	return err;
+}
+
+static int intel_pt_parse_terms(struct list_head *formats, const char *str,
+				u64 *config)
+{
+	*config = 0;
+	return intel_pt_parse_terms_with_default(formats, str, config);
+}
+
+static size_t intel_pt_psb_period(struct perf_pmu *intel_pt_pmu __maybe_unused,
+				  struct perf_evlist *evlist __maybe_unused)
+{
+	return 256;
+}
+
+static u64 intel_pt_default_config(struct perf_pmu *intel_pt_pmu)
+{
+	u64 config;
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &config);
+	return config;
+}
+
+static int intel_pt_parse_snapshot_options(struct auxtrace_record *itr,
+					   struct record_opts *opts,
+					   const char *str)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	unsigned long long snapshot_size = 0;
+	char *endptr;
+
+	if (str) {
+		snapshot_size = strtoull(str, &endptr, 0);
+		if (*endptr || snapshot_size > SIZE_MAX)
+			return -1;
+	}
+
+	opts->auxtrace_snapshot_mode = true;
+	opts->auxtrace_snapshot_size = snapshot_size;
+
+	ptr->snapshot_size = snapshot_size;
+
+	return 0;
+}
+
+struct perf_event_attr *
+intel_pt_pmu_default_config(struct perf_pmu *intel_pt_pmu)
+{
+	struct perf_event_attr *attr;
+
+	attr = zalloc(sizeof(struct perf_event_attr));
+	if (!attr)
+		return NULL;
+
+	attr->config = intel_pt_default_config(intel_pt_pmu);
+
+	intel_pt_pmu->selectable = true;
+
+	return attr;
+}
+
+static size_t intel_pt_info_priv_size(struct auxtrace_record *itr __maybe_unused)
+{
+	return INTEL_PT_AUXTRACE_PRIV_SIZE;
+}
+
+static int intel_pt_info_fill(struct auxtrace_record *itr,
+			      struct perf_session *session,
+			      struct auxtrace_info_event *auxtrace_info,
+			      size_t priv_size)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
+	struct perf_event_mmap_page *pc;
+	struct perf_tsc_conversion tc = { .time_mult = 0, };
+	bool cap_user_time_zero = false, per_cpu_mmaps;
+	u64 tsc_bit, noretcomp_bit;
+	int err;
+
+	if (priv_size != INTEL_PT_AUXTRACE_PRIV_SIZE)
+		return -EINVAL;
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
+	intel_pt_parse_terms(&intel_pt_pmu->format, "noretcomp",
+			     &noretcomp_bit);
+
+	if (!session->evlist->nr_mmaps)
+		return -EINVAL;
+
+	pc = session->evlist->mmap[0].base;
+	if (pc) {
+		err = perf_read_tsc_conversion(pc, &tc);
+		if (err) {
+			if (err != -EOPNOTSUPP)
+				return err;
+		} else {
+			cap_user_time_zero = tc.time_mult != 0;
+		}
+		if (!cap_user_time_zero)
+			ui__warning("Intel Processor Trace: TSC not available\n");
+	}
+
+	per_cpu_mmaps = !cpu_map__empty(session->evlist->cpus);
+
+	auxtrace_info->type = PERF_AUXTRACE_INTEL_PT;
+	auxtrace_info->priv[INTEL_PT_PMU_TYPE] = intel_pt_pmu->type;
+	auxtrace_info->priv[INTEL_PT_TIME_SHIFT] = tc.time_shift;
+	auxtrace_info->priv[INTEL_PT_TIME_MULT] = tc.time_mult;
+	auxtrace_info->priv[INTEL_PT_TIME_ZERO] = tc.time_zero;
+	auxtrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO] = cap_user_time_zero;
+	auxtrace_info->priv[INTEL_PT_TSC_BIT] = tsc_bit;
+	auxtrace_info->priv[INTEL_PT_NORETCOMP_BIT] = noretcomp_bit;
+	auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH] = ptr->have_sched_switch;
+	auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE] = ptr->snapshot_mode;
+	auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS] = per_cpu_mmaps;
+
+	return 0;
+}
+
+static int intel_pt_track_switches(struct perf_evlist *evlist)
+{
+	const char *sched_switch = "sched:sched_switch";
+	struct perf_evsel *evsel;
+	int err;
+
+	if (!perf_evlist__can_select_event(evlist, sched_switch))
+		return -EPERM;
+
+	err = parse_events(evlist, sched_switch, NULL);
+	if (err) {
+		pr_debug2("%s: failed to parse %s, error %d\n",
+			  __func__, sched_switch, err);
+		return err;
+	}
+
+	evsel = perf_evlist__last(evlist);
+
+	perf_evsel__set_sample_bit(evsel, CPU);
+	perf_evsel__set_sample_bit(evsel, TIME);
+
+	evsel->system_wide = true;
+	evsel->no_aux_samples = true;
+	evsel->immediate = true;
+
+	return 0;
+}
+
+static int intel_pt_recording_options(struct auxtrace_record *itr,
+				      struct perf_evlist *evlist,
+				      struct record_opts *opts)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
+	bool have_timing_info;
+	struct perf_evsel *evsel, *intel_pt_evsel = NULL;
+	const struct cpu_map *cpus = evlist->cpus;
+	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
+	u64 tsc_bit;
+
+	ptr->evlist = evlist;
+	ptr->snapshot_mode = opts->auxtrace_snapshot_mode;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == intel_pt_pmu->type) {
+			if (intel_pt_evsel) {
+				pr_err("There may be only one " INTEL_PT_PMU_NAME " event\n");
+				return -EINVAL;
+			}
+			evsel->attr.freq = 0;
+			evsel->attr.sample_period = 1;
+			intel_pt_evsel = evsel;
+			opts->full_auxtrace = true;
+		}
+	}
+
+	if (opts->auxtrace_snapshot_mode && !opts->full_auxtrace) {
+		pr_err("Snapshot mode (-S option) requires " INTEL_PT_PMU_NAME " PMU event (-e " INTEL_PT_PMU_NAME ")\n");
+		return -EINVAL;
+	}
+
+	if (opts->use_clockid) {
+		pr_err("Cannot use clockid (-k option) with " INTEL_PT_PMU_NAME "\n");
+		return -EINVAL;
+	}
+
+	if (!opts->full_auxtrace)
+		return 0;
+
+	/* Set default sizes for snapshot mode */
+	if (opts->auxtrace_snapshot_mode) {
+		size_t psb_period = intel_pt_psb_period(intel_pt_pmu, evlist);
+
+		if (!opts->auxtrace_snapshot_size && !opts->auxtrace_mmap_pages) {
+			if (privileged) {
+				opts->auxtrace_mmap_pages = MiB(4) / page_size;
+			} else {
+				opts->auxtrace_mmap_pages = KiB(128) / page_size;
+				if (opts->mmap_pages == UINT_MAX)
+					opts->mmap_pages = KiB(256) / page_size;
+			}
+		} else if (!opts->auxtrace_mmap_pages && !privileged &&
+			   opts->mmap_pages == UINT_MAX) {
+			opts->mmap_pages = KiB(256) / page_size;
+		}
+		if (!opts->auxtrace_snapshot_size)
+			opts->auxtrace_snapshot_size =
+				opts->auxtrace_mmap_pages * (size_t)page_size;
+		if (!opts->auxtrace_mmap_pages) {
+			size_t sz = opts->auxtrace_snapshot_size;
+
+			sz = round_up(sz, page_size) / page_size;
+			opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
+		}
+		if (opts->auxtrace_snapshot_size >
+				opts->auxtrace_mmap_pages * (size_t)page_size) {
+			pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
+			       opts->auxtrace_snapshot_size,
+			       opts->auxtrace_mmap_pages * (size_t)page_size);
+			return -EINVAL;
+		}
+		if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages) {
+			pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
+			return -EINVAL;
+		}
+		pr_debug2("Intel PT snapshot size: %zu\n",
+			  opts->auxtrace_snapshot_size);
+		if (psb_period &&
+		    opts->auxtrace_snapshot_size <= psb_period +
+						  INTEL_PT_PSB_PERIOD_NEAR)
+			ui__warning("Intel PT snapshot size (%zu) may be too small for PSB period (%zu)\n",
+				    opts->auxtrace_snapshot_size, psb_period);
+	}
+
+	/* Set default sizes for full trace mode */
+	if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) {
+		if (privileged) {
+			opts->auxtrace_mmap_pages = MiB(4) / page_size;
+		} else {
+			opts->auxtrace_mmap_pages = KiB(128) / page_size;
+			if (opts->mmap_pages == UINT_MAX)
+				opts->mmap_pages = KiB(256) / page_size;
+		}
+	}
+
+	/* Validate auxtrace_mmap_pages */
+	if (opts->auxtrace_mmap_pages) {
+		size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
+		size_t min_sz;
+
+		if (opts->auxtrace_snapshot_mode)
+			min_sz = KiB(4);
+		else
+			min_sz = KiB(8);
+
+		if (sz < min_sz || !is_power_of_2(sz)) {
+			pr_err("Invalid mmap size for Intel Processor Trace: must be at least %zuKiB and a power of 2\n",
+			       min_sz / 1024);
+			return -EINVAL;
+		}
+	}
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
+
+	if (opts->full_auxtrace && (intel_pt_evsel->attr.config & tsc_bit))
+		have_timing_info = true;
+	else
+		have_timing_info = false;
+
+	/*
+	 * Per-cpu recording needs sched_switch events to distinguish different
+	 * threads.
+	 */
+	if (have_timing_info && !cpu_map__empty(cpus)) {
+		int err;
+
+		err = intel_pt_track_switches(evlist);
+		if (err == -EPERM)
+			pr_debug2("Unable to select sched:sched_switch\n");
+		else if (err)
+			return err;
+		else
+			ptr->have_sched_switch = 1;
+	}
+
+	if (intel_pt_evsel) {
+		/*
+		 * To obtain the auxtrace buffer file descriptor, the auxtrace
+		 * event must come first.
+		 */
+		perf_evlist__to_front(evlist, intel_pt_evsel);
+		/*
+		 * In the case of per-cpu mmaps, we need the CPU on the
+		 * AUX event.
+		 */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(intel_pt_evsel, CPU);
+	}
+
+	/* Add dummy event to keep tracking */
+	if (opts->full_auxtrace) {
+		struct perf_evsel *tracking_evsel;
+		int err;
+
+		err = parse_events(evlist, "dummy:u", NULL);
+		if (err)
+			return err;
+
+		tracking_evsel = perf_evlist__last(evlist);
+
+		perf_evlist__set_tracking_event(evlist, tracking_evsel);
+
+		tracking_evsel->attr.freq = 0;
+		tracking_evsel->attr.sample_period = 1;
+
+		/* In per-cpu case, always need the time of mmap events etc */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(tracking_evsel, TIME);
+	}
+
+	/*
+	 * Warn the user when we do not have enough information to decode i.e.
+	 * per-cpu with no sched_switch (except workload-only).
+	 */
+	if (!ptr->have_sched_switch && !cpu_map__empty(cpus) &&
+	    !target__none(&opts->target))
+		ui__warning("Intel Processor Trace decoding will not be possible except for kernel tracing!\n");
+
+	return 0;
+}
+
+static int intel_pt_snapshot_start(struct auxtrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(ptr->evlist, evsel) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__disable_event(ptr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_pt_snapshot_finish(struct auxtrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(ptr->evlist, evsel) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__enable_event(ptr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_pt_alloc_snapshot_refs(struct intel_pt_recording *ptr, int idx)
+{
+	const size_t sz = sizeof(struct intel_pt_snapshot_ref);
+	int cnt = ptr->snapshot_ref_cnt, new_cnt = cnt * 2;
+	struct intel_pt_snapshot_ref *refs;
+
+	if (!new_cnt)
+		new_cnt = 16;
+
+	while (new_cnt <= idx)
+		new_cnt *= 2;
+
+	refs = calloc(new_cnt, sz);
+	if (!refs)
+		return -ENOMEM;
+
+	memcpy(refs, ptr->snapshot_refs, cnt * sz);
+
+	ptr->snapshot_refs = refs;
+	ptr->snapshot_ref_cnt = new_cnt;
+
+	return 0;
+}
+
+static void intel_pt_free_snapshot_refs(struct intel_pt_recording *ptr)
+{
+	int i;
+
+	for (i = 0; i < ptr->snapshot_ref_cnt; i++)
+		zfree(&ptr->snapshot_refs[i].ref_buf);
+	zfree(&ptr->snapshot_refs);
+}
+
+static void intel_pt_recording_free(struct auxtrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+
+	intel_pt_free_snapshot_refs(ptr);
+	free(ptr);
+}
+
+static int intel_pt_alloc_snapshot_ref(struct intel_pt_recording *ptr, int idx,
+				       size_t snapshot_buf_size)
+{
+	size_t ref_buf_size = ptr->snapshot_ref_buf_size;
+	void *ref_buf;
+
+	ref_buf = zalloc(ref_buf_size);
+	if (!ref_buf)
+		return -ENOMEM;
+
+	ptr->snapshot_refs[idx].ref_buf = ref_buf;
+	ptr->snapshot_refs[idx].ref_offset = snapshot_buf_size - ref_buf_size;
+
+	return 0;
+}
+
+static size_t intel_pt_snapshot_ref_buf_size(struct intel_pt_recording *ptr,
+					     size_t snapshot_buf_size)
+{
+	const size_t max_size = 256 * 1024;
+	size_t buf_size = 0, psb_period;
+
+	if (ptr->snapshot_size <= 64 * 1024)
+		return 0;
+
+	psb_period = intel_pt_psb_period(ptr->intel_pt_pmu, ptr->evlist);
+	if (psb_period)
+		buf_size = psb_period * 2;
+
+	if (!buf_size || buf_size > max_size)
+		buf_size = max_size;
+
+	if (buf_size >= snapshot_buf_size)
+		return 0;
+
+	if (buf_size >= ptr->snapshot_size / 2)
+		return 0;
+
+	return buf_size;
+}
+
+static int intel_pt_snapshot_init(struct intel_pt_recording *ptr,
+				  size_t snapshot_buf_size)
+{
+	if (ptr->snapshot_init_done)
+		return 0;
+
+	ptr->snapshot_init_done = true;
+
+	ptr->snapshot_ref_buf_size = intel_pt_snapshot_ref_buf_size(ptr,
+							snapshot_buf_size);
+
+	return 0;
+}
+
+/**
+ * intel_pt_compare_buffers - compare bytes in a buffer to a circular buffer.
+ * @buf1: first buffer
+ * @compare_size: number of bytes to compare
+ * @buf2: second buffer (a circular buffer)
+ * @offs2: offset in second buffer
+ * @buf2_size: size of second buffer
+ *
+ * The comparison allows for the possibility that the bytes to compare in the
+ * circular buffer are not contiguous.  It is assumed that @compare_size <=
+ * @buf2_size.  This function returns %false if the bytes are identical, %true
+ * otherwise.
+ */
+static bool intel_pt_compare_buffers(void *buf1, size_t compare_size,
+				     void *buf2, size_t offs2, size_t buf2_size)
+{
+	size_t end2 = offs2 + compare_size, part_size;
+
+	if (end2 <= buf2_size)
+		return memcmp(buf1, buf2 + offs2, compare_size);
+
+	part_size = end2 - buf2_size;
+	if (memcmp(buf1, buf2 + offs2, part_size))
+		return true;
+
+	compare_size -= part_size;
+
+	return memcmp(buf1 + part_size, buf2, compare_size);
+}
+
+static bool intel_pt_compare_ref(void *ref_buf, size_t ref_offset,
+				 size_t ref_size, size_t buf_size,
+				 void *data, size_t head)
+{
+	size_t ref_end = ref_offset + ref_size;
+
+	if (ref_end > buf_size) {
+		if (head > ref_offset || head < ref_end - buf_size)
+			return true;
+	} else if (head > ref_offset && head < ref_end) {
+		return true;
+	}
+
+	return intel_pt_compare_buffers(ref_buf, ref_size, data, ref_offset,
+					buf_size);
+}
+
+static void intel_pt_copy_ref(void *ref_buf, size_t ref_size, size_t buf_size,
+			      void *data, size_t head)
+{
+	if (head >= ref_size) {
+		memcpy(ref_buf, data + head - ref_size, ref_size);
+	} else {
+		memcpy(ref_buf, data, head);
+		ref_size -= head;
+		memcpy(ref_buf + head, data + buf_size - ref_size, ref_size);
+	}
+}
+
+static bool intel_pt_wrapped(struct intel_pt_recording *ptr, int idx,
+			     struct auxtrace_mmap *mm, unsigned char *data,
+			     u64 head)
+{
+	struct intel_pt_snapshot_ref *ref = &ptr->snapshot_refs[idx];
+	bool wrapped;
+
+	wrapped = intel_pt_compare_ref(ref->ref_buf, ref->ref_offset,
+				       ptr->snapshot_ref_buf_size, mm->len,
+				       data, head);
+
+	intel_pt_copy_ref(ref->ref_buf, ptr->snapshot_ref_buf_size, mm->len,
+			  data, head);
+
+	return wrapped;
+}
+
+static bool intel_pt_first_wrap(u64 *data, size_t buf_size)
+{
+	int i, a, b;
+
+	b = buf_size >> 3;
+	a = b - 512;
+	if (a < 0)
+		a = 0;
+
+	for (i = a; i < b; i++) {
+		if (data[i])
+			return true;
+	}
+
+	return false;
+}
+
+static int intel_pt_find_snapshot(struct auxtrace_record *itr, int idx,
+				  struct auxtrace_mmap *mm, unsigned char *data,
+				  u64 *head, u64 *old)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	bool wrapped;
+	int err;
+
+	pr_debug3("%s: mmap index %d old head %zu new head %zu\n",
+		  __func__, idx, (size_t)*old, (size_t)*head);
+
+	err = intel_pt_snapshot_init(ptr, mm->len);
+	if (err)
+		goto out_err;
+
+	if (idx >= ptr->snapshot_ref_cnt) {
+		err = intel_pt_alloc_snapshot_refs(ptr, idx);
+		if (err)
+			goto out_err;
+	}
+
+	if (ptr->snapshot_ref_buf_size) {
+		if (!ptr->snapshot_refs[idx].ref_buf) {
+			err = intel_pt_alloc_snapshot_ref(ptr, idx, mm->len);
+			if (err)
+				goto out_err;
+		}
+		wrapped = intel_pt_wrapped(ptr, idx, mm, data, *head);
+	} else {
+		wrapped = ptr->snapshot_refs[idx].wrapped;
+		if (!wrapped && intel_pt_first_wrap((u64 *)data, mm->len)) {
+			ptr->snapshot_refs[idx].wrapped = true;
+			wrapped = true;
+		}
+	}
+
+	/*
+	 * In full trace mode 'head' continually increases.  However in snapshot
+	 * mode 'head' is an offset within the buffer.  Here 'old' and 'head'
+	 * are adjusted to match the full trace case which expects that 'old' is
+	 * always less than 'head'.
+	 */
+	if (wrapped) {
+		*old = *head;
+		*head += mm->len;
+	} else {
+		if (mm->mask)
+			*old &= mm->mask;
+		else
+			*old %= mm->len;
+		if (*old > *head)
+			*head += mm->len;
+	}
+
+	pr_debug3("%s: wrap-around %sdetected, adjusted old head %zu adjusted new head %zu\n",
+		  __func__, wrapped ? "" : "not ", (size_t)*old, (size_t)*head);
+
+	return 0;
+
+out_err:
+	pr_err("%s: failed, error %d\n", __func__, err);
+	return err;
+}
+
+static u64 intel_pt_reference(struct auxtrace_record *itr __maybe_unused)
+{
+	return rdtsc();
+}
+
+static int intel_pt_read_finish(struct auxtrace_record *itr, int idx)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(ptr->evlist, evsel) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__enable_event_idx(ptr->evlist, evsel,
+							     idx);
+	}
+	return -EINVAL;
+}
+
+struct auxtrace_record *intel_pt_recording_init(int *err)
+{
+	struct perf_pmu *intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
+	struct intel_pt_recording *ptr;
+
+	if (!intel_pt_pmu)
+		return NULL;
+
+	ptr = zalloc(sizeof(struct intel_pt_recording));
+	if (!ptr) {
+		*err = -ENOMEM;
+		return NULL;
+	}
+
+	ptr->intel_pt_pmu = intel_pt_pmu;
+	ptr->itr.recording_options = intel_pt_recording_options;
+	ptr->itr.info_priv_size = intel_pt_info_priv_size;
+	ptr->itr.info_fill = intel_pt_info_fill;
+	ptr->itr.free = intel_pt_recording_free;
+	ptr->itr.snapshot_start = intel_pt_snapshot_start;
+	ptr->itr.snapshot_finish = intel_pt_snapshot_finish;
+	ptr->itr.find_snapshot = intel_pt_find_snapshot;
+	ptr->itr.parse_snapshot_options = intel_pt_parse_snapshot_options;
+	ptr->itr.reference = intel_pt_reference;
+	ptr->itr.read_finish = intel_pt_read_finish;
+	return &ptr->itr;
+}
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 1b67598d4c87..7f5f2b6aff19 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -77,6 +77,7 @@ libperf-y += cloexec.o
 libperf-y += thread-stack.o
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
+libperf-$(CONFIG_AUXTRACE) += intel-pt.o
 libperf-y += parse-branch-options.o
 
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
new file mode 100644
index 000000000000..2a4a4120473b
--- /dev/null
+++ b/tools/perf/util/intel-pt.c
@@ -0,0 +1,1911 @@
+/*
+ * intel_pt.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+
+#include "../perf.h"
+#include "session.h"
+#include "machine.h"
+#include "tool.h"
+#include "event.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "map.h"
+#include "color.h"
+#include "util.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "symbol.h"
+#include "callchain.h"
+#include "dso.h"
+#include "debug.h"
+#include "auxtrace.h"
+#include "tsc.h"
+#include "intel-pt.h"
+
+#include "intel-pt-decoder/intel-pt-log.h"
+#include "intel-pt-decoder/intel-pt-decoder.h"
+#include "intel-pt-decoder/intel-pt-insn-decoder.h"
+#include "intel-pt-decoder/intel-pt-pkt-decoder.h"
+
+#define MAX_TIMESTAMP (~0ULL)
+
+struct intel_pt {
+	struct auxtrace auxtrace;
+	struct auxtrace_queues queues;
+	struct auxtrace_heap heap;
+	u32 auxtrace_type;
+	struct perf_session *session;
+	struct machine *machine;
+	struct perf_evsel *switch_evsel;
+	struct thread *unknown_thread;
+	bool timeless_decoding;
+	bool sampling_mode;
+	bool snapshot_mode;
+	bool per_cpu_mmaps;
+	bool have_tsc;
+	bool data_queued;
+	bool est_tsc;
+	bool sync_switch;
+	int have_sched_switch;
+	u32 pmu_type;
+	u64 kernel_start;
+	u64 switch_ip;
+	u64 ptss_ip;
+
+	struct perf_tsc_conversion tc;
+	bool cap_user_time_zero;
+
+	struct itrace_synth_opts synth_opts;
+
+	bool sample_instructions;
+	u64 instructions_sample_type;
+	u64 instructions_sample_period;
+	u64 instructions_id;
+
+	bool sample_branches;
+	u32 branches_filter;
+	u64 branches_sample_type;
+	u64 branches_id;
+
+	bool sample_transactions;
+	u64 transactions_sample_type;
+	u64 transactions_id;
+
+	bool synth_needs_swap;
+
+	u64 tsc_bit;
+	u64 noretcomp_bit;
+	unsigned max_non_turbo_ratio;
+};
+
+enum switch_state {
+	INTEL_PT_SS_NOT_TRACING,
+	INTEL_PT_SS_UNKNOWN,
+	INTEL_PT_SS_TRACING,
+	INTEL_PT_SS_EXPECTING_SWITCH_EVENT,
+	INTEL_PT_SS_EXPECTING_SWITCH_IP,
+};
+
+struct intel_pt_queue {
+	struct intel_pt *pt;
+	unsigned int queue_nr;
+	struct auxtrace_buffer *buffer;
+	void *decoder;
+	const struct intel_pt_state *state;
+	struct ip_callchain *chain;
+	union perf_event *event_buf;
+	bool on_heap;
+	bool stop;
+	bool step_through_buffers;
+	bool use_buffer_pid_tid;
+	pid_t pid, tid;
+	int cpu;
+	int switch_state;
+	pid_t next_tid;
+	struct thread *thread;
+	bool exclude_kernel;
+	bool have_sample;
+	u64 time;
+	u64 timestamp;
+	u32 flags;
+	u16 insn_len;
+};
+
+static void intel_pt_dump(struct intel_pt *pt __maybe_unused,
+			  unsigned char *buf, size_t len)
+{
+	struct intel_pt_pkt packet;
+	size_t pos = 0;
+	int ret, pkt_len, i;
+	char desc[INTEL_PT_PKT_DESC_MAX];
+	const char *color = PERF_COLOR_BLUE;
+
+	color_fprintf(stdout, color,
+		      ". ... Intel Processor Trace data: size %zu bytes\n",
+		      len);
+
+	while (len) {
+		ret = intel_pt_get_packet(buf, len, &packet);
+		if (ret > 0)
+			pkt_len = ret;
+		else
+			pkt_len = 1;
+		printf(".");
+		color_fprintf(stdout, color, "  %08x: ", pos);
+		for (i = 0; i < pkt_len; i++)
+			color_fprintf(stdout, color, " %02x", buf[i]);
+		for (; i < 16; i++)
+			color_fprintf(stdout, color, "   ");
+		if (ret > 0) {
+			ret = intel_pt_pkt_desc(&packet, desc,
+						INTEL_PT_PKT_DESC_MAX);
+			if (ret > 0)
+				color_fprintf(stdout, color, " %s\n", desc);
+		} else {
+			color_fprintf(stdout, color, " Bad packet!\n");
+		}
+		pos += pkt_len;
+		buf += pkt_len;
+		len -= pkt_len;
+	}
+}
+
+static void intel_pt_dump_event(struct intel_pt *pt, unsigned char *buf,
+				size_t len)
+{
+	printf(".\n");
+	intel_pt_dump(pt, buf, len);
+}
+
+static int intel_pt_do_fix_overlap(struct intel_pt *pt, struct auxtrace_buffer *a,
+				   struct auxtrace_buffer *b)
+{
+	void *start;
+
+	start = intel_pt_find_overlap(a->data, a->size, b->data, b->size,
+				      pt->have_tsc);
+	if (!start)
+		return -EINVAL;
+	b->use_size = b->data + b->size - start;
+	b->use_data = start;
+	return 0;
+}
+
+static void intel_pt_use_buffer_pid_tid(struct intel_pt_queue *ptq,
+					struct auxtrace_queue *queue,
+					struct auxtrace_buffer *buffer)
+{
+	if (queue->cpu == -1 && buffer->cpu != -1)
+		ptq->cpu = buffer->cpu;
+
+	ptq->pid = buffer->pid;
+	ptq->tid = buffer->tid;
+
+	intel_pt_log("queue %u cpu %d pid %d tid %d\n",
+		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+
+	thread__zput(ptq->thread);
+
+	if (ptq->tid != -1) {
+		if (ptq->pid != -1)
+			ptq->thread = machine__findnew_thread(ptq->pt->machine,
+							      ptq->pid,
+							      ptq->tid);
+		else
+			ptq->thread = machine__find_thread(ptq->pt->machine, -1,
+							   ptq->tid);
+	}
+}
+
+/* This function assumes data is processed sequentially only */
+static int intel_pt_get_trace(struct intel_pt_buffer *b, void *data)
+{
+	struct intel_pt_queue *ptq = data;
+	struct auxtrace_buffer *buffer = ptq->buffer, *old_buffer = buffer;
+	struct auxtrace_queue *queue;
+
+	if (ptq->stop) {
+		b->len = 0;
+		return 0;
+	}
+
+	queue = &ptq->pt->queues.queue_array[ptq->queue_nr];
+
+	buffer = auxtrace_buffer__next(queue, buffer);
+	if (!buffer) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		b->len = 0;
+		return 0;
+	}
+
+	ptq->buffer = buffer;
+
+	if (!buffer->data) {
+		int fd = perf_data_file__fd(ptq->pt->session->file);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data)
+			return -ENOMEM;
+	}
+
+	if (ptq->pt->snapshot_mode && !buffer->consecutive && old_buffer &&
+	    intel_pt_do_fix_overlap(ptq->pt, old_buffer, buffer))
+		return -ENOMEM;
+
+	if (old_buffer)
+		auxtrace_buffer__drop_data(old_buffer);
+
+	if (buffer->use_data) {
+		b->len = buffer->use_size;
+		b->buf = buffer->use_data;
+	} else {
+		b->len = buffer->size;
+		b->buf = buffer->data;
+	}
+	b->ref_timestamp = buffer->reference;
+
+	if (!old_buffer || ptq->pt->sampling_mode || (ptq->pt->snapshot_mode &&
+						      !buffer->consecutive)) {
+		b->consecutive = false;
+		b->trace_nr = buffer->buffer_nr + 1;
+	} else {
+		b->consecutive = true;
+	}
+
+	if (ptq->use_buffer_pid_tid && (ptq->pid != buffer->pid ||
+					ptq->tid != buffer->tid))
+		intel_pt_use_buffer_pid_tid(ptq, queue, buffer);
+
+	if (ptq->step_through_buffers)
+		ptq->stop = true;
+
+	if (!b->len)
+		return intel_pt_get_trace(b, data);
+
+	return 0;
+}
+
+struct intel_pt_cache_entry {
+	struct auxtrace_cache_entry	entry;
+	u64				insn_cnt;
+	u64				byte_cnt;
+	enum intel_pt_insn_op		op;
+	enum intel_pt_insn_branch	branch;
+	int				length;
+	int32_t				rel;
+};
+
+static int intel_pt_config_div(const char *var, const char *value, void *data)
+{
+	int *d = data;
+	long val;
+
+	if (!strcmp(var, "intel-pt.cache-divisor")) {
+		val = strtol(value, NULL, 0);
+		if (val > 0 && val <= INT_MAX)
+			*d = val;
+	}
+
+	return 0;
+}
+
+static int intel_pt_cache_divisor(void)
+{
+	static int d;
+
+	if (d)
+		return d;
+
+	perf_config(intel_pt_config_div, &d);
+
+	if (!d)
+		d = 64;
+
+	return d;
+}
+
+static unsigned int intel_pt_cache_size(struct dso *dso,
+					struct machine *machine)
+{
+	off_t size;
+
+	size = dso__data_size(dso, machine);
+	size /= intel_pt_cache_divisor();
+	if (size < 1000)
+		return 10;
+	if (size > (1 << 21))
+		return 21;
+	return 32 - __builtin_clz(size);
+}
+
+static struct auxtrace_cache *intel_pt_cache(struct dso *dso,
+					     struct machine *machine)
+{
+	struct auxtrace_cache *c;
+	unsigned int bits;
+
+	if (dso->auxtrace_cache)
+		return dso->auxtrace_cache;
+
+	bits = intel_pt_cache_size(dso, machine);
+
+	/* Ignoring cache creation failure */
+	c = auxtrace_cache__new(bits, sizeof(struct intel_pt_cache_entry), 200);
+
+	dso->auxtrace_cache = c;
+
+	return c;
+}
+
+static int intel_pt_cache_add(struct dso *dso, struct machine *machine,
+			      u64 offset, u64 insn_cnt, u64 byte_cnt,
+			      struct intel_pt_insn *intel_pt_insn)
+{
+	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
+	struct intel_pt_cache_entry *e;
+	int err;
+
+	if (!c)
+		return -ENOMEM;
+
+	e = auxtrace_cache__alloc_entry(c);
+	if (!e)
+		return -ENOMEM;
+
+	e->insn_cnt = insn_cnt;
+	e->byte_cnt = byte_cnt;
+	e->op = intel_pt_insn->op;
+	e->branch = intel_pt_insn->branch;
+	e->length = intel_pt_insn->length;
+	e->rel = intel_pt_insn->rel;
+
+	err = auxtrace_cache__add(c, offset, &e->entry);
+	if (err)
+		auxtrace_cache__free_entry(c, e);
+
+	return err;
+}
+
+static struct intel_pt_cache_entry *
+intel_pt_cache_lookup(struct dso *dso, struct machine *machine, u64 offset)
+{
+	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
+
+	if (!c)
+		return NULL;
+
+	return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
+}
+
+static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
+				   uint64_t *insn_cnt_ptr, uint64_t *ip,
+				   uint64_t to_ip, uint64_t max_insn_cnt,
+				   void *data)
+{
+	struct intel_pt_queue *ptq = data;
+	struct machine *machine = ptq->pt->machine;
+	struct thread *thread;
+	struct addr_location al;
+	unsigned char buf[1024];
+	size_t bufsz;
+	ssize_t len;
+	int x86_64;
+	u8 cpumode;
+	u64 offset, start_offset, start_ip;
+	u64 insn_cnt = 0;
+	bool one_map = true;
+
+	if (to_ip && *ip == to_ip)
+		goto out_no_cache;
+
+	bufsz = intel_pt_insn_max_size();
+
+	if (*ip >= ptq->pt->kernel_start)
+		cpumode = PERF_RECORD_MISC_KERNEL;
+	else
+		cpumode = PERF_RECORD_MISC_USER;
+
+	thread = ptq->thread;
+	if (!thread) {
+		if (cpumode != PERF_RECORD_MISC_KERNEL)
+			return -EINVAL;
+		thread = ptq->pt->unknown_thread;
+	}
+
+	while (1) {
+		thread__find_addr_map(thread, cpumode, MAP__FUNCTION, *ip, &al);
+		if (!al.map || !al.map->dso)
+			return -EINVAL;
+
+		if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR &&
+		    dso__data_status_seen(al.map->dso,
+					  DSO_DATA_STATUS_SEEN_ITRACE))
+			return -ENOENT;
+
+		offset = al.map->map_ip(al.map, *ip);
+
+		if (!to_ip && one_map) {
+			struct intel_pt_cache_entry *e;
+
+			e = intel_pt_cache_lookup(al.map->dso, machine, offset);
+			if (e &&
+			    (!max_insn_cnt || e->insn_cnt <= max_insn_cnt)) {
+				*insn_cnt_ptr = e->insn_cnt;
+				*ip += e->byte_cnt;
+				intel_pt_insn->op = e->op;
+				intel_pt_insn->branch = e->branch;
+				intel_pt_insn->length = e->length;
+				intel_pt_insn->rel = e->rel;
+				intel_pt_log_insn_no_data(intel_pt_insn, *ip);
+				return 0;
+			}
+		}
+
+		start_offset = offset;
+		start_ip = *ip;
+
+		/* Load maps to ensure dso->is_64_bit has been updated */
+		map__load(al.map, machine->symbol_filter);
+
+		x86_64 = al.map->dso->is_64_bit;
+
+		while (1) {
+			len = dso__data_read_offset(al.map->dso, machine,
+						    offset, buf, bufsz);
+			if (len <= 0)
+				return -EINVAL;
+
+			if (intel_pt_get_insn(buf, len, x86_64, intel_pt_insn))
+				return -EINVAL;
+
+			intel_pt_log_insn(intel_pt_insn, *ip);
+
+			insn_cnt += 1;
+
+			if (intel_pt_insn->branch != INTEL_PT_BR_NO_BRANCH)
+				goto out;
+
+			if (max_insn_cnt && insn_cnt >= max_insn_cnt)
+				goto out_no_cache;
+
+			*ip += intel_pt_insn->length;
+
+			if (to_ip && *ip == to_ip)
+				goto out_no_cache;
+
+			if (*ip >= al.map->end)
+				break;
+
+			offset += intel_pt_insn->length;
+		}
+		one_map = false;
+	}
+out:
+	*insn_cnt_ptr = insn_cnt;
+
+	if (!one_map)
+		goto out_no_cache;
+
+	/*
+	 * Didn't lookup in the 'to_ip' case, so do it now to prevent duplicate
+	 * entries.
+	 */
+	if (to_ip) {
+		struct intel_pt_cache_entry *e;
+
+		e = intel_pt_cache_lookup(al.map->dso, machine, start_offset);
+		if (e)
+			return 0;
+	}
+
+	/* Ignore cache errors */
+	intel_pt_cache_add(al.map->dso, machine, start_offset, insn_cnt,
+			   *ip - start_ip, intel_pt_insn);
+
+	return 0;
+
+out_no_cache:
+	*insn_cnt_ptr = insn_cnt;
+	return 0;
+}
+
+static bool intel_pt_get_config(struct intel_pt *pt,
+				struct perf_event_attr *attr, u64 *config)
+{
+	if (attr->type == pt->pmu_type) {
+		if (config)
+			*config = attr->config;
+		return true;
+	}
+
+	return false;
+}
+
+static bool intel_pt_exclude_kernel(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, NULL) &&
+		    !evsel->attr.exclude_kernel)
+			return false;
+	}
+	return true;
+}
+
+static bool intel_pt_return_compression(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	u64 config;
+
+	if (!pt->noretcomp_bit)
+		return true;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, &config) &&
+		    (config & pt->noretcomp_bit))
+			return false;
+	}
+	return true;
+}
+
+static bool intel_pt_timeless_decoding(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	bool timeless_decoding = true;
+	u64 config;
+
+	if (!pt->tsc_bit || !pt->cap_user_time_zero)
+		return true;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (!(evsel->attr.sample_type & PERF_SAMPLE_TIME))
+			return true;
+		if (intel_pt_get_config(pt, &evsel->attr, &config)) {
+			if (config & pt->tsc_bit)
+				timeless_decoding = false;
+			else
+				return true;
+		}
+	}
+	return timeless_decoding;
+}
+
+static bool intel_pt_tracing_kernel(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, NULL) &&
+		    !evsel->attr.exclude_kernel)
+			return true;
+	}
+	return false;
+}
+
+static bool intel_pt_have_tsc(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	bool have_tsc = false;
+	u64 config;
+
+	if (!pt->tsc_bit)
+		return false;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, &config)) {
+			if (config & pt->tsc_bit)
+				have_tsc = true;
+			else
+				return false;
+		}
+	}
+	return have_tsc;
+}
+
+static u64 intel_pt_ns_to_ticks(const struct intel_pt *pt, u64 ns)
+{
+	u64 quot, rem;
+
+	quot = ns / pt->tc.time_mult;
+	rem  = ns % pt->tc.time_mult;
+	return (quot << pt->tc.time_shift) + (rem << pt->tc.time_shift) /
+		pt->tc.time_mult;
+}
+
+static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
+						   unsigned int queue_nr)
+{
+	struct intel_pt_params params = { .get_trace = 0, };
+	struct intel_pt_queue *ptq;
+
+	ptq = zalloc(sizeof(struct intel_pt_queue));
+	if (!ptq)
+		return NULL;
+
+	if (pt->synth_opts.callchain) {
+		size_t sz = sizeof(struct ip_callchain);
+
+		sz += pt->synth_opts.callchain_sz * sizeof(u64);
+		ptq->chain = zalloc(sz);
+		if (!ptq->chain)
+			goto out_free;
+	}
+
+	ptq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
+	if (!ptq->event_buf)
+		goto out_free;
+
+	ptq->pt = pt;
+	ptq->queue_nr = queue_nr;
+	ptq->exclude_kernel = intel_pt_exclude_kernel(pt);
+	ptq->pid = -1;
+	ptq->tid = -1;
+	ptq->cpu = -1;
+	ptq->next_tid = -1;
+
+	params.get_trace = intel_pt_get_trace;
+	params.walk_insn = intel_pt_walk_next_insn;
+	params.data = ptq;
+	params.return_compression = intel_pt_return_compression(pt);
+	params.max_non_turbo_ratio = pt->max_non_turbo_ratio;
+
+	if (pt->synth_opts.instructions) {
+		if (pt->synth_opts.period) {
+			switch (pt->synth_opts.period_type) {
+			case PERF_ITRACE_PERIOD_INSTRUCTIONS:
+				params.period_type =
+						INTEL_PT_PERIOD_INSTRUCTIONS;
+				params.period = pt->synth_opts.period;
+				break;
+			case PERF_ITRACE_PERIOD_TICKS:
+				params.period_type = INTEL_PT_PERIOD_TICKS;
+				params.period = pt->synth_opts.period;
+				break;
+			case PERF_ITRACE_PERIOD_NANOSECS:
+				params.period_type = INTEL_PT_PERIOD_TICKS;
+				params.period = intel_pt_ns_to_ticks(pt,
+							pt->synth_opts.period);
+				break;
+			default:
+				break;
+			}
+		}
+
+		if (!params.period) {
+			params.period_type = INTEL_PT_PERIOD_INSTRUCTIONS;
+			params.period = 1000;
+		}
+	}
+
+	ptq->decoder = intel_pt_decoder_new(&params);
+	if (!ptq->decoder)
+		goto out_free;
+
+	return ptq;
+
+out_free:
+	zfree(&ptq->event_buf);
+	zfree(&ptq->chain);
+	free(ptq);
+	return NULL;
+}
+
+static void intel_pt_free_queue(void *priv)
+{
+	struct intel_pt_queue *ptq = priv;
+
+	if (!ptq)
+		return;
+	thread__zput(ptq->thread);
+	intel_pt_decoder_free(ptq->decoder);
+	zfree(&ptq->event_buf);
+	zfree(&ptq->chain);
+	free(ptq);
+}
+
+static void intel_pt_set_pid_tid_cpu(struct intel_pt *pt,
+				     struct auxtrace_queue *queue)
+{
+	struct intel_pt_queue *ptq = queue->priv;
+
+	if (queue->tid == -1 || pt->have_sched_switch) {
+		ptq->tid = machine__get_current_tid(pt->machine, ptq->cpu);
+		thread__zput(ptq->thread);
+	}
+
+	if (!ptq->thread && ptq->tid != -1)
+		ptq->thread = machine__find_thread(pt->machine, -1, ptq->tid);
+
+	if (ptq->thread) {
+		ptq->pid = ptq->thread->pid_;
+		if (queue->cpu == -1)
+			ptq->cpu = ptq->thread->cpu;
+	}
+}
+
+static void intel_pt_sample_flags(struct intel_pt_queue *ptq)
+{
+	if (ptq->state->flags & INTEL_PT_ABORT_TX) {
+		ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TX_ABORT;
+	} else if (ptq->state->flags & INTEL_PT_ASYNC) {
+		if (ptq->state->to_ip)
+			ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+				     PERF_IP_FLAG_ASYNC |
+				     PERF_IP_FLAG_INTERRUPT;
+		else
+			ptq->flags = PERF_IP_FLAG_BRANCH |
+				     PERF_IP_FLAG_TRACE_END;
+		ptq->insn_len = 0;
+	} else {
+		if (ptq->state->from_ip)
+			ptq->flags = intel_pt_insn_type(ptq->state->insn_op);
+		else
+			ptq->flags = PERF_IP_FLAG_BRANCH |
+				     PERF_IP_FLAG_TRACE_BEGIN;
+		if (ptq->state->flags & INTEL_PT_IN_TX)
+			ptq->flags |= PERF_IP_FLAG_IN_TX;
+		ptq->insn_len = ptq->state->insn_len;
+	}
+}
+
+static int intel_pt_setup_queue(struct intel_pt *pt,
+				struct auxtrace_queue *queue,
+				unsigned int queue_nr)
+{
+	struct intel_pt_queue *ptq = queue->priv;
+
+	if (list_empty(&queue->head))
+		return 0;
+
+	if (!ptq) {
+		ptq = intel_pt_alloc_queue(pt, queue_nr);
+		if (!ptq)
+			return -ENOMEM;
+		queue->priv = ptq;
+
+		if (queue->cpu != -1)
+			ptq->cpu = queue->cpu;
+		ptq->tid = queue->tid;
+
+		if (pt->sampling_mode) {
+			if (pt->timeless_decoding)
+				ptq->step_through_buffers = true;
+			if (pt->timeless_decoding || !pt->have_sched_switch)
+				ptq->use_buffer_pid_tid = true;
+		}
+	}
+
+	if (!ptq->on_heap &&
+	    (!pt->sync_switch ||
+	     ptq->switch_state != INTEL_PT_SS_EXPECTING_SWITCH_EVENT)) {
+		const struct intel_pt_state *state;
+		int ret;
+
+		if (pt->timeless_decoding)
+			return 0;
+
+		intel_pt_log("queue %u getting timestamp\n", queue_nr);
+		intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
+			     queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+		while (1) {
+			state = intel_pt_decode(ptq->decoder);
+			if (state->err) {
+				if (state->err == INTEL_PT_ERR_NODATA) {
+					intel_pt_log("queue %u has no timestamp\n",
+						     queue_nr);
+					return 0;
+				}
+				continue;
+			}
+			if (state->timestamp)
+				break;
+		}
+
+		ptq->timestamp = state->timestamp;
+		intel_pt_log("queue %u timestamp 0x%" PRIx64 "\n",
+			     queue_nr, ptq->timestamp);
+		ptq->state = state;
+		ptq->have_sample = true;
+		intel_pt_sample_flags(ptq);
+		ret = auxtrace_heap__add(&pt->heap, queue_nr, ptq->timestamp);
+		if (ret)
+			return ret;
+		ptq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int intel_pt_setup_queues(struct intel_pt *pt)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < pt->queues.nr_queues; i++) {
+		ret = intel_pt_setup_queue(pt, &pt->queues.queue_array[i], i);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static int intel_pt_inject_event(union perf_event *event,
+				 struct perf_sample *sample, u64 type,
+				 bool swapped)
+{
+	event->header.size = perf_event__sample_event_size(sample, type, 0);
+	return perf_event__synthesize_sample(event, type, 0, sample, swapped);
+}
+
+static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event *event = ptq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->branches_id;
+	sample.stream_id = ptq->pt->branches_id;
+	sample.period = 1;
+	sample.cpu = ptq->cpu;
+	sample.flags = ptq->flags;
+	sample.insn_len = ptq->insn_len;
+
+	if (pt->branches_filter && !(pt->branches_filter & ptq->flags))
+		return 0;
+
+	if (pt->synth_opts.inject) {
+		ret = intel_pt_inject_event(event, &sample,
+					    pt->branches_sample_type,
+					    pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver branch event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event *event = ptq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->instructions_id;
+	sample.stream_id = ptq->pt->instructions_id;
+	sample.period = ptq->pt->instructions_sample_period;
+	sample.cpu = ptq->cpu;
+	sample.flags = ptq->flags;
+	sample.insn_len = ptq->insn_len;
+
+	if (pt->synth_opts.callchain) {
+		thread_stack__sample(ptq->thread, ptq->chain,
+				     pt->synth_opts.callchain_sz, sample.ip);
+		sample.callchain = ptq->chain;
+	}
+
+	if (pt->synth_opts.inject) {
+		ret = intel_pt_inject_event(event, &sample,
+					    pt->instructions_sample_type,
+					    pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver instruction event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_transaction_sample(struct intel_pt_queue *ptq)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event *event = ptq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->transactions_id;
+	sample.stream_id = ptq->pt->transactions_id;
+	sample.period = 1;
+	sample.cpu = ptq->cpu;
+	sample.flags = ptq->flags;
+	sample.insn_len = ptq->insn_len;
+
+	if (pt->synth_opts.callchain) {
+		thread_stack__sample(ptq->thread, ptq->chain,
+				     pt->synth_opts.callchain_sz, sample.ip);
+		sample.callchain = ptq->chain;
+	}
+
+	if (pt->synth_opts.inject) {
+		ret = intel_pt_inject_event(event, &sample,
+					    pt->transactions_sample_type,
+					    pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver transaction event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_error(struct intel_pt *pt, int code, int cpu,
+				pid_t pid, pid_t tid, u64 ip)
+{
+	union perf_event event;
+	char msg[MAX_AUXTRACE_ERROR_MSG];
+	int err;
+
+	intel_pt__strerror(code, msg, MAX_AUXTRACE_ERROR_MSG);
+
+	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
+			     code, cpu, pid, tid, ip, msg);
+
+	err = perf_session__deliver_synth_event(pt->session, &event, NULL);
+	if (err)
+		pr_err("Intel Processor Trace: failed to deliver error event, error %d\n",
+		       err);
+
+	return err;
+}
+
+static int intel_pt_next_tid(struct intel_pt *pt, struct intel_pt_queue *ptq)
+{
+	struct auxtrace_queue *queue;
+	pid_t tid = ptq->next_tid;
+	int err;
+
+	if (tid == -1)
+		return 0;
+
+	intel_pt_log("switch: cpu %d tid %d\n", ptq->cpu, tid);
+
+	err = machine__set_current_tid(pt->machine, ptq->cpu, -1, tid);
+
+	queue = &pt->queues.queue_array[ptq->queue_nr];
+	intel_pt_set_pid_tid_cpu(pt, queue);
+
+	ptq->next_tid = -1;
+
+	return err;
+}
+
+static inline bool intel_pt_is_switch_ip(struct intel_pt_queue *ptq, u64 ip)
+{
+	struct intel_pt *pt = ptq->pt;
+
+	return ip == pt->switch_ip &&
+	       (ptq->flags & PERF_IP_FLAG_BRANCH) &&
+	       !(ptq->flags & (PERF_IP_FLAG_CONDITIONAL | PERF_IP_FLAG_ASYNC |
+			       PERF_IP_FLAG_INTERRUPT | PERF_IP_FLAG_TX_ABORT));
+}
+
+static int intel_pt_sample(struct intel_pt_queue *ptq)
+{
+	const struct intel_pt_state *state = ptq->state;
+	struct intel_pt *pt = ptq->pt;
+	int err;
+
+	if (!ptq->have_sample)
+		return 0;
+
+	ptq->have_sample = false;
+
+	if (pt->sample_instructions &&
+	    (state->type & INTEL_PT_INSTRUCTION)) {
+		err = intel_pt_synth_instruction_sample(ptq);
+		if (err)
+			return err;
+	}
+
+	if (pt->sample_transactions &&
+	    (state->type & INTEL_PT_TRANSACTION)) {
+		err = intel_pt_synth_transaction_sample(ptq);
+		if (err)
+			return err;
+	}
+
+	if (!(state->type & INTEL_PT_BRANCH))
+		return 0;
+
+	if (pt->synth_opts.callchain)
+		thread_stack__event(ptq->thread, ptq->flags, state->from_ip,
+				    state->to_ip, ptq->insn_len,
+				    state->trace_nr);
+	else
+		thread_stack__set_trace_nr(ptq->thread, state->trace_nr);
+
+	if (pt->sample_branches) {
+		err = intel_pt_synth_branch_sample(ptq);
+		if (err)
+			return err;
+	}
+
+	if (!pt->sync_switch)
+		return 0;
+
+	if (intel_pt_is_switch_ip(ptq, state->to_ip)) {
+		switch (ptq->switch_state) {
+		case INTEL_PT_SS_UNKNOWN:
+		case INTEL_PT_SS_EXPECTING_SWITCH_IP:
+			err = intel_pt_next_tid(pt, ptq);
+			if (err)
+				return err;
+			ptq->switch_state = INTEL_PT_SS_TRACING;
+			break;
+		default:
+			ptq->switch_state = INTEL_PT_SS_EXPECTING_SWITCH_EVENT;
+			return 1;
+		}
+	} else if (!state->to_ip) {
+		ptq->switch_state = INTEL_PT_SS_NOT_TRACING;
+	} else if (ptq->switch_state == INTEL_PT_SS_NOT_TRACING) {
+		ptq->switch_state = INTEL_PT_SS_UNKNOWN;
+	} else if (ptq->switch_state == INTEL_PT_SS_UNKNOWN &&
+		   state->to_ip == pt->ptss_ip &&
+		   (ptq->flags & PERF_IP_FLAG_CALL)) {
+		ptq->switch_state = INTEL_PT_SS_TRACING;
+	}
+
+	return 0;
+}
+
+static u64 intel_pt_switch_ip(struct machine *machine, u64 *ptss_ip)
+{
+	struct map *map;
+	struct symbol *sym, *start;
+	u64 ip, switch_ip = 0;
+
+	if (ptss_ip)
+		*ptss_ip = 0;
+
+	map = machine__kernel_map(machine, MAP__FUNCTION);
+	if (!map)
+		return 0;
+
+	if (map__load(map, machine->symbol_filter))
+		return 0;
+
+	start = dso__first_symbol(map->dso, MAP__FUNCTION);
+
+	for (sym = start; sym; sym = dso__next_symbol(sym)) {
+		if (sym->binding == STB_GLOBAL &&
+		    !strcmp(sym->name, "__switch_to")) {
+			ip = map->unmap_ip(map, sym->start);
+			if (ip >= map->start && ip < map->end) {
+				switch_ip = ip;
+				break;
+			}
+		}
+	}
+
+	if (!switch_ip || !ptss_ip)
+		return 0;
+
+	for (sym = start; sym; sym = dso__next_symbol(sym)) {
+		if (!strcmp(sym->name, "perf_trace_sched_switch")) {
+			ip = map->unmap_ip(map, sym->start);
+			if (ip >= map->start && ip < map->end) {
+				*ptss_ip = ip;
+				break;
+			}
+		}
+	}
+
+	return switch_ip;
+}
+
+static int intel_pt_run_decoder(struct intel_pt_queue *ptq, u64 *timestamp)
+{
+	const struct intel_pt_state *state = ptq->state;
+	struct intel_pt *pt = ptq->pt;
+	int err;
+
+	if (!pt->kernel_start) {
+		pt->kernel_start = machine__kernel_start(pt->machine);
+		if (pt->per_cpu_mmaps && pt->have_sched_switch &&
+		    !pt->timeless_decoding && intel_pt_tracing_kernel(pt) &&
+		    !pt->sampling_mode) {
+			pt->switch_ip = intel_pt_switch_ip(pt->machine,
+							   &pt->ptss_ip);
+			if (pt->switch_ip) {
+				intel_pt_log("switch_ip: %"PRIx64" ptss_ip: %"PRIx64"\n",
+					     pt->switch_ip, pt->ptss_ip);
+				pt->sync_switch = true;
+			}
+		}
+	}
+
+	intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
+		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+	while (1) {
+		err = intel_pt_sample(ptq);
+		if (err)
+			return err;
+
+		state = intel_pt_decode(ptq->decoder);
+		if (state->err) {
+			if (state->err == INTEL_PT_ERR_NODATA)
+				return 1;
+			if (pt->sync_switch &&
+			    state->from_ip >= pt->kernel_start) {
+				pt->sync_switch = false;
+				intel_pt_next_tid(pt, ptq);
+			}
+			if (pt->synth_opts.errors) {
+				err = intel_pt_synth_error(pt, state->err,
+							   ptq->cpu, ptq->pid,
+							   ptq->tid,
+							   state->from_ip);
+				if (err)
+					return err;
+			}
+			continue;
+		}
+
+		ptq->state = state;
+		ptq->have_sample = true;
+		intel_pt_sample_flags(ptq);
+
+		/* Use estimated TSC upon return to user space */
+		if (pt->est_tsc &&
+		    (state->from_ip >= pt->kernel_start || !state->from_ip) &&
+		    state->to_ip && state->to_ip < pt->kernel_start) {
+			intel_pt_log("TSC %"PRIx64" est. TSC %"PRIx64"\n",
+				     state->timestamp, state->est_timestamp);
+			ptq->timestamp = state->est_timestamp;
+		/* Use estimated TSC in unknown switch state */
+		} else if (pt->sync_switch &&
+			   ptq->switch_state == INTEL_PT_SS_UNKNOWN &&
+			   intel_pt_is_switch_ip(ptq, state->to_ip) &&
+			   ptq->next_tid == -1) {
+			intel_pt_log("TSC %"PRIx64" est. TSC %"PRIx64"\n",
+				     state->timestamp, state->est_timestamp);
+			ptq->timestamp = state->est_timestamp;
+		} else if (state->timestamp > ptq->timestamp) {
+			ptq->timestamp = state->timestamp;
+		}
+
+		if (!pt->timeless_decoding && ptq->timestamp >= *timestamp) {
+			*timestamp = ptq->timestamp;
+			return 0;
+		}
+	}
+	return 0;
+}
+
+static inline int intel_pt_update_queues(struct intel_pt *pt)
+{
+	if (pt->queues.new_data) {
+		pt->queues.new_data = false;
+		return intel_pt_setup_queues(pt);
+	}
+	return 0;
+}
+
+static int intel_pt_process_queues(struct intel_pt *pt, u64 timestamp)
+{
+	unsigned int queue_nr;
+	u64 ts;
+	int ret;
+
+	while (1) {
+		struct auxtrace_queue *queue;
+		struct intel_pt_queue *ptq;
+
+		if (!pt->heap.heap_cnt)
+			return 0;
+
+		if (pt->heap.heap_array[0].ordinal >= timestamp)
+			return 0;
+
+		queue_nr = pt->heap.heap_array[0].queue_nr;
+		queue = &pt->queues.queue_array[queue_nr];
+		ptq = queue->priv;
+
+		intel_pt_log("queue %u processing 0x%" PRIx64 " to 0x%" PRIx64 "\n",
+			     queue_nr, pt->heap.heap_array[0].ordinal,
+			     timestamp);
+
+		auxtrace_heap__pop(&pt->heap);
+
+		if (pt->heap.heap_cnt) {
+			ts = pt->heap.heap_array[0].ordinal + 1;
+			if (ts > timestamp)
+				ts = timestamp;
+		} else {
+			ts = timestamp;
+		}
+
+		intel_pt_set_pid_tid_cpu(pt, queue);
+
+		ret = intel_pt_run_decoder(ptq, &ts);
+
+		if (ret < 0) {
+			auxtrace_heap__add(&pt->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&pt->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			ptq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int intel_pt_process_timeless_queues(struct intel_pt *pt, pid_t tid,
+					    u64 time_)
+{
+	struct auxtrace_queues *queues = &pt->queues;
+	unsigned int i;
+	u64 ts = 0;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &pt->queues.queue_array[i];
+		struct intel_pt_queue *ptq = queue->priv;
+
+		if (ptq && (tid == -1 || ptq->tid == tid)) {
+			ptq->time = time_;
+			intel_pt_set_pid_tid_cpu(pt, queue);
+			intel_pt_run_decoder(ptq, &ts);
+		}
+	}
+	return 0;
+}
+
+static int intel_pt_lost(struct intel_pt *pt, struct perf_sample *sample)
+{
+	return intel_pt_synth_error(pt, INTEL_PT_ERR_LOST, sample->cpu,
+				    sample->pid, sample->tid, 0);
+}
+
+static struct intel_pt_queue *intel_pt_cpu_to_ptq(struct intel_pt *pt, int cpu)
+{
+	unsigned i, j;
+
+	if (cpu < 0 || !pt->queues.nr_queues)
+		return NULL;
+
+	if ((unsigned)cpu >= pt->queues.nr_queues)
+		i = pt->queues.nr_queues - 1;
+	else
+		i = cpu;
+
+	if (pt->queues.queue_array[i].cpu == cpu)
+		return pt->queues.queue_array[i].priv;
+
+	for (j = 0; i > 0; j++) {
+		if (pt->queues.queue_array[--i].cpu == cpu)
+			return pt->queues.queue_array[i].priv;
+	}
+
+	for (; j < pt->queues.nr_queues; j++) {
+		if (pt->queues.queue_array[j].cpu == cpu)
+			return pt->queues.queue_array[j].priv;
+	}
+
+	return NULL;
+}
+
+static int intel_pt_process_switch(struct intel_pt *pt,
+				   struct perf_sample *sample)
+{
+	struct intel_pt_queue *ptq;
+	struct perf_evsel *evsel;
+	pid_t tid;
+	int cpu, err;
+
+	evsel = perf_evlist__id2evsel(pt->session->evlist, sample->id);
+	if (evsel != pt->switch_evsel)
+		return 0;
+
+	tid = perf_evsel__intval(evsel, sample, "next_pid");
+	cpu = sample->cpu;
+
+	intel_pt_log("sched_switch: cpu %d tid %d time %"PRIu64" tsc %#"PRIx64"\n",
+		     cpu, tid, sample->time, perf_time_to_tsc(sample->time,
+		     &pt->tc));
+
+	if (!pt->sync_switch)
+		goto out;
+
+	ptq = intel_pt_cpu_to_ptq(pt, cpu);
+	if (!ptq)
+		goto out;
+
+	switch (ptq->switch_state) {
+	case INTEL_PT_SS_NOT_TRACING:
+		ptq->next_tid = -1;
+		break;
+	case INTEL_PT_SS_UNKNOWN:
+	case INTEL_PT_SS_TRACING:
+		ptq->next_tid = tid;
+		ptq->switch_state = INTEL_PT_SS_EXPECTING_SWITCH_IP;
+		return 0;
+	case INTEL_PT_SS_EXPECTING_SWITCH_EVENT:
+		if (!ptq->on_heap) {
+			ptq->timestamp = perf_time_to_tsc(sample->time,
+							  &pt->tc);
+			err = auxtrace_heap__add(&pt->heap, ptq->queue_nr,
+						 ptq->timestamp);
+			if (err)
+				return err;
+			ptq->on_heap = true;
+		}
+		ptq->switch_state = INTEL_PT_SS_TRACING;
+		break;
+	case INTEL_PT_SS_EXPECTING_SWITCH_IP:
+		ptq->next_tid = tid;
+		intel_pt_log("ERROR: cpu %d expecting switch ip\n", cpu);
+		break;
+	default:
+		break;
+	}
+out:
+	return machine__set_current_tid(pt->machine, cpu, -1, tid);
+}
+
+static int intel_pt_process_itrace_start(struct intel_pt *pt,
+					 union perf_event *event,
+					 struct perf_sample *sample)
+{
+	if (!pt->per_cpu_mmaps)
+		return 0;
+
+	intel_pt_log("itrace_start: cpu %d pid %d tid %d time %"PRIu64" tsc %#"PRIx64"\n",
+		     sample->cpu, event->itrace_start.pid,
+		     event->itrace_start.tid, sample->time,
+		     perf_time_to_tsc(sample->time, &pt->tc));
+
+	return machine__set_current_tid(pt->machine, sample->cpu,
+					event->itrace_start.pid,
+					event->itrace_start.tid);
+}
+
+static int intel_pt_process_event(struct perf_session *session,
+				  union perf_event *event,
+				  struct perf_sample *sample,
+				  struct perf_tool *tool)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+	u64 timestamp;
+	int err = 0;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("Intel Processor Trace requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time)
+		timestamp = perf_time_to_tsc(sample->time, &pt->tc);
+	else
+		timestamp = 0;
+
+	if (timestamp || pt->timeless_decoding) {
+		err = intel_pt_update_queues(pt);
+		if (err)
+			return err;
+	}
+
+	if (pt->timeless_decoding) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = intel_pt_process_timeless_queues(pt,
+							       event->comm.tid,
+							       sample->time);
+		}
+	} else if (timestamp) {
+		err = intel_pt_process_queues(pt, timestamp);
+	}
+	if (err)
+		return err;
+
+	if (event->header.type == PERF_RECORD_AUX &&
+	    (event->aux.flags & PERF_AUX_FLAG_TRUNCATED) &&
+	    pt->synth_opts.errors) {
+		err = intel_pt_lost(pt, sample);
+		if (err)
+			return err;
+	}
+
+	if (pt->switch_evsel && event->header.type == PERF_RECORD_SAMPLE)
+		err = intel_pt_process_switch(pt, sample);
+	else if (event->header.type == PERF_RECORD_ITRACE_START)
+		err = intel_pt_process_itrace_start(pt, event, sample);
+
+	intel_pt_log("event %s (%u): cpu %d time %"PRIu64" tsc %#"PRIx64"\n",
+		     perf_event__name(event->header.type), event->header.type,
+		     sample->cpu, sample->time, timestamp);
+
+	return err;
+}
+
+static int intel_pt_flush(struct perf_session *session, struct perf_tool *tool)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+	int ret;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = intel_pt_update_queues(pt);
+	if (ret < 0)
+		return ret;
+
+	if (pt->timeless_decoding)
+		return intel_pt_process_timeless_queues(pt, -1,
+							MAX_TIMESTAMP - 1);
+
+	return intel_pt_process_queues(pt, MAX_TIMESTAMP);
+}
+
+static void intel_pt_free_events(struct perf_session *session)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+	struct auxtrace_queues *queues = &pt->queues;
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		intel_pt_free_queue(queues->queue_array[i].priv);
+		queues->queue_array[i].priv = NULL;
+	}
+	intel_pt_log_disable();
+	auxtrace_queues__free(queues);
+}
+
+static void intel_pt_free(struct perf_session *session)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+
+	auxtrace_heap__free(&pt->heap);
+	intel_pt_free_events(session);
+	session->auxtrace = NULL;
+	thread__delete(pt->unknown_thread);
+	free(pt);
+}
+
+static int intel_pt_process_auxtrace_event(struct perf_session *session,
+					   union perf_event *event,
+					   struct perf_tool *tool __maybe_unused)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+
+	if (pt->sampling_mode)
+		return 0;
+
+	if (!pt->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data_file__fd(session->file);
+		int err;
+
+		if (perf_data_file__is_pipe(session->file)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&pt->queues, session, event,
+						 data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				intel_pt_dump_event(pt, buffer->data,
+						    buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
+		}
+	}
+
+	return 0;
+}
+
+struct intel_pt_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int intel_pt_event_synth(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample __maybe_unused,
+				struct machine *machine __maybe_unused)
+{
+	struct intel_pt_synth *intel_pt_synth =
+			container_of(tool, struct intel_pt_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(intel_pt_synth->session, event,
+						 NULL);
+}
+
+static int intel_pt_synth_event(struct perf_session *session,
+				struct perf_event_attr *attr, u64 id)
+{
+	struct intel_pt_synth intel_pt_synth;
+
+	memset(&intel_pt_synth, 0, sizeof(struct intel_pt_synth));
+	intel_pt_synth.session = session;
+
+	return perf_event__synthesize_attr(&intel_pt_synth.dummy_tool, attr, 1,
+					   &id, intel_pt_event_synth);
+}
+
+static int intel_pt_synth_events(struct intel_pt *pt,
+				 struct perf_session *session)
+{
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == pt->pmu_type && evsel->ids) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("There are no selected events with Intel Processor Trace data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+			    PERF_SAMPLE_PERIOD;
+	if (pt->timeless_decoding)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	else
+		attr.sample_type |= PERF_SAMPLE_TIME;
+	if (!pt->per_cpu_mmaps)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_CPU;
+	attr.exclude_user = evsel->attr.exclude_user;
+	attr.exclude_kernel = evsel->attr.exclude_kernel;
+	attr.exclude_hv = evsel->attr.exclude_hv;
+	attr.exclude_host = evsel->attr.exclude_host;
+	attr.exclude_guest = evsel->attr.exclude_guest;
+	attr.sample_id_all = evsel->attr.sample_id_all;
+	attr.read_format = evsel->attr.read_format;
+
+	id = evsel->id[0] + 1000000000;
+	if (!id)
+		id = 1;
+
+	if (pt->synth_opts.instructions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		if (pt->synth_opts.period_type == PERF_ITRACE_PERIOD_NANOSECS)
+			attr.sample_period =
+				intel_pt_ns_to_ticks(pt, pt->synth_opts.period);
+		else
+			attr.sample_period = pt->synth_opts.period;
+		pt->instructions_sample_period = attr.sample_period;
+		if (pt->synth_opts.callchain)
+			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
+		pr_debug("Synthesizing 'instructions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'instructions' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_instructions = true;
+		pt->instructions_sample_type = attr.sample_type;
+		pt->instructions_id = id;
+		id += 1;
+	}
+
+	if (pt->synth_opts.transactions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		attr.sample_period = 1;
+		if (pt->synth_opts.callchain)
+			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
+		pr_debug("Synthesizing 'transactions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'transactions' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_transactions = true;
+		pt->transactions_id = id;
+		id += 1;
+		evlist__for_each(evlist, evsel) {
+			if (evsel->id && evsel->id[0] == pt->transactions_id) {
+				if (evsel->name)
+					zfree(&evsel->name);
+				evsel->name = strdup("transactions");
+				break;
+			}
+		}
+	}
+
+	if (pt->synth_opts.branches) {
+		attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
+		attr.sample_period = 1;
+		attr.sample_type |= PERF_SAMPLE_ADDR;
+		attr.sample_type &= ~(u64)PERF_SAMPLE_CALLCHAIN;
+		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'branches' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_branches = true;
+		pt->branches_sample_type = attr.sample_type;
+		pt->branches_id = id;
+	}
+
+	pt->synth_needs_swap = evsel->needs_swap;
+
+	return 0;
+}
+
+static struct perf_evsel *intel_pt_find_sched_switch(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each_reverse(evlist, evsel) {
+		const char *name = perf_evsel__name(evsel);
+
+		if (!strcmp(name, "sched:sched_switch"))
+			return evsel;
+	}
+
+	return NULL;
+}
+
+static const char * const intel_pt_info_fmts[] = {
+	[INTEL_PT_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
+	[INTEL_PT_TIME_SHIFT]		= "  Time Shift         %"PRIu64"\n",
+	[INTEL_PT_TIME_MULT]		= "  Time Muliplier     %"PRIu64"\n",
+	[INTEL_PT_TIME_ZERO]		= "  Time Zero          %"PRIu64"\n",
+	[INTEL_PT_CAP_USER_TIME_ZERO]	= "  Cap Time Zero      %"PRId64"\n",
+	[INTEL_PT_TSC_BIT]		= "  TSC bit            %#"PRIx64"\n",
+	[INTEL_PT_NORETCOMP_BIT]	= "  NoRETComp bit      %#"PRIx64"\n",
+	[INTEL_PT_HAVE_SCHED_SWITCH]	= "  Have sched_switch  %"PRId64"\n",
+	[INTEL_PT_SNAPSHOT_MODE]	= "  Snapshot mode      %"PRId64"\n",
+	[INTEL_PT_PER_CPU_MMAPS]	= "  Per-cpu maps       %"PRId64"\n",
+};
+
+static void intel_pt_print_info(u64 *arr, int start, int finish)
+{
+	int i;
+
+	if (!dump_trace)
+		return;
+
+	for (i = start; i <= finish; i++)
+		fprintf(stdout, intel_pt_info_fmts[i], arr[i]);
+}
+
+int intel_pt_process_auxtrace_info(union perf_event *event,
+				   struct perf_session *session)
+{
+	struct auxtrace_info_event *auxtrace_info = &event->auxtrace_info;
+	size_t min_sz = sizeof(u64) * INTEL_PT_PER_CPU_MMAPS;
+	struct intel_pt *pt;
+	int err;
+
+	if (auxtrace_info->header.size < sizeof(struct auxtrace_info_event) +
+					min_sz)
+		return -EINVAL;
+
+	pt = zalloc(sizeof(struct intel_pt));
+	if (!pt)
+		return -ENOMEM;
+
+	err = auxtrace_queues__init(&pt->queues);
+	if (err)
+		goto err_free;
+
+	intel_pt_log_set_name(INTEL_PT_PMU_NAME);
+
+	pt->session = session;
+	pt->machine = &session->machines.host; /* No kvm support */
+	pt->auxtrace_type = auxtrace_info->type;
+	pt->pmu_type = auxtrace_info->priv[INTEL_PT_PMU_TYPE];
+	pt->tc.time_shift = auxtrace_info->priv[INTEL_PT_TIME_SHIFT];
+	pt->tc.time_mult = auxtrace_info->priv[INTEL_PT_TIME_MULT];
+	pt->tc.time_zero = auxtrace_info->priv[INTEL_PT_TIME_ZERO];
+	pt->cap_user_time_zero = auxtrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO];
+	pt->tsc_bit = auxtrace_info->priv[INTEL_PT_TSC_BIT];
+	pt->noretcomp_bit = auxtrace_info->priv[INTEL_PT_NORETCOMP_BIT];
+	pt->have_sched_switch = auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH];
+	pt->snapshot_mode = auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE];
+	pt->per_cpu_mmaps = auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS];
+	intel_pt_print_info(&auxtrace_info->priv[0], INTEL_PT_PMU_TYPE,
+			    INTEL_PT_PER_CPU_MMAPS);
+
+	pt->timeless_decoding = intel_pt_timeless_decoding(pt);
+	pt->have_tsc = intel_pt_have_tsc(pt);
+	pt->sampling_mode = false;
+	pt->est_tsc = !pt->timeless_decoding;
+
+	pt->unknown_thread = thread__new(999999999, 999999999);
+	if (!pt->unknown_thread) {
+		err = -ENOMEM;
+		goto err_free_queues;
+	}
+	err = thread__set_comm(pt->unknown_thread, "unknown", 0);
+	if (err)
+		goto err_delete_thread;
+	if (thread__init_map_groups(pt->unknown_thread, pt->machine)) {
+		err = -ENOMEM;
+		goto err_delete_thread;
+	}
+
+	pt->auxtrace.process_event = intel_pt_process_event;
+	pt->auxtrace.process_auxtrace_event = intel_pt_process_auxtrace_event;
+	pt->auxtrace.flush_events = intel_pt_flush;
+	pt->auxtrace.free_events = intel_pt_free_events;
+	pt->auxtrace.free = intel_pt_free;
+	session->auxtrace = &pt->auxtrace;
+
+	if (dump_trace)
+		return 0;
+
+	if (pt->have_sched_switch == 1) {
+		pt->switch_evsel = intel_pt_find_sched_switch(session->evlist);
+		if (!pt->switch_evsel) {
+			pr_err("%s: missing sched_switch event\n", __func__);
+			goto err_delete_thread;
+		}
+	}
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set) {
+		pt->synth_opts = *session->itrace_synth_opts;
+	} else {
+		itrace_synth_opts__set_default(&pt->synth_opts);
+		if (use_browser != -1) {
+			pt->synth_opts.branches = false;
+			pt->synth_opts.callchain = true;
+		}
+	}
+
+	if (pt->synth_opts.log)
+		intel_pt_log_enable();
+
+	/* Maximum non-turbo ratio is TSC freq / 100 MHz */
+	if (pt->tc.time_mult) {
+		u64 tsc_freq = intel_pt_ns_to_ticks(pt, 1000000000);
+
+		pt->max_non_turbo_ratio = (tsc_freq + 50000000) / 100000000;
+		intel_pt_log("TSC frequency %"PRIu64"\n", tsc_freq);
+		intel_pt_log("Maximum non-turbo ratio %u\n",
+			     pt->max_non_turbo_ratio);
+	}
+
+	if (pt->synth_opts.calls)
+		pt->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
+				       PERF_IP_FLAG_TRACE_END;
+	if (pt->synth_opts.returns)
+		pt->branches_filter |= PERF_IP_FLAG_RETURN |
+				       PERF_IP_FLAG_TRACE_BEGIN;
+
+	if (pt->synth_opts.callchain && !symbol_conf.use_callchain) {
+		symbol_conf.use_callchain = true;
+		if (callchain_register_param(&callchain_param) < 0) {
+			symbol_conf.use_callchain = false;
+			pt->synth_opts.callchain = false;
+		}
+	}
+
+	err = intel_pt_synth_events(pt, session);
+	if (err)
+		goto err_delete_thread;
+
+	err = auxtrace_queues__process_index(&pt->queues, session);
+	if (err)
+		goto err_delete_thread;
+
+	if (pt->queues.populated)
+		pt->data_queued = true;
+
+	if (pt->timeless_decoding)
+		pr_debug2("Intel PT decoding without timestamps\n");
+
+	return 0;
+
+err_delete_thread:
+	thread__delete(pt->unknown_thread);
+err_free_queues:
+	intel_pt_log_disable();
+	auxtrace_queues__free(&pt->queues);
+	session->auxtrace = NULL;
+err_free:
+	free(pt);
+	return err;
+}
diff --git a/tools/perf/util/intel-pt.h b/tools/perf/util/intel-pt.h
new file mode 100644
index 000000000000..a1bfe93473ba
--- /dev/null
+++ b/tools/perf/util/intel-pt.h
@@ -0,0 +1,51 @@
+/*
+ * intel_pt.h: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__PERF_INTEL_PT_H__
+#define INCLUDE__PERF_INTEL_PT_H__
+
+#define INTEL_PT_PMU_NAME "intel_pt"
+
+enum {
+	INTEL_PT_PMU_TYPE,
+	INTEL_PT_TIME_SHIFT,
+	INTEL_PT_TIME_MULT,
+	INTEL_PT_TIME_ZERO,
+	INTEL_PT_CAP_USER_TIME_ZERO,
+	INTEL_PT_TSC_BIT,
+	INTEL_PT_NORETCOMP_BIT,
+	INTEL_PT_HAVE_SCHED_SWITCH,
+	INTEL_PT_SNAPSHOT_MODE,
+	INTEL_PT_PER_CPU_MMAPS,
+	INTEL_PT_AUXTRACE_PRIV_MAX,
+};
+
+#define INTEL_PT_AUXTRACE_PRIV_SIZE (INTEL_PT_AUXTRACE_PRIV_MAX * sizeof(u64))
+
+struct auxtrace_record;
+struct perf_tool;
+union perf_event;
+struct perf_session;
+struct perf_event_attr;
+struct perf_pmu;
+
+struct auxtrace_record *intel_pt_recording_init(int *err);
+
+int intel_pt_process_auxtrace_info(union perf_event *event,
+				   struct perf_session *session);
+
+struct perf_event_attr *intel_pt_pmu_default_config(struct perf_pmu *pmu);
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 07/25] perf tools: Take Intel PT into use
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (5 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 06/25] perf tools: Add Intel PT support Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-20  9:59   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 08/25] perf tools: Add Intel BTS support Adrian Hunter
                   ` (17 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

To record an AUX area, the weak function auxtrace_record__init() must be
implemented.

Equally to decode an AUX area, the AUX area tracing type must be added
to the perf_event__process_auxtrace_info() function.

This patch makes those two changes plus hooks up default config for the
intel_pt PMU.  Also some brief documentation is provided for using the
tools with intel_pt.

E.g:

  [root@perf4 ~]# dmesg
  451 [0.405807] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver.
  [root@perf4 ~]# perf --version
  perf version 4.1.g53874a
  [root@perf4 ~]#  perf record -e intel_pt//u -a sleep 10
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.383 MB perf.data ]
  [root@perf4 ~]# perf evlist
  intel_pt//u
  sched:sched_switch
  dummy:u
  [root@perf4 ~]# perf report --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 0  of event 'intel_pt//u'
  # Event count (approx.): 0
  #
  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ......
  #

  # Samples: 393  of event 'sched:sched_switch'
  # Event count (approx.): 393
  #
  # Overhead  Command         Shared Object     Symbol
  # ........  ..............  ................  ..............
    49.62%  swapper         [kernel.vmlinux]  [k] __schedule
    10.69%  rcu_sched       [kernel.vmlinux]  [k] __schedule
     6.62%  rcuos/0         [kernel.vmlinux]  [k] __schedule
     5.60%  kworker/0:1     [kernel.vmlinux]  [k] __schedule
     3.56%  rcuos/3         [kernel.vmlinux]  [k] __schedule
     3.05%  kworker/u384:2  [kernel.vmlinux]  [k] __schedule
     2.54%  kworker/2:0     [kernel.vmlinux]  [k] __schedule
     2.54%  tuned           [kernel.vmlinux]  [k] __schedule
  <SNIP>
  # Samples: 0  of event 'dummy:u'
  # Event count (approx.): 0
  #
  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ......

  # Samples: 28  of event 'instructions:u'
  # Event count (approx.): 5030172
  #
  # Overhead  Command     Shared Object        Symbol
  # ........  ..........  ...................  ................................
  #
    21.43%  tuned       libpython2.7.so.1.0  [.] PyEval_EvalFrameEx
                 |
                 ---PyEval_EvalFrameEx
                    |
                    |--83.33%-- PyEval_EvalCodeEx
                    |          PyEval_EvalFrameEx
                    |          |
                    |          |--60.00%-- PyEval_EvalCodeEx
                    |          |          PyEval_EvalFrameEx
                    |          |          PyEval_EvalFrameEx
                    |          |
                    |           --40.00%-- PyEval_EvalFrameEx
                    |
                     --16.67%-- PyEval_EvalFrameEx
                               PyEval_EvalCodeEx
                               PyEval_EvalFrameEx
                               PyEval_EvalCodeEx
                               PyEval_EvalFrameEx
                               PyEval_EvalFrameEx

    14.29%  tuned       libpython2.7.so.1.0  [.] _PyType_Lookup
                 |
                 ---_PyType_Lookup
                    _PyObject_GenericGetAttrWithDict
                    PyEval_EvalFrameEx
                    PyEval_EvalCodeEx
                    PyEval_EvalFrameEx
                    PyEval_EvalCodeEx
                    PyEval_EvalFrameEx
                    |
                    |--75.00%-- PyEval_EvalFrameEx
                    |
                     --25.00%-- PyEval_EvalCodeEx
                               PyEval_EvalFrameEx
                               PyEval_EvalFrameEx

     3.57%  irqbalance  irqbalance           [.] 0x0000000000004038
            |
            ---0x4038
               0x4761
               0x4761
               0x4761
               0x49f1
               0x2295

     3.57%  irqbalance  libc-2.17.so         [.] __GI_____strtoull_l_internal
            |
            ---__GI_____strtoull_l_internal
               0x6f49
               0x229a

     3.57%  irqbalance  libc-2.17.so         [.] __strchrnul
            |
            ---__strchrnul
               vfprintf
               __vsprintf_chk
               __sprintf_chk
               0x2724
               0x4038
               0x2331

     3.57%  irqbalance  libc-2.17.so         [.] __strstr_sse42
            |
            ---__strstr_sse42
               0x71e0
               0x229f

  # And now to some userspace ftrace on uninstrumented binaries 8-) :
  # Hand edited to make it a bit more compact, replacing /home/acme/bin/perf
  # with /bin/perf:

  [root@perf4 ~]# perf script
     perf 8921 [3] 7.310889: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310889: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       481630 perf_evlist__enable (/bin/perf) => 4816d8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       4816de perf_evlist__enable (/bin/perf) => 48164f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310889: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310890: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310890: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310890: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310890: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310890: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310890: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310890: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310893: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310893: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       4816a8 perf_evlist__enable (/bin/perf) => 4815f8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       4815fe perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310893: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310956: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310956: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       481630 perf_evlist__enable (/bin/perf) => 4816d8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       4816de perf_evlist__enable (/bin/perf) => 48164f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310956: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310961: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310961: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310961: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310961: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310961: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310961: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310961: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310968: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310968: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       4816a8 perf_evlist__enable (/bin/perf) => 4815f8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       4815fe perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310968: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.311040: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311040: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       481630 perf_evlist__enable (/bin/perf) => 4816d8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       4816de perf_evlist__enable (/bin/perf) => 48164f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311040: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.311046: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311046: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311046: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311046: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311046: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.311046: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311046: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.311050: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311050: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
:

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/intel-pt.txt | 588 ++++++++++++++++++++++++++++++++++
 tools/perf/arch/x86/util/Build        |   2 +
 tools/perf/arch/x86/util/auxtrace.c   |  38 +++
 tools/perf/arch/x86/util/pmu.c        |  15 +
 tools/perf/util/auxtrace.c            |   5 +-
 tools/perf/util/pmu.c                 |   4 +-
 6 files changed, 649 insertions(+), 3 deletions(-)
 create mode 100644 tools/perf/Documentation/intel-pt.txt
 create mode 100644 tools/perf/arch/x86/util/auxtrace.c
 create mode 100644 tools/perf/arch/x86/util/pmu.c

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
new file mode 100644
index 000000000000..2866b62eb293
--- /dev/null
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -0,0 +1,588 @@
+Intel Processor Trace
+=====================
+
+Overview
+========
+
+Intel Processor Trace (Intel PT) is an extension of Intel Architecture that
+collects information about software execution such as control flow, execution
+modes and timings and formats it into highly compressed binary packets.
+Technical details are documented in the Intel 64 and IA-32 Architectures
+Software Developer Manuals, Chapter 36 Intel Processor Trace.
+
+Intel PT is first supported in Intel Core M and 5th generation Intel Core
+processors that are based on the Intel micro-architecture code name Broadwell.
+
+Trace data is collected by 'perf record' and stored within the perf.data file.
+See below for options to 'perf record'.
+
+Trace data must be 'decoded' which involves walking the object code and matching
+the trace data packets. For example a TNT packet only tells whether a
+conditional branch was taken or not taken, so to make use of that packet the
+decoder must know precisely which instruction was being executed.
+
+Decoding is done on-the-fly.  The decoder outputs samples in the same format as
+samples output by perf hardware events, for example as though the "instructions"
+or "branches" events had been recorded.  Presently 3 tools support this:
+'perf script', 'perf report' and 'perf inject'.  See below for more information
+on using those tools.
+
+The main distinguishing feature of Intel PT is that the decoder can determine
+the exact flow of software execution.  Intel PT can be used to understand why
+and how did software get to a certain point, or behave a certain way.  The
+software does not have to be recompiled, so Intel PT works with debug or release
+builds, however the executed images are needed - which makes use in JIT-compiled
+environments, or with self-modified code, a challenge.  Also symbols need to be
+provided to make sense of addresses.
+
+A limitation of Intel PT is that it produces huge amounts of trace data
+(hundreds of megabytes per second per core) which takes a long time to decode,
+for example two or three orders of magnitude longer than it took to collect.
+Another limitation is the performance impact of tracing, something that will
+vary depending on the use-case and architecture.
+
+
+Quickstart
+==========
+
+It is important to start small.  That is because it is easy to capture vastly
+more data than can possibly be processed.
+
+The simplest thing to do with Intel PT is userspace profiling of small programs.
+Data is captured with 'perf record' e.g. to trace 'ls' userspace-only:
+
+	perf record -e intel_pt//u ls
+
+And profiled with 'perf report' e.g.
+
+	perf report
+
+To also trace kernel space presents a problem, namely kernel self-modifying
+code.  A fairly good kernel image is available in /proc/kcore but to get an
+accurate image a copy of /proc/kcore needs to be made under the same conditions
+as the data capture.  A script perf-with-kcore can do that, but beware that the
+script makes use of 'sudo' to copy /proc/kcore.  If you have perf installed
+locally from the source tree you can do:
+
+	~/libexec/perf-core/perf-with-kcore record pt_ls -e intel_pt// -- ls
+
+which will create a directory named 'pt_ls' and put the perf.data file and
+copies of /proc/kcore, /proc/kallsyms and /proc/modules into it.  Then to use
+'perf report' becomes:
+
+	~/libexec/perf-core/perf-with-kcore report pt_ls
+
+Because samples are synthesized after-the-fact, the sampling period can be
+selected for reporting. e.g. sample every microsecond
+
+	~/libexec/perf-core/perf-with-kcore report pt_ls --itrace=i1usge
+
+See the sections below for more information about the --itrace option.
+
+Beware the smaller the period, the more samples that are produced, and the
+longer it takes to process them.
+
+Also note that the coarseness of Intel PT timing information will start to
+distort the statistical value of the sampling as the sampling period becomes
+smaller.
+
+To represent software control flow, "branches" samples are produced.  By default
+a branch sample is synthesized for every single branch.  To get an idea what
+data is available you can use the 'perf script' tool with no parameters, which
+will list all the samples.
+
+	perf record -e intel_pt//u ls
+	perf script
+
+An interesting field that is not printed by default is 'flags' which can be
+displayed as follows:
+
+	perf script -Fcomm,tid,pid,time,cpu,event,trace,ip,sym,dso,addr,symoff,flags
+
+The flags are "bcrosyiABEx" which stand for branch, call, return, conditional,
+system, asynchronous, interrupt, transaction abort, trace begin, trace end, and
+in transaction, respectively.
+
+While it is possible to create scripts to analyze the data, an alternative
+approach is available to export the data to a postgresql database.  Refer to
+script export-to-postgresql.py for more details, and to script
+call-graph-from-postgresql.py for an example of using the database.
+
+As mentioned above, it is easy to capture too much data.  One way to limit the
+data captured is to use 'snapshot' mode which is explained further below.
+Refer to 'new snapshot option' and 'Intel PT modes of operation' further below.
+
+Another problem that will be experienced is decoder errors.  They can be caused
+by inability to access the executed image, self-modified or JIT-ed code, or the
+inability to match side-band information (such as context switches and mmaps)
+which results in the decoder not knowing what code was executed.
+
+There is also the problem of perf not being able to copy the data fast enough,
+resulting in data lost because the buffer was full.  See 'Buffer handling' below
+for more details.
+
+
+perf record
+===========
+
+new event
+---------
+
+The Intel PT kernel driver creates a new PMU for Intel PT.  PMU events are
+selected by providing the PMU name followed by the "config" separated by slashes.
+An enhancement has been made to allow default "config" e.g. the option
+
+	-e intel_pt//
+
+will use a default config value.  Currently that is the same as
+
+	-e intel_pt/tsc,noretcomp=0/
+
+which is the same as
+
+	-e intel_pt/tsc=1,noretcomp=0/
+
+The config terms are listed in /sys/devices/intel_pt/format.  They are bit
+fields within the config member of the struct perf_event_attr which is
+passed to the kernel by the perf_event_open system call.  They correspond to bit
+fields in the IA32_RTIT_CTL MSR.  Here is a list of them and their definitions:
+
+	$ for f in `ls /sys/devices/intel_pt/format`;do
+	> echo $f
+	> cat /sys/devices/intel_pt/format/$f
+	> done
+	noretcomp
+	config:11
+	tsc
+	config:10
+
+Note that the default config must be overridden for each term i.e.
+
+	-e intel_pt/noretcomp=0/
+
+is the same as:
+
+	-e intel_pt/tsc=1,noretcomp=0/
+
+So, to disable TSC packets use:
+
+	-e intel_pt/tsc=0/
+
+It is also possible to specify the config value explicitly:
+
+	-e intel_pt/config=0x400/
+
+Note that, as with all events, the event is suffixed with event modifiers:
+
+	u	userspace
+	k	kernel
+	h	hypervisor
+	G	guest
+	H	host
+	p	precise ip
+
+'h', 'G' and 'H' are for virtualization which is not supported by Intel PT.
+'p' is also not relevant to Intel PT.  So only options 'u' and 'k' are
+meaningful for Intel PT.
+
+perf_event_attr is displayed if the -vv option is used e.g.
+
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             6
+	size                             112
+	config                           0x400
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|CPU|IDENTIFIER
+	read_format                      ID
+	disabled                         1
+	inherit                          1
+	exclude_kernel                   1
+	exclude_hv                       1
+	enable_on_exec                   1
+	sample_id_all                    1
+	------------------------------------------------------------
+	sys_perf_event_open: pid 31104  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 3  group_fd -1  flags 0x8
+	------------------------------------------------------------
+
+
+new snapshot option
+-------------------
+
+To select snapshot mode a new option has been added:
+
+	-S
+
+Optionally it can be followed by the snapshot size e.g.
+
+	-S0x100000
+
+The default snapshot size is the auxtrace mmap size.  If neither auxtrace mmap size
+nor snapshot size is specified, then the default is 4MiB for privileged users
+(or if /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for unprivileged users.
+If an unprivileged user does not specify mmap pages, the mmap pages will be
+reduced as described in the 'new auxtrace mmap size option' section below.
+
+The snapshot size is displayed if the option -vv is used e.g.
+
+	Intel PT snapshot size: %zu
+
+
+new auxtrace mmap size option
+---------------------------
+
+Intel PT buffer size is specified by an addition to the -m option e.g.
+
+	-m,16
+
+selects a buffer size of 16 pages i.e. 64KiB.
+
+Note that the existing functionality of -m is unchanged.  The auxtrace mmap size
+is specified by the optional addition of a comma and the value.
+
+The default auxtrace mmap size for Intel PT is 4MiB/page_size for privileged users
+(or if /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for unprivileged users.
+If an unprivileged user does not specify mmap pages, the mmap pages will be
+reduced from the default 512KiB/page_size to 256KiB/page_size, otherwise the
+user is likely to get an error as they exceed their mlock limit (Max locked
+memory as shown in /proc/self/limits).  Note that perf does not count the first
+512KiB (actually /proc/sys/kernel/perf_event_mlock_kb minus 1 page) per cpu
+against the mlock limit so an unprivileged user is allowed 512KiB per cpu plus
+their mlock limit (which defaults to 64KiB but is not multiplied by the number
+of cpus).
+
+In full-trace mode, powers of two are allowed for buffer size, with a minimum
+size of 2 pages.  In snapshot mode, it is the same but the minimum size is
+1 page.
+
+The mmap size and auxtrace mmap size are displayed if the -vv option is used e.g.
+
+	mmap length 528384
+	auxtrace mmap length 4198400
+
+
+Intel PT modes of operation
+---------------------------
+
+Intel PT can be used in 2 modes:
+	full-trace mode
+	snapshot mode
+
+Full-trace mode traces continuously e.g.
+
+	perf record -e intel_pt//u uname
+
+Snapshot mode captures the available data when a signal is sent e.g.
+
+	perf record -v -e intel_pt//u -S ./loopy 1000000000 &
+	[1] 11435
+	kill -USR2 11435
+	Recording AUX area tracing snapshot
+
+Note that the signal sent is SIGUSR2.
+Note that "Recording AUX area tracing snapshot" is displayed because the -v
+option is used.
+
+The 2 modes cannot be used together.
+
+
+Buffer handling
+---------------
+
+There may be buffer limitations (i.e. single ToPa entry) which means that actual
+buffer sizes are limited to powers of 2 up to 4MiB (MAX_ORDER).  In order to
+provide other sizes, and in particular an arbitrarily large size, multiple
+buffers are logically concatenated.  However an interrupt must be used to switch
+between buffers.  That has two potential problems:
+	a) the interrupt may not be handled in time so that the current buffer
+	becomes full and some trace data is lost.
+	b) the interrupts may slow the system and affect the performance
+	results.
+
+If trace data is lost, the driver sets 'truncated' in the PERF_RECORD_AUX event
+which the tools report as an error.
+
+In full-trace mode, the driver waits for data to be copied out before allowing
+the (logical) buffer to wrap-around.  If data is not copied out quickly enough,
+again 'truncated' is set in the PERF_RECORD_AUX event.  If the driver has to
+wait, the intel_pt event gets disabled.  Because it is difficult to know when
+that happens, perf tools always re-enable the intel_pt event after copying out
+data.
+
+
+Intel PT and build ids
+----------------------
+
+By default "perf record" post-processes the event stream to find all build ids
+for executables for all addresses sampled.  Deliberately, Intel PT is not
+decoded for that purpose (it would take too long).  Instead the build ids for
+all executables encountered (due to mmap, comm or task events) are included
+in the perf.data file.
+
+To see buildids included in the perf.data file use the command:
+
+	perf buildid-list
+
+If the perf.data file contains Intel PT data, that is the same as:
+
+	perf buildid-list --with-hits
+
+
+Snapshot mode and event disabling
+---------------------------------
+
+In order to make a snapshot, the intel_pt event is disabled using an IOCTL,
+namely PERF_EVENT_IOC_DISABLE.  However doing that can also disable the
+collection of side-band information.  In order to prevent that,  a dummy
+software event has been introduced that permits tracking events (like mmaps) to
+continue to be recorded while intel_pt is disabled.  That is important to ensure
+there is complete side-band information to allow the decoding of subsequent
+snapshots.
+
+A test has been created for that.  To find the test:
+
+	perf test list
+	...
+	23: Test using a dummy software event to keep tracking
+
+To run the test:
+
+	perf test 23
+	23: Test using a dummy software event to keep tracking     : Ok
+
+
+perf record modes (nothing new here)
+------------------------------------
+
+perf record essentially operates in one of three modes:
+	per thread
+	per cpu
+	workload only
+
+"per thread" mode is selected by -t or by --per-thread (with -p or -u or just a
+workload).
+"per cpu" is selected by -C or -a.
+"workload only" mode is selected by not using the other options but providing a
+command to run (i.e. the workload).
+
+In per-thread mode an exact list of threads is traced.  There is no inheritance.
+Each thread has its own event buffer.
+
+In per-cpu mode all processes (or processes from the selected cgroup i.e. -G
+option, or processes selected with -p or -u) are traced.  Each cpu has its own
+buffer. Inheritance is allowed.
+
+In workload-only mode, the workload is traced but with per-cpu buffers.
+Inheritance is allowed.  Note that you can now trace a workload in per-thread
+mode by using the --per-thread option.
+
+
+Privileged vs non-privileged users
+----------------------------------
+
+Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users
+have memory limits imposed upon them.  That affects what buffer sizes they can
+have as outlined above.
+
+Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users are
+not permitted to use tracepoints which means there is insufficient side-band
+information to decode Intel PT in per-cpu mode, and potentially workload-only
+mode too if the workload creates new processes.
+
+Note also, that to use tracepoints, read-access to debugfs is required.  So if
+debugfs is not mounted or the user does not have read-access, it will again not
+be possible to decode Intel PT in per-cpu mode.
+
+
+sched_switch tracepoint
+-----------------------
+
+The sched_switch tracepoint is used to provide side-band data for Intel PT
+decoding.  sched_switch events are automatically added. e.g. the second event
+shown below
+
+	$ perf record -vv -e intel_pt//u uname
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             6
+	size                             112
+	config                           0x400
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|CPU|IDENTIFIER
+	read_format                      ID
+	disabled                         1
+	inherit                          1
+	exclude_kernel                   1
+	exclude_hv                       1
+	enable_on_exec                   1
+	sample_id_all                    1
+	------------------------------------------------------------
+	sys_perf_event_open: pid 31104  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 3  group_fd -1  flags 0x8
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             2
+	size                             112
+	config                           0x108
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
+	read_format                      ID
+	inherit                          1
+	sample_id_all                    1
+	exclude_guest                    1
+	------------------------------------------------------------
+	sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             1
+	size                             112
+	config                           0x9
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|IDENTIFIER
+	read_format                      ID
+	disabled                         1
+	inherit                          1
+	exclude_kernel                   1
+	exclude_hv                       1
+	mmap                             1
+	comm                             1
+	enable_on_exec                   1
+	task                             1
+	sample_id_all                    1
+	mmap2                            1
+	comm_exec                        1
+	------------------------------------------------------------
+	sys_perf_event_open: pid 31104  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 3  group_fd -1  flags 0x8
+	mmap size 528384B
+	AUX area mmap length 4194304
+	perf event ring buffer mmapped per cpu
+	Synthesizing auxtrace information
+	Linux
+	[ perf record: Woken up 1 times to write data ]
+	[ perf record: Captured and wrote 0.042 MB perf.data ]
+
+Note, the sched_switch event is only added if the user is permitted to use it
+and only in per-cpu mode.
+
+Note also, the sched_switch event is only added if TSC packets are requested.
+That is because, in the absence of timing information, the sched_switch events
+cannot be matched against the Intel PT trace.
+
+
+perf script
+===========
+
+By default, perf script will decode trace data found in the perf.data file.
+This can be further controlled by new option --itrace.
+
+
+New --itrace option
+-------------------
+
+Having no option is the same as
+
+	--itrace
+
+which, in turn, is the same as
+
+	--itrace=ibxe
+
+The letters are:
+
+	i	synthesize "instructions" events
+	b	synthesize "branches" events
+	x	synthesize "transactions" events
+	c	synthesize branches events (calls only)
+	r	synthesize branches events (returns only)
+	e	synthesize tracing error events
+	d	create a debug log
+	g	synthesize a call chain (use with i or x)
+
+"Instructions" events look like they were recorded by "perf record -e
+instructions".
+
+"Branches" events look like they were recorded by "perf record -e branches". "c"
+and "r" can be combined to get calls and returns.
+
+"Transactions" events correspond to the start or end of transactions. The
+'flags' field can be used in perf script to determine whether the event is a
+tranasaction start, commit or abort.
+
+Error events are new.  They show where the decoder lost the trace.  Error events
+are quite important.  Users must know if what they are seeing is a complete
+picture or not.
+
+The "d" option will cause the creation of a file "intel_pt.log" containing all
+decoded packets and instructions.  Note that this option slows down the decoder
+and that the resulting file may be very large.
+
+In addition, the period of the "instructions" event can be specified. e.g.
+
+	--itrace=i10us
+
+sets the period to 10us i.e. one  instruction sample is synthesized for each 10
+microseconds of trace.  Alternatives to "us" are "ms" (milliseconds),
+"ns" (nanoseconds), "t" (TSC ticks) or "i" (instructions).
+
+"ms", "us" and "ns" are converted to TSC ticks.
+
+The timing information included with Intel PT does not give the time of every
+instruction.  Consequently, for the purpose of sampling, the decoder estimates
+the time since the last timing packet based on 1 tick per instruction.  The time
+on the sample is *not* adjusted and reflects the last known value of TSC.
+
+For Intel PT, the default period is 100us.
+
+Also the call chain size (default 16, max. 1024) for instructions or
+transactions events can be specified. e.g.
+
+	--itrace=ig32
+	--itrace=xg32
+
+To disable trace decoding entirely, use the option --no-itrace.
+
+
+dump option
+-----------
+
+perf script has an option (-D) to "dump" the events i.e. display the binary
+data.
+
+When -D is used, Intel PT packets are displayed.  The packet decoder does not
+pay attention to PSB packets, but just decodes the bytes - so the packets seen
+by the actual decoder may not be identical in places where the data is corrupt.
+One example of that would be when the buffer-switching interrupt has been too
+slow, and the buffer has been filled completely.  In that case, the last packet
+in the buffer might be truncated and immediately followed by a PSB as the trace
+continues in the next buffer.
+
+To disable the display of Intel PT packets, combine the -D option with
+--no-itrace.
+
+
+perf report
+===========
+
+By default, perf report will decode trace data found in the perf.data file.
+This can be further controlled by new option --itrace exactly the same as
+perf script, with the exception that the default is --itrace=igxe.
+
+
+perf inject
+===========
+
+perf inject also accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+	perf inject --itrace -i perf.data -o perf.data.new
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 139608878888..a8be9f9d0462 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -1,5 +1,6 @@
 libperf-y += header.o
 libperf-y += tsc.o
+libperf-y += pmu.o
 libperf-y += kvm-stat.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
@@ -7,4 +8,5 @@ libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_LIBUNWIND)          += unwind-libunwind.o
 libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 
+libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
new file mode 100644
index 000000000000..e7654b506312
--- /dev/null
+++ b/tools/perf/arch/x86/util/auxtrace.c
@@ -0,0 +1,38 @@
+/*
+ * auxtrace.c: AUX area tracing support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include "../../util/header.h"
+#include "../../util/auxtrace.h"
+#include "../../util/intel-pt.h"
+
+struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe_unused,
+					      int *err)
+{
+	char buffer[64];
+	int ret;
+
+	*err = 0;
+
+	ret = get_cpuid(buffer, sizeof(buffer));
+	if (ret) {
+		*err = ret;
+		return NULL;
+	}
+
+	if (!strncmp(buffer, "GenuineIntel,", 13))
+		return intel_pt_recording_init(err);
+
+	return NULL;
+}
diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
new file mode 100644
index 000000000000..fd11cc3ce780
--- /dev/null
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -0,0 +1,15 @@
+#include <string.h>
+
+#include <linux/perf_event.h>
+
+#include "../../util/intel-pt.h"
+#include "../../util/pmu.h"
+
+struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
+{
+#ifdef HAVE_AUXTRACE_SUPPORT
+	if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
+		return intel_pt_pmu_default_config(pmu);
+#endif
+	return NULL;
+}
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index ea29022a4f52..1b3cc297290d 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -47,6 +47,8 @@
 #include "debug.h"
 #include "parse-options.h"
 
+#include "intel-pt.h"
+
 int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
 			struct auxtrace_mmap_params *mp,
 			void *userpg, int fd)
@@ -876,7 +878,7 @@ static bool auxtrace__dont_decode(struct perf_session *session)
 
 int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 				      union perf_event *event,
-				      struct perf_session *session __maybe_unused)
+				      struct perf_session *session)
 {
 	enum auxtrace_type type = event->auxtrace_info.type;
 
@@ -885,6 +887,7 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 
 	switch (type) {
 	case PERF_AUXTRACE_INTEL_PT:
+		return intel_pt_process_auxtrace_info(event, session);
 	case PERF_AUXTRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 7bcb8c315615..b5ea26bc290d 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -462,8 +462,8 @@ static struct perf_pmu *pmu_lookup(const char *name)
 	LIST_HEAD(aliases);
 	__u32 type;
 
-	/* No support for intel_bts or intel_pt so disallow them */
-	if (!strcmp(name, "intel_bts") || !strcmp(name, "intel_pt"))
+	/* No support for intel_bts so disallow it */
+	if (!strcmp(name, "intel_bts"))
 		return NULL;
 
 	/*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 08/25] perf tools: Add Intel BTS support
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (6 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 07/25] perf tools: Take Intel PT into use Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-17 15:52   ` Arnaldo Carvalho de Melo
  2015-08-22  6:52   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 09/25] perf tools: Put itrace options into an asciidoc include Adrian Hunter
                   ` (16 subsequent siblings)
  24 siblings, 2 replies; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Intel BTS support fits within the new auxtrace infrastructure.
Recording is supporting by identifying the Intel BTS PMU, parsing
options and setting up events.

Decoding is supported by queuing up trace data by thread and then
decoding synchronously delivering synthesized event samples into the
session processing for tools to consume.

E.g:

  [root@zoo ~]# perf record --per-thread -e intel_bts// ls
  anaconda-ks.cfg  b  bin  lib64	libexec  new  old  perf.data
  perf.data.old  stream_test  tg.run
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 4.242 MB perf.data ]
  [root@zoo ~]# perf evlist
  intel_bts//
  dummy:u
  [root@zoo ~]# perf evlist -v
  intel_bts//: type: 7, size: 112, { sample_period, sample_freq }: 1,
  sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1,
  enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1 dummy:u:
  type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 1,
  sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1,
  exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, enable_on_exec: 1,
  task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1
  [root@zoo ~]# perf report --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 0  of event 'intel_bts//'
  # Event count (approx.): 0
  #
  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ......
  #

  # Samples: 0  of event 'dummy:u'
  # Event count (approx.): 0
  #
  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ......
  #

  # Samples: 184K of event 'branches'
  # Event count (approx.): 184522
  #
  # Overhead  Command  Shared Object       Symbol
  # ........  .......  ..................  ..............................................
  #
     7.36%  ls       [kernel.kallsyms]   [.] unmap_single_vma
     4.69%  ls       ld-2.20.so          [.] strcmp
     4.66%  ls       libc-2.20.so        [.] _dl_addr
     3.43%  ls       [kernel.kallsyms]   [.] change_protection
     3.01%  ls       ld-2.20.so          [.] do_lookup_x
     2.18%  ls       [kernel.kallsyms]   [.] filemap_map_pages
     2.09%  ls       ld-2.20.so          [.] _dl_name_match_p
     1.93%  ls       ld-2.20.so          [.] _dl_lookup_symbol_x
     1.91%  ls       [kernel.kallsyms]   [.] page_remove_rmap
     1.72%  ls       [kernel.kallsyms]   [.] do_set_pte
     1.48%  ls       ld-2.20.so          [.] _dl_relocate_object
     1.34%  ls       [kernel.kallsyms]   [.] mem_cgroup_begin_page_stat
     1.11%  ls       [kernel.kallsyms]   [.] page_add_file_rmap
     1.00%  ls       [kernel.kallsyms]   [.] mark_page_accessed
     0.97%  ls       [kernel.kallsyms]   [.] perf_event_aux
     0.95%  ls       [kernel.kallsyms]   [.] handle_mm_fault
     0.94%  ls       [kernel.kallsyms]   [.] get_page_from_freelist
  <SNIP>
  [root@zoo ~]# perf record --per-thread -e intel_bts//u ls
  <SNIP>
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 1.278 MB perf.data ]
  [root@zoo ~]# perf report --stdio
  <SNIP>
  # Samples: 55K of event 'branches:u'
  # Event count (approx.): 55165
  #
  # Overhead  Command  Shared Object       Symbol
  # ........  .......  ..................  ......................................
  #
    15.69%  ls       ld-2.20.so          [.] strcmp
    15.58%  ls       libc-2.20.so        [.] _dl_addr
    10.05%  ls       ld-2.20.so          [.] do_lookup_x
     6.98%  ls       ld-2.20.so          [.] _dl_name_match_p
     6.46%  ls       ld-2.20.so          [.] _dl_lookup_symbol_x
     4.95%  ls       ld-2.20.so          [.] _dl_relocate_object
     2.96%  ls       ls                  [.] quotearg_buffer_restyled
     2.78%  ls       libc-2.20.so        [.] getenv
     1.91%  ls       ld-2.20.so          [.] _dl_cache_libcmp
     1.76%  ls       libc-2.20.so        [.] __memmove_sse2
     1.75%  ls       ld-2.20.so          [.] check_match.isra.0
     1.47%  ls       ld-2.20.so          [.] _dl_map_object_deps
     1.27%  ls       ld-2.20.so          [.] _dl_map_object_from_fd
     1.17%  ls       ls                  [.] quote_name
     1.16%  ls       ld-2.20.so          [.] _dl_check_map_versions
   <SNIP>
     0.19%  ls       [kernel.kallsyms]   [.] entry_SYSCALL_64_fastpath
     0.19%  ls       [kernel.kallsyms]   [.] native_irq_return_iret
     0.19%  ls       libc-2.20.so        [.] _IO_file_xsputn@@GLIBC_2.2.5
   <SNIP>

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/intel-bts.txt |  86 +++
 tools/perf/arch/x86/util/Build         |   1 +
 tools/perf/arch/x86/util/auxtrace.c    |  49 +-
 tools/perf/arch/x86/util/intel-bts.c   | 458 ++++++++++++++++
 tools/perf/arch/x86/util/pmu.c         |   3 +
 tools/perf/util/Build                  |   1 +
 tools/perf/util/auxtrace.c             |   3 +
 tools/perf/util/auxtrace.h             |   1 +
 tools/perf/util/intel-bts.c            | 933 +++++++++++++++++++++++++++++++++
 tools/perf/util/intel-bts.h            |  43 ++
 tools/perf/util/pmu.c                  |   4 -
 11 files changed, 1576 insertions(+), 6 deletions(-)
 create mode 100644 tools/perf/Documentation/intel-bts.txt
 create mode 100644 tools/perf/arch/x86/util/intel-bts.c
 create mode 100644 tools/perf/util/intel-bts.c
 create mode 100644 tools/perf/util/intel-bts.h

diff --git a/tools/perf/Documentation/intel-bts.txt b/tools/perf/Documentation/intel-bts.txt
new file mode 100644
index 000000000000..8bdc93bd7fdb
--- /dev/null
+++ b/tools/perf/Documentation/intel-bts.txt
@@ -0,0 +1,86 @@
+Intel Branch Trace Store
+========================
+
+Overview
+========
+
+Intel BTS could be regarded as a predecessor to Intel PT and has some
+similarities because it can also identify every branch a program takes.  A
+notable difference is that Intel BTS has no timing information and as a
+consequence the present implementation is limited to per-thread recording.
+
+While decoding Intel BTS does not require walking the object code, the object
+code is still needed to pair up calls and returns correctly, consequently much
+of the Intel PT documentation applies also to Intel BTS.  Refer to the Intel PT
+documentation and consider that the PMU 'intel_bts' can usually be used in
+place of 'intel_pt' in the examples provided, with the proviso that per-thread
+recording must also be stipulated i.e. the --per-thread option for
+'perf record'.
+
+
+perf record
+===========
+
+new event
+---------
+
+The Intel BTS kernel driver creates a new PMU for Intel BTS.  The perf record
+option is:
+
+	-e intel_bts//
+
+Currently Intel BTS is limited to per-thread tracing so the --per-thread option
+is also needed.
+
+
+snapshot option
+---------------
+
+The snapshot option is the same as Intel PT (refer Intel PT documentation).
+
+
+auxtrace mmap size option
+-----------------------
+
+The mmap size option is the same as Intel PT (refer Intel PT documentation).
+
+
+perf script
+===========
+
+By default, perf script will decode trace data found in the perf.data file.
+This can be further controlled by option --itrace.  The --itrace option is
+the same as Intel PT (refer Intel PT documentation) except that neither
+"instructions" events nor "transactions" events (and consequently call
+chains) are supported.
+
+To disable trace decoding entirely, use the option --no-itrace.
+
+
+dump option
+-----------
+
+perf script has an option (-D) to "dump" the events i.e. display the binary
+data.
+
+When -D is used, Intel BTS packets are displayed.
+
+To disable the display of Intel BTS packets, combine the -D option with
+--no-itrace.
+
+
+perf report
+===========
+
+By default, perf report will decode trace data found in the perf.data file.
+This can be further controlled by new option --itrace exactly the same as
+perf script.
+
+
+perf inject
+===========
+
+perf inject also accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+	perf inject --itrace -i perf.data -o perf.data.new
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index a8be9f9d0462..2c55e1b336c5 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -10,3 +10,4 @@ libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
+libperf-$(CONFIG_AUXTRACE) += intel-bts.o
diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
index e7654b506312..7a7805583e3f 100644
--- a/tools/perf/arch/x86/util/auxtrace.c
+++ b/tools/perf/arch/x86/util/auxtrace.c
@@ -13,11 +13,56 @@
  *
  */
 
+#include <stdbool.h>
+
 #include "../../util/header.h"
+#include "../../util/debug.h"
+#include "../../util/pmu.h"
 #include "../../util/auxtrace.h"
 #include "../../util/intel-pt.h"
+#include "../../util/intel-bts.h"
+#include "../../util/evlist.h"
+
+static
+struct auxtrace_record *auxtrace_record__init_intel(struct perf_evlist *evlist,
+						    int *err)
+{
+	struct perf_pmu *intel_pt_pmu;
+	struct perf_pmu *intel_bts_pmu;
+	struct perf_evsel *evsel;
+	bool found_pt = false;
+	bool found_bts = false;
+
+	intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
+	intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
+
+	if (evlist) {
+		evlist__for_each(evlist, evsel) {
+			if (intel_pt_pmu &&
+			    evsel->attr.type == intel_pt_pmu->type)
+				found_pt = true;
+			if (intel_bts_pmu &&
+			    evsel->attr.type == intel_bts_pmu->type)
+				found_bts = true;
+		}
+	}
+
+	if (found_pt && found_bts) {
+		pr_err("intel_pt and intel_bts may not be used together\n");
+		*err = -EINVAL;
+		return NULL;
+	}
+
+	if (found_pt)
+		return intel_pt_recording_init(err);
+
+	if (found_bts)
+		return intel_bts_recording_init(err);
 
-struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe_unused,
+	return NULL;
+}
+
+struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist,
 					      int *err)
 {
 	char buffer[64];
@@ -32,7 +77,7 @@ struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe
 	}
 
 	if (!strncmp(buffer, "GenuineIntel,", 13))
-		return intel_pt_recording_init(err);
+		return auxtrace_record__init_intel(evlist, err);
 
 	return NULL;
 }
diff --git a/tools/perf/arch/x86/util/intel-bts.c b/tools/perf/arch/x86/util/intel-bts.c
new file mode 100644
index 000000000000..9b94ce520917
--- /dev/null
+++ b/tools/perf/arch/x86/util/intel-bts.c
@@ -0,0 +1,458 @@
+/*
+ * intel-bts.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "../../util/cpumap.h"
+#include "../../util/evsel.h"
+#include "../../util/evlist.h"
+#include "../../util/session.h"
+#include "../../util/util.h"
+#include "../../util/pmu.h"
+#include "../../util/debug.h"
+#include "../../util/tsc.h"
+#include "../../util/auxtrace.h"
+#include "../../util/intel-bts.h"
+
+#define KiB(x) ((x) * 1024)
+#define MiB(x) ((x) * 1024 * 1024)
+#define KiB_MASK(x) (KiB(x) - 1)
+#define MiB_MASK(x) (MiB(x) - 1)
+
+#define INTEL_BTS_DFLT_SAMPLE_SIZE	KiB(4)
+
+#define INTEL_BTS_MAX_SAMPLE_SIZE	KiB(60)
+
+struct intel_bts_snapshot_ref {
+	void	*ref_buf;
+	size_t	ref_offset;
+	bool	wrapped;
+};
+
+struct intel_bts_recording {
+	struct auxtrace_record		itr;
+	struct perf_pmu			*intel_bts_pmu;
+	struct perf_evlist		*evlist;
+	bool				snapshot_mode;
+	size_t				snapshot_size;
+	int				snapshot_ref_cnt;
+	struct intel_bts_snapshot_ref	*snapshot_refs;
+};
+
+struct branch {
+	u64 from;
+	u64 to;
+	u64 misc;
+};
+
+static size_t intel_bts_info_priv_size(struct auxtrace_record *itr __maybe_unused)
+{
+	return INTEL_BTS_AUXTRACE_PRIV_SIZE;
+}
+
+static int intel_bts_info_fill(struct auxtrace_record *itr,
+			       struct perf_session *session,
+			       struct auxtrace_info_event *auxtrace_info,
+			       size_t priv_size)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
+	struct perf_event_mmap_page *pc;
+	struct perf_tsc_conversion tc = { .time_mult = 0, };
+	bool cap_user_time_zero = false;
+	int err;
+
+	if (priv_size != INTEL_BTS_AUXTRACE_PRIV_SIZE)
+		return -EINVAL;
+
+	if (!session->evlist->nr_mmaps)
+		return -EINVAL;
+
+	pc = session->evlist->mmap[0].base;
+	if (pc) {
+		err = perf_read_tsc_conversion(pc, &tc);
+		if (err) {
+			if (err != -EOPNOTSUPP)
+				return err;
+		} else {
+			cap_user_time_zero = tc.time_mult != 0;
+		}
+		if (!cap_user_time_zero)
+			ui__warning("Intel BTS: TSC not available\n");
+	}
+
+	auxtrace_info->type = PERF_AUXTRACE_INTEL_BTS;
+	auxtrace_info->priv[INTEL_BTS_PMU_TYPE] = intel_bts_pmu->type;
+	auxtrace_info->priv[INTEL_BTS_TIME_SHIFT] = tc.time_shift;
+	auxtrace_info->priv[INTEL_BTS_TIME_MULT] = tc.time_mult;
+	auxtrace_info->priv[INTEL_BTS_TIME_ZERO] = tc.time_zero;
+	auxtrace_info->priv[INTEL_BTS_CAP_USER_TIME_ZERO] = cap_user_time_zero;
+	auxtrace_info->priv[INTEL_BTS_SNAPSHOT_MODE] = btsr->snapshot_mode;
+
+	return 0;
+}
+
+static int intel_bts_recording_options(struct auxtrace_record *itr,
+				       struct perf_evlist *evlist,
+				       struct record_opts *opts)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
+	struct perf_evsel *evsel, *intel_bts_evsel = NULL;
+	const struct cpu_map *cpus = evlist->cpus;
+	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
+
+	btsr->evlist = evlist;
+	btsr->snapshot_mode = opts->auxtrace_snapshot_mode;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == intel_bts_pmu->type) {
+			if (intel_bts_evsel) {
+				pr_err("There may be only one " INTEL_BTS_PMU_NAME " event\n");
+				return -EINVAL;
+			}
+			evsel->attr.freq = 0;
+			evsel->attr.sample_period = 1;
+			intel_bts_evsel = evsel;
+			opts->full_auxtrace = true;
+		}
+	}
+
+	if (opts->auxtrace_snapshot_mode && !opts->full_auxtrace) {
+		pr_err("Snapshot mode (-S option) requires " INTEL_BTS_PMU_NAME " PMU event (-e " INTEL_BTS_PMU_NAME ")\n");
+		return -EINVAL;
+	}
+
+	if (!opts->full_auxtrace)
+		return 0;
+
+	if (opts->full_auxtrace && !cpu_map__empty(cpus)) {
+		pr_err(INTEL_BTS_PMU_NAME " does not support per-cpu recording\n");
+		return -EINVAL;
+	}
+
+	/* Set default sizes for snapshot mode */
+	if (opts->auxtrace_snapshot_mode) {
+		if (!opts->auxtrace_snapshot_size && !opts->auxtrace_mmap_pages) {
+			if (privileged) {
+				opts->auxtrace_mmap_pages = MiB(4) / page_size;
+			} else {
+				opts->auxtrace_mmap_pages = KiB(128) / page_size;
+				if (opts->mmap_pages == UINT_MAX)
+					opts->mmap_pages = KiB(256) / page_size;
+			}
+		} else if (!opts->auxtrace_mmap_pages && !privileged &&
+			   opts->mmap_pages == UINT_MAX) {
+			opts->mmap_pages = KiB(256) / page_size;
+		}
+		if (!opts->auxtrace_snapshot_size)
+			opts->auxtrace_snapshot_size =
+				opts->auxtrace_mmap_pages * (size_t)page_size;
+		if (!opts->auxtrace_mmap_pages) {
+			size_t sz = opts->auxtrace_snapshot_size;
+
+			sz = round_up(sz, page_size) / page_size;
+			opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
+		}
+		if (opts->auxtrace_snapshot_size >
+				opts->auxtrace_mmap_pages * (size_t)page_size) {
+			pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
+			       opts->auxtrace_snapshot_size,
+			       opts->auxtrace_mmap_pages * (size_t)page_size);
+			return -EINVAL;
+		}
+		if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages) {
+			pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
+			return -EINVAL;
+		}
+		pr_debug2("Intel BTS snapshot size: %zu\n",
+			  opts->auxtrace_snapshot_size);
+	}
+
+	/* Set default sizes for full trace mode */
+	if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) {
+		if (privileged) {
+			opts->auxtrace_mmap_pages = MiB(4) / page_size;
+		} else {
+			opts->auxtrace_mmap_pages = KiB(128) / page_size;
+			if (opts->mmap_pages == UINT_MAX)
+				opts->mmap_pages = KiB(256) / page_size;
+		}
+	}
+
+	/* Validate auxtrace_mmap_pages */
+	if (opts->auxtrace_mmap_pages) {
+		size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
+		size_t min_sz;
+
+		if (opts->auxtrace_snapshot_mode)
+			min_sz = KiB(4);
+		else
+			min_sz = KiB(8);
+
+		if (sz < min_sz || !is_power_of_2(sz)) {
+			pr_err("Invalid mmap size for Intel BTS: must be at least %zuKiB and a power of 2\n",
+			       min_sz / 1024);
+			return -EINVAL;
+		}
+	}
+
+	if (intel_bts_evsel) {
+		/*
+		 * To obtain the auxtrace buffer file descriptor, the auxtrace event
+		 * must come first.
+		 */
+		perf_evlist__to_front(evlist, intel_bts_evsel);
+		/*
+		 * In the case of per-cpu mmaps, we need the CPU on the
+		 * AUX event.
+		 */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(intel_bts_evsel, CPU);
+	}
+
+	/* Add dummy event to keep tracking */
+	if (opts->full_auxtrace) {
+		struct perf_evsel *tracking_evsel;
+		int err;
+
+		err = parse_events(evlist, "dummy:u", NULL);
+		if (err)
+			return err;
+
+		tracking_evsel = perf_evlist__last(evlist);
+
+		perf_evlist__set_tracking_event(evlist, tracking_evsel);
+
+		tracking_evsel->attr.freq = 0;
+		tracking_evsel->attr.sample_period = 1;
+	}
+
+	return 0;
+}
+
+static int intel_bts_parse_snapshot_options(struct auxtrace_record *itr,
+					    struct record_opts *opts,
+					    const char *str)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	unsigned long long snapshot_size = 0;
+	char *endptr;
+
+	if (str) {
+		snapshot_size = strtoull(str, &endptr, 0);
+		if (*endptr || snapshot_size > SIZE_MAX)
+			return -1;
+	}
+
+	opts->auxtrace_snapshot_mode = true;
+	opts->auxtrace_snapshot_size = snapshot_size;
+
+	btsr->snapshot_size = snapshot_size;
+
+	return 0;
+}
+
+static u64 intel_bts_reference(struct auxtrace_record *itr __maybe_unused)
+{
+	return rdtsc();
+}
+
+static int intel_bts_alloc_snapshot_refs(struct intel_bts_recording *btsr,
+					 int idx)
+{
+	const size_t sz = sizeof(struct intel_bts_snapshot_ref);
+	int cnt = btsr->snapshot_ref_cnt, new_cnt = cnt * 2;
+	struct intel_bts_snapshot_ref *refs;
+
+	if (!new_cnt)
+		new_cnt = 16;
+
+	while (new_cnt <= idx)
+		new_cnt *= 2;
+
+	refs = calloc(new_cnt, sz);
+	if (!refs)
+		return -ENOMEM;
+
+	memcpy(refs, btsr->snapshot_refs, cnt * sz);
+
+	btsr->snapshot_refs = refs;
+	btsr->snapshot_ref_cnt = new_cnt;
+
+	return 0;
+}
+
+static void intel_bts_free_snapshot_refs(struct intel_bts_recording *btsr)
+{
+	int i;
+
+	for (i = 0; i < btsr->snapshot_ref_cnt; i++)
+		zfree(&btsr->snapshot_refs[i].ref_buf);
+	zfree(&btsr->snapshot_refs);
+}
+
+static void intel_bts_recording_free(struct auxtrace_record *itr)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+
+	intel_bts_free_snapshot_refs(btsr);
+	free(btsr);
+}
+
+static int intel_bts_snapshot_start(struct auxtrace_record *itr)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(btsr->evlist, evsel) {
+		if (evsel->attr.type == btsr->intel_bts_pmu->type)
+			return perf_evlist__disable_event(btsr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_bts_snapshot_finish(struct auxtrace_record *itr)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(btsr->evlist, evsel) {
+		if (evsel->attr.type == btsr->intel_bts_pmu->type)
+			return perf_evlist__enable_event(btsr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static bool intel_bts_first_wrap(u64 *data, size_t buf_size)
+{
+	int i, a, b;
+
+	b = buf_size >> 3;
+	a = b - 512;
+	if (a < 0)
+		a = 0;
+
+	for (i = a; i < b; i++) {
+		if (data[i])
+			return true;
+	}
+
+	return false;
+}
+
+static int intel_bts_find_snapshot(struct auxtrace_record *itr, int idx,
+				   struct auxtrace_mmap *mm, unsigned char *data,
+				   u64 *head, u64 *old)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	bool wrapped;
+	int err;
+
+	pr_debug3("%s: mmap index %d old head %zu new head %zu\n",
+		  __func__, idx, (size_t)*old, (size_t)*head);
+
+	if (idx >= btsr->snapshot_ref_cnt) {
+		err = intel_bts_alloc_snapshot_refs(btsr, idx);
+		if (err)
+			goto out_err;
+	}
+
+	wrapped = btsr->snapshot_refs[idx].wrapped;
+	if (!wrapped && intel_bts_first_wrap((u64 *)data, mm->len)) {
+		btsr->snapshot_refs[idx].wrapped = true;
+		wrapped = true;
+	}
+
+	/*
+	 * In full trace mode 'head' continually increases.  However in snapshot
+	 * mode 'head' is an offset within the buffer.  Here 'old' and 'head'
+	 * are adjusted to match the full trace case which expects that 'old' is
+	 * always less than 'head'.
+	 */
+	if (wrapped) {
+		*old = *head;
+		*head += mm->len;
+	} else {
+		if (mm->mask)
+			*old &= mm->mask;
+		else
+			*old %= mm->len;
+		if (*old > *head)
+			*head += mm->len;
+	}
+
+	pr_debug3("%s: wrap-around %sdetected, adjusted old head %zu adjusted new head %zu\n",
+		  __func__, wrapped ? "" : "not ", (size_t)*old, (size_t)*head);
+
+	return 0;
+
+out_err:
+	pr_err("%s: failed, error %d\n", __func__, err);
+	return err;
+}
+
+static int intel_bts_read_finish(struct auxtrace_record *itr, int idx)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(btsr->evlist, evsel) {
+		if (evsel->attr.type == btsr->intel_bts_pmu->type)
+			return perf_evlist__enable_event_idx(btsr->evlist,
+							     evsel, idx);
+	}
+	return -EINVAL;
+}
+
+struct auxtrace_record *intel_bts_recording_init(int *err)
+{
+	struct perf_pmu *intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
+	struct intel_bts_recording *btsr;
+
+	if (!intel_bts_pmu)
+		return NULL;
+
+	btsr = zalloc(sizeof(struct intel_bts_recording));
+	if (!btsr) {
+		*err = -ENOMEM;
+		return NULL;
+	}
+
+	btsr->intel_bts_pmu = intel_bts_pmu;
+	btsr->itr.recording_options = intel_bts_recording_options;
+	btsr->itr.info_priv_size = intel_bts_info_priv_size;
+	btsr->itr.info_fill = intel_bts_info_fill;
+	btsr->itr.free = intel_bts_recording_free;
+	btsr->itr.snapshot_start = intel_bts_snapshot_start;
+	btsr->itr.snapshot_finish = intel_bts_snapshot_finish;
+	btsr->itr.find_snapshot = intel_bts_find_snapshot;
+	btsr->itr.parse_snapshot_options = intel_bts_parse_snapshot_options;
+	btsr->itr.reference = intel_bts_reference;
+	btsr->itr.read_finish = intel_bts_read_finish;
+	btsr->itr.alignment = sizeof(struct branch);
+	return &btsr->itr;
+}
diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
index fd11cc3ce780..79fe07158d00 100644
--- a/tools/perf/arch/x86/util/pmu.c
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -3,6 +3,7 @@
 #include <linux/perf_event.h>
 
 #include "../../util/intel-pt.h"
+#include "../../util/intel-bts.h"
 #include "../../util/pmu.h"
 
 struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
@@ -10,6 +11,8 @@ struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __mayb
 #ifdef HAVE_AUXTRACE_SUPPORT
 	if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
 		return intel_pt_pmu_default_config(pmu);
+	if (!strcmp(pmu->name, INTEL_BTS_PMU_NAME))
+		pmu->selectable = true;
 #endif
 	return NULL;
 }
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 7f5f2b6aff19..dc29c58cb300 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -78,6 +78,7 @@ libperf-y += thread-stack.o
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
+libperf-$(CONFIG_AUXTRACE) += intel-bts.o
 libperf-y += parse-branch-options.o
 
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 1b3cc297290d..3f3b40fa87c8 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -48,6 +48,7 @@
 #include "parse-options.h"
 
 #include "intel-pt.h"
+#include "intel-bts.h"
 
 int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
 			struct auxtrace_mmap_params *mp,
@@ -888,6 +889,8 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 	switch (type) {
 	case PERF_AUXTRACE_INTEL_PT:
 		return intel_pt_process_auxtrace_info(event, session);
+	case PERF_AUXTRACE_INTEL_BTS:
+		return intel_bts_process_auxtrace_info(event, session);
 	case PERF_AUXTRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 7d12f33a3a06..bf72b77a588a 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -40,6 +40,7 @@ struct events_stats;
 enum auxtrace_type {
 	PERF_AUXTRACE_UNKNOWN,
 	PERF_AUXTRACE_INTEL_PT,
+	PERF_AUXTRACE_INTEL_BTS,
 };
 
 enum itrace_period_type {
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
new file mode 100644
index 000000000000..dce99cfb1309
--- /dev/null
+++ b/tools/perf/util/intel-bts.c
@@ -0,0 +1,933 @@
+/*
+ * intel-bts.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <endian.h>
+#include <byteswap.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "cpumap.h"
+#include "color.h"
+#include "evsel.h"
+#include "evlist.h"
+#include "machine.h"
+#include "session.h"
+#include "util.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "debug.h"
+#include "tsc.h"
+#include "auxtrace.h"
+#include "intel-pt-decoder/intel-pt-insn-decoder.h"
+#include "intel-bts.h"
+
+#define MAX_TIMESTAMP (~0ULL)
+
+#define INTEL_BTS_ERR_NOINSN  5
+#define INTEL_BTS_ERR_LOST    9
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define le64_to_cpu bswap_64
+#else
+#define le64_to_cpu
+#endif
+
+struct intel_bts {
+	struct auxtrace			auxtrace;
+	struct auxtrace_queues		queues;
+	struct auxtrace_heap		heap;
+	u32				auxtrace_type;
+	struct perf_session		*session;
+	struct machine			*machine;
+	bool				sampling_mode;
+	bool				snapshot_mode;
+	bool				data_queued;
+	u32				pmu_type;
+	struct perf_tsc_conversion	tc;
+	bool				cap_user_time_zero;
+	struct itrace_synth_opts	synth_opts;
+	bool				sample_branches;
+	u32				branches_filter;
+	u64				branches_sample_type;
+	u64				branches_id;
+	size_t				branches_event_size;
+	bool				synth_needs_swap;
+};
+
+struct intel_bts_queue {
+	struct intel_bts	*bts;
+	unsigned int		queue_nr;
+	struct auxtrace_buffer	*buffer;
+	bool			on_heap;
+	bool			done;
+	pid_t			pid;
+	pid_t			tid;
+	int			cpu;
+	u64			time;
+	struct intel_pt_insn	intel_pt_insn;
+	u32			sample_flags;
+};
+
+struct branch {
+	u64 from;
+	u64 to;
+	u64 misc;
+};
+
+static void intel_bts_dump(struct intel_bts *bts __maybe_unused,
+			   unsigned char *buf, size_t len)
+{
+	struct branch *branch;
+	size_t i, pos = 0, br_sz = sizeof(struct branch), sz;
+	const char *color = PERF_COLOR_BLUE;
+
+	color_fprintf(stdout, color,
+		      ". ... Intel BTS data: size %zu bytes\n",
+		      len);
+
+	while (len) {
+		if (len >= br_sz)
+			sz = br_sz;
+		else
+			sz = len;
+		printf(".");
+		color_fprintf(stdout, color, "  %08x: ", pos);
+		for (i = 0; i < sz; i++)
+			color_fprintf(stdout, color, " %02x", buf[i]);
+		for (; i < br_sz; i++)
+			color_fprintf(stdout, color, "   ");
+		if (len >= br_sz) {
+			branch = (struct branch *)buf;
+			color_fprintf(stdout, color, " %"PRIx64" -> %"PRIx64" %s\n",
+				      le64_to_cpu(branch->from),
+				      le64_to_cpu(branch->to),
+				      le64_to_cpu(branch->misc) & 0x10 ?
+							"pred" : "miss");
+		} else {
+			color_fprintf(stdout, color, " Bad record!\n");
+		}
+		pos += sz;
+		buf += sz;
+		len -= sz;
+	}
+}
+
+static void intel_bts_dump_event(struct intel_bts *bts, unsigned char *buf,
+				 size_t len)
+{
+	printf(".\n");
+	intel_bts_dump(bts, buf, len);
+}
+
+static int intel_bts_lost(struct intel_bts *bts, struct perf_sample *sample)
+{
+	union perf_event event;
+	int err;
+
+	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
+			     INTEL_BTS_ERR_LOST, sample->cpu, sample->pid,
+			     sample->tid, 0, "Lost trace data");
+
+	err = perf_session__deliver_synth_event(bts->session, &event, NULL);
+	if (err)
+		pr_err("Intel BTS: failed to deliver error event, error %d\n",
+		       err);
+
+	return err;
+}
+
+static struct intel_bts_queue *intel_bts_alloc_queue(struct intel_bts *bts,
+						     unsigned int queue_nr)
+{
+	struct intel_bts_queue *btsq;
+
+	btsq = zalloc(sizeof(struct intel_bts_queue));
+	if (!btsq)
+		return NULL;
+
+	btsq->bts = bts;
+	btsq->queue_nr = queue_nr;
+	btsq->pid = -1;
+	btsq->tid = -1;
+	btsq->cpu = -1;
+
+	return btsq;
+}
+
+static int intel_bts_setup_queue(struct intel_bts *bts,
+				 struct auxtrace_queue *queue,
+				 unsigned int queue_nr)
+{
+	struct intel_bts_queue *btsq = queue->priv;
+
+	if (list_empty(&queue->head))
+		return 0;
+
+	if (!btsq) {
+		btsq = intel_bts_alloc_queue(bts, queue_nr);
+		if (!btsq)
+			return -ENOMEM;
+		queue->priv = btsq;
+
+		if (queue->cpu != -1)
+			btsq->cpu = queue->cpu;
+		btsq->tid = queue->tid;
+	}
+
+	if (bts->sampling_mode)
+		return 0;
+
+	if (!btsq->on_heap && !btsq->buffer) {
+		int ret;
+
+		btsq->buffer = auxtrace_buffer__next(queue, NULL);
+		if (!btsq->buffer)
+			return 0;
+
+		ret = auxtrace_heap__add(&bts->heap, queue_nr,
+					 btsq->buffer->reference);
+		if (ret)
+			return ret;
+		btsq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int intel_bts_setup_queues(struct intel_bts *bts)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < bts->queues.nr_queues; i++) {
+		ret = intel_bts_setup_queue(bts, &bts->queues.queue_array[i],
+					    i);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static inline int intel_bts_update_queues(struct intel_bts *bts)
+{
+	if (bts->queues.new_data) {
+		bts->queues.new_data = false;
+		return intel_bts_setup_queues(bts);
+	}
+	return 0;
+}
+
+static unsigned char *intel_bts_find_overlap(unsigned char *buf_a, size_t len_a,
+					     unsigned char *buf_b, size_t len_b)
+{
+	size_t offs, len;
+
+	if (len_a > len_b)
+		offs = len_a - len_b;
+	else
+		offs = 0;
+
+	for (; offs < len_a; offs += sizeof(struct branch)) {
+		len = len_a - offs;
+		if (!memcmp(buf_a + offs, buf_b, len))
+			return buf_b + len;
+	}
+
+	return buf_b;
+}
+
+static int intel_bts_do_fix_overlap(struct auxtrace_queue *queue,
+				    struct auxtrace_buffer *b)
+{
+	struct auxtrace_buffer *a;
+	void *start;
+
+	if (b->list.prev == &queue->head)
+		return 0;
+	a = list_entry(b->list.prev, struct auxtrace_buffer, list);
+	start = intel_bts_find_overlap(a->data, a->size, b->data, b->size);
+	if (!start)
+		return -EINVAL;
+	b->use_size = b->data + b->size - start;
+	b->use_data = start;
+	return 0;
+}
+
+static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
+					 struct branch *branch)
+{
+	int ret;
+	struct intel_bts *bts = btsq->bts;
+	union perf_event event;
+	struct perf_sample sample = { .ip = 0, };
+
+	event.sample.header.type = PERF_RECORD_SAMPLE;
+	event.sample.header.misc = PERF_RECORD_MISC_USER;
+	event.sample.header.size = sizeof(struct perf_event_header);
+
+	sample.ip = le64_to_cpu(branch->from);
+	sample.pid = btsq->pid;
+	sample.tid = btsq->tid;
+	sample.addr = le64_to_cpu(branch->to);
+	sample.id = btsq->bts->branches_id;
+	sample.stream_id = btsq->bts->branches_id;
+	sample.period = 1;
+	sample.cpu = btsq->cpu;
+	sample.flags = btsq->sample_flags;
+	sample.insn_len = btsq->intel_pt_insn.length;
+
+	if (bts->synth_opts.inject) {
+		event.sample.header.size = bts->branches_event_size;
+		ret = perf_event__synthesize_sample(&event,
+						    bts->branches_sample_type,
+						    0, &sample,
+						    bts->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(bts->session, &event, &sample);
+	if (ret)
+		pr_err("Intel BTS: failed to deliver branch event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_bts_get_next_insn(struct intel_bts_queue *btsq, u64 ip)
+{
+	struct machine *machine = btsq->bts->machine;
+	struct thread *thread;
+	struct addr_location al;
+	unsigned char buf[1024];
+	size_t bufsz;
+	ssize_t len;
+	int x86_64;
+	uint8_t cpumode;
+	int err = -1;
+
+	bufsz = intel_pt_insn_max_size();
+
+	if (machine__kernel_ip(machine, ip))
+		cpumode = PERF_RECORD_MISC_KERNEL;
+	else
+		cpumode = PERF_RECORD_MISC_USER;
+
+	thread = machine__find_thread(machine, -1, btsq->tid);
+	if (!thread)
+		return -1;
+
+	thread__find_addr_map(thread, cpumode, MAP__FUNCTION, ip, &al);
+	if (!al.map || !al.map->dso)
+		goto out_put;
+
+	len = dso__data_read_addr(al.map->dso, al.map, machine, ip, buf, bufsz);
+	if (len <= 0)
+		goto out_put;
+
+	/* Load maps to ensure dso->is_64_bit has been updated */
+	map__load(al.map, machine->symbol_filter);
+
+	x86_64 = al.map->dso->is_64_bit;
+
+	if (intel_pt_get_insn(buf, len, x86_64, &btsq->intel_pt_insn))
+		goto out_put;
+
+	err = 0;
+out_put:
+	thread__put(thread);
+	return err;
+}
+
+static int intel_bts_synth_error(struct intel_bts *bts, int cpu, pid_t pid,
+				 pid_t tid, u64 ip)
+{
+	union perf_event event;
+	int err;
+
+	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
+			     INTEL_BTS_ERR_NOINSN, cpu, pid, tid, ip,
+			     "Failed to get instruction");
+
+	err = perf_session__deliver_synth_event(bts->session, &event, NULL);
+	if (err)
+		pr_err("Intel BTS: failed to deliver error event, error %d\n",
+		       err);
+
+	return err;
+}
+
+static int intel_bts_get_branch_type(struct intel_bts_queue *btsq,
+				     struct branch *branch)
+{
+	int err;
+
+	if (!branch->from) {
+		if (branch->to)
+			btsq->sample_flags = PERF_IP_FLAG_BRANCH |
+					     PERF_IP_FLAG_TRACE_BEGIN;
+		else
+			btsq->sample_flags = 0;
+		btsq->intel_pt_insn.length = 0;
+	} else if (!branch->to) {
+		btsq->sample_flags = PERF_IP_FLAG_BRANCH |
+				     PERF_IP_FLAG_TRACE_END;
+		btsq->intel_pt_insn.length = 0;
+	} else {
+		err = intel_bts_get_next_insn(btsq, branch->from);
+		if (err) {
+			btsq->sample_flags = 0;
+			btsq->intel_pt_insn.length = 0;
+			if (!btsq->bts->synth_opts.errors)
+				return 0;
+			err = intel_bts_synth_error(btsq->bts, btsq->cpu,
+						    btsq->pid, btsq->tid,
+						    branch->from);
+			return err;
+		}
+		btsq->sample_flags = intel_pt_insn_type(btsq->intel_pt_insn.op);
+		/* Check for an async branch into the kernel */
+		if (!machine__kernel_ip(btsq->bts->machine, branch->from) &&
+		    machine__kernel_ip(btsq->bts->machine, branch->to) &&
+		    btsq->sample_flags != (PERF_IP_FLAG_BRANCH |
+					   PERF_IP_FLAG_CALL |
+					   PERF_IP_FLAG_SYSCALLRET))
+			btsq->sample_flags = PERF_IP_FLAG_BRANCH |
+					     PERF_IP_FLAG_CALL |
+					     PERF_IP_FLAG_ASYNC |
+					     PERF_IP_FLAG_INTERRUPT;
+	}
+
+	return 0;
+}
+
+static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
+				    struct auxtrace_buffer *buffer)
+{
+	struct branch *branch;
+	size_t sz, bsz = sizeof(struct branch);
+	u32 filter = btsq->bts->branches_filter;
+	int err = 0;
+
+	if (buffer->use_data) {
+		sz = buffer->use_size;
+		branch = buffer->use_data;
+	} else {
+		sz = buffer->size;
+		branch = buffer->data;
+	}
+
+	if (!btsq->bts->sample_branches)
+		return 0;
+
+	for (; sz > bsz; branch += 1, sz -= bsz) {
+		if (!branch->from && !branch->to)
+			continue;
+		intel_bts_get_branch_type(btsq, branch);
+		if (filter && !(filter & btsq->sample_flags))
+			continue;
+		err = intel_bts_synth_branch_sample(btsq, branch);
+		if (err)
+			break;
+	}
+	return err;
+}
+
+static int intel_bts_process_queue(struct intel_bts_queue *btsq, u64 *timestamp)
+{
+	struct auxtrace_buffer *buffer = btsq->buffer, *old_buffer = buffer;
+	struct auxtrace_queue *queue;
+	struct thread *thread;
+	int err;
+
+	if (btsq->done)
+		return 1;
+
+	if (btsq->pid == -1) {
+		thread = machine__find_thread(btsq->bts->machine, -1,
+					      btsq->tid);
+		if (thread)
+			btsq->pid = thread->pid_;
+	} else {
+		thread = machine__findnew_thread(btsq->bts->machine, btsq->pid,
+						 btsq->tid);
+	}
+
+	queue = &btsq->bts->queues.queue_array[btsq->queue_nr];
+
+	if (!buffer)
+		buffer = auxtrace_buffer__next(queue, NULL);
+
+	if (!buffer) {
+		if (!btsq->bts->sampling_mode)
+			btsq->done = 1;
+		err = 1;
+		goto out_put;
+	}
+
+	/* Currently there is no support for split buffers */
+	if (buffer->consecutive) {
+		err = -EINVAL;
+		goto out_put;
+	}
+
+	if (!buffer->data) {
+		int fd = perf_data_file__fd(btsq->bts->session->file);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data) {
+			err = -ENOMEM;
+			goto out_put;
+		}
+	}
+
+	if (btsq->bts->snapshot_mode && !buffer->consecutive &&
+	    intel_bts_do_fix_overlap(queue, buffer)) {
+		err = -ENOMEM;
+		goto out_put;
+	}
+
+	if (!btsq->bts->synth_opts.callchain && thread &&
+	    (!old_buffer || btsq->bts->sampling_mode ||
+	     (btsq->bts->snapshot_mode && !buffer->consecutive)))
+		thread_stack__set_trace_nr(thread, buffer->buffer_nr + 1);
+
+	err = intel_bts_process_buffer(btsq, buffer);
+
+	auxtrace_buffer__drop_data(buffer);
+
+	btsq->buffer = auxtrace_buffer__next(queue, buffer);
+	if (btsq->buffer) {
+		if (timestamp)
+			*timestamp = btsq->buffer->reference;
+	} else {
+		if (!btsq->bts->sampling_mode)
+			btsq->done = 1;
+	}
+out_put:
+	thread__put(thread);
+	return err;
+}
+
+static int intel_bts_flush_queue(struct intel_bts_queue *btsq)
+{
+	u64 ts = 0;
+	int ret;
+
+	while (1) {
+		ret = intel_bts_process_queue(btsq, &ts);
+		if (ret < 0)
+			return ret;
+		if (ret)
+			break;
+	}
+	return 0;
+}
+
+static int intel_bts_process_tid_exit(struct intel_bts *bts, pid_t tid)
+{
+	struct auxtrace_queues *queues = &bts->queues;
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &bts->queues.queue_array[i];
+		struct intel_bts_queue *btsq = queue->priv;
+
+		if (btsq && btsq->tid == tid)
+			return intel_bts_flush_queue(btsq);
+	}
+	return 0;
+}
+
+static int intel_bts_process_queues(struct intel_bts *bts, u64 timestamp)
+{
+	while (1) {
+		unsigned int queue_nr;
+		struct auxtrace_queue *queue;
+		struct intel_bts_queue *btsq;
+		u64 ts = 0;
+		int ret;
+
+		if (!bts->heap.heap_cnt)
+			return 0;
+
+		if (bts->heap.heap_array[0].ordinal > timestamp)
+			return 0;
+
+		queue_nr = bts->heap.heap_array[0].queue_nr;
+		queue = &bts->queues.queue_array[queue_nr];
+		btsq = queue->priv;
+
+		auxtrace_heap__pop(&bts->heap);
+
+		ret = intel_bts_process_queue(btsq, &ts);
+		if (ret < 0) {
+			auxtrace_heap__add(&bts->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&bts->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			btsq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int intel_bts_process_event(struct perf_session *session,
+				   union perf_event *event,
+				   struct perf_sample *sample,
+				   struct perf_tool *tool)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+	u64 timestamp;
+	int err;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("Intel BTS requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time)
+		timestamp = perf_time_to_tsc(sample->time, &bts->tc);
+	else
+		timestamp = 0;
+
+	err = intel_bts_update_queues(bts);
+	if (err)
+		return err;
+
+	err = intel_bts_process_queues(bts, timestamp);
+	if (err)
+		return err;
+	if (event->header.type == PERF_RECORD_EXIT) {
+		err = intel_bts_process_tid_exit(bts, event->comm.tid);
+		if (err)
+			return err;
+	}
+
+	if (event->header.type == PERF_RECORD_AUX &&
+	    (event->aux.flags & PERF_AUX_FLAG_TRUNCATED) &&
+	    bts->synth_opts.errors)
+		err = intel_bts_lost(bts, sample);
+
+	return err;
+}
+
+static int intel_bts_process_auxtrace_event(struct perf_session *session,
+					    union perf_event *event,
+					    struct perf_tool *tool __maybe_unused)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+
+	if (bts->sampling_mode)
+		return 0;
+
+	if (!bts->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data_file__fd(session->file);
+		int err;
+
+		if (perf_data_file__is_pipe(session->file)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&bts->queues, session, event,
+						 data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				intel_bts_dump_event(bts, buffer->data,
+						     buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int intel_bts_flush(struct perf_session *session __maybe_unused,
+			   struct perf_tool *tool __maybe_unused)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+	int ret;
+
+	if (dump_trace || bts->sampling_mode)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = intel_bts_update_queues(bts);
+	if (ret < 0)
+		return ret;
+
+	return intel_bts_process_queues(bts, MAX_TIMESTAMP);
+}
+
+static void intel_bts_free_queue(void *priv)
+{
+	struct intel_bts_queue *btsq = priv;
+
+	if (!btsq)
+		return;
+	free(btsq);
+}
+
+static void intel_bts_free_events(struct perf_session *session)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+	struct auxtrace_queues *queues = &bts->queues;
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		intel_bts_free_queue(queues->queue_array[i].priv);
+		queues->queue_array[i].priv = NULL;
+	}
+	auxtrace_queues__free(queues);
+}
+
+static void intel_bts_free(struct perf_session *session)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+
+	auxtrace_heap__free(&bts->heap);
+	intel_bts_free_events(session);
+	session->auxtrace = NULL;
+	free(bts);
+}
+
+struct intel_bts_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int intel_bts_event_synth(struct perf_tool *tool,
+				 union perf_event *event,
+				 struct perf_sample *sample __maybe_unused,
+				 struct machine *machine __maybe_unused)
+{
+	struct intel_bts_synth *intel_bts_synth =
+			container_of(tool, struct intel_bts_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(intel_bts_synth->session,
+						 event, NULL);
+}
+
+static int intel_bts_synth_event(struct perf_session *session,
+				 struct perf_event_attr *attr, u64 id)
+{
+	struct intel_bts_synth intel_bts_synth;
+
+	memset(&intel_bts_synth, 0, sizeof(struct intel_bts_synth));
+	intel_bts_synth.session = session;
+
+	return perf_event__synthesize_attr(&intel_bts_synth.dummy_tool, attr, 1,
+					   &id, intel_bts_event_synth);
+}
+
+static int intel_bts_synth_events(struct intel_bts *bts,
+				  struct perf_session *session)
+{
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == bts->pmu_type && evsel->ids) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("There are no selected events with Intel BTS data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+			    PERF_SAMPLE_PERIOD;
+	attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	attr.sample_type &= ~(u64)PERF_SAMPLE_CPU;
+	attr.exclude_user = evsel->attr.exclude_user;
+	attr.exclude_kernel = evsel->attr.exclude_kernel;
+	attr.exclude_hv = evsel->attr.exclude_hv;
+	attr.exclude_host = evsel->attr.exclude_host;
+	attr.exclude_guest = evsel->attr.exclude_guest;
+	attr.sample_id_all = evsel->attr.sample_id_all;
+	attr.read_format = evsel->attr.read_format;
+
+	id = evsel->id[0] + 1000000000;
+	if (!id)
+		id = 1;
+
+	if (bts->synth_opts.branches) {
+		attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
+		attr.sample_period = 1;
+		attr.sample_type |= PERF_SAMPLE_ADDR;
+		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_bts_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'branches' event type\n",
+			       __func__);
+			return err;
+		}
+		bts->sample_branches = true;
+		bts->branches_sample_type = attr.sample_type;
+		bts->branches_id = id;
+		/*
+		 * We only use sample types from PERF_SAMPLE_MASK so we can use
+		 * __perf_evsel__sample_size() here.
+		 */
+		bts->branches_event_size = sizeof(struct sample_event) +
+				__perf_evsel__sample_size(attr.sample_type);
+	}
+
+	bts->synth_needs_swap = evsel->needs_swap;
+
+	return 0;
+}
+
+static const char * const intel_bts_info_fmts[] = {
+	[INTEL_BTS_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
+	[INTEL_BTS_TIME_SHIFT]		= "  Time Shift         %"PRIu64"\n",
+	[INTEL_BTS_TIME_MULT]		= "  Time Muliplier     %"PRIu64"\n",
+	[INTEL_BTS_TIME_ZERO]		= "  Time Zero          %"PRIu64"\n",
+	[INTEL_BTS_CAP_USER_TIME_ZERO]	= "  Cap Time Zero      %"PRId64"\n",
+	[INTEL_BTS_SNAPSHOT_MODE]	= "  Snapshot mode      %"PRId64"\n",
+};
+
+static void intel_bts_print_info(u64 *arr, int start, int finish)
+{
+	int i;
+
+	if (!dump_trace)
+		return;
+
+	for (i = start; i <= finish; i++)
+		fprintf(stdout, intel_bts_info_fmts[i], arr[i]);
+}
+
+u64 intel_bts_auxtrace_info_priv[INTEL_BTS_AUXTRACE_PRIV_SIZE];
+
+int intel_bts_process_auxtrace_info(union perf_event *event,
+				    struct perf_session *session)
+{
+	struct auxtrace_info_event *auxtrace_info = &event->auxtrace_info;
+	size_t min_sz = sizeof(u64) * INTEL_BTS_SNAPSHOT_MODE;
+	struct intel_bts *bts;
+	int err;
+
+	if (auxtrace_info->header.size < sizeof(struct auxtrace_info_event) +
+					min_sz)
+		return -EINVAL;
+
+	bts = zalloc(sizeof(struct intel_bts));
+	if (!bts)
+		return -ENOMEM;
+
+	err = auxtrace_queues__init(&bts->queues);
+	if (err)
+		goto err_free;
+
+	bts->session = session;
+	bts->machine = &session->machines.host; /* No kvm support */
+	bts->auxtrace_type = auxtrace_info->type;
+	bts->pmu_type = auxtrace_info->priv[INTEL_BTS_PMU_TYPE];
+	bts->tc.time_shift = auxtrace_info->priv[INTEL_BTS_TIME_SHIFT];
+	bts->tc.time_mult = auxtrace_info->priv[INTEL_BTS_TIME_MULT];
+	bts->tc.time_zero = auxtrace_info->priv[INTEL_BTS_TIME_ZERO];
+	bts->cap_user_time_zero =
+			auxtrace_info->priv[INTEL_BTS_CAP_USER_TIME_ZERO];
+	bts->snapshot_mode = auxtrace_info->priv[INTEL_BTS_SNAPSHOT_MODE];
+
+	bts->sampling_mode = false;
+
+	bts->auxtrace.process_event = intel_bts_process_event;
+	bts->auxtrace.process_auxtrace_event = intel_bts_process_auxtrace_event;
+	bts->auxtrace.flush_events = intel_bts_flush;
+	bts->auxtrace.free_events = intel_bts_free_events;
+	bts->auxtrace.free = intel_bts_free;
+	session->auxtrace = &bts->auxtrace;
+
+	intel_bts_print_info(&auxtrace_info->priv[0], INTEL_BTS_PMU_TYPE,
+			     INTEL_BTS_SNAPSHOT_MODE);
+
+	if (dump_trace)
+		return 0;
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
+		bts->synth_opts = *session->itrace_synth_opts;
+	else
+		itrace_synth_opts__set_default(&bts->synth_opts);
+
+	if (bts->synth_opts.calls)
+		bts->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
+					PERF_IP_FLAG_TRACE_END;
+	if (bts->synth_opts.returns)
+		bts->branches_filter |= PERF_IP_FLAG_RETURN |
+					PERF_IP_FLAG_TRACE_BEGIN;
+
+	err = intel_bts_synth_events(bts, session);
+	if (err)
+		goto err_free_queues;
+
+	err = auxtrace_queues__process_index(&bts->queues, session);
+	if (err)
+		goto err_free_queues;
+
+	if (bts->queues.populated)
+		bts->data_queued = true;
+
+	return 0;
+
+err_free_queues:
+	auxtrace_queues__free(&bts->queues);
+	session->auxtrace = NULL;
+err_free:
+	free(bts);
+	return err;
+}
diff --git a/tools/perf/util/intel-bts.h b/tools/perf/util/intel-bts.h
new file mode 100644
index 000000000000..ca65e21b3e83
--- /dev/null
+++ b/tools/perf/util/intel-bts.h
@@ -0,0 +1,43 @@
+/*
+ * intel-bts.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__PERF_INTEL_BTS_H__
+#define INCLUDE__PERF_INTEL_BTS_H__
+
+#define INTEL_BTS_PMU_NAME "intel_bts"
+
+enum {
+	INTEL_BTS_PMU_TYPE,
+	INTEL_BTS_TIME_SHIFT,
+	INTEL_BTS_TIME_MULT,
+	INTEL_BTS_TIME_ZERO,
+	INTEL_BTS_CAP_USER_TIME_ZERO,
+	INTEL_BTS_SNAPSHOT_MODE,
+	INTEL_BTS_AUXTRACE_PRIV_MAX,
+};
+
+#define INTEL_BTS_AUXTRACE_PRIV_SIZE (INTEL_BTS_AUXTRACE_PRIV_MAX * sizeof(u64))
+
+struct auxtrace_record;
+struct perf_tool;
+union perf_event;
+struct perf_session;
+
+struct auxtrace_record *intel_bts_recording_init(int *err);
+
+int intel_bts_process_auxtrace_info(union perf_event *event,
+				    struct perf_session *session);
+
+#endif
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index b5ea26bc290d..52d569cda606 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -462,10 +462,6 @@ static struct perf_pmu *pmu_lookup(const char *name)
 	LIST_HEAD(aliases);
 	__u32 type;
 
-	/* No support for intel_bts so disallow it */
-	if (!strcmp(name, "intel_bts"))
-		return NULL;
-
 	/*
 	 * The pmu data we store & need consists of the pmu
 	 * type value and format definitions. Load both right
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 09/25] perf tools: Put itrace options into an asciidoc include
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (7 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 08/25] perf tools: Add Intel BTS support Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-22  6:53   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 10/25] perf tools: Add example call-graph script Adrian Hunter
                   ` (15 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

perf script, report and inject all have the same itrace options. Put
them into an asciidoc include file.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/itrace.txt      | 22 ++++++++++++++++++++++
 tools/perf/Documentation/perf-inject.txt | 23 +----------------------
 tools/perf/Documentation/perf-report.txt | 23 +----------------------
 tools/perf/Documentation/perf-script.txt | 23 +----------------------
 4 files changed, 25 insertions(+), 66 deletions(-)
 create mode 100644 tools/perf/Documentation/itrace.txt

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
new file mode 100644
index 000000000000..2ff946677e3b
--- /dev/null
+++ b/tools/perf/Documentation/itrace.txt
@@ -0,0 +1,22 @@
+		i	synthesize instructions events
+		b	synthesize branches events
+		c	synthesize branches events (calls only)
+		r	synthesize branches events (returns only)
+		x	synthesize transactions events
+		e	synthesize error events
+		d	create a debug log
+		g	synthesize a call chain (use with i or x)
+
+	The default is all events i.e. the same as --itrace=ibxe
+
+	In addition, the period (default 100000) for instructions events
+	can be specified in units of:
+
+		i	instructions
+		t	ticks
+		ms	milliseconds
+		us	microseconds
+		ns	nanoseconds (default)
+
+	Also the call chain size (default 16, max. 1024) for instructions or
+	transactions events can be specified.
diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt
index b876ae312699..0c721c3e37e1 100644
--- a/tools/perf/Documentation/perf-inject.txt
+++ b/tools/perf/Documentation/perf-inject.txt
@@ -48,28 +48,7 @@ OPTIONS
 	Decode Instruction Tracing data, replacing it with synthesized events.
 	Options are:
 
-		i	synthesize instructions events
-		b	synthesize branches events
-		c	synthesize branches events (calls only)
-		r	synthesize branches events (returns only)
-		x	synthesize transactions events
-		e	synthesize error events
-		d	create a debug log
-		g	synthesize a call chain (use with i or x)
-
-	The default is all events i.e. the same as --itrace=ibxe
-
-	In addition, the period (default 100000) for instructions events
-	can be specified in units of:
-
-		i	instructions
-		t	ticks
-		ms	milliseconds
-		us	microseconds
-		ns	nanoseconds (default)
-
-	Also the call chain size (default 16, max. 1024) for instructions or
-	transactions events can be specified.
+include::itrace.txt[]
 
 SEE ALSO
 --------
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index c33b69f3374f..6c449284040e 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -328,28 +328,7 @@ OPTIONS
 --itrace::
 	Options for decoding instruction tracing data. The options are:
 
-		i	synthesize instructions events
-		b	synthesize branches events
-		c	synthesize branches events (calls only)
-		r	synthesize branches events (returns only)
-		x	synthesize transactions events
-		e	synthesize error events
-		d	create a debug log
-		g	synthesize a call chain (use with i or x)
-
-	The default is all events i.e. the same as --itrace=ibxe
-
-	In addition, the period (default 100000) for instructions events
-	can be specified in units of:
-
-		i	instructions
-		t	ticks
-		ms	milliseconds
-		us	microseconds
-		ns	nanoseconds (default)
-
-	Also the call chain size (default 16, max. 1024) for instructions or
-	transactions events can be specified.
+include::itrace.txt[]
 
 	To disable decoding entirely, use --no-itrace.
 
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index c82df572fac2..ac9e99add007 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -231,28 +231,7 @@ OPTIONS
 --itrace::
 	Options for decoding instruction tracing data. The options are:
 
-		i	synthesize instructions events
-		b	synthesize branches events
-		c	synthesize branches events (calls only)
-		r	synthesize branches events (returns only)
-		x	synthesize transactions events
-		e	synthesize error events
-		d	create a debug log
-		g	synthesize a call chain (use with i or x)
-
-	The default is all events i.e. the same as --itrace=ibxe
-
-	In addition, the period (default 100000) for instructions events
-	can be specified in units of:
-
-		i	instructions
-		t	ticks
-		ms	milliseconds
-		us	microseconds
-		ns	nanoseconds (default)
-
-	Also the call chain size (default 16, max. 1024) for instructions or
-	transactions events can be specified.
+include::itrace.txt[]
 
 	To disable decoding entirely, use --no-itrace.
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 10/25] perf tools: Add example call-graph script
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (8 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 09/25] perf tools: Put itrace options into an asciidoc include Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-21 15:00   ` Arnaldo Carvalho de Melo
  2015-08-22  6:53   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 11/25] perf auxtrace: Fix period type 'i' not working Adrian Hunter
                   ` (14 subsequent siblings)
  24 siblings, 2 replies; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add a script to produce a call-graph from data exported
to a postgresql database and derived from a processor trace
event like intel_pt or intel_bts. Refer to comments in the
scripts call-graph-from-postgresql.py and export-to-postgresql.py
for more details.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 .../scripts/python/call-graph-from-postgresql.py   | 327 +++++++++++++++++++++
 tools/perf/scripts/python/export-to-postgresql.py  |  47 +++
 2 files changed, 374 insertions(+)
 create mode 100644 tools/perf/scripts/python/call-graph-from-postgresql.py

diff --git a/tools/perf/scripts/python/call-graph-from-postgresql.py b/tools/perf/scripts/python/call-graph-from-postgresql.py
new file mode 100644
index 000000000000..e78fdc2a5a9d
--- /dev/null
+++ b/tools/perf/scripts/python/call-graph-from-postgresql.py
@@ -0,0 +1,327 @@
+#!/usr/bin/python2
+# call-graph-from-postgresql.py: create call-graph from postgresql database
+# Copyright (c) 2014, Intel Corporation.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+
+# To use this script you will need to have exported data using the
+# export-to-postgresql.py script.  Refer to that script for details.
+#
+# Following on from the example in the export-to-postgresql.py script, a
+# call-graph can be displayed for the pt_example database like this:
+#
+#	python tools/perf/scripts/python/call-graph-from-postgresql.py pt_example
+#
+# Note this script supports connecting to remote databases by setting hostname,
+# port, username, password, and dbname e.g.
+#
+#	python tools/perf/scripts/python/call-graph-from-postgresql.py "hostname=myhost username=myuser password=mypassword dbname=pt_example"
+#
+# The result is a GUI window with a tree representing a context-sensitive
+# call-graph.  Expanding a couple of levels of the tree and adjusting column
+# widths to suit will display something like:
+#
+#                                         Call Graph: pt_example
+# Call Path                          Object      Count   Time(ns)  Time(%)  Branch Count   Branch Count(%)
+# v- ls
+#     v- 2638:2638
+#         v- _start                  ld-2.19.so    1     10074071   100.0         211135            100.0
+#           |- unknown               unknown       1        13198     0.1              1              0.0
+#           >- _dl_start             ld-2.19.so    1      1400980    13.9          19637              9.3
+#           >- _d_linit_internal     ld-2.19.so    1       448152     4.4          11094              5.3
+#           v-__libc_start_main@plt  ls            1      8211741    81.5         180397             85.4
+#              >- _dl_fixup          ld-2.19.so    1         7607     0.1            108              0.1
+#              >- __cxa_atexit       libc-2.19.so  1        11737     0.1             10              0.0
+#              >- __libc_csu_init    ls            1        10354     0.1             10              0.0
+#              |- _setjmp            libc-2.19.so  1            0     0.0              4              0.0
+#              v- main               ls            1      8182043    99.6         180254             99.9
+#
+# Points to note:
+#	The top level is a command name (comm)
+#	The next level is a thread (pid:tid)
+#	Subsequent levels are functions
+#	'Count' is the number of calls
+#	'Time' is the elapsed time until the function returns
+#	Percentages are relative to the level above
+#	'Branch Count' is the total number of branches for that function and all
+#       functions that it calls
+
+import sys
+from PySide.QtCore import *
+from PySide.QtGui import *
+from PySide.QtSql import *
+from decimal import *
+
+class TreeItem():
+
+	def __init__(self, db, row, parent_item):
+		self.db = db
+		self.row = row
+		self.parent_item = parent_item
+		self.query_done = False;
+		self.child_count = 0
+		self.child_items = []
+		self.data = ["", "", "", "", "", "", ""]
+		self.comm_id = 0
+		self.thread_id = 0
+		self.call_path_id = 1
+		self.branch_count = 0
+		self.time = 0
+		if not parent_item:
+			self.setUpRoot()
+
+	def setUpRoot(self):
+		self.query_done = True
+		query = QSqlQuery(self.db)
+		ret = query.exec_('SELECT id, comm FROM comms')
+		if not ret:
+			raise Exception("Query failed: " + query.lastError().text())
+		while query.next():
+			if not query.value(0):
+				continue
+			child_item = TreeItem(self.db, self.child_count, self)
+			self.child_items.append(child_item)
+			self.child_count += 1
+			child_item.setUpLevel1(query.value(0), query.value(1))
+
+	def setUpLevel1(self, comm_id, comm):
+		self.query_done = True;
+		self.comm_id = comm_id
+		self.data[0] = comm
+		self.child_items = []
+		self.child_count = 0
+		query = QSqlQuery(self.db)
+		ret = query.exec_('SELECT thread_id, ( SELECT pid FROM threads WHERE id = thread_id ), ( SELECT tid FROM threads WHERE id = thread_id ) FROM comm_threads WHERE comm_id = ' + str(comm_id))
+		if not ret:
+			raise Exception("Query failed: " + query.lastError().text())
+		while query.next():
+			child_item = TreeItem(self.db, self.child_count, self)
+			self.child_items.append(child_item)
+			self.child_count += 1
+			child_item.setUpLevel2(comm_id, query.value(0), query.value(1), query.value(2))
+
+	def setUpLevel2(self, comm_id, thread_id, pid, tid):
+		self.comm_id = comm_id
+		self.thread_id = thread_id
+		self.data[0] = str(pid) + ":" + str(tid)
+
+	def getChildItem(self, row):
+		return self.child_items[row]
+
+	def getParentItem(self):
+		return self.parent_item
+
+	def getRow(self):
+		return self.row
+
+	def timePercent(self, b):
+		if not self.time:
+			return "0.0"
+		x = (b * Decimal(100)) / self.time
+		return str(x.quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
+
+	def branchPercent(self, b):
+		if not self.branch_count:
+			return "0.0"
+		x = (b * Decimal(100)) / self.branch_count
+		return str(x.quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
+
+	def addChild(self, call_path_id, name, dso, count, time, branch_count):
+		child_item = TreeItem(self.db, self.child_count, self)
+		child_item.comm_id = self.comm_id
+		child_item.thread_id = self.thread_id
+		child_item.call_path_id = call_path_id
+		child_item.branch_count = branch_count
+		child_item.time = time
+		child_item.data[0] = name
+		if dso == "[kernel.kallsyms]":
+			dso = "[kernel]"
+		child_item.data[1] = dso
+		child_item.data[2] = str(count)
+		child_item.data[3] = str(time)
+		child_item.data[4] = self.timePercent(time)
+		child_item.data[5] = str(branch_count)
+		child_item.data[6] = self.branchPercent(branch_count)
+		self.child_items.append(child_item)
+		self.child_count += 1
+
+	def selectCalls(self):
+		self.query_done = True;
+		query = QSqlQuery(self.db)
+		ret = query.exec_('SELECT id, call_path_id, branch_count, call_time, return_time, '
+				  '( SELECT name FROM symbols WHERE id = ( SELECT symbol_id FROM call_paths WHERE id = call_path_id ) ), '
+				  '( SELECT short_name FROM dsos WHERE id = ( SELECT dso_id FROM symbols WHERE id = ( SELECT symbol_id FROM call_paths WHERE id = call_path_id ) ) ), '
+				  '( SELECT ip FROM call_paths where id = call_path_id ) '
+				  'FROM calls WHERE parent_call_path_id = ' + str(self.call_path_id) + ' AND comm_id = ' + str(self.comm_id) + ' AND thread_id = ' + str(self.thread_id) +
+				  'ORDER BY call_path_id')
+		if not ret:
+			raise Exception("Query failed: " + query.lastError().text())
+		last_call_path_id = 0
+		name = ""
+		dso = ""
+		count = 0
+		branch_count = 0
+		total_branch_count = 0
+		time = 0
+		total_time = 0
+		while query.next():
+			if query.value(1) == last_call_path_id:
+				count += 1
+				branch_count += query.value(2)
+				time += query.value(4) - query.value(3)
+			else:
+				if count:
+					self.addChild(last_call_path_id, name, dso, count, time, branch_count)
+				last_call_path_id = query.value(1)
+				name = query.value(5)
+				dso = query.value(6)
+				count = 1
+				total_branch_count += branch_count
+				total_time += time
+				branch_count = query.value(2)
+				time = query.value(4) - query.value(3)
+		if count:
+			self.addChild(last_call_path_id, name, dso, count, time, branch_count)
+		total_branch_count += branch_count
+		total_time += time
+		# Top level does not have time or branch count, so fix that here
+		if total_branch_count > self.branch_count:
+			self.branch_count = total_branch_count
+			if self.branch_count:
+				for child_item in self.child_items:
+					child_item.data[6] = self.branchPercent(child_item.branch_count)
+		if total_time > self.time:
+			self.time = total_time
+			if self.time:
+				for child_item in self.child_items:
+					child_item.data[4] = self.timePercent(child_item.time)
+
+	def childCount(self):
+		if not self.query_done:
+			self.selectCalls()
+		return self.child_count
+
+	def columnCount(self):
+		return 7
+
+	def columnHeader(self, column):
+		headers = ["Call Path", "Object", "Count ", "Time (ns) ", "Time (%) ", "Branch Count ", "Branch Count (%) "]
+		return headers[column]
+
+	def getData(self, column):
+		return self.data[column]
+
+class TreeModel(QAbstractItemModel):
+
+	def __init__(self, db, parent=None):
+		super(TreeModel, self).__init__(parent)
+		self.db = db
+		self.root = TreeItem(db, 0, None)
+
+	def columnCount(self, parent):
+		return self.root.columnCount()
+
+	def rowCount(self, parent):
+		if parent.isValid():
+			parent_item = parent.internalPointer()
+		else:
+			parent_item = self.root
+		return parent_item.childCount()
+
+	def headerData(self, section, orientation, role):
+		if role == Qt.TextAlignmentRole:
+			if section > 1:
+				return Qt.AlignRight
+		if role != Qt.DisplayRole:
+			return None
+		if orientation != Qt.Horizontal:
+			return None
+		return self.root.columnHeader(section)
+
+	def parent(self, child):
+		child_item = child.internalPointer()
+		if child_item is self.root:
+			return QModelIndex()
+		parent_item = child_item.getParentItem()
+		return self.createIndex(parent_item.getRow(), 0, parent_item)
+
+	def index(self, row, column, parent):
+		if parent.isValid():
+			parent_item = parent.internalPointer()
+		else:
+			parent_item = self.root
+		child_item = parent_item.getChildItem(row)
+		return self.createIndex(row, column, child_item)
+
+	def data(self, index, role):
+		if role == Qt.TextAlignmentRole:
+			if index.column() > 1:
+				return Qt.AlignRight
+		if role != Qt.DisplayRole:
+			return None
+		index_item = index.internalPointer()
+		return index_item.getData(index.column())
+
+class MainWindow(QMainWindow):
+
+	def __init__(self, db, dbname, parent=None):
+		super(MainWindow, self).__init__(parent)
+
+		self.setObjectName("MainWindow")
+		self.setWindowTitle("Call Graph: " + dbname)
+		self.move(100, 100)
+		self.resize(800, 600)
+		style = self.style()
+		icon = style.standardIcon(QStyle.SP_MessageBoxInformation)
+		self.setWindowIcon(icon);
+
+		self.model = TreeModel(db)
+
+		self.view = QTreeView()
+		self.view.setModel(self.model)
+
+		self.setCentralWidget(self.view)
+
+if __name__ == '__main__':
+	if (len(sys.argv) < 2):
+		print >> sys.stderr, "Usage is: call-graph-from-postgresql.py <database name>"
+		raise Exception("Too few arguments")
+
+	dbname = sys.argv[1]
+
+	db = QSqlDatabase.addDatabase('QPSQL')
+
+	opts = dbname.split()
+	for opt in opts:
+		if '=' in opt:
+			opt = opt.split('=')
+			if opt[0] == 'hostname':
+				db.setHostName(opt[1])
+			elif opt[0] == 'port':
+				db.setPort(int(opt[1]))
+			elif opt[0] == 'username':
+				db.setUserName(opt[1])
+			elif opt[0] == 'password':
+				db.setPassword(opt[1])
+			elif opt[0] == 'dbname':
+				dbname = opt[1]
+		else:
+			dbname = opt
+
+	db.setDatabaseName(dbname)
+	if not db.open():
+		raise Exception("Failed to open database " + dbname + " error: " + db.lastError().text())
+
+	app = QApplication(sys.argv)
+	window = MainWindow(db, dbname)
+	window.show()
+	err = app.exec_()
+	db.close()
+	sys.exit(err)
diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py
index 4cdafd880074..5e939eadacd2 100644
--- a/tools/perf/scripts/python/export-to-postgresql.py
+++ b/tools/perf/scripts/python/export-to-postgresql.py
@@ -15,6 +15,53 @@ import sys
 import struct
 import datetime
 
+# To use this script you will need to have installed package python-pyside which
+# provides LGPL-licensed Python bindings for Qt.  You will also need the package
+# libqt4-sql-psql for Qt postgresql support.
+#
+# The script assumes postgresql is running on the local machine and that the
+# user has postgresql permissions to create databases. Examples of installing
+# postgresql and adding such a user are:
+#
+# fedora:
+#
+#	$ sudo yum install postgresql postgresql-server
+#	$ sudo su - postgres -c initdb
+#	$ sudo service postgresql start
+#	$ sudo su - postgres
+#	$ createuser <your user id here>
+#	Shall the new role be a superuser? (y/n) y
+#
+# ubuntu:
+#
+#	$ sudo apt-get install postgresql
+#	$ sudo su - postgres
+#	$ createuser <your user id here>
+#	Shall the new role be a superuser? (y/n) y
+#
+# An example of using this script with Intel PT:
+#
+#	$ perf record -e intel_pt//u ls
+#	$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py pt_example branches calls
+#	2015-05-29 12:49:23.464364 Creating database...
+#	2015-05-29 12:49:26.281717 Writing to intermediate files...
+#	2015-05-29 12:49:27.190383 Copying to database...
+#	2015-05-29 12:49:28.140451 Removing intermediate files...
+#	2015-05-29 12:49:28.147451 Adding primary keys
+#	2015-05-29 12:49:28.655683 Adding foreign keys
+#	2015-05-29 12:49:29.365350 Done
+#
+# To browse the database, psql can be used e.g.
+#
+#	$ psql pt_example
+#	pt_example=# select * from samples_view where id < 100;
+#	pt_example=# \d+
+#	pt_example=# \d+ samples_view
+#	pt_example=# \q
+#
+# An example of using the database is provided by the script
+# call-graph-from-postgresql.py.  Refer to that script for details.
+
 from PySide.QtSql import *
 
 # Need to access PostgreSQL C library directly to use COPY FROM STDIN
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 11/25] perf auxtrace: Fix period type 'i' not working
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (9 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 10/25] perf tools: Add example call-graph script Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-06 19:50   ` Arnaldo Carvalho de Melo
  2015-08-07  7:22   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 12/25] perf tools: Fix perf-with-kcore handling of arguments containing spaces Adrian Hunter
                   ` (13 subsequent siblings)
  24 siblings, 2 replies; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

PERF_ITRACE_PERIOD_INSTRUCTIONS is zero so it
got overwritten by the default period type.
Fix by checking if the period type was set
rather than if the value was zero when applying
the default.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/auxtrace.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 3f3b40fa87c8..4ec3acf6fabb 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -949,6 +949,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 	struct itrace_synth_opts *synth_opts = opt->value;
 	const char *p;
 	char *endptr;
+	bool period_type_set = false;
 
 	synth_opts->set = true;
 
@@ -977,10 +978,12 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 				case 'i':
 					synth_opts->period_type =
 						PERF_ITRACE_PERIOD_INSTRUCTIONS;
+					period_type_set = true;
 					break;
 				case 't':
 					synth_opts->period_type =
 						PERF_ITRACE_PERIOD_TICKS;
+					period_type_set = true;
 					break;
 				case 'm':
 					synth_opts->period *= 1000;
@@ -993,6 +996,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 						goto out_err;
 					synth_opts->period_type =
 						PERF_ITRACE_PERIOD_NANOSECS;
+					period_type_set = true;
 					break;
 				case '\0':
 					goto out;
@@ -1046,7 +1050,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 	}
 out:
 	if (synth_opts->instructions) {
-		if (!synth_opts->period_type)
+		if (!period_type_set)
 			synth_opts->period_type =
 					PERF_ITRACE_DEFAULT_PERIOD_TYPE;
 		if (!synth_opts->period)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 12/25] perf tools: Fix perf-with-kcore handling of arguments containing spaces
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (10 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 11/25] perf auxtrace: Fix period type 'i' not working Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-06 19:50   ` Arnaldo Carvalho de Melo
  2015-08-07  7:22   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 13/25] perf tools: Fix Intel PT 'instructions' sample period Adrian Hunter
                   ` (12 subsequent siblings)
  24 siblings, 2 replies; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Fix the perf-with-kcore script so that it doesn't
split arguments that contain spaces.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/perf-with-kcore.sh | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/tools/perf/perf-with-kcore.sh b/tools/perf/perf-with-kcore.sh
index c7ff90a90e4e..7e47a7cbc195 100644
--- a/tools/perf/perf-with-kcore.sh
+++ b/tools/perf/perf-with-kcore.sh
@@ -50,7 +50,7 @@ copy_kcore()
 	fi
 
 	rm -f perf.data.junk
-	("$PERF" record -o perf.data.junk $PERF_OPTIONS -- sleep 60) >/dev/null 2>/dev/null &
+	("$PERF" record -o perf.data.junk "${PERF_OPTIONS[@]}" -- sleep 60) >/dev/null 2>/dev/null &
 	PERF_PID=$!
 
 	# Need to make sure that perf has started
@@ -160,18 +160,18 @@ record()
 			echo "*** WARNING *** /proc/sys/kernel/kptr_restrict prevents access to kernel addresses" >&2
 		fi
 
-		if echo "$PERF_OPTIONS" | grep -q ' -a \|^-a \| -a$\|^-a$\| --all-cpus \|^--all-cpus \| --all-cpus$\|^--all-cpus$' ; then
+		if echo "${PERF_OPTIONS[@]}" | grep -q ' -a \|^-a \| -a$\|^-a$\| --all-cpus \|^--all-cpus \| --all-cpus$\|^--all-cpus$' ; then
 			echo "*** WARNING *** system-wide tracing without root access will not be able to read all necessary information from /proc" >&2
 		fi
 
-		if echo "$PERF_OPTIONS" | grep -q 'intel_pt\|intel_bts\| -I\|^-I' ; then
+		if echo "${PERF_OPTIONS[@]}" | grep -q 'intel_pt\|intel_bts\| -I\|^-I' ; then
 			if [ "$(cat /proc/sys/kernel/perf_event_paranoid)" -gt -1 ] ; then
 				echo "*** WARNING *** /proc/sys/kernel/perf_event_paranoid restricts buffer size and tracepoint (sched_switch) use" >&2
 			fi
 
-			if echo "$PERF_OPTIONS" | grep -q ' --per-thread \|^--per-thread \| --per-thread$\|^--per-thread$' ; then
+			if echo "${PERF_OPTIONS[@]}" | grep -q ' --per-thread \|^--per-thread \| --per-thread$\|^--per-thread$' ; then
 				true
-			elif echo "$PERF_OPTIONS" | grep -q ' -t \|^-t \| -t$\|^-t$' ; then
+			elif echo "${PERF_OPTIONS[@]}" | grep -q ' -t \|^-t \| -t$\|^-t$' ; then
 				true
 			elif [ ! -r /sys/kernel/debug -o ! -x /sys/kernel/debug ] ; then
 				echo "*** WARNING *** /sys/kernel/debug permissions prevent tracepoint (sched_switch) use" >&2
@@ -193,8 +193,8 @@ record()
 
 	mkdir "$PERF_DATA_DIR"
 
-	echo "$PERF record -o $PERF_DATA_DIR/perf.data $PERF_OPTIONS -- $*"
-	"$PERF" record -o "$PERF_DATA_DIR/perf.data" $PERF_OPTIONS -- $* || true
+	echo "$PERF record -o $PERF_DATA_DIR/perf.data ${PERF_OPTIONS[@]} -- $@"
+	"$PERF" record -o "$PERF_DATA_DIR/perf.data" "${PERF_OPTIONS[@]}" -- "$@" || true
 
 	if rmdir "$PERF_DATA_DIR" > /dev/null 2>/dev/null ; then
 		exit 1
@@ -209,8 +209,8 @@ subcommand()
 {
 	find_perf
 	check_buildid_cache_permissions
-	echo "$PERF $PERF_SUB_COMMAND -i $PERF_DATA_DIR/perf.data --kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms $*"
-	"$PERF" $PERF_SUB_COMMAND -i "$PERF_DATA_DIR/perf.data" "--kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms" $*
+	echo "$PERF $PERF_SUB_COMMAND -i $PERF_DATA_DIR/perf.data --kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms $@"
+	"$PERF" $PERF_SUB_COMMAND -i "$PERF_DATA_DIR/perf.data" "--kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms" "$@"
 }
 
 if [ "$1" = "fix_buildid_cache_permissions" ] ; then
@@ -234,7 +234,7 @@ fi
 case "$PERF_SUB_COMMAND" in
 "record")
 	while [ "$1" != "--" ] ; do
-		PERF_OPTIONS+="$1 "
+		PERF_OPTIONS+=("$1")
 		shift || break
 	done
 	if [ "$1" != "--" ] ; then
@@ -242,16 +242,16 @@ case "$PERF_SUB_COMMAND" in
 		usage
 	fi
 	shift
-	record $*
+	record "$@"
 ;;
 "script")
-	subcommand $*
+	subcommand "$@"
 ;;
 "report")
-	subcommand $*
+	subcommand "$@"
 ;;
 "inject")
-	subcommand $*
+	subcommand "$@"
 ;;
 *)
 	usage
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 13/25] perf tools: Fix Intel PT 'instructions' sample period
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (11 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 12/25] perf tools: Fix perf-with-kcore handling of arguments containing spaces Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-28  6:37   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 14/25] perf tools: Add perf_pmu__format_bits() Adrian Hunter
                   ` (11 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

The period on synthesized 'instructions' samples was
being set to a fixed value, whereas the correct value
is the number of instructions since the last sample,
which is a value that the decoder can provide.  So
do it that way.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 3 +++
 tools/perf/util/intel-pt-decoder/intel-pt-decoder.h | 1 +
 tools/perf/util/intel-pt.c                          | 5 ++++-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index f8ac462fec1a..56790ea1e88e 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -108,6 +108,7 @@ struct intel_pt_decoder {
 	uint64_t sign_bits;
 	uint64_t period;
 	enum intel_pt_period_type period_type;
+	uint64_t tot_insn_cnt;
 	uint64_t period_insn_cnt;
 	uint64_t period_mask;
 	uint64_t period_ticks;
@@ -559,6 +560,7 @@ static int intel_pt_walk_insn(struct intel_pt_decoder *decoder,
 	err = decoder->walk_insn(intel_pt_insn, &insn_cnt, &decoder->ip, ip,
 				 max_insn_cnt, decoder->data);
 
+	decoder->tot_insn_cnt += insn_cnt;
 	decoder->timestamp_insn_cnt += insn_cnt;
 	decoder->period_insn_cnt += insn_cnt;
 
@@ -1529,6 +1531,7 @@ const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder)
 	decoder->state.timestamp = decoder->timestamp;
 	decoder->state.est_timestamp = intel_pt_est_timestamp(decoder);
 	decoder->state.cr3 = decoder->cr3;
+	decoder->state.tot_insn_cnt = decoder->tot_insn_cnt;
 
 	if (err)
 		decoder->state.from_ip = decoder->ip;
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
index 4c4880230cc9..cbf57044c385 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -58,6 +58,7 @@ struct intel_pt_state {
 	uint64_t from_ip;
 	uint64_t to_ip;
 	uint64_t cr3;
+	uint64_t tot_insn_cnt;
 	uint64_t timestamp;
 	uint64_t est_timestamp;
 	uint64_t trace_nr;
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 2a4a4120473b..64c8352d2a2b 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -126,6 +126,7 @@ struct intel_pt_queue {
 	u64 timestamp;
 	u32 flags;
 	u16 insn_len;
+	u64 last_insn_cnt;
 };
 
 static void intel_pt_dump(struct intel_pt *pt __maybe_unused,
@@ -920,11 +921,13 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
 	sample.addr = ptq->state->to_ip;
 	sample.id = ptq->pt->instructions_id;
 	sample.stream_id = ptq->pt->instructions_id;
-	sample.period = ptq->pt->instructions_sample_period;
+	sample.period = ptq->state->tot_insn_cnt - ptq->last_insn_cnt;
 	sample.cpu = ptq->cpu;
 	sample.flags = ptq->flags;
 	sample.insn_len = ptq->insn_len;
 
+	ptq->last_insn_cnt = ptq->state->tot_insn_cnt;
+
 	if (pt->synth_opts.callchain) {
 		thread_stack__sample(ptq->thread, ptq->chain,
 				     pt->synth_opts.callchain_sz, sample.ip);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 14/25] perf tools: Add perf_pmu__format_bits()
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (12 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 13/25] perf tools: Fix Intel PT 'instructions' sample period Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-06 19:50   ` Arnaldo Carvalho de Melo
  2015-08-07  7:23   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 15/25] perf tools: Validate config term maximum value Adrian Hunter
                   ` (10 subsequent siblings)
  24 siblings, 2 replies; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add perf_pmu__format_bits() to get the format bits
for a PMU config term.  Intel PT will use this to
validate terms and to record format bits to enable
later interpreting the config from the attribute
stored in the perf.data file.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/pmu.c | 17 ++++++++++++++++-
 tools/perf/util/pmu.h |  1 +
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 52d569cda606..d26ff0ab8410 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -538,7 +538,7 @@ struct perf_pmu *perf_pmu__find(const char *name)
 }
 
 static struct perf_pmu_format *
-pmu_find_format(struct list_head *formats, char *name)
+pmu_find_format(struct list_head *formats, const char *name)
 {
 	struct perf_pmu_format *format;
 
@@ -549,6 +549,21 @@ pmu_find_format(struct list_head *formats, char *name)
 	return NULL;
 }
 
+__u64 perf_pmu__format_bits(struct list_head *formats, const char *name)
+{
+	struct perf_pmu_format *format = pmu_find_format(formats, name);
+	__u64 bits = 0;
+	int fbit;
+
+	if (!format)
+		return 0;
+
+	for_each_set_bit(fbit, format->bits, PERF_PMU_FORMAT_BITS)
+		bits |= 1ULL << fbit;
+
+	return bits;
+}
+
 /*
  * Sets value based on the format definition (format parameter)
  * and unformated value (value parameter).
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 7b9c8cf8ae3e..5d7e84466bee 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -54,6 +54,7 @@ int perf_pmu__config_terms(struct list_head *formats,
 			   struct perf_event_attr *attr,
 			   struct list_head *head_terms,
 			   bool zero, struct parse_events_error *error);
+__u64 perf_pmu__format_bits(struct list_head *formats, const char *name);
 int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
 			  struct perf_pmu_info *info);
 struct list_head *perf_pmu__alias(struct perf_pmu *pmu,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 15/25] perf tools: Validate config term maximum value
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (13 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 14/25] perf tools: Add perf_pmu__format_bits() Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-06 19:50   ` Arnaldo Carvalho de Melo
  2015-08-07  7:23   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 16/25] perf tools: Extend the event parser maximum error index Adrian Hunter
                   ` (9 subsequent siblings)
  24 siblings, 2 replies; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Currently the value of a PMU config term is silently
truncated if it is too big. This is an impediment to
validating the value for other criteria later on i.e.
the user provides an invalid value that gets truncated
to a valid one.

The maximum value validation is only done for the
parser where the error is passed back to the user. In
other cases the silent truncation continues so as not
to affect tools that perhaps rely on it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/pmu.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index d26ff0ab8410..29797fd9bedc 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -585,6 +585,18 @@ static void pmu_format_value(unsigned long *format, __u64 value, __u64 *v,
 	}
 }
 
+static __u64 pmu_format_max_value(const unsigned long *format)
+{
+	int w;
+
+	w = bitmap_weight(format, PERF_PMU_FORMAT_BITS);
+	if (!w)
+		return 0;
+	if (w < 64)
+		return (1ULL << w) - 1;
+	return -1;
+}
+
 /*
  * Term is a string term, and might be a param-term. Try to look up it's value
  * in the remaining terms.
@@ -658,7 +670,7 @@ static int pmu_config_term(struct list_head *formats,
 {
 	struct perf_pmu_format *format;
 	__u64 *vp;
-	__u64 val;
+	__u64 val, max_val;
 
 	/*
 	 * If this is a parameter we've already used for parameterized-eval,
@@ -724,6 +736,22 @@ static int pmu_config_term(struct list_head *formats,
 	} else
 		return -EINVAL;
 
+	max_val = pmu_format_max_value(format->bits);
+	if (val > max_val) {
+		if (err) {
+			err->idx = term->err_val;
+			if (asprintf(&err->str,
+				     "value too big for format, maximum is %llu",
+				     (unsigned long long)max_val) < 0)
+				err->str = strdup("value too big for format");
+			return -EINVAL;
+		}
+		/*
+		 * Assume we don't care if !err, in which case the value will be
+		 * silently truncated.
+		 */
+	}
+
 	pmu_format_value(format->bits, val, vp, zero);
 	return 0;
 }
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 16/25] perf tools: Extend the event parser maximum error index
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (14 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 15/25] perf tools: Validate config term maximum value Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-06 19:50   ` Arnaldo Carvalho de Melo
  2015-08-07  7:23   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 17/25] perf tools: Add Intel PT support for PSB periods Adrian Hunter
                   ` (8 subsequent siblings)
  24 siblings, 2 replies; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Extend the event parser maximum error index from 10
to 13.  That allows PMU config terms of up to 10
characters to display un-truncated in the error
message.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/parse-events.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index a71eeb279ed2..dbe43acb5fe0 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1105,7 +1105,7 @@ static void parse_events_print_error(struct parse_events_error *err,
 		 * Maximum error index indent, we will cut
 		 * the event string if it's bigger.
 		 */
-		int max_err_idx = 10;
+		int max_err_idx = 13;
 
 		/*
 		 * Let's be specific with the message when
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 17/25] perf tools: Add Intel PT support for PSB periods
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (15 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 16/25] perf tools: Extend the event parser maximum error index Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-28  6:38   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 18/25] perf tools: Add new Intel PT packet definitions Adrian Hunter
                   ` (7 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

The PSB packet is a synchronization packet that
provides a starting point for decoding or recovery
from errors.

This patch adds support for a new Intel PT feature
that allows the frequency of PSB packets to be specified.

Support for this feature is indicated by
/sys/bus/event_source/devices/intel_pt/caps/psb_cyc
which contains "1" if the feature is supported and
"0" otherwise.

The PSB period can be specified as a PMU config term
e.g. perf record -e intel_pt/psb_period=2/u sleep 1

The default value is 3 or the nearest lower value
that is supported.  0 is always supported.

Valid values are given by:
/sys/bus/event_source/devices/intel_pt/caps/psb_periods
which contains a hexadecimal value, the bits of which
represent valid values e.g. bit 2 set means value 2
is valid.

The value is converted to the approximate number of
trace bytes between PSB packets as:

	2 ^ (value + 11)

e.g. value 3 means 16KiB bytes between PSBs

If an invalid value is entered, the error message
will give a list of valid values e.g.

	$ perf record -e intel_pt/psb_period=15/u uname
	Invalid psb_period for intel_pt. Valid values are: 0-5

tools/perf/Documentation/intel-pt.txt is updated in
a later patch as there are a number of new features
being added.

For more information about PSB periods refer to the
Intel 64 and IA-32 Architectures SDM Chapter 36
Intel Processor Trace from June 2015 or later.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/arch/x86/util/intel-pt.c | 217 ++++++++++++++++++++++++++++++++++--
 1 file changed, 210 insertions(+), 7 deletions(-)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index da7d2c15e611..145975b003a7 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -99,17 +99,121 @@ static int intel_pt_parse_terms(struct list_head *formats, const char *str,
 	return intel_pt_parse_terms_with_default(formats, str, config);
 }
 
-static size_t intel_pt_psb_period(struct perf_pmu *intel_pt_pmu __maybe_unused,
-				  struct perf_evlist *evlist __maybe_unused)
+static u64 intel_pt_masked_bits(u64 mask, u64 bits)
 {
-	return 256;
+	const u64 top_bit = 1ULL << 63;
+	u64 res = 0;
+	int i;
+
+	for (i = 0; i < 64; i++) {
+		if (mask & top_bit) {
+			res <<= 1;
+			if (bits & top_bit)
+				res |= 1;
+		}
+		mask <<= 1;
+		bits <<= 1;
+	}
+
+	return res;
+}
+
+static int intel_pt_read_config(struct perf_pmu *intel_pt_pmu, const char *str,
+				struct perf_evlist *evlist, u64 *res)
+{
+	struct perf_evsel *evsel;
+	u64 mask;
+
+	*res = 0;
+
+	mask = perf_pmu__format_bits(&intel_pt_pmu->format, str);
+	if (!mask)
+		return -EINVAL;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == intel_pt_pmu->type) {
+			*res = intel_pt_masked_bits(mask, evsel->attr.config);
+			return 0;
+		}
+	}
+
+	return -EINVAL;
+}
+
+static size_t intel_pt_psb_period(struct perf_pmu *intel_pt_pmu,
+				  struct perf_evlist *evlist)
+{
+	u64 val;
+	int err, topa_multiple_entries;
+	size_t psb_period;
+
+	if (perf_pmu__scan_file(intel_pt_pmu, "caps/topa_multiple_entries",
+				"%d", &topa_multiple_entries) != 1)
+		topa_multiple_entries = 0;
+
+	/*
+	 * Use caps/topa_multiple_entries to indicate early hardware that had
+	 * extra frequent PSBs.
+	 */
+	if (!topa_multiple_entries) {
+		psb_period = 256;
+		goto out;
+	}
+
+	err = intel_pt_read_config(intel_pt_pmu, "psb_period", evlist, &val);
+	if (err)
+		val = 0;
+
+	psb_period = 1 << (val + 11);
+out:
+	pr_debug2("%s psb_period %zu\n", intel_pt_pmu->name, psb_period);
+	return psb_period;
+}
+
+static int intel_pt_pick_bit(int bits, int target)
+{
+	int pos, pick = -1;
+
+	for (pos = 0; bits; bits >>= 1, pos++) {
+		if (bits & 1) {
+			if (pos <= target || pick < 0)
+				pick = pos;
+			if (pos >= target)
+				break;
+		}
+	}
+
+	return pick;
 }
 
 static u64 intel_pt_default_config(struct perf_pmu *intel_pt_pmu)
 {
+	char buf[256];
+	int psb_cyc, psb_periods, psb_period;
+	int pos = 0;
 	u64 config;
 
-	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &config);
+	pos += scnprintf(buf + pos, sizeof(buf) - pos, "tsc");
+
+	if (perf_pmu__scan_file(intel_pt_pmu, "caps/psb_cyc", "%d",
+				&psb_cyc) != 1)
+		psb_cyc = 1;
+
+	if (psb_cyc) {
+		if (perf_pmu__scan_file(intel_pt_pmu, "caps/psb_periods", "%x",
+					&psb_periods) != 1)
+			psb_periods = 0;
+		if (psb_periods) {
+			psb_period = intel_pt_pick_bit(psb_periods, 3);
+			pos += scnprintf(buf + pos, sizeof(buf) - pos,
+					 ",psb_period=%d", psb_period);
+		}
+	}
+
+	pr_debug2("%s default config: %s\n", intel_pt_pmu->name, buf);
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, buf, &config);
+
 	return config;
 }
 
@@ -239,6 +343,103 @@ static int intel_pt_track_switches(struct perf_evlist *evlist)
 	return 0;
 }
 
+static void intel_pt_valid_str(char *str, size_t len, u64 valid)
+{
+	unsigned int val, last = 0, state = 1;
+	int p = 0;
+
+	str[0] = '\0';
+
+	for (val = 0; val <= 64; val++, valid >>= 1) {
+		if (valid & 1) {
+			last = val;
+			switch (state) {
+			case 0:
+				p += scnprintf(str + p, len - p, ",");
+				/* Fall through */
+			case 1:
+				p += scnprintf(str + p, len - p, "%u", val);
+				state = 2;
+				break;
+			case 2:
+				state = 3;
+				break;
+			case 3:
+				state = 4;
+				break;
+			default:
+				break;
+			}
+		} else {
+			switch (state) {
+			case 3:
+				p += scnprintf(str + p, len - p, ",%u", last);
+				state = 0;
+				break;
+			case 4:
+				p += scnprintf(str + p, len - p, "-%u", last);
+				state = 0;
+				break;
+			default:
+				break;
+			}
+			if (state != 1)
+				state = 0;
+		}
+	}
+}
+
+static int intel_pt_val_config_term(struct perf_pmu *intel_pt_pmu,
+				    const char *caps, const char *name,
+				    const char *supported, u64 config)
+{
+	char valid_str[256];
+	unsigned int shift;
+	unsigned long long valid;
+	u64 bits;
+	int ok;
+
+	if (perf_pmu__scan_file(intel_pt_pmu, caps, "%llx", &valid) != 1)
+		valid = 0;
+
+	if (supported &&
+	    perf_pmu__scan_file(intel_pt_pmu, supported, "%d", &ok) == 1 && !ok)
+		valid = 0;
+
+	valid |= 1;
+
+	bits = perf_pmu__format_bits(&intel_pt_pmu->format, name);
+
+	config &= bits;
+
+	for (shift = 0; bits && !(bits & 1); shift++)
+		bits >>= 1;
+
+	config >>= shift;
+
+	if (config > 63)
+		goto out_err;
+
+	if (valid & (1 << config))
+		return 0;
+out_err:
+	intel_pt_valid_str(valid_str, sizeof(valid_str), valid);
+	pr_err("Invalid %s for %s. Valid values are: %s\n",
+	       name, INTEL_PT_PMU_NAME, valid_str);
+	return -EINVAL;
+}
+
+static int intel_pt_validate_config(struct perf_pmu *intel_pt_pmu,
+				    struct perf_evsel *evsel)
+{
+	if (!evsel)
+		return 0;
+
+	return intel_pt_val_config_term(intel_pt_pmu, "caps/psb_periods",
+					"psb_period", "caps/psb_cyc",
+					evsel->attr.config);
+}
+
 static int intel_pt_recording_options(struct auxtrace_record *itr,
 				      struct perf_evlist *evlist,
 				      struct record_opts *opts)
@@ -251,6 +452,7 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 	const struct cpu_map *cpus = evlist->cpus;
 	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
 	u64 tsc_bit;
+	int err;
 
 	ptr->evlist = evlist;
 	ptr->snapshot_mode = opts->auxtrace_snapshot_mode;
@@ -281,6 +483,10 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 	if (!opts->full_auxtrace)
 		return 0;
 
+	err = intel_pt_validate_config(intel_pt_pmu, intel_pt_evsel);
+	if (err)
+		return err;
+
 	/* Set default sizes for snapshot mode */
 	if (opts->auxtrace_snapshot_mode) {
 		size_t psb_period = intel_pt_psb_period(intel_pt_pmu, evlist);
@@ -366,8 +572,6 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 	 * threads.
 	 */
 	if (have_timing_info && !cpu_map__empty(cpus)) {
-		int err;
-
 		err = intel_pt_track_switches(evlist);
 		if (err == -EPERM)
 			pr_debug2("Unable to select sched:sched_switch\n");
@@ -394,7 +598,6 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 	/* Add dummy event to keep tracking */
 	if (opts->full_auxtrace) {
 		struct perf_evsel *tracking_evsel;
-		int err;
 
 		err = parse_events(evlist, "dummy:u", NULL);
 		if (err)
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 18/25] perf tools: Add new Intel PT packet definitions
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (16 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 17/25] perf tools: Add Intel PT support for PSB periods Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-28  6:38   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 19/25] perf tools: Pass Intel PT information for decoding MTC and CYC Adrian Hunter
                   ` (6 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

New features have been added to Intel PT which include a number
of new packet definitions.

This patch adds packet definitions for new packets: TMA, MTC, CYC,
VMCS, TRACESTOP and MNT.  Also another bit in PIP is defined.

This patch only adds support for the definitions. Later patches
add support for decoding TMA, MTC, CYC and TRACESTOP which is
where those packets are explained.

VMCS and the newly defined bit in PIP are used with virtualization
which is not supported yet.  MNT is a maintenance packet which the
decoder should ignore.

For details, refer to the June 2015 or later Intel 64 and
IA-32 Architectures SDM Chapter 36 Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  |  70 +++++++++-
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   | 142 +++++++++++++++++++--
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.h   |   6 +
 3 files changed, 201 insertions(+), 17 deletions(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 56790ea1e88e..4a0e9fb1d173 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -923,6 +923,7 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 		case INTEL_PT_TIP_PGE:
 		case INTEL_PT_TIP:
 		case INTEL_PT_TNT:
+		case INTEL_PT_TRACESTOP:
 		case INTEL_PT_BAD:
 		case INTEL_PT_PSB:
 			intel_pt_log("ERROR: Unexpected packet\n");
@@ -935,6 +936,9 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 			intel_pt_calc_tsc_timestamp(decoder);
 			break;
 
+		case INTEL_PT_TMA:
+			break;
+
 		case INTEL_PT_CBR:
 			decoder->cbr = decoder->packet.payload;
 			break;
@@ -944,7 +948,7 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload;
+			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
 			break;
 
 		case INTEL_PT_FUP:
@@ -956,6 +960,12 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 			intel_pt_update_in_tx(decoder);
 			break;
 
+		case INTEL_PT_MTC:
+			break;
+
+		case INTEL_PT_CYC:
+		case INTEL_PT_VMCS:
+		case INTEL_PT_MNT:
 		case INTEL_PT_PAD:
 		default:
 			break;
@@ -983,8 +993,10 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
 		switch (decoder->packet.type) {
 		case INTEL_PT_TNT:
 		case INTEL_PT_FUP:
+		case INTEL_PT_TRACESTOP:
 		case INTEL_PT_PSB:
 		case INTEL_PT_TSC:
+		case INTEL_PT_TMA:
 		case INTEL_PT_CBR:
 		case INTEL_PT_MODE_TSX:
 		case INTEL_PT_BAD:
@@ -1032,13 +1044,21 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
 			return 0;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload;
+			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
+			break;
+
+		case INTEL_PT_MTC:
+			break;
+
+		case INTEL_PT_CYC:
 			break;
 
 		case INTEL_PT_MODE_EXEC:
 			decoder->exec_mode = decoder->packet.payload;
 			break;
 
+		case INTEL_PT_VMCS:
+		case INTEL_PT_MNT:
 		case INTEL_PT_PAD:
 			break;
 
@@ -1122,6 +1142,9 @@ next:
 			}
 			return intel_pt_walk_fup_tip(decoder);
 
+		case INTEL_PT_TRACESTOP:
+			break;
+
 		case INTEL_PT_PSB:
 			intel_pt_clear_stack(&decoder->stack);
 			err = intel_pt_walk_psbend(decoder);
@@ -1132,13 +1155,22 @@ next:
 			break;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload;
+			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
+			break;
+
+		case INTEL_PT_MTC:
 			break;
 
 		case INTEL_PT_TSC:
 			intel_pt_calc_tsc_timestamp(decoder);
 			break;
 
+		case INTEL_PT_TMA:
+			break;
+
+		case INTEL_PT_CYC:
+			break;
+
 		case INTEL_PT_CBR:
 			decoder->cbr = decoder->packet.payload;
 			break;
@@ -1162,6 +1194,8 @@ next:
 			return intel_pt_bug(decoder);
 
 		case INTEL_PT_PSBEND:
+		case INTEL_PT_VMCS:
+		case INTEL_PT_MNT:
 		case INTEL_PT_PAD:
 			break;
 
@@ -1202,16 +1236,25 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			}
 			break;
 
+		case INTEL_PT_MTC:
+			break;
+
 		case INTEL_PT_TSC:
 			intel_pt_calc_tsc_timestamp(decoder);
 			break;
 
+		case INTEL_PT_TMA:
+			break;
+
+		case INTEL_PT_CYC:
+			break;
+
 		case INTEL_PT_CBR:
 			decoder->cbr = decoder->packet.payload;
 			break;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload;
+			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
 			break;
 
 		case INTEL_PT_MODE_EXEC:
@@ -1222,6 +1265,7 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			intel_pt_update_in_tx(decoder);
 			break;
 
+		case INTEL_PT_TRACESTOP:
 		case INTEL_PT_TNT:
 			intel_pt_log("ERROR: Unexpected packet\n");
 			if (decoder->ip)
@@ -1240,6 +1284,8 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			return 0;
 
 		case INTEL_PT_PSB:
+		case INTEL_PT_VMCS:
+		case INTEL_PT_MNT:
 		case INTEL_PT_PAD:
 		default:
 			break;
@@ -1282,16 +1328,25 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 				intel_pt_set_last_ip(decoder);
 			break;
 
+		case INTEL_PT_MTC:
+			break;
+
 		case INTEL_PT_TSC:
 			intel_pt_calc_tsc_timestamp(decoder);
 			break;
 
+		case INTEL_PT_TMA:
+			break;
+
+		case INTEL_PT_CYC:
+			break;
+
 		case INTEL_PT_CBR:
 			decoder->cbr = decoder->packet.payload;
 			break;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload;
+			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
 			break;
 
 		case INTEL_PT_MODE_EXEC:
@@ -1308,6 +1363,9 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 		case INTEL_PT_BAD: /* Does not happen */
 			return intel_pt_bug(decoder);
 
+		case INTEL_PT_TRACESTOP:
+			break;
+
 		case INTEL_PT_PSB:
 			err = intel_pt_walk_psb(decoder);
 			if (err)
@@ -1321,6 +1379,8 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 
 		case INTEL_PT_TNT:
 		case INTEL_PT_PSBEND:
+		case INTEL_PT_VMCS:
+		case INTEL_PT_MNT:
 		case INTEL_PT_PAD:
 		default:
 			break;
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
index 988c82c6652d..b1257c816310 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
@@ -24,6 +24,8 @@
 
 #define BIT63		((uint64_t)1 << 63)
 
+#define NR_FLAG		BIT63
+
 #if __BYTE_ORDER == __BIG_ENDIAN
 #define le16_to_cpu bswap_16
 #define le32_to_cpu bswap_32
@@ -46,15 +48,21 @@ static const char * const packet_name[] = {
 	[INTEL_PT_TIP_PGD]	= "TIP.PGD",
 	[INTEL_PT_TIP_PGE]	= "TIP.PGE",
 	[INTEL_PT_TSC]		= "TSC",
+	[INTEL_PT_TMA]		= "TMA",
 	[INTEL_PT_MODE_EXEC]	= "MODE.Exec",
 	[INTEL_PT_MODE_TSX]	= "MODE.TSX",
+	[INTEL_PT_MTC]		= "MTC",
 	[INTEL_PT_TIP]		= "TIP",
 	[INTEL_PT_FUP]		= "FUP",
+	[INTEL_PT_CYC]		= "CYC",
+	[INTEL_PT_VMCS]		= "VMCS",
 	[INTEL_PT_PSB]		= "PSB",
 	[INTEL_PT_PSBEND]	= "PSBEND",
 	[INTEL_PT_CBR]		= "CBR",
+	[INTEL_PT_TRACESTOP]	= "TraceSTOP",
 	[INTEL_PT_PIP]		= "PIP",
 	[INTEL_PT_OVF]		= "OVF",
+	[INTEL_PT_MNT]		= "MNT",
 };
 
 const char *intel_pt_pkt_name(enum intel_pt_pkt_type type)
@@ -96,10 +104,18 @@ static int intel_pt_get_pip(const unsigned char *buf, size_t len,
 	packet->type = INTEL_PT_PIP;
 	memcpy_le64(&payload, buf + 2, 6);
 	packet->payload = payload >> 1;
+	if (payload & 1)
+		packet->payload |= NR_FLAG;
 
 	return 8;
 }
 
+static int intel_pt_get_tracestop(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_TRACESTOP;
+	return 2;
+}
+
 static int intel_pt_get_cbr(const unsigned char *buf, size_t len,
 			    struct intel_pt_pkt *packet)
 {
@@ -110,6 +126,24 @@ static int intel_pt_get_cbr(const unsigned char *buf, size_t len,
 	return 4;
 }
 
+static int intel_pt_get_vmcs(const unsigned char *buf, size_t len,
+			     struct intel_pt_pkt *packet)
+{
+	unsigned int count = (52 - 5) >> 3;
+
+	if (count < 1 || count > 7)
+		return INTEL_PT_BAD_PACKET;
+
+	if (len < count + 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	packet->type = INTEL_PT_VMCS;
+	packet->count = count;
+	memcpy_le64(&packet->payload, buf + 2, count);
+
+	return count + 2;
+}
+
 static int intel_pt_get_ovf(struct intel_pt_pkt *packet)
 {
 	packet->type = INTEL_PT_OVF;
@@ -139,12 +173,49 @@ static int intel_pt_get_psbend(struct intel_pt_pkt *packet)
 	return 2;
 }
 
+static int intel_pt_get_tma(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 7)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	packet->type = INTEL_PT_TMA;
+	packet->payload = buf[2] | (buf[3] << 8);
+	packet->count = buf[5] | ((buf[6] & BIT(0)) << 8);
+	return 7;
+}
+
 static int intel_pt_get_pad(struct intel_pt_pkt *packet)
 {
 	packet->type = INTEL_PT_PAD;
 	return 1;
 }
 
+static int intel_pt_get_mnt(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 11)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_MNT;
+	memcpy_le64(&packet->payload, buf + 3, 8);
+	return 11
+;
+}
+
+static int intel_pt_get_3byte(const unsigned char *buf, size_t len,
+			      struct intel_pt_pkt *packet)
+{
+	if (len < 3)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	switch (buf[2]) {
+	case 0x88: /* MNT */
+		return intel_pt_get_mnt(buf, len, packet);
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+}
+
 static int intel_pt_get_ext(const unsigned char *buf, size_t len,
 			    struct intel_pt_pkt *packet)
 {
@@ -156,14 +227,22 @@ static int intel_pt_get_ext(const unsigned char *buf, size_t len,
 		return intel_pt_get_long_tnt(buf, len, packet);
 	case 0x43: /* PIP */
 		return intel_pt_get_pip(buf, len, packet);
+	case 0x83: /* TraceStop */
+		return intel_pt_get_tracestop(packet);
 	case 0x03: /* CBR */
 		return intel_pt_get_cbr(buf, len, packet);
+	case 0xc8: /* VMCS */
+		return intel_pt_get_vmcs(buf, len, packet);
 	case 0xf3: /* OVF */
 		return intel_pt_get_ovf(packet);
 	case 0x82: /* PSB */
 		return intel_pt_get_psb(buf, len, packet);
 	case 0x23: /* PSBEND */
 		return intel_pt_get_psbend(packet);
+	case 0x73: /* TMA */
+		return intel_pt_get_tma(buf, len, packet);
+	case 0xC3: /* 3-byte header */
+		return intel_pt_get_3byte(buf, len, packet);
 	default:
 		return INTEL_PT_BAD_PACKET;
 	}
@@ -187,6 +266,28 @@ static int intel_pt_get_short_tnt(unsigned int byte,
 	return 1;
 }
 
+static int intel_pt_get_cyc(unsigned int byte, const unsigned char *buf,
+			    size_t len, struct intel_pt_pkt *packet)
+{
+	unsigned int offs = 1, shift;
+	uint64_t payload = byte >> 3;
+
+	byte >>= 2;
+	len -= 1;
+	for (shift = 5; byte & 1; shift += 7) {
+		if (offs > 9)
+			return INTEL_PT_BAD_PACKET;
+		if (len < offs)
+			return INTEL_PT_NEED_MORE_BYTES;
+		byte = buf[offs++];
+		payload |= (byte >> 1) << shift;
+	}
+
+	packet->type = INTEL_PT_CYC;
+	packet->payload = payload;
+	return offs;
+}
+
 static int intel_pt_get_ip(enum intel_pt_pkt_type type, unsigned int byte,
 			   const unsigned char *buf, size_t len,
 			   struct intel_pt_pkt *packet)
@@ -269,6 +370,16 @@ static int intel_pt_get_tsc(const unsigned char *buf, size_t len,
 	return 8;
 }
 
+static int intel_pt_get_mtc(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_MTC;
+	packet->payload = buf[1];
+	return 2;
+}
+
 static int intel_pt_do_get_packet(const unsigned char *buf, size_t len,
 				  struct intel_pt_pkt *packet)
 {
@@ -288,6 +399,9 @@ static int intel_pt_do_get_packet(const unsigned char *buf, size_t len,
 		return intel_pt_get_short_tnt(byte, packet);
 	}
 
+	if ((byte & 2))
+		return intel_pt_get_cyc(byte, buf, len, packet);
+
 	switch (byte & 0x1f) {
 	case 0x0D:
 		return intel_pt_get_ip(INTEL_PT_TIP, byte, buf, len, packet);
@@ -305,6 +419,8 @@ static int intel_pt_do_get_packet(const unsigned char *buf, size_t len,
 			return intel_pt_get_mode(buf, len, packet);
 		case 0x19:
 			return intel_pt_get_tsc(buf, len, packet);
+		case 0x59:
+			return intel_pt_get_mtc(buf, len, packet);
 		default:
 			return INTEL_PT_BAD_PACKET;
 		}
@@ -329,7 +445,7 @@ int intel_pt_get_packet(const unsigned char *buf, size_t len,
 int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf,
 		      size_t buf_len)
 {
-	int ret, i;
+	int ret, i, nr;
 	unsigned long long payload = packet->payload;
 	const char *name = intel_pt_pkt_name(packet->type);
 
@@ -338,6 +454,7 @@ int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf,
 	case INTEL_PT_PAD:
 	case INTEL_PT_PSB:
 	case INTEL_PT_PSBEND:
+	case INTEL_PT_TRACESTOP:
 	case INTEL_PT_OVF:
 		return snprintf(buf, buf_len, "%s", name);
 	case INTEL_PT_TNT: {
@@ -371,17 +488,16 @@ int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf,
 	case INTEL_PT_FUP:
 		if (!(packet->count))
 			return snprintf(buf, buf_len, "%s no ip", name);
+	case INTEL_PT_CYC:
+	case INTEL_PT_VMCS:
+	case INTEL_PT_MTC:
+	case INTEL_PT_MNT:
 	case INTEL_PT_CBR:
-		return snprintf(buf, buf_len, "%s 0x%llx", name, payload);
 	case INTEL_PT_TSC:
-		if (packet->count)
-			return snprintf(buf, buf_len,
-					"%s 0x%llx CTC 0x%x FC 0x%x",
-					name, payload, packet->count & 0xffff,
-					(packet->count >> 16) & 0x1ff);
-		else
-			return snprintf(buf, buf_len, "%s 0x%llx",
-					name, payload);
+		return snprintf(buf, buf_len, "%s 0x%llx", name, payload);
+	case INTEL_PT_TMA:
+		return snprintf(buf, buf_len, "%s CTC 0x%x FC 0x%x", name,
+				(unsigned)payload, packet->count);
 	case INTEL_PT_MODE_EXEC:
 		return snprintf(buf, buf_len, "%s %lld", name, payload);
 	case INTEL_PT_MODE_TSX:
@@ -389,8 +505,10 @@ int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf,
 				name, (unsigned)(payload >> 1) & 1,
 				(unsigned)payload & 1);
 	case INTEL_PT_PIP:
-		ret = snprintf(buf, buf_len, "%s 0x%llx",
-			       name, payload);
+		nr = packet->payload & NR_FLAG ? 1 : 0;
+		payload &= ~NR_FLAG;
+		ret = snprintf(buf, buf_len, "%s 0x%llx (NR=%d)",
+			       name, payload, nr);
 		return ret;
 	default:
 		break;
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
index 53404fa942b3..781bb79883bd 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
@@ -37,15 +37,21 @@ enum intel_pt_pkt_type {
 	INTEL_PT_TIP_PGD,
 	INTEL_PT_TIP_PGE,
 	INTEL_PT_TSC,
+	INTEL_PT_TMA,
 	INTEL_PT_MODE_EXEC,
 	INTEL_PT_MODE_TSX,
+	INTEL_PT_MTC,
 	INTEL_PT_TIP,
 	INTEL_PT_FUP,
+	INTEL_PT_CYC,
+	INTEL_PT_VMCS,
 	INTEL_PT_PSB,
 	INTEL_PT_PSBEND,
 	INTEL_PT_CBR,
+	INTEL_PT_TRACESTOP,
 	INTEL_PT_PIP,
 	INTEL_PT_OVF,
+	INTEL_PT_MNT,
 };
 
 struct intel_pt_pkt {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 19/25] perf tools: Pass Intel PT information for decoding MTC and CYC
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (17 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 18/25] perf tools: Add new Intel PT packet definitions Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-28  6:38   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 20/25] perf tools: Add Intel PT support for decoding MTC packets Adrian Hunter
                   ` (5 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Record additional information in the AUXTRACE_INFO
event in preparation for decoding MTC and CYC packets.
Pass the information to the decoder.

The AUXTRACE_INFO record can be extended by using the
size to indicate the presence of new members.

The additional information includes PMU config bit
positions and the TSC to CTC (hardware crystal clock)
ratio needed to decode MTC packets.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/arch/x86/util/intel-pt.c                | 24 ++++++++-
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |  3 ++
 tools/perf/util/intel-pt.c                         | 62 ++++++++++++++++++----
 tools/perf/util/intel-pt.h                         |  5 ++
 4 files changed, 83 insertions(+), 11 deletions(-)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index 145975b003a7..faae9289bcf6 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -18,6 +18,7 @@
 #include <linux/types.h>
 #include <linux/bitops.h>
 #include <linux/log2.h>
+#include <cpuid.h>
 
 #include "../../perf.h"
 #include "../../util/session.h"
@@ -261,6 +262,15 @@ static size_t intel_pt_info_priv_size(struct auxtrace_record *itr __maybe_unused
 	return INTEL_PT_AUXTRACE_PRIV_SIZE;
 }
 
+static void intel_pt_tsc_ctc_ratio(u32 *n, u32 *d)
+{
+	unsigned int eax = 0, ebx = 0, ecx = 0, edx = 0;
+
+	__get_cpuid(0x15, &eax, &ebx, &ecx, &edx);
+	*n = ebx;
+	*d = eax;
+}
+
 static int intel_pt_info_fill(struct auxtrace_record *itr,
 			      struct perf_session *session,
 			      struct auxtrace_info_event *auxtrace_info,
@@ -272,7 +282,8 @@ static int intel_pt_info_fill(struct auxtrace_record *itr,
 	struct perf_event_mmap_page *pc;
 	struct perf_tsc_conversion tc = { .time_mult = 0, };
 	bool cap_user_time_zero = false, per_cpu_mmaps;
-	u64 tsc_bit, noretcomp_bit;
+	u64 tsc_bit, mtc_bit, mtc_freq_bits, cyc_bit, noretcomp_bit;
+	u32 tsc_ctc_ratio_n, tsc_ctc_ratio_d;
 	int err;
 
 	if (priv_size != INTEL_PT_AUXTRACE_PRIV_SIZE)
@@ -281,6 +292,12 @@ static int intel_pt_info_fill(struct auxtrace_record *itr,
 	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
 	intel_pt_parse_terms(&intel_pt_pmu->format, "noretcomp",
 			     &noretcomp_bit);
+	intel_pt_parse_terms(&intel_pt_pmu->format, "mtc", &mtc_bit);
+	mtc_freq_bits = perf_pmu__format_bits(&intel_pt_pmu->format,
+					      "mtc_period");
+	intel_pt_parse_terms(&intel_pt_pmu->format, "cyc", &cyc_bit);
+
+	intel_pt_tsc_ctc_ratio(&tsc_ctc_ratio_n, &tsc_ctc_ratio_d);
 
 	if (!session->evlist->nr_mmaps)
 		return -EINVAL;
@@ -311,6 +328,11 @@ static int intel_pt_info_fill(struct auxtrace_record *itr,
 	auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH] = ptr->have_sched_switch;
 	auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE] = ptr->snapshot_mode;
 	auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS] = per_cpu_mmaps;
+	auxtrace_info->priv[INTEL_PT_MTC_BIT] = mtc_bit;
+	auxtrace_info->priv[INTEL_PT_MTC_FREQ_BITS] = mtc_freq_bits;
+	auxtrace_info->priv[INTEL_PT_TSC_CTC_N] = tsc_ctc_ratio_n;
+	auxtrace_info->priv[INTEL_PT_TSC_CTC_D] = tsc_ctc_ratio_d;
+	auxtrace_info->priv[INTEL_PT_CYC_BIT] = cyc_bit;
 
 	return 0;
 }
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
index cbf57044c385..56cc47baca11 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -87,6 +87,9 @@ struct intel_pt_params {
 	uint64_t period;
 	enum intel_pt_period_type period_type;
 	unsigned max_non_turbo_ratio;
+	unsigned int mtc_period;
+	uint32_t tsc_ctc_ratio_n;
+	uint32_t tsc_ctc_ratio_d;
 };
 
 struct intel_pt_decoder;
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 64c8352d2a2b..4bae958096d4 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -91,6 +91,11 @@ struct intel_pt {
 	bool synth_needs_swap;
 
 	u64 tsc_bit;
+	u64 mtc_bit;
+	u64 mtc_freq_bits;
+	u32 tsc_ctc_ratio_n;
+	u32 tsc_ctc_ratio_d;
+	u64 cyc_bit;
 	u64 noretcomp_bit;
 	unsigned max_non_turbo_ratio;
 };
@@ -568,6 +573,25 @@ static bool intel_pt_return_compression(struct intel_pt *pt)
 	return true;
 }
 
+static unsigned int intel_pt_mtc_period(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	unsigned int shift;
+	u64 config;
+
+	if (!pt->mtc_freq_bits)
+		return 0;
+
+	for (shift = 0, config = pt->mtc_freq_bits; !(config & 1); shift++)
+		config >>= 1;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, &config))
+			return (config & pt->mtc_freq_bits) >> shift;
+	}
+	return 0;
+}
+
 static bool intel_pt_timeless_decoding(struct intel_pt *pt)
 {
 	struct perf_evsel *evsel;
@@ -668,6 +692,9 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
 	params.data = ptq;
 	params.return_compression = intel_pt_return_compression(pt);
 	params.max_non_turbo_ratio = pt->max_non_turbo_ratio;
+	params.mtc_period = intel_pt_mtc_period(pt);
+	params.tsc_ctc_ratio_n = pt->tsc_ctc_ratio_n;
+	params.tsc_ctc_ratio_d = pt->tsc_ctc_ratio_d;
 
 	if (pt->synth_opts.instructions) {
 		if (pt->synth_opts.period) {
@@ -1751,16 +1778,20 @@ static struct perf_evsel *intel_pt_find_sched_switch(struct perf_evlist *evlist)
 }
 
 static const char * const intel_pt_info_fmts[] = {
-	[INTEL_PT_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
-	[INTEL_PT_TIME_SHIFT]		= "  Time Shift         %"PRIu64"\n",
-	[INTEL_PT_TIME_MULT]		= "  Time Muliplier     %"PRIu64"\n",
-	[INTEL_PT_TIME_ZERO]		= "  Time Zero          %"PRIu64"\n",
-	[INTEL_PT_CAP_USER_TIME_ZERO]	= "  Cap Time Zero      %"PRId64"\n",
-	[INTEL_PT_TSC_BIT]		= "  TSC bit            %#"PRIx64"\n",
-	[INTEL_PT_NORETCOMP_BIT]	= "  NoRETComp bit      %#"PRIx64"\n",
-	[INTEL_PT_HAVE_SCHED_SWITCH]	= "  Have sched_switch  %"PRId64"\n",
-	[INTEL_PT_SNAPSHOT_MODE]	= "  Snapshot mode      %"PRId64"\n",
-	[INTEL_PT_PER_CPU_MMAPS]	= "  Per-cpu maps       %"PRId64"\n",
+	[INTEL_PT_PMU_TYPE]		= "  PMU Type            %"PRId64"\n",
+	[INTEL_PT_TIME_SHIFT]		= "  Time Shift          %"PRIu64"\n",
+	[INTEL_PT_TIME_MULT]		= "  Time Muliplier      %"PRIu64"\n",
+	[INTEL_PT_TIME_ZERO]		= "  Time Zero           %"PRIu64"\n",
+	[INTEL_PT_CAP_USER_TIME_ZERO]	= "  Cap Time Zero       %"PRId64"\n",
+	[INTEL_PT_TSC_BIT]		= "  TSC bit             %#"PRIx64"\n",
+	[INTEL_PT_NORETCOMP_BIT]	= "  NoRETComp bit       %#"PRIx64"\n",
+	[INTEL_PT_HAVE_SCHED_SWITCH]	= "  Have sched_switch   %"PRId64"\n",
+	[INTEL_PT_SNAPSHOT_MODE]	= "  Snapshot mode       %"PRId64"\n",
+	[INTEL_PT_PER_CPU_MMAPS]	= "  Per-cpu maps        %"PRId64"\n",
+	[INTEL_PT_MTC_BIT]		= "  MTC bit             %#"PRIx64"\n",
+	[INTEL_PT_TSC_CTC_N]		= "  TSC:CTC numerator   %"PRIu64"\n",
+	[INTEL_PT_TSC_CTC_D]		= "  TSC:CTC denominator %"PRIu64"\n",
+	[INTEL_PT_CYC_BIT]		= "  CYC bit             %#"PRIx64"\n",
 };
 
 static void intel_pt_print_info(u64 *arr, int start, int finish)
@@ -1812,6 +1843,17 @@ int intel_pt_process_auxtrace_info(union perf_event *event,
 	intel_pt_print_info(&auxtrace_info->priv[0], INTEL_PT_PMU_TYPE,
 			    INTEL_PT_PER_CPU_MMAPS);
 
+	if (auxtrace_info->header.size >= sizeof(struct auxtrace_info_event) +
+					(sizeof(u64) * INTEL_PT_CYC_BIT)) {
+		pt->mtc_bit = auxtrace_info->priv[INTEL_PT_MTC_BIT];
+		pt->mtc_freq_bits = auxtrace_info->priv[INTEL_PT_MTC_FREQ_BITS];
+		pt->tsc_ctc_ratio_n = auxtrace_info->priv[INTEL_PT_TSC_CTC_N];
+		pt->tsc_ctc_ratio_d = auxtrace_info->priv[INTEL_PT_TSC_CTC_D];
+		pt->cyc_bit = auxtrace_info->priv[INTEL_PT_CYC_BIT];
+		intel_pt_print_info(&auxtrace_info->priv[0], INTEL_PT_MTC_BIT,
+				    INTEL_PT_CYC_BIT);
+	}
+
 	pt->timeless_decoding = intel_pt_timeless_decoding(pt);
 	pt->have_tsc = intel_pt_have_tsc(pt);
 	pt->sampling_mode = false;
diff --git a/tools/perf/util/intel-pt.h b/tools/perf/util/intel-pt.h
index a1bfe93473ba..0065949df693 100644
--- a/tools/perf/util/intel-pt.h
+++ b/tools/perf/util/intel-pt.h
@@ -29,6 +29,11 @@ enum {
 	INTEL_PT_HAVE_SCHED_SWITCH,
 	INTEL_PT_SNAPSHOT_MODE,
 	INTEL_PT_PER_CPU_MMAPS,
+	INTEL_PT_MTC_BIT,
+	INTEL_PT_MTC_FREQ_BITS,
+	INTEL_PT_TSC_CTC_N,
+	INTEL_PT_TSC_CTC_D,
+	INTEL_PT_CYC_BIT,
 	INTEL_PT_AUXTRACE_PRIV_MAX,
 };
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 20/25] perf tools: Add Intel PT support for decoding MTC packets
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (18 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 19/25] perf tools: Pass Intel PT information for decoding MTC and CYC Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-28  6:39   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 21/25] perf tools: Add Intel PT support for using " Adrian Hunter
                   ` (4 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

MTC packets provide finer grain timestamp information
than TSC packets.  MTC packets record time using the
hardware crystal clock (CTC) which is related to TSC
packets using a TMA packet.

This patch just adds decoder support.

Support for a default value and validation of values
is provided by a later patch. Also documentation is
updated in a separate patch.

For details refer to the June 2015 or later Intel 64 and
IA-32 Architectures SDM Chapter 36 Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 162 ++++++++++++++++++++-
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |   1 +
 2 files changed, 159 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 4a0e9fb1d173..f7119a11a4b6 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -85,7 +85,9 @@ struct intel_pt_decoder {
 	const unsigned char *buf;
 	size_t len;
 	bool return_compression;
+	bool mtc_insn;
 	bool pge;
+	bool have_tma;
 	uint64_t pos;
 	uint64_t last_ip;
 	uint64_t ip;
@@ -94,6 +96,15 @@ struct intel_pt_decoder {
 	uint64_t tsc_timestamp;
 	uint64_t ref_timestamp;
 	uint64_t ret_addr;
+	uint64_t ctc_timestamp;
+	uint64_t ctc_delta;
+	uint32_t last_mtc;
+	uint32_t tsc_ctc_ratio_n;
+	uint32_t tsc_ctc_ratio_d;
+	uint32_t tsc_ctc_mult;
+	uint32_t tsc_slip;
+	uint32_t ctc_rem_mask;
+	int mtc_shift;
 	struct intel_pt_stack stack;
 	enum intel_pt_pkt_state pkt_state;
 	struct intel_pt_pkt packet;
@@ -149,6 +160,13 @@ static void intel_pt_setup_period(struct intel_pt_decoder *decoder)
 	}
 }
 
+static uint64_t multdiv(uint64_t t, uint32_t n, uint32_t d)
+{
+	if (!d)
+		return 0;
+	return (t / d) * n + ((t % d) * n) / d;
+}
+
 struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
 {
 	struct intel_pt_decoder *decoder;
@@ -175,6 +193,39 @@ struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
 
 	intel_pt_setup_period(decoder);
 
+	decoder->mtc_shift = params->mtc_period;
+	decoder->ctc_rem_mask = (1 << decoder->mtc_shift) - 1;
+
+	decoder->tsc_ctc_ratio_n = params->tsc_ctc_ratio_n;
+	decoder->tsc_ctc_ratio_d = params->tsc_ctc_ratio_d;
+
+	if (!decoder->tsc_ctc_ratio_n)
+		decoder->tsc_ctc_ratio_d = 0;
+
+	if (decoder->tsc_ctc_ratio_d) {
+		if (!(decoder->tsc_ctc_ratio_n % decoder->tsc_ctc_ratio_d))
+			decoder->tsc_ctc_mult = decoder->tsc_ctc_ratio_n /
+						decoder->tsc_ctc_ratio_d;
+
+		/*
+		 * Allow for timestamps appearing to backwards because a TSC
+		 * packet has slipped past a MTC packet, so allow 2 MTC ticks
+		 * or ...
+		 */
+		decoder->tsc_slip = multdiv(2 << decoder->mtc_shift,
+					decoder->tsc_ctc_ratio_n,
+					decoder->tsc_ctc_ratio_d);
+	}
+	/* ... or 0x100 paranoia */
+	if (decoder->tsc_slip < 0x100)
+		decoder->tsc_slip = 0x100;
+
+	intel_pt_log("timestamp: mtc_shift %u\n", decoder->mtc_shift);
+	intel_pt_log("timestamp: tsc_ctc_ratio_n %u\n", decoder->tsc_ctc_ratio_n);
+	intel_pt_log("timestamp: tsc_ctc_ratio_d %u\n", decoder->tsc_ctc_ratio_d);
+	intel_pt_log("timestamp: tsc_ctc_mult %u\n", decoder->tsc_ctc_mult);
+	intel_pt_log("timestamp: tsc_slip %#x\n", decoder->tsc_slip);
+
 	return decoder;
 }
 
@@ -368,6 +419,7 @@ static inline void intel_pt_update_in_tx(struct intel_pt_decoder *decoder)
 static int intel_pt_bad_packet(struct intel_pt_decoder *decoder)
 {
 	intel_pt_clear_tx_flags(decoder);
+	decoder->have_tma = false;
 	decoder->pkt_len = 1;
 	decoder->pkt_step = 1;
 	intel_pt_decoder_log_packet(decoder);
@@ -400,6 +452,7 @@ static int intel_pt_get_data(struct intel_pt_decoder *decoder)
 		decoder->pkt_state = INTEL_PT_STATE_NO_PSB;
 		decoder->ref_timestamp = buffer.ref_timestamp;
 		decoder->timestamp = 0;
+		decoder->have_tma = false;
 		decoder->state.trace_nr = buffer.trace_nr;
 		intel_pt_log("Reference timestamp 0x%" PRIx64 "\n",
 			     decoder->ref_timestamp);
@@ -523,6 +576,7 @@ static uint64_t intel_pt_next_sample(struct intel_pt_decoder *decoder)
 	case INTEL_PT_PERIOD_TICKS:
 		return intel_pt_next_period(decoder);
 	case INTEL_PT_PERIOD_NONE:
+	case INTEL_PT_PERIOD_MTC:
 	default:
 		return 0;
 	}
@@ -542,6 +596,7 @@ static void intel_pt_sample_insn(struct intel_pt_decoder *decoder)
 		decoder->last_masked_timestamp = masked_timestamp;
 		break;
 	case INTEL_PT_PERIOD_NONE:
+	case INTEL_PT_PERIOD_MTC:
 	default:
 		break;
 	}
@@ -555,6 +610,9 @@ static int intel_pt_walk_insn(struct intel_pt_decoder *decoder,
 	uint64_t max_insn_cnt, insn_cnt = 0;
 	int err;
 
+	if (!decoder->mtc_insn)
+		decoder->mtc_insn = true;
+
 	max_insn_cnt = intel_pt_next_sample(decoder);
 
 	err = decoder->walk_insn(intel_pt_insn, &insn_cnt, &decoder->ip, ip,
@@ -861,6 +919,8 @@ static void intel_pt_calc_tsc_timestamp(struct intel_pt_decoder *decoder)
 {
 	uint64_t timestamp;
 
+	decoder->have_tma = false;
+
 	if (decoder->ref_timestamp) {
 		timestamp = decoder->packet.payload |
 			    (decoder->ref_timestamp & (0xffULL << 56));
@@ -878,17 +938,18 @@ static void intel_pt_calc_tsc_timestamp(struct intel_pt_decoder *decoder)
 	} else if (decoder->timestamp) {
 		timestamp = decoder->packet.payload |
 			    (decoder->timestamp & (0xffULL << 56));
+		decoder->tsc_timestamp = timestamp;
 		if (timestamp < decoder->timestamp &&
-		    decoder->timestamp - timestamp < 0x100) {
-			intel_pt_log_to("ERROR: Suppressing backwards timestamp",
+		    decoder->timestamp - timestamp < decoder->tsc_slip) {
+			intel_pt_log_to("Suppressing backwards timestamp",
 					timestamp);
 			timestamp = decoder->timestamp;
 		}
 		while (timestamp < decoder->timestamp) {
 			intel_pt_log_to("Wraparound timestamp", timestamp);
 			timestamp += (1ULL << 56);
+			decoder->tsc_timestamp = timestamp;
 		}
-		decoder->tsc_timestamp = timestamp;
 		decoder->timestamp = timestamp;
 		decoder->timestamp_insn_cnt = 0;
 	}
@@ -900,11 +961,73 @@ static int intel_pt_overflow(struct intel_pt_decoder *decoder)
 {
 	intel_pt_log("ERROR: Buffer overflow\n");
 	intel_pt_clear_tx_flags(decoder);
+	decoder->have_tma = false;
 	decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
 	decoder->overflow = true;
 	return -EOVERFLOW;
 }
 
+static void intel_pt_calc_tma(struct intel_pt_decoder *decoder)
+{
+	uint32_t ctc = decoder->packet.payload;
+	uint32_t fc = decoder->packet.count;
+	uint32_t ctc_rem = ctc & decoder->ctc_rem_mask;
+
+	if (!decoder->tsc_ctc_ratio_d)
+		return;
+
+	decoder->last_mtc = (ctc >> decoder->mtc_shift) & 0xff;
+	decoder->ctc_timestamp = decoder->tsc_timestamp - fc;
+	if (decoder->tsc_ctc_mult) {
+		decoder->ctc_timestamp -= ctc_rem * decoder->tsc_ctc_mult;
+	} else {
+		decoder->ctc_timestamp -= multdiv(ctc_rem,
+						  decoder->tsc_ctc_ratio_n,
+						  decoder->tsc_ctc_ratio_d);
+	}
+	decoder->ctc_delta = 0;
+	decoder->have_tma = true;
+	intel_pt_log("CTC timestamp " x64_fmt " last MTC %#x  CTC rem %#x\n",
+		     decoder->ctc_timestamp, decoder->last_mtc, ctc_rem);
+}
+
+static void intel_pt_calc_mtc_timestamp(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp;
+	uint32_t mtc, mtc_delta;
+
+	if (!decoder->have_tma)
+		return;
+
+	mtc = decoder->packet.payload;
+
+	if (mtc > decoder->last_mtc)
+		mtc_delta = mtc - decoder->last_mtc;
+	else
+		mtc_delta = mtc + 256 - decoder->last_mtc;
+
+	decoder->ctc_delta += mtc_delta << decoder->mtc_shift;
+
+	if (decoder->tsc_ctc_mult) {
+		timestamp = decoder->ctc_timestamp +
+			    decoder->ctc_delta * decoder->tsc_ctc_mult;
+	} else {
+		timestamp = decoder->ctc_timestamp +
+			    multdiv(decoder->ctc_delta,
+				    decoder->tsc_ctc_ratio_n,
+				    decoder->tsc_ctc_ratio_d);
+	}
+
+	if (timestamp < decoder->timestamp)
+		intel_pt_log("Suppressing MTC timestamp " x64_fmt " less than current timestamp " x64_fmt "\n",
+			     timestamp, decoder->timestamp);
+	else
+		decoder->timestamp = timestamp;
+
+	decoder->timestamp_insn_cnt = 0;
+	decoder->last_mtc = mtc;
+}
+
 /* Walk PSB+ packets when already in sync. */
 static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 {
@@ -926,6 +1049,7 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 		case INTEL_PT_TRACESTOP:
 		case INTEL_PT_BAD:
 		case INTEL_PT_PSB:
+			decoder->have_tma = false;
 			intel_pt_log("ERROR: Unexpected packet\n");
 			return -EAGAIN;
 
@@ -937,6 +1061,7 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_TMA:
+			intel_pt_calc_tma(decoder);
 			break;
 
 		case INTEL_PT_CBR:
@@ -961,6 +1086,9 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_MTC:
+			intel_pt_calc_mtc_timestamp(decoder);
+			if (decoder->period_type == INTEL_PT_PERIOD_MTC)
+				decoder->state.type |= INTEL_PT_INSTRUCTION;
 			break;
 
 		case INTEL_PT_CYC:
@@ -1048,6 +1176,9 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_MTC:
+			intel_pt_calc_mtc_timestamp(decoder);
+			if (decoder->period_type == INTEL_PT_PERIOD_MTC)
+				decoder->state.type |= INTEL_PT_INSTRUCTION;
 			break;
 
 		case INTEL_PT_CYC:
@@ -1159,13 +1290,31 @@ next:
 			break;
 
 		case INTEL_PT_MTC:
-			break;
+			intel_pt_calc_mtc_timestamp(decoder);
+			if (decoder->period_type != INTEL_PT_PERIOD_MTC)
+				break;
+			/*
+			 * Ensure that there has been an instruction since the
+			 * last MTC.
+			 */
+			if (!decoder->mtc_insn)
+				break;
+			decoder->mtc_insn = false;
+			/* Ensure that there is a timestamp */
+			if (!decoder->timestamp)
+				break;
+			decoder->state.type = INTEL_PT_INSTRUCTION;
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = 0;
+			decoder->mtc_insn = false;
+			return 0;
 
 		case INTEL_PT_TSC:
 			intel_pt_calc_tsc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_TMA:
+			intel_pt_calc_tma(decoder);
 			break;
 
 		case INTEL_PT_CYC:
@@ -1237,6 +1386,7 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_MTC:
+			intel_pt_calc_mtc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_TSC:
@@ -1244,6 +1394,7 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_TMA:
+			intel_pt_calc_tma(decoder);
 			break;
 
 		case INTEL_PT_CYC:
@@ -1267,6 +1418,7 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 
 		case INTEL_PT_TRACESTOP:
 		case INTEL_PT_TNT:
+			decoder->have_tma = false;
 			intel_pt_log("ERROR: Unexpected packet\n");
 			if (decoder->ip)
 				decoder->pkt_state = INTEL_PT_STATE_ERR4;
@@ -1329,6 +1481,7 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_MTC:
+			intel_pt_calc_mtc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_TSC:
@@ -1336,6 +1489,7 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_TMA:
+			intel_pt_calc_tma(decoder);
 			break;
 
 		case INTEL_PT_CYC:
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
index 56cc47baca11..02c38fec1c37 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -36,6 +36,7 @@ enum intel_pt_period_type {
 	INTEL_PT_PERIOD_NONE,
 	INTEL_PT_PERIOD_INSTRUCTIONS,
 	INTEL_PT_PERIOD_TICKS,
+	INTEL_PT_PERIOD_MTC,
 };
 
 enum {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 21/25] perf tools: Add Intel PT support for using MTC packets
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (19 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 20/25] perf tools: Add Intel PT support for decoding MTC packets Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-28  6:39   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 22/25] perf tools: Add Intel PT support for decoding CYC packets Adrian Hunter
                   ` (3 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

MTC packets are a new Intel PT feature.

MTC packets provide finer grain timestamp information
than TSC packets.

Support for this feature is indicated by
/sys/bus/event_source/devices/intel_pt/caps/mtc
which contains "1" if the feature is supported and
"0" otherwise.

MTC packets can be requested using a PMU config term
e.g. perf record -e intel_pt/mtc/u sleep 1

The frequency of MTC packets can also be specified.
e.g. perf record -e intel_pt/mtc,mtc_period=2/u sleep 1

The default value is 3 or the nearest lower value
that is supported.  0 is always supported.

Valid values are given by:
/sys/bus/event_source/devices/intel_pt/caps/mtc_periods
which contains a hexadecimal value, the bits of which
represent valid values e.g. bit 2 set means value 2
is valid.

The value is converted to the MTC frequency as:

	CTC-frequency / (2 ^ value)

e.g. value 3 means one eighth of CTC-frequency

Where CTC is the hardware crystal clock, the frequency of
which can be related to TSC via values provided in cpuid
leaf 0x15.

If an invalid value is entered, the error message
will give a list of valid values e.g.

	$ perf record -e intel_pt/mtc_period=15/u uname
	Invalid mtc_period for intel_pt. Valid values are: 0,3,6,9

tools/perf/Documentation/intel-pt.txt is updated in
a later patch as there are a number of new features
being added.

For more information refer to the June 2015 or later Intel 64 and
IA-32 Architectures SDM Chapter 36 Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/arch/x86/util/intel-pt.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index faae9289bcf6..a5de01dad868 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -190,17 +190,33 @@ static int intel_pt_pick_bit(int bits, int target)
 static u64 intel_pt_default_config(struct perf_pmu *intel_pt_pmu)
 {
 	char buf[256];
+	int mtc, mtc_periods = 0, mtc_period;
 	int psb_cyc, psb_periods, psb_period;
 	int pos = 0;
 	u64 config;
 
 	pos += scnprintf(buf + pos, sizeof(buf) - pos, "tsc");
 
+	if (perf_pmu__scan_file(intel_pt_pmu, "caps/mtc", "%d",
+				&mtc) != 1)
+		mtc = 1;
+
+	if (mtc) {
+		if (perf_pmu__scan_file(intel_pt_pmu, "caps/mtc_periods", "%x",
+					&mtc_periods) != 1)
+			mtc_periods = 0;
+		if (mtc_periods) {
+			mtc_period = intel_pt_pick_bit(mtc_periods, 3);
+			pos += scnprintf(buf + pos, sizeof(buf) - pos,
+					 ",mtc,mtc_period=%d", mtc_period);
+		}
+	}
+
 	if (perf_pmu__scan_file(intel_pt_pmu, "caps/psb_cyc", "%d",
 				&psb_cyc) != 1)
 		psb_cyc = 1;
 
-	if (psb_cyc) {
+	if (psb_cyc && mtc_periods) {
 		if (perf_pmu__scan_file(intel_pt_pmu, "caps/psb_periods", "%x",
 					&psb_periods) != 1)
 			psb_periods = 0;
@@ -454,9 +470,17 @@ out_err:
 static int intel_pt_validate_config(struct perf_pmu *intel_pt_pmu,
 				    struct perf_evsel *evsel)
 {
+	int err;
+
 	if (!evsel)
 		return 0;
 
+	err = intel_pt_val_config_term(intel_pt_pmu, "caps/mtc_periods",
+				       "mtc_period", "caps/mtc",
+				       evsel->attr.config);
+	if (err)
+		return err;
+
 	return intel_pt_val_config_term(intel_pt_pmu, "caps/psb_periods",
 					"psb_period", "caps/psb_cyc",
 					evsel->attr.config);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 22/25] perf tools: Add Intel PT support for decoding CYC packets
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (20 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 21/25] perf tools: Add Intel PT support for using " Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-28  6:39   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 23/25] perf tools: Add Intel PT support for using " Adrian Hunter
                   ` (2 subsequent siblings)
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

CYC packets provide even finer grain timestamp information
than MTC and TSC packets.  A CYC packet contains the number
of CPU cycles since the last CYC packet.

This patch just adds decoder support.  The CPU frequency can
be related to TSC using the Maximum Non-Turbo Ratio in
combination with the CBR (core-to-bus ratio) packet.
However more accuracy is achieved by simply interpolating the
number of cycles between other timing packets like MTC or
TSC.  This patch takes the latter approach.

Support for a default value and validation of values
is provided by a later patch. Also documentation is
updated in a separate patch.

For details refer to the June 2015 or later Intel 64 and
IA-32 Architectures SDM Chapter 36 Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 311 ++++++++++++++++++++-
 1 file changed, 306 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index f7119a11a4b6..0845c5e6ad1d 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -88,6 +88,7 @@ struct intel_pt_decoder {
 	bool mtc_insn;
 	bool pge;
 	bool have_tma;
+	bool have_cyc;
 	uint64_t pos;
 	uint64_t last_ip;
 	uint64_t ip;
@@ -98,6 +99,8 @@ struct intel_pt_decoder {
 	uint64_t ret_addr;
 	uint64_t ctc_timestamp;
 	uint64_t ctc_delta;
+	uint64_t cycle_cnt;
+	uint64_t cyc_ref_timestamp;
 	uint32_t last_mtc;
 	uint32_t tsc_ctc_ratio_n;
 	uint32_t tsc_ctc_ratio_d;
@@ -111,8 +114,13 @@ struct intel_pt_decoder {
 	struct intel_pt_pkt tnt;
 	int pkt_step;
 	int pkt_len;
+	int last_packet_type;
 	unsigned int cbr;
 	unsigned int max_non_turbo_ratio;
+	double max_non_turbo_ratio_fp;
+	double cbr_cyc_to_tsc;
+	double calc_cyc_to_tsc;
+	bool have_calc_cyc_to_tsc;
 	int exec_mode;
 	unsigned int insn_bytes;
 	uint64_t sign_bit;
@@ -189,7 +197,8 @@ struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
 	decoder->period             = params->period;
 	decoder->period_type        = params->period_type;
 
-	decoder->max_non_turbo_ratio = params->max_non_turbo_ratio;
+	decoder->max_non_turbo_ratio    = params->max_non_turbo_ratio;
+	decoder->max_non_turbo_ratio_fp = params->max_non_turbo_ratio;
 
 	intel_pt_setup_period(decoder);
 
@@ -514,10 +523,247 @@ static int intel_pt_get_split_packet(struct intel_pt_decoder *decoder)
 	return ret;
 }
 
+struct intel_pt_pkt_info {
+	struct intel_pt_decoder	  *decoder;
+	struct intel_pt_pkt       packet;
+	uint64_t                  pos;
+	int                       pkt_len;
+	int                       last_packet_type;
+	void                      *data;
+};
+
+typedef int (*intel_pt_pkt_cb_t)(struct intel_pt_pkt_info *pkt_info);
+
+/* Lookahead packets in current buffer */
+static int intel_pt_pkt_lookahead(struct intel_pt_decoder *decoder,
+				  intel_pt_pkt_cb_t cb, void *data)
+{
+	struct intel_pt_pkt_info pkt_info;
+	const unsigned char *buf = decoder->buf;
+	size_t len = decoder->len;
+	int ret;
+
+	pkt_info.decoder          = decoder;
+	pkt_info.pos              = decoder->pos;
+	pkt_info.pkt_len          = decoder->pkt_step;
+	pkt_info.last_packet_type = decoder->last_packet_type;
+	pkt_info.data             = data;
+
+	while (1) {
+		do {
+			pkt_info.pos += pkt_info.pkt_len;
+			buf          += pkt_info.pkt_len;
+			len          -= pkt_info.pkt_len;
+
+			if (!len)
+				return INTEL_PT_NEED_MORE_BYTES;
+
+			ret = intel_pt_get_packet(buf, len, &pkt_info.packet);
+			if (!ret)
+				return INTEL_PT_NEED_MORE_BYTES;
+			if (ret < 0)
+				return ret;
+
+			pkt_info.pkt_len = ret;
+		} while (pkt_info.packet.type == INTEL_PT_PAD);
+
+		ret = cb(&pkt_info);
+		if (ret)
+			return 0;
+
+		pkt_info.last_packet_type = pkt_info.packet.type;
+	}
+}
+
+struct intel_pt_calc_cyc_to_tsc_info {
+	uint64_t        cycle_cnt;
+	unsigned int    cbr;
+	uint32_t        last_mtc;
+	uint64_t        ctc_timestamp;
+	uint64_t        ctc_delta;
+	uint64_t        tsc_timestamp;
+	uint64_t        timestamp;
+	bool            have_tma;
+	bool            from_mtc;
+	double          cbr_cyc_to_tsc;
+};
+
+static int intel_pt_calc_cyc_cb(struct intel_pt_pkt_info *pkt_info)
+{
+	struct intel_pt_decoder *decoder = pkt_info->decoder;
+	struct intel_pt_calc_cyc_to_tsc_info *data = pkt_info->data;
+	uint64_t timestamp;
+	double cyc_to_tsc;
+	unsigned int cbr;
+	uint32_t mtc, mtc_delta, ctc, fc, ctc_rem;
+
+	switch (pkt_info->packet.type) {
+	case INTEL_PT_TNT:
+	case INTEL_PT_TIP_PGE:
+	case INTEL_PT_TIP:
+	case INTEL_PT_FUP:
+	case INTEL_PT_PSB:
+	case INTEL_PT_PIP:
+	case INTEL_PT_MODE_EXEC:
+	case INTEL_PT_MODE_TSX:
+	case INTEL_PT_PSBEND:
+	case INTEL_PT_PAD:
+	case INTEL_PT_VMCS:
+	case INTEL_PT_MNT:
+		return 0;
+
+	case INTEL_PT_MTC:
+		if (!data->have_tma)
+			return 0;
+
+		mtc = pkt_info->packet.payload;
+		if (mtc > data->last_mtc)
+			mtc_delta = mtc - data->last_mtc;
+		else
+			mtc_delta = mtc + 256 - data->last_mtc;
+		data->ctc_delta += mtc_delta << decoder->mtc_shift;
+		data->last_mtc = mtc;
+
+		if (decoder->tsc_ctc_mult) {
+			timestamp = data->ctc_timestamp +
+				data->ctc_delta * decoder->tsc_ctc_mult;
+		} else {
+			timestamp = data->ctc_timestamp +
+				multdiv(data->ctc_delta,
+					decoder->tsc_ctc_ratio_n,
+					decoder->tsc_ctc_ratio_d);
+		}
+
+		if (timestamp < data->timestamp)
+			return 1;
+
+		if (pkt_info->last_packet_type != INTEL_PT_CYC) {
+			data->timestamp = timestamp;
+			return 0;
+		}
+
+		break;
+
+	case INTEL_PT_TSC:
+		timestamp = pkt_info->packet.payload |
+			    (data->timestamp & (0xffULL << 56));
+		if (data->from_mtc && timestamp < data->timestamp &&
+		    data->timestamp - timestamp < decoder->tsc_slip)
+			return 1;
+		while (timestamp < data->timestamp)
+			timestamp += (1ULL << 56);
+		if (pkt_info->last_packet_type != INTEL_PT_CYC) {
+			if (data->from_mtc)
+				return 1;
+			data->tsc_timestamp = timestamp;
+			data->timestamp = timestamp;
+			return 0;
+		}
+		break;
+
+	case INTEL_PT_TMA:
+		if (data->from_mtc)
+			return 1;
+
+		if (!decoder->tsc_ctc_ratio_d)
+			return 0;
+
+		ctc = pkt_info->packet.payload;
+		fc = pkt_info->packet.count;
+		ctc_rem = ctc & decoder->ctc_rem_mask;
+
+		data->last_mtc = (ctc >> decoder->mtc_shift) & 0xff;
+
+		data->ctc_timestamp = data->tsc_timestamp - fc;
+		if (decoder->tsc_ctc_mult) {
+			data->ctc_timestamp -= ctc_rem * decoder->tsc_ctc_mult;
+		} else {
+			data->ctc_timestamp -=
+				multdiv(ctc_rem, decoder->tsc_ctc_ratio_n,
+					decoder->tsc_ctc_ratio_d);
+		}
+
+		data->ctc_delta = 0;
+		data->have_tma = true;
+
+		return 0;
+
+	case INTEL_PT_CYC:
+		data->cycle_cnt += pkt_info->packet.payload;
+		return 0;
+
+	case INTEL_PT_CBR:
+		cbr = pkt_info->packet.payload;
+		if (data->cbr && data->cbr != cbr)
+			return 1;
+		data->cbr = cbr;
+		data->cbr_cyc_to_tsc = decoder->max_non_turbo_ratio_fp / cbr;
+		return 0;
+
+	case INTEL_PT_TIP_PGD:
+	case INTEL_PT_TRACESTOP:
+	case INTEL_PT_OVF:
+	case INTEL_PT_BAD: /* Does not happen */
+	default:
+		return 1;
+	}
+
+	if (!data->cbr && decoder->cbr) {
+		data->cbr = decoder->cbr;
+		data->cbr_cyc_to_tsc = decoder->cbr_cyc_to_tsc;
+	}
+
+	if (!data->cycle_cnt)
+		return 1;
+
+	cyc_to_tsc = (double)(timestamp - decoder->timestamp) / data->cycle_cnt;
+
+	if (data->cbr && cyc_to_tsc > data->cbr_cyc_to_tsc &&
+	    cyc_to_tsc / data->cbr_cyc_to_tsc > 1.25) {
+		intel_pt_log("Timestamp: calculated %g TSC ticks per cycle too big (c.f. CBR-based value %g), pos " x64_fmt "\n",
+			     cyc_to_tsc, data->cbr_cyc_to_tsc, pkt_info->pos);
+		return 1;
+	}
+
+	decoder->calc_cyc_to_tsc = cyc_to_tsc;
+	decoder->have_calc_cyc_to_tsc = true;
+
+	if (data->cbr) {
+		intel_pt_log("Timestamp: calculated %g TSC ticks per cycle c.f. CBR-based value %g, pos " x64_fmt "\n",
+			     cyc_to_tsc, data->cbr_cyc_to_tsc, pkt_info->pos);
+	} else {
+		intel_pt_log("Timestamp: calculated %g TSC ticks per cycle c.f. unknown CBR-based value, pos " x64_fmt "\n",
+			     cyc_to_tsc, pkt_info->pos);
+	}
+
+	return 1;
+}
+
+static void intel_pt_calc_cyc_to_tsc(struct intel_pt_decoder *decoder,
+				     bool from_mtc)
+{
+	struct intel_pt_calc_cyc_to_tsc_info data = {
+		.cycle_cnt      = 0,
+		.cbr            = 0,
+		.last_mtc       = decoder->last_mtc,
+		.ctc_timestamp  = decoder->ctc_timestamp,
+		.ctc_delta      = decoder->ctc_delta,
+		.tsc_timestamp  = decoder->tsc_timestamp,
+		.timestamp      = decoder->timestamp,
+		.have_tma       = decoder->have_tma,
+		.from_mtc       = from_mtc,
+		.cbr_cyc_to_tsc = 0,
+	};
+
+	intel_pt_pkt_lookahead(decoder, intel_pt_calc_cyc_cb, &data);
+}
+
 static int intel_pt_get_next_packet(struct intel_pt_decoder *decoder)
 {
 	int ret;
 
+	decoder->last_packet_type = decoder->packet.type;
+
 	do {
 		decoder->pos += decoder->pkt_step;
 		decoder->buf += decoder->pkt_step;
@@ -954,6 +1200,13 @@ static void intel_pt_calc_tsc_timestamp(struct intel_pt_decoder *decoder)
 		decoder->timestamp_insn_cnt = 0;
 	}
 
+	if (decoder->last_packet_type == INTEL_PT_CYC) {
+		decoder->cyc_ref_timestamp = decoder->timestamp;
+		decoder->cycle_cnt = 0;
+		decoder->have_calc_cyc_to_tsc = false;
+		intel_pt_calc_cyc_to_tsc(decoder, false);
+	}
+
 	intel_pt_log_to("Setting timestamp", decoder->timestamp);
 }
 
@@ -962,6 +1215,7 @@ static int intel_pt_overflow(struct intel_pt_decoder *decoder)
 	intel_pt_log("ERROR: Buffer overflow\n");
 	intel_pt_clear_tx_flags(decoder);
 	decoder->have_tma = false;
+	decoder->cbr = 0;
 	decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
 	decoder->overflow = true;
 	return -EOVERFLOW;
@@ -1026,6 +1280,49 @@ static void intel_pt_calc_mtc_timestamp(struct intel_pt_decoder *decoder)
 
 	decoder->timestamp_insn_cnt = 0;
 	decoder->last_mtc = mtc;
+
+	if (decoder->last_packet_type == INTEL_PT_CYC) {
+		decoder->cyc_ref_timestamp = decoder->timestamp;
+		decoder->cycle_cnt = 0;
+		decoder->have_calc_cyc_to_tsc = false;
+		intel_pt_calc_cyc_to_tsc(decoder, true);
+	}
+}
+
+static void intel_pt_calc_cbr(struct intel_pt_decoder *decoder)
+{
+	unsigned int cbr = decoder->packet.payload;
+
+	if (decoder->cbr == cbr)
+		return;
+
+	decoder->cbr = cbr;
+	decoder->cbr_cyc_to_tsc = decoder->max_non_turbo_ratio_fp / cbr;
+}
+
+static void intel_pt_calc_cyc_timestamp(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp = decoder->cyc_ref_timestamp;
+
+	decoder->have_cyc = true;
+
+	decoder->cycle_cnt += decoder->packet.payload;
+
+	if (!decoder->cyc_ref_timestamp)
+		return;
+
+	if (decoder->have_calc_cyc_to_tsc)
+		timestamp += decoder->cycle_cnt * decoder->calc_cyc_to_tsc;
+	else if (decoder->cbr)
+		timestamp += decoder->cycle_cnt * decoder->cbr_cyc_to_tsc;
+	else
+		return;
+
+	if (timestamp < decoder->timestamp)
+		intel_pt_log("Suppressing CYC timestamp " x64_fmt " less than current timestamp " x64_fmt "\n",
+			     timestamp, decoder->timestamp);
+	else
+		decoder->timestamp = timestamp;
 }
 
 /* Walk PSB+ packets when already in sync. */
@@ -1065,7 +1362,7 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_CBR:
-			decoder->cbr = decoder->packet.payload;
+			intel_pt_calc_cbr(decoder);
 			break;
 
 		case INTEL_PT_MODE_EXEC:
@@ -1182,6 +1479,7 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_CYC:
+			intel_pt_calc_cyc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_MODE_EXEC:
@@ -1318,10 +1616,11 @@ next:
 			break;
 
 		case INTEL_PT_CYC:
+			intel_pt_calc_cyc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_CBR:
-			decoder->cbr = decoder->packet.payload;
+			intel_pt_calc_cbr(decoder);
 			break;
 
 		case INTEL_PT_MODE_EXEC:
@@ -1398,10 +1697,11 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_CYC:
+			intel_pt_calc_cyc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_CBR:
-			decoder->cbr = decoder->packet.payload;
+			intel_pt_calc_cbr(decoder);
 			break;
 
 		case INTEL_PT_PIP:
@@ -1493,10 +1793,11 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_CYC:
+			intel_pt_calc_cyc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_CBR:
-			decoder->cbr = decoder->packet.payload;
+			intel_pt_calc_cbr(decoder);
 			break;
 
 		case INTEL_PT_PIP:
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 23/25] perf tools: Add Intel PT support for using CYC packets
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (21 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 22/25] perf tools: Add Intel PT support for decoding CYC packets Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-28  6:40   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:33 ` [PATCH V8 24/25] perf tools: Add Intel PT support for decoding TRACESTOP packets Adrian Hunter
  2015-07-17 16:34 ` [PATCH V8 25/25] perf tools: Update Intel PT documentation Adrian Hunter
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

CYC packets are a new Intel PT feature.

CYC packets provide even finer grain timestamp information
than MTC and TSC packets.  A CYC packet contains the number
of CPU cycles since the last CYC packet. Unlike MTC and TSC
packets, CYC packets are only sent when another packet is
also sent.

Support for this feature is indicated by
/sys/bus/event_source/devices/intel_pt/caps/psb_cyc
which contains "1" if the feature is supported and
"0" otherwise.

CYC packets can be requested using a PMU config term
e.g. perf record -e intel_pt/cyc/u sleep 1

The frequency of CYC packets can also be specified.
e.g. perf record -e intel_pt/cyc,cyc_thresh=2/u sleep 1

CYC packets are not requested by default.

Valid cyc_thresh values are given by:
/sys/bus/event_source/devices/intel_pt/caps/cycle_thresholds
which contains a hexadecimal value, the bits of which
represent valid values e.g. bit 2 set means value 2
is valid.

The value represents the minimum number of CPU cycles
that must have passed before a CYC packet can be sent.
The number of CPU cycles is:

    2 ^ (value - 1)

e.g. value 4 means 8 CPU cycles must pass before a CYC packet
can be sent.  Note a CYC packet is still only sent when another
packet is sent, not at, e.g. every 8 CPU cycles.

If an invalid value is entered, the error message
will give a list of valid values e.g.

    $ perf record -e intel_pt/cyc,cyc_thresh=15/u uname
    Invalid cyc_thresh for intel_pt. Valid values are: 0-12

tools/perf/Documentation/intel-pt.txt is updated in
a later patch as there are a number of new features
being added.

For more information refer to the June 2015 or later Intel 64 and
IA-32 Architectures SDM Chapter 36 Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/arch/x86/util/intel-pt.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index a5de01dad868..2ca10d796c0b 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -475,6 +475,12 @@ static int intel_pt_validate_config(struct perf_pmu *intel_pt_pmu,
 	if (!evsel)
 		return 0;
 
+	err = intel_pt_val_config_term(intel_pt_pmu, "caps/cycle_thresholds",
+				       "cyc_thresh", "caps/psb_cyc",
+				       evsel->attr.config);
+	if (err)
+		return err;
+
 	err = intel_pt_val_config_term(intel_pt_pmu, "caps/mtc_periods",
 				       "mtc_period", "caps/mtc",
 				       evsel->attr.config);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 24/25] perf tools: Add Intel PT support for decoding TRACESTOP packets
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (22 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 23/25] perf tools: Add Intel PT support for using " Adrian Hunter
@ 2015-07-17 16:33 ` Adrian Hunter
  2015-08-28  6:40   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-07-17 16:34 ` [PATCH V8 25/25] perf tools: Update Intel PT documentation Adrian Hunter
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

A TRACESTOP packet is produced when an Intel PT trace
enters a defined region of the address space at which
point the tracing stops.

This patch just adds decoder support.

Support for specifying TRACESTOP regions is left until
later.

For details refer to the June 2015 or later Intel 64 and
IA-32 Architectures SDM Chapter 36 Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 0845c5e6ad1d..22ba50224319 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -1572,6 +1572,10 @@ next:
 			return intel_pt_walk_fup_tip(decoder);
 
 		case INTEL_PT_TRACESTOP:
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			intel_pt_clear_tx_flags(decoder);
+			decoder->have_tma = false;
 			break;
 
 		case INTEL_PT_PSB:
@@ -1717,6 +1721,9 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_TRACESTOP:
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			intel_pt_clear_tx_flags(decoder);
 		case INTEL_PT_TNT:
 			decoder->have_tma = false;
 			intel_pt_log("ERROR: Unexpected packet\n");
@@ -1819,6 +1826,10 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 			return intel_pt_bug(decoder);
 
 		case INTEL_PT_TRACESTOP:
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			intel_pt_clear_tx_flags(decoder);
+			decoder->have_tma = false;
 			break;
 
 		case INTEL_PT_PSB:
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [PATCH V8 25/25] perf tools: Update Intel PT documentation
  2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (23 preceding siblings ...)
  2015-07-17 16:33 ` [PATCH V8 24/25] perf tools: Add Intel PT support for decoding TRACESTOP packets Adrian Hunter
@ 2015-07-17 16:34 ` Adrian Hunter
  2015-08-28  6:40   ` [tip:perf/core] " tip-bot for Adrian Hunter
  24 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-07-17 16:34 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Update Intel PT documentation to describe new
features.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/intel-pt.txt | 194 ++++++++++++++++++++++++++++++++--
 1 file changed, 186 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
index 2866b62eb293..4a0501d7a3b4 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -142,19 +142,21 @@ which is the same as
 
 	-e intel_pt/tsc=1,noretcomp=0/
 
+Note there are now new config terms - see section 'config terms' further below.
+
 The config terms are listed in /sys/devices/intel_pt/format.  They are bit
 fields within the config member of the struct perf_event_attr which is
 passed to the kernel by the perf_event_open system call.  They correspond to bit
 fields in the IA32_RTIT_CTL MSR.  Here is a list of them and their definitions:
 
-	$ for f in `ls /sys/devices/intel_pt/format`;do
-	> echo $f
-	> cat /sys/devices/intel_pt/format/$f
-	> done
-	noretcomp
-	config:11
-	tsc
-	config:10
+	$ grep -H . /sys/bus/event_source/devices/intel_pt/format/*
+	/sys/bus/event_source/devices/intel_pt/format/cyc:config:1
+	/sys/bus/event_source/devices/intel_pt/format/cyc_thresh:config:19-22
+	/sys/bus/event_source/devices/intel_pt/format/mtc:config:9
+	/sys/bus/event_source/devices/intel_pt/format/mtc_period:config:14-17
+	/sys/bus/event_source/devices/intel_pt/format/noretcomp:config:11
+	/sys/bus/event_source/devices/intel_pt/format/psb_period:config:24-27
+	/sys/bus/event_source/devices/intel_pt/format/tsc:config:10
 
 Note that the default config must be overridden for each term i.e.
 
@@ -209,9 +211,185 @@ perf_event_attr is displayed if the -vv option is used e.g.
 	------------------------------------------------------------
 
 
+config terms
+------------
+
+The June 2015 version of Intel 64 and IA-32 Architectures Software Developer
+Manuals, Chapter 36 Intel Processor Trace, defined new Intel PT features.
+Some of the features are reflect in new config terms.  All the config terms are
+described below.
+
+tsc		Always supported.  Produces TSC timestamp packets to provide
+		timing information.  In some cases it is possible to decode
+		without timing information, for example a per-thread context
+		that does not overlap executable memory maps.
+
+		The default config selects tsc (i.e. tsc=1).
+
+noretcomp	Always supported.  Disables "return compression" so a TIP packet
+		is produced when a function returns.  Causes more packets to be
+		produced but might make decoding more reliable.
+
+		The default config does not select noretcomp (i.e. noretcomp=0).
+
+psb_period	Allows the frequency of PSB packets to be specified.
+
+		The PSB packet is a synchronization packet that provides a
+		starting point for decoding or recovery from errors.
+
+		Support for psb_period is indicated by:
+
+			/sys/bus/event_source/devices/intel_pt/caps/psb_cyc
+
+		which contains "1" if the feature is supported and "0"
+		otherwise.
+
+		Valid values are given by:
+
+			/sys/bus/event_source/devices/intel_pt/caps/psb_periods
+
+		which contains a hexadecimal value, the bits of which represent
+		valid values e.g. bit 2 set means value 2 is valid.
+
+		The psb_period value is converted to the approximate number of
+		trace bytes between PSB packets as:
+
+			2 ^ (value + 11)
+
+		e.g. value 3 means 16KiB bytes between PSBs
+
+		If an invalid value is entered, the error message
+		will give a list of valid values e.g.
+
+			$ perf record -e intel_pt/psb_period=15/u uname
+			Invalid psb_period for intel_pt. Valid values are: 0-5
+
+		If MTC packets are selected, the default config selects a value
+		of 3 (i.e. psb_period=3) or the nearest lower value that is
+		supported (0 is always supported).  Otherwise the default is 0.
+
+		If decoding is expected to be reliable and the buffer is large
+		then a large PSB period can be used.
+
+		Because a TSC packet is produced with PSB, the PSB period can
+		also affect the granularity to timing information in the absence
+		of MTC or CYC.
+
+mtc		Produces MTC timing packets.
+
+		MTC packets provide finer grain timestamp information than TSC
+		packets.  MTC packets record time using the hardware crystal
+		clock (CTC) which is related to TSC packets using a TMA packet.
+
+		Support for this feature is indicated by:
+
+			/sys/bus/event_source/devices/intel_pt/caps/mtc
+
+		which contains "1" if the feature is supported and
+		"0" otherwise.
+
+		The frequency of MTC packets can also be specified - see
+		mtc_period below.
+
+mtc_period	Specifies how frequently MTC packets are produced - see mtc
+		above for how to determine if MTC packets are supported.
+
+		Valid values are given by:
+
+			/sys/bus/event_source/devices/intel_pt/caps/mtc_periods
+
+		which contains a hexadecimal value, the bits of which represent
+		valid values e.g. bit 2 set means value 2 is valid.
+
+		The mtc_period value is converted to the MTC frequency as:
+
+			CTC-frequency / (2 ^ value)
+
+		e.g. value 3 means one eighth of CTC-frequency
+
+		Where CTC is the hardware crystal clock, the frequency of which
+		can be related to TSC via values provided in cpuid leaf 0x15.
+
+		If an invalid value is entered, the error message
+		will give a list of valid values e.g.
+
+			$ perf record -e intel_pt/mtc_period=15/u uname
+			Invalid mtc_period for intel_pt. Valid values are: 0,3,6,9
+
+		The default value is 3 or the nearest lower value
+		that is supported (0 is always supported).
+
+cyc		Produces CYC timing packets.
+
+		CYC packets provide even finer grain timestamp information than
+		MTC and TSC packets.  A CYC packet contains the number of CPU
+		cycles since the last CYC packet. Unlike MTC and TSC packets,
+		CYC packets are only sent when another packet is also sent.
+
+		Support for this feature is indicated by:
+
+			/sys/bus/event_source/devices/intel_pt/caps/psb_cyc
+
+		which contains "1" if the feature is supported and
+		"0" otherwise.
+
+		The number of CYC packets produced can be reduced by specifying
+		a threshold - see cyc_thresh below.
+
+cyc_thresh	Specifies how frequently CYC packets are produced - see cyc
+		above for how to determine if CYC packets are supported.
+
+		Valid cyc_thresh values are given by:
+
+			/sys/bus/event_source/devices/intel_pt/caps/cycle_thresholds
+
+		which contains a hexadecimal value, the bits of which represent
+		valid values e.g. bit 2 set means value 2 is valid.
+
+		The cyc_thresh value represents the minimum number of CPU cycles
+		that must have passed before a CYC packet can be sent.  The
+		number of CPU cycles is:
+
+			2 ^ (value - 1)
+
+		e.g. value 4 means 8 CPU cycles must pass before a CYC packet
+		can be sent.  Note a CYC packet is still only sent when another
+		packet is sent, not at, e.g. every 8 CPU cycles.
+
+		If an invalid value is entered, the error message
+		will give a list of valid values e.g.
+
+			$ perf record -e intel_pt/cyc,cyc_thresh=15/u uname
+			Invalid cyc_thresh for intel_pt. Valid values are: 0-12
+
+		CYC packets are not requested by default.
+
+no_force_psb	This is a driver option and is not in the IA32_RTIT_CTL MSR.
+
+		It stops the driver resetting the byte count to zero whenever
+		enabling the trace (for example on context switches) which in
+		turn results in no PSB being forced.  However some processors
+		will produce a PSB anyway.
+
+		In any case, there is still a PSB when the trace is enabled for
+		the first time.
+
+		no_force_psb can be used to slightly decrease the trace size but
+		may make it harder for the decoder to recover from errors.
+
+		no_force_psb is not selected by default.
+
+
 new snapshot option
 -------------------
 
+The difference between full trace and snapshot from the kernel's perspective is
+that in full trace we don't overwrite trace data that the user hasn't collected
+yet (and indicated that by advancing aux_tail), whereas in snapshot mode we let
+the trace run and overwrite older data in the buffer so that whenever something
+interesting happens, we can stop it and grab a snapshot of what was going on
+around that interesting moment.
+
 To select snapshot mode a new option has been added:
 
 	-S
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 11/25] perf auxtrace: Fix period type 'i' not working
  2015-07-17 16:33 ` [PATCH V8 11/25] perf auxtrace: Fix period type 'i' not working Adrian Hunter
@ 2015-08-06 19:50   ` Arnaldo Carvalho de Melo
  2015-08-07  7:22   ` [tip:perf/core] " tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-06 19:50 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Jul 17, 2015 at 07:33:46PM +0300, Adrian Hunter escreveu:
> PERF_ITRACE_PERIOD_INSTRUCTIONS is zero so it
> got overwritten by the default period type.
> Fix by checking if the period type was set
> rather than if the value was zero when applying
> the default.

Applied
 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/auxtrace.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index 3f3b40fa87c8..4ec3acf6fabb 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -949,6 +949,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
>  	struct itrace_synth_opts *synth_opts = opt->value;
>  	const char *p;
>  	char *endptr;
> +	bool period_type_set = false;
>  
>  	synth_opts->set = true;
>  
> @@ -977,10 +978,12 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
>  				case 'i':
>  					synth_opts->period_type =
>  						PERF_ITRACE_PERIOD_INSTRUCTIONS;
> +					period_type_set = true;
>  					break;
>  				case 't':
>  					synth_opts->period_type =
>  						PERF_ITRACE_PERIOD_TICKS;
> +					period_type_set = true;
>  					break;
>  				case 'm':
>  					synth_opts->period *= 1000;
> @@ -993,6 +996,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
>  						goto out_err;
>  					synth_opts->period_type =
>  						PERF_ITRACE_PERIOD_NANOSECS;
> +					period_type_set = true;
>  					break;
>  				case '\0':
>  					goto out;
> @@ -1046,7 +1050,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
>  	}
>  out:
>  	if (synth_opts->instructions) {
> -		if (!synth_opts->period_type)
> +		if (!period_type_set)
>  			synth_opts->period_type =
>  					PERF_ITRACE_DEFAULT_PERIOD_TYPE;
>  		if (!synth_opts->period)
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 12/25] perf tools: Fix perf-with-kcore handling of arguments containing spaces
  2015-07-17 16:33 ` [PATCH V8 12/25] perf tools: Fix perf-with-kcore handling of arguments containing spaces Adrian Hunter
@ 2015-08-06 19:50   ` Arnaldo Carvalho de Melo
  2015-08-07  7:22   ` [tip:perf/core] " tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-06 19:50 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Jul 17, 2015 at 07:33:47PM +0300, Adrian Hunter escreveu:
> Fix the perf-with-kcore script so that it doesn't
> split arguments that contain spaces.

Applied
 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/perf-with-kcore.sh | 28 ++++++++++++++--------------
>  1 file changed, 14 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/perf/perf-with-kcore.sh b/tools/perf/perf-with-kcore.sh
> index c7ff90a90e4e..7e47a7cbc195 100644
> --- a/tools/perf/perf-with-kcore.sh
> +++ b/tools/perf/perf-with-kcore.sh
> @@ -50,7 +50,7 @@ copy_kcore()
>  	fi
>  
>  	rm -f perf.data.junk
> -	("$PERF" record -o perf.data.junk $PERF_OPTIONS -- sleep 60) >/dev/null 2>/dev/null &
> +	("$PERF" record -o perf.data.junk "${PERF_OPTIONS[@]}" -- sleep 60) >/dev/null 2>/dev/null &
>  	PERF_PID=$!
>  
>  	# Need to make sure that perf has started
> @@ -160,18 +160,18 @@ record()
>  			echo "*** WARNING *** /proc/sys/kernel/kptr_restrict prevents access to kernel addresses" >&2
>  		fi
>  
> -		if echo "$PERF_OPTIONS" | grep -q ' -a \|^-a \| -a$\|^-a$\| --all-cpus \|^--all-cpus \| --all-cpus$\|^--all-cpus$' ; then
> +		if echo "${PERF_OPTIONS[@]}" | grep -q ' -a \|^-a \| -a$\|^-a$\| --all-cpus \|^--all-cpus \| --all-cpus$\|^--all-cpus$' ; then
>  			echo "*** WARNING *** system-wide tracing without root access will not be able to read all necessary information from /proc" >&2
>  		fi
>  
> -		if echo "$PERF_OPTIONS" | grep -q 'intel_pt\|intel_bts\| -I\|^-I' ; then
> +		if echo "${PERF_OPTIONS[@]}" | grep -q 'intel_pt\|intel_bts\| -I\|^-I' ; then
>  			if [ "$(cat /proc/sys/kernel/perf_event_paranoid)" -gt -1 ] ; then
>  				echo "*** WARNING *** /proc/sys/kernel/perf_event_paranoid restricts buffer size and tracepoint (sched_switch) use" >&2
>  			fi
>  
> -			if echo "$PERF_OPTIONS" | grep -q ' --per-thread \|^--per-thread \| --per-thread$\|^--per-thread$' ; then
> +			if echo "${PERF_OPTIONS[@]}" | grep -q ' --per-thread \|^--per-thread \| --per-thread$\|^--per-thread$' ; then
>  				true
> -			elif echo "$PERF_OPTIONS" | grep -q ' -t \|^-t \| -t$\|^-t$' ; then
> +			elif echo "${PERF_OPTIONS[@]}" | grep -q ' -t \|^-t \| -t$\|^-t$' ; then
>  				true
>  			elif [ ! -r /sys/kernel/debug -o ! -x /sys/kernel/debug ] ; then
>  				echo "*** WARNING *** /sys/kernel/debug permissions prevent tracepoint (sched_switch) use" >&2
> @@ -193,8 +193,8 @@ record()
>  
>  	mkdir "$PERF_DATA_DIR"
>  
> -	echo "$PERF record -o $PERF_DATA_DIR/perf.data $PERF_OPTIONS -- $*"
> -	"$PERF" record -o "$PERF_DATA_DIR/perf.data" $PERF_OPTIONS -- $* || true
> +	echo "$PERF record -o $PERF_DATA_DIR/perf.data ${PERF_OPTIONS[@]} -- $@"
> +	"$PERF" record -o "$PERF_DATA_DIR/perf.data" "${PERF_OPTIONS[@]}" -- "$@" || true
>  
>  	if rmdir "$PERF_DATA_DIR" > /dev/null 2>/dev/null ; then
>  		exit 1
> @@ -209,8 +209,8 @@ subcommand()
>  {
>  	find_perf
>  	check_buildid_cache_permissions
> -	echo "$PERF $PERF_SUB_COMMAND -i $PERF_DATA_DIR/perf.data --kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms $*"
> -	"$PERF" $PERF_SUB_COMMAND -i "$PERF_DATA_DIR/perf.data" "--kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms" $*
> +	echo "$PERF $PERF_SUB_COMMAND -i $PERF_DATA_DIR/perf.data --kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms $@"
> +	"$PERF" $PERF_SUB_COMMAND -i "$PERF_DATA_DIR/perf.data" "--kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms" "$@"
>  }
>  
>  if [ "$1" = "fix_buildid_cache_permissions" ] ; then
> @@ -234,7 +234,7 @@ fi
>  case "$PERF_SUB_COMMAND" in
>  "record")
>  	while [ "$1" != "--" ] ; do
> -		PERF_OPTIONS+="$1 "
> +		PERF_OPTIONS+=("$1")
>  		shift || break
>  	done
>  	if [ "$1" != "--" ] ; then
> @@ -242,16 +242,16 @@ case "$PERF_SUB_COMMAND" in
>  		usage
>  	fi
>  	shift
> -	record $*
> +	record "$@"
>  ;;
>  "script")
> -	subcommand $*
> +	subcommand "$@"
>  ;;
>  "report")
> -	subcommand $*
> +	subcommand "$@"
>  ;;
>  "inject")
> -	subcommand $*
> +	subcommand "$@"
>  ;;
>  *)
>  	usage
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 14/25] perf tools: Add perf_pmu__format_bits()
  2015-07-17 16:33 ` [PATCH V8 14/25] perf tools: Add perf_pmu__format_bits() Adrian Hunter
@ 2015-08-06 19:50   ` Arnaldo Carvalho de Melo
  2015-08-07  7:23   ` [tip:perf/core] " tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-06 19:50 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Jul 17, 2015 at 07:33:49PM +0300, Adrian Hunter escreveu:
> Add perf_pmu__format_bits() to get the format bits
> for a PMU config term.  Intel PT will use this to
> validate terms and to record format bits to enable
> later interpreting the config from the attribute
> stored in the perf.data file.

Applied
 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/pmu.c | 17 ++++++++++++++++-
>  tools/perf/util/pmu.h |  1 +
>  2 files changed, 17 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 52d569cda606..d26ff0ab8410 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -538,7 +538,7 @@ struct perf_pmu *perf_pmu__find(const char *name)
>  }
>  
>  static struct perf_pmu_format *
> -pmu_find_format(struct list_head *formats, char *name)
> +pmu_find_format(struct list_head *formats, const char *name)
>  {
>  	struct perf_pmu_format *format;
>  
> @@ -549,6 +549,21 @@ pmu_find_format(struct list_head *formats, char *name)
>  	return NULL;
>  }
>  
> +__u64 perf_pmu__format_bits(struct list_head *formats, const char *name)
> +{
> +	struct perf_pmu_format *format = pmu_find_format(formats, name);
> +	__u64 bits = 0;
> +	int fbit;
> +
> +	if (!format)
> +		return 0;
> +
> +	for_each_set_bit(fbit, format->bits, PERF_PMU_FORMAT_BITS)
> +		bits |= 1ULL << fbit;
> +
> +	return bits;
> +}
> +
>  /*
>   * Sets value based on the format definition (format parameter)
>   * and unformated value (value parameter).
> diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
> index 7b9c8cf8ae3e..5d7e84466bee 100644
> --- a/tools/perf/util/pmu.h
> +++ b/tools/perf/util/pmu.h
> @@ -54,6 +54,7 @@ int perf_pmu__config_terms(struct list_head *formats,
>  			   struct perf_event_attr *attr,
>  			   struct list_head *head_terms,
>  			   bool zero, struct parse_events_error *error);
> +__u64 perf_pmu__format_bits(struct list_head *formats, const char *name);
>  int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
>  			  struct perf_pmu_info *info);
>  struct list_head *perf_pmu__alias(struct perf_pmu *pmu,
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 15/25] perf tools: Validate config term maximum value
  2015-07-17 16:33 ` [PATCH V8 15/25] perf tools: Validate config term maximum value Adrian Hunter
@ 2015-08-06 19:50   ` Arnaldo Carvalho de Melo
  2015-08-07  7:23   ` [tip:perf/core] " tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-06 19:50 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Jul 17, 2015 at 07:33:50PM +0300, Adrian Hunter escreveu:
> Currently the value of a PMU config term is silently
> truncated if it is too big. This is an impediment to
> validating the value for other criteria later on i.e.
> the user provides an invalid value that gets truncated
> to a valid one.

Applied
 
> The maximum value validation is only done for the
> parser where the error is passed back to the user. In
> other cases the silent truncation continues so as not
> to affect tools that perhaps rely on it.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/pmu.c | 30 +++++++++++++++++++++++++++++-
>  1 file changed, 29 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index d26ff0ab8410..29797fd9bedc 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -585,6 +585,18 @@ static void pmu_format_value(unsigned long *format, __u64 value, __u64 *v,
>  	}
>  }
>  
> +static __u64 pmu_format_max_value(const unsigned long *format)
> +{
> +	int w;
> +
> +	w = bitmap_weight(format, PERF_PMU_FORMAT_BITS);
> +	if (!w)
> +		return 0;
> +	if (w < 64)
> +		return (1ULL << w) - 1;
> +	return -1;
> +}
> +
>  /*
>   * Term is a string term, and might be a param-term. Try to look up it's value
>   * in the remaining terms.
> @@ -658,7 +670,7 @@ static int pmu_config_term(struct list_head *formats,
>  {
>  	struct perf_pmu_format *format;
>  	__u64 *vp;
> -	__u64 val;
> +	__u64 val, max_val;
>  
>  	/*
>  	 * If this is a parameter we've already used for parameterized-eval,
> @@ -724,6 +736,22 @@ static int pmu_config_term(struct list_head *formats,
>  	} else
>  		return -EINVAL;
>  
> +	max_val = pmu_format_max_value(format->bits);
> +	if (val > max_val) {
> +		if (err) {
> +			err->idx = term->err_val;
> +			if (asprintf(&err->str,
> +				     "value too big for format, maximum is %llu",
> +				     (unsigned long long)max_val) < 0)
> +				err->str = strdup("value too big for format");
> +			return -EINVAL;
> +		}
> +		/*
> +		 * Assume we don't care if !err, in which case the value will be
> +		 * silently truncated.
> +		 */
> +	}
> +
>  	pmu_format_value(format->bits, val, vp, zero);
>  	return 0;
>  }
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 16/25] perf tools: Extend the event parser maximum error index
  2015-07-17 16:33 ` [PATCH V8 16/25] perf tools: Extend the event parser maximum error index Adrian Hunter
@ 2015-08-06 19:50   ` Arnaldo Carvalho de Melo
  2015-08-07  7:23   ` [tip:perf/core] " tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-06 19:50 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Jul 17, 2015 at 07:33:51PM +0300, Adrian Hunter escreveu:
> Extend the event parser maximum error index from 10
> to 13.  That allows PMU config terms of up to 10
> characters to display un-truncated in the error
> message.

Applied
 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/parse-events.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index a71eeb279ed2..dbe43acb5fe0 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -1105,7 +1105,7 @@ static void parse_events_print_error(struct parse_events_error *err,
>  		 * Maximum error index indent, we will cut
>  		 * the event string if it's bigger.
>  		 */
> -		int max_err_idx = 10;
> +		int max_err_idx = 13;
>  
>  		/*
>  		 * Let's be specific with the message when
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf auxtrace: Fix period type 'i' not working
  2015-07-17 16:33 ` [PATCH V8 11/25] perf auxtrace: Fix period type 'i' not working Adrian Hunter
  2015-08-06 19:50   ` Arnaldo Carvalho de Melo
@ 2015-08-07  7:22   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-07  7:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, tglx, acme, hpa, jolsa, linux-kernel, adrian.hunter

Commit-ID:  f70cfa07e3675a115265e32d6357272275358cdb
Gitweb:     http://git.kernel.org/tip/f70cfa07e3675a115265e32d6357272275358cdb
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:46 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 6 Aug 2015 16:47:58 -0300

perf auxtrace: Fix period type 'i' not working

PERF_ITRACE_PERIOD_INSTRUCTIONS is zero so it got overwritten by the
default period type.

Fix by checking if the period type was set rather than if the value was
zero when applying the default.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-12-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/auxtrace.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 83d9dd9..a25b360 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -942,6 +942,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 	struct itrace_synth_opts *synth_opts = opt->value;
 	const char *p;
 	char *endptr;
+	bool period_type_set = false;
 
 	synth_opts->set = true;
 
@@ -970,10 +971,12 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 				case 'i':
 					synth_opts->period_type =
 						PERF_ITRACE_PERIOD_INSTRUCTIONS;
+					period_type_set = true;
 					break;
 				case 't':
 					synth_opts->period_type =
 						PERF_ITRACE_PERIOD_TICKS;
+					period_type_set = true;
 					break;
 				case 'm':
 					synth_opts->period *= 1000;
@@ -986,6 +989,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 						goto out_err;
 					synth_opts->period_type =
 						PERF_ITRACE_PERIOD_NANOSECS;
+					period_type_set = true;
 					break;
 				case '\0':
 					goto out;
@@ -1039,7 +1043,7 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 	}
 out:
 	if (synth_opts->instructions) {
-		if (!synth_opts->period_type)
+		if (!period_type_set)
 			synth_opts->period_type =
 					PERF_ITRACE_DEFAULT_PERIOD_TYPE;
 		if (!synth_opts->period)

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Fix perf-with-kcore handling of arguments containing spaces
  2015-07-17 16:33 ` [PATCH V8 12/25] perf tools: Fix perf-with-kcore handling of arguments containing spaces Adrian Hunter
  2015-08-06 19:50   ` Arnaldo Carvalho de Melo
@ 2015-08-07  7:22   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-07  7:22 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, mingo, acme, jolsa, adrian.hunter, tglx, hpa

Commit-ID:  8bd1b2d2578ca2688969352ed1f8a0a8f10dbb63
Gitweb:     http://git.kernel.org/tip/8bd1b2d2578ca2688969352ed1f8a0a8f10dbb63
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:47 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 6 Aug 2015 16:48:27 -0300

perf tools: Fix perf-with-kcore handling of arguments containing spaces

Fix the perf-with-kcore script so that it doesn't split arguments that
contain spaces.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-13-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/perf-with-kcore.sh | 28 ++++++++++++++--------------
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/tools/perf/perf-with-kcore.sh b/tools/perf/perf-with-kcore.sh
index c7ff90a..7e47a7c 100644
--- a/tools/perf/perf-with-kcore.sh
+++ b/tools/perf/perf-with-kcore.sh
@@ -50,7 +50,7 @@ copy_kcore()
 	fi
 
 	rm -f perf.data.junk
-	("$PERF" record -o perf.data.junk $PERF_OPTIONS -- sleep 60) >/dev/null 2>/dev/null &
+	("$PERF" record -o perf.data.junk "${PERF_OPTIONS[@]}" -- sleep 60) >/dev/null 2>/dev/null &
 	PERF_PID=$!
 
 	# Need to make sure that perf has started
@@ -160,18 +160,18 @@ record()
 			echo "*** WARNING *** /proc/sys/kernel/kptr_restrict prevents access to kernel addresses" >&2
 		fi
 
-		if echo "$PERF_OPTIONS" | grep -q ' -a \|^-a \| -a$\|^-a$\| --all-cpus \|^--all-cpus \| --all-cpus$\|^--all-cpus$' ; then
+		if echo "${PERF_OPTIONS[@]}" | grep -q ' -a \|^-a \| -a$\|^-a$\| --all-cpus \|^--all-cpus \| --all-cpus$\|^--all-cpus$' ; then
 			echo "*** WARNING *** system-wide tracing without root access will not be able to read all necessary information from /proc" >&2
 		fi
 
-		if echo "$PERF_OPTIONS" | grep -q 'intel_pt\|intel_bts\| -I\|^-I' ; then
+		if echo "${PERF_OPTIONS[@]}" | grep -q 'intel_pt\|intel_bts\| -I\|^-I' ; then
 			if [ "$(cat /proc/sys/kernel/perf_event_paranoid)" -gt -1 ] ; then
 				echo "*** WARNING *** /proc/sys/kernel/perf_event_paranoid restricts buffer size and tracepoint (sched_switch) use" >&2
 			fi
 
-			if echo "$PERF_OPTIONS" | grep -q ' --per-thread \|^--per-thread \| --per-thread$\|^--per-thread$' ; then
+			if echo "${PERF_OPTIONS[@]}" | grep -q ' --per-thread \|^--per-thread \| --per-thread$\|^--per-thread$' ; then
 				true
-			elif echo "$PERF_OPTIONS" | grep -q ' -t \|^-t \| -t$\|^-t$' ; then
+			elif echo "${PERF_OPTIONS[@]}" | grep -q ' -t \|^-t \| -t$\|^-t$' ; then
 				true
 			elif [ ! -r /sys/kernel/debug -o ! -x /sys/kernel/debug ] ; then
 				echo "*** WARNING *** /sys/kernel/debug permissions prevent tracepoint (sched_switch) use" >&2
@@ -193,8 +193,8 @@ record()
 
 	mkdir "$PERF_DATA_DIR"
 
-	echo "$PERF record -o $PERF_DATA_DIR/perf.data $PERF_OPTIONS -- $*"
-	"$PERF" record -o "$PERF_DATA_DIR/perf.data" $PERF_OPTIONS -- $* || true
+	echo "$PERF record -o $PERF_DATA_DIR/perf.data ${PERF_OPTIONS[@]} -- $@"
+	"$PERF" record -o "$PERF_DATA_DIR/perf.data" "${PERF_OPTIONS[@]}" -- "$@" || true
 
 	if rmdir "$PERF_DATA_DIR" > /dev/null 2>/dev/null ; then
 		exit 1
@@ -209,8 +209,8 @@ subcommand()
 {
 	find_perf
 	check_buildid_cache_permissions
-	echo "$PERF $PERF_SUB_COMMAND -i $PERF_DATA_DIR/perf.data --kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms $*"
-	"$PERF" $PERF_SUB_COMMAND -i "$PERF_DATA_DIR/perf.data" "--kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms" $*
+	echo "$PERF $PERF_SUB_COMMAND -i $PERF_DATA_DIR/perf.data --kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms $@"
+	"$PERF" $PERF_SUB_COMMAND -i "$PERF_DATA_DIR/perf.data" "--kallsyms=$PERF_DATA_DIR/kcore_dir/kallsyms" "$@"
 }
 
 if [ "$1" = "fix_buildid_cache_permissions" ] ; then
@@ -234,7 +234,7 @@ fi
 case "$PERF_SUB_COMMAND" in
 "record")
 	while [ "$1" != "--" ] ; do
-		PERF_OPTIONS+="$1 "
+		PERF_OPTIONS+=("$1")
 		shift || break
 	done
 	if [ "$1" != "--" ] ; then
@@ -242,16 +242,16 @@ case "$PERF_SUB_COMMAND" in
 		usage
 	fi
 	shift
-	record $*
+	record "$@"
 ;;
 "script")
-	subcommand $*
+	subcommand "$@"
 ;;
 "report")
-	subcommand $*
+	subcommand "$@"
 ;;
 "inject")
-	subcommand $*
+	subcommand "$@"
 ;;
 *)
 	usage

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add perf_pmu__format_bits()
  2015-07-17 16:33 ` [PATCH V8 14/25] perf tools: Add perf_pmu__format_bits() Adrian Hunter
  2015-08-06 19:50   ` Arnaldo Carvalho de Melo
@ 2015-08-07  7:23   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-07  7:23 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, tglx, mingo, adrian.hunter, jolsa, acme, linux-kernel

Commit-ID:  09ff607176ab2bf7e038150100fdf9290a6fbe47
Gitweb:     http://git.kernel.org/tip/09ff607176ab2bf7e038150100fdf9290a6fbe47
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:49 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 6 Aug 2015 16:49:01 -0300

perf tools: Add perf_pmu__format_bits()

Add perf_pmu__format_bits() to get the format bits for a PMU config
term.  Intel PT will use this to validate terms and to record format
bits to enable later interpreting the config from the attribute stored
in the perf.data file.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-15-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/pmu.c | 17 ++++++++++++++++-
 tools/perf/util/pmu.h |  1 +
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index b615cdf..c548ec8 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -542,7 +542,7 @@ struct perf_pmu *perf_pmu__find(const char *name)
 }
 
 static struct perf_pmu_format *
-pmu_find_format(struct list_head *formats, char *name)
+pmu_find_format(struct list_head *formats, const char *name)
 {
 	struct perf_pmu_format *format;
 
@@ -553,6 +553,21 @@ pmu_find_format(struct list_head *formats, char *name)
 	return NULL;
 }
 
+__u64 perf_pmu__format_bits(struct list_head *formats, const char *name)
+{
+	struct perf_pmu_format *format = pmu_find_format(formats, name);
+	__u64 bits = 0;
+	int fbit;
+
+	if (!format)
+		return 0;
+
+	for_each_set_bit(fbit, format->bits, PERF_PMU_FORMAT_BITS)
+		bits |= 1ULL << fbit;
+
+	return bits;
+}
+
 /*
  * Sets value based on the format definition (format parameter)
  * and unformated value (value parameter).
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 7b9c8cf..5d7e844 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -54,6 +54,7 @@ int perf_pmu__config_terms(struct list_head *formats,
 			   struct perf_event_attr *attr,
 			   struct list_head *head_terms,
 			   bool zero, struct parse_events_error *error);
+__u64 perf_pmu__format_bits(struct list_head *formats, const char *name);
 int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
 			  struct perf_pmu_info *info);
 struct list_head *perf_pmu__alias(struct perf_pmu *pmu,

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Validate config term maximum value
  2015-07-17 16:33 ` [PATCH V8 15/25] perf tools: Validate config term maximum value Adrian Hunter
  2015-08-06 19:50   ` Arnaldo Carvalho de Melo
@ 2015-08-07  7:23   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-07  7:23 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: adrian.hunter, mingo, linux-kernel, jolsa, hpa, acme, tglx

Commit-ID:  0efe6b67690b6546daa0d2f34a17eb3ca46c9dea
Gitweb:     http://git.kernel.org/tip/0efe6b67690b6546daa0d2f34a17eb3ca46c9dea
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:50 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 6 Aug 2015 16:49:28 -0300

perf tools: Validate config term maximum value

Currently the value of a PMU config term is silently truncated if it is
too big. This is an impediment to validating the value for other
criteria later on i.e.  the user provides an invalid value that gets
truncated to a valid one.

The maximum value validation is only done for the parser where the error
is passed back to the user. In other cases the silent truncation
continues so as not to affect tools that perhaps rely on it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-16-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/pmu.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index c548ec8..d4b0e64 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -589,6 +589,18 @@ static void pmu_format_value(unsigned long *format, __u64 value, __u64 *v,
 	}
 }
 
+static __u64 pmu_format_max_value(const unsigned long *format)
+{
+	int w;
+
+	w = bitmap_weight(format, PERF_PMU_FORMAT_BITS);
+	if (!w)
+		return 0;
+	if (w < 64)
+		return (1ULL << w) - 1;
+	return -1;
+}
+
 /*
  * Term is a string term, and might be a param-term. Try to look up it's value
  * in the remaining terms.
@@ -662,7 +674,7 @@ static int pmu_config_term(struct list_head *formats,
 {
 	struct perf_pmu_format *format;
 	__u64 *vp;
-	__u64 val;
+	__u64 val, max_val;
 
 	/*
 	 * If this is a parameter we've already used for parameterized-eval,
@@ -728,6 +740,22 @@ static int pmu_config_term(struct list_head *formats,
 	} else
 		return -EINVAL;
 
+	max_val = pmu_format_max_value(format->bits);
+	if (val > max_val) {
+		if (err) {
+			err->idx = term->err_val;
+			if (asprintf(&err->str,
+				     "value too big for format, maximum is %llu",
+				     (unsigned long long)max_val) < 0)
+				err->str = strdup("value too big for format");
+			return -EINVAL;
+		}
+		/*
+		 * Assume we don't care if !err, in which case the value will be
+		 * silently truncated.
+		 */
+	}
+
 	pmu_format_value(format->bits, val, vp, zero);
 	return 0;
 }

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Extend the event parser maximum error index
  2015-07-17 16:33 ` [PATCH V8 16/25] perf tools: Extend the event parser maximum error index Adrian Hunter
  2015-08-06 19:50   ` Arnaldo Carvalho de Melo
@ 2015-08-07  7:23   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-07  7:23 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, jolsa, adrian.hunter, hpa, acme, mingo, tglx

Commit-ID:  141b2d3161f19a774b3ceaa8faed5e63484a4684
Gitweb:     http://git.kernel.org/tip/141b2d3161f19a774b3ceaa8faed5e63484a4684
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:51 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Thu, 6 Aug 2015 16:49:44 -0300

perf tools: Extend the event parser maximum error index

Extend the event parser maximum error index from 10 to 13.  That allows
PMU config terms of up to 10 characters to display un-truncated in the
error message.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-17-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/parse-events.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index a6cb9af..828936d 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -1168,7 +1168,7 @@ static void parse_events_print_error(struct parse_events_error *err,
 		 * Maximum error index indent, we will cut
 		 * the event string if it's bigger.
 		 */
-		int max_err_idx = 10;
+		int max_err_idx = 13;
 
 		/*
 		 * Let's be specific with the message when

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 03/25] perf tools: Add Intel PT instruction decoder
  2015-07-17 16:33 ` [PATCH V8 03/25] perf tools: Add Intel PT instruction decoder Adrian Hunter
@ 2015-08-12 20:55   ` Arnaldo Carvalho de Melo
  2015-08-13  6:48     ` Adrian Hunter
  0 siblings, 1 reply; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-12 20:55 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Jul 17, 2015 at 07:33:38PM +0300, Adrian Hunter escreveu:
> Add support for decoding instructions for Intel Processor Trace.  The
> kernel x86 instruction decoder is copied for this.
> 
> This essentially provides intel_pt_get_insn() which takes a binary
> buffer, uses the kernel's x86 instruction decoder to get details of the
> instruction and then categorizes it for consumption by an Intel PT
> decoder.


[acme@felicio linux]$ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ; make -C tools/perf O=/tmp/build/perf install-bin
make: Entering directory `/home/acme/git/linux/tools/perf'
  BUILD:   Doing 'make -j4' parallel build

Auto-detecting system features:
...                         dwarf: [ on  ]
...                         glibc: [ on  ]
...                          gtk2: [ on  ]
...                      libaudit: [ on  ]
...                        libbfd: [ on  ]
...                        libelf: [ on  ]
...                       libnuma: [ on  ]
...                       libperl: [ on  ]
...                     libpython: [ on  ]
...                      libslang: [ on  ]
...                     libunwind: [ on  ]
...            libdw-dwarf-unwind: [ on  ]
...                          zlib: [ on  ]
...                          lzma: [ on  ]

<SNIP>

  CC       /tmp/build/perf/util/thread-stack.o
  MKDIR    /tmp/build/perf/ui/tui/
  CC       /tmp/build/perf/ui/tui/setup.o
  CC       /tmp/build/perf/util/auxtrace.o
  MKDIR    /tmp/build/perf/ui/browsers/
  MKDIR    /tmp/build/perf/util/intel-pt-decoder/
  CC       /tmp/build/perf/ui/browsers/hists.o
  CC       /tmp/build/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.o
  MKDIR    /tmp/build/perf/ui/tui/
  CC       /tmp/build/perf/ui/tui/util.o
  GEN      /tmp/build/perf/util/intel-pt-decoder/inat-tables.c
  MKDIR    /tmp/build/perf/util/scripting-engines/
  CC       /tmp/build/perf/util/scripting-engines/trace-event-perl.o
  CC       /tmp/build/perf/util/intel-pt-decoder/intel-pt-insn-decoder.o
  CC       /tmp/build/perf/ui/tui/helpline.o
  CC       /tmp/build/perf/ui/browsers/map.o
  CC       /tmp/build/perf/ui/browsers/scripts.o
  CC       /tmp/build/perf/ui/tui/progress.o
  MKDIR    /tmp/build/perf/util/scripting-engines/
  CC       /tmp/build/perf/util/scripting-engines/trace-event-python.o
util/intel-pt-decoder/intel-pt-insn-decoder.c:26:18: fatal error: insn.c: No such file or directory
 #include <insn.c>
                  ^
compilation terminated.
<SNIP>
make[4]: *** [/tmp/build/perf/util/intel-pt-decoder/intel-pt-insn-decoder.o] Error 1
make[3]: *** [intel-pt-decoder] Error 2
make[3]: *** Waiting for unfinished jobs....
  MKDIR    /tmp/build/perf/scripts/python/Perf-Trace-Util/
  CC       /tmp/build/perf/scripts/python/Perf-Trace-Util/Context.o
make[2]: *** [util] Error 2
make[2]: *** Waiting for unfinished jobs....
  LD       /tmp/build/perf/scripts/python/Perf-Trace-Util/libperf-in.o
  LD       /tmp/build/perf/scripts/perl/Perf-Trace-Util/libperf-in.o
  LD       /tmp/build/perf/scripts/libperf-in.o
make[1]: *** [/tmp/build/perf/libperf-in.o] Error 2
make: *** [install-bin] Error 2
make: Leaving directory `/home/acme/git/linux/tools/perf'
[acme@felicio linux]$ 
 
It works if I just do:

   cd tools/perf
   make

- Arnaldo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 03/25] perf tools: Add Intel PT instruction decoder
  2015-08-12 20:55   ` Arnaldo Carvalho de Melo
@ 2015-08-13  6:48     ` Adrian Hunter
  2015-08-13  7:14       ` [PATCH V9 " Adrian Hunter
  0 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-08-13  6:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

On 12/08/15 23:55, Arnaldo Carvalho de Melo wrote:
> Em Fri, Jul 17, 2015 at 07:33:38PM +0300, Adrian Hunter escreveu:
>> Add support for decoding instructions for Intel Processor Trace.  The
>> kernel x86 instruction decoder is copied for this.
>>
>> This essentially provides intel_pt_get_insn() which takes a binary
>> buffer, uses the kernel's x86 instruction decoder to get details of the
>> instruction and then categorizes it for consumption by an Intel PT
>> decoder.
> 
> 
> [acme@felicio linux]$ rm -rf /tmp/build/perf ; mkdir -p /tmp/build/perf ; make -C tools/perf O=/tmp/build/perf install-bin
> make: Entering directory `/home/acme/git/linux/tools/perf'
>   BUILD:   Doing 'make -j4' parallel build
> 
> Auto-detecting system features:
> ...                         dwarf: [ on  ]
> ...                         glibc: [ on  ]
> ...                          gtk2: [ on  ]
> ...                      libaudit: [ on  ]
> ...                        libbfd: [ on  ]
> ...                        libelf: [ on  ]
> ...                       libnuma: [ on  ]
> ...                       libperl: [ on  ]
> ...                     libpython: [ on  ]
> ...                      libslang: [ on  ]
> ...                     libunwind: [ on  ]
> ...            libdw-dwarf-unwind: [ on  ]
> ...                          zlib: [ on  ]
> ...                          lzma: [ on  ]
> 
> <SNIP>
> 
>   CC       /tmp/build/perf/util/thread-stack.o
>   MKDIR    /tmp/build/perf/ui/tui/
>   CC       /tmp/build/perf/ui/tui/setup.o
>   CC       /tmp/build/perf/util/auxtrace.o
>   MKDIR    /tmp/build/perf/ui/browsers/
>   MKDIR    /tmp/build/perf/util/intel-pt-decoder/
>   CC       /tmp/build/perf/ui/browsers/hists.o
>   CC       /tmp/build/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.o
>   MKDIR    /tmp/build/perf/ui/tui/
>   CC       /tmp/build/perf/ui/tui/util.o
>   GEN      /tmp/build/perf/util/intel-pt-decoder/inat-tables.c
>   MKDIR    /tmp/build/perf/util/scripting-engines/
>   CC       /tmp/build/perf/util/scripting-engines/trace-event-perl.o
>   CC       /tmp/build/perf/util/intel-pt-decoder/intel-pt-insn-decoder.o
>   CC       /tmp/build/perf/ui/tui/helpline.o
>   CC       /tmp/build/perf/ui/browsers/map.o
>   CC       /tmp/build/perf/ui/browsers/scripts.o
>   CC       /tmp/build/perf/ui/tui/progress.o
>   MKDIR    /tmp/build/perf/util/scripting-engines/
>   CC       /tmp/build/perf/util/scripting-engines/trace-event-python.o
> util/intel-pt-decoder/intel-pt-insn-decoder.c:26:18: fatal error: insn.c: No such file or directory
>  #include <insn.c>
>                   ^
> compilation terminated.

I had a fix for this already but for some reason it fell in with the "for
later" patches.

I will roll the fix in and send V9 of this patch.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [PATCH V9 03/25] perf tools: Add Intel PT instruction decoder
  2015-08-13  6:48     ` Adrian Hunter
@ 2015-08-13  7:14       ` Adrian Hunter
  2015-08-13 12:37         ` Arnaldo Carvalho de Melo
  2015-08-20  9:57         ` [tip:perf/core] " tip-bot for Adrian Hunter
  0 siblings, 2 replies; 77+ messages in thread
From: Adrian Hunter @ 2015-08-13  7:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: linux-kernel, Ingo Molnar, Jiri Olsa

Add support for decoding instructions for Intel Processor Trace.  The
kernel x86 instruction decoder is copied for this.

This essentially provides intel_pt_get_insn() which takes a binary
buffer, uses the kernel's x86 instruction decoder to get details of the
instruction and then categorizes it for consumption by an Intel PT
decoder.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---


Changes in V9:

	insn.c is in the tools/perf sources and so should
	be included as "insn.c" not <insn.c>.

	inat_types.h is in the kernel headers so is not needed
	in the tools/perf sources.


 tools/build/Makefile.build                         |   2 +
 tools/perf/.gitignore                              |   1 +
 tools/perf/Makefile.perf                           |  12 +-
 tools/perf/util/intel-pt-decoder/Build             |  12 +-
 .../util/intel-pt-decoder/gen-insn-attr-x86.awk    | 386 ++++++++
 tools/perf/util/intel-pt-decoder/inat.c            |  96 ++
 tools/perf/util/intel-pt-decoder/inat.h            | 221 +++++
 tools/perf/util/intel-pt-decoder/insn.c            | 594 +++++++++++++
 tools/perf/util/intel-pt-decoder/insn.h            | 201 +++++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.c  | 246 ++++++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |  65 ++
 .../perf/util/intel-pt-decoder/x86-opcode-map.txt  | 970 +++++++++++++++++++++
 12 files changed, 2803 insertions(+), 3 deletions(-)
 create mode 100644 tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
 create mode 100644 tools/perf/util/intel-pt-decoder/inat.c
 create mode 100644 tools/perf/util/intel-pt-decoder/inat.h
 create mode 100644 tools/perf/util/intel-pt-decoder/insn.c
 create mode 100644 tools/perf/util/intel-pt-decoder/insn.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
 create mode 100644 tools/perf/util/intel-pt-decoder/x86-opcode-map.txt

diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index faca2bf6a430..8120af9c0341 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -57,6 +57,8 @@ quiet_cmd_cc_i_c = CPP      $@
 quiet_cmd_cc_s_c = AS       $@
       cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $<
 
+quiet_cmd_gen = GEN      $@
+
 # Link agregate command
 # If there's nothing to link, create empty $@ object.
 quiet_cmd_ld_multi = LD       $@
diff --git a/tools/perf/.gitignore b/tools/perf/.gitignore
index 09db62ba5786..3d1bb802dbf4 100644
--- a/tools/perf/.gitignore
+++ b/tools/perf/.gitignore
@@ -29,3 +29,4 @@ config.mak.autogen
 *.pyc
 *.pyo
 .config-detected
+util/intel-pt-decoder/inat-tables.c
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 4b58daeff881..d9863cb96f59 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -76,6 +76,12 @@ include config/utilities.mak
 #
 # Define NO_AUXTRACE if you do not want AUX area tracing support
 
+# As per kernel Makefile, avoid funny character set dependencies
+unexport LC_ALL
+LC_COLLATE=C
+LC_NUMERIC=C
+export LC_COLLATE LC_NUMERIC
+
 ifeq ($(srctree),)
 srctree := $(patsubst %/,%,$(dir $(shell pwd)))
 srctree := $(patsubst %/,%,$(dir $(srctree)))
@@ -135,6 +141,7 @@ INSTALL = install
 FLEX    = flex
 BISON   = bison
 STRIP   = strip
+AWK     = awk
 
 LIB_DIR          = $(srctree)/tools/lib/api/
 TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
@@ -289,7 +296,7 @@ strip: $(PROGRAMS) $(OUTPUT)perf
 
 PERF_IN := $(OUTPUT)perf-in.o
 
-export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX
+export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX AWK
 build := -f $(srctree)/tools/build/Makefile.build dir=. obj
 
 $(PERF_IN): $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h FORCE
@@ -565,7 +572,8 @@ clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
 	$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
 	$(Q)$(RM) $(OUTPUT).config-detected
 	$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf perf-read-vdso32 perf-read-vdsox32
-	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex*
+	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex* \
+		$(OUTPUT)util/intel-pt-decoder/inat-tables.c
 	$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean
 	$(python-clean)
 
diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
index 9d67381a9bd3..5a46ce13c1f8 100644
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -1 +1,11 @@
-libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o
+
+inat_tables_script = util/intel-pt-decoder/gen-insn-attr-x86.awk
+inat_tables_maps = util/intel-pt-decoder/x86-opcode-map.txt
+
+$(OUTPUT)util/intel-pt-decoder/inat-tables.c: $(inat_tables_script) $(inat_tables_maps)
+	@$(call echo-cmd,gen)$(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
+
+$(OUTPUT)util/intel-pt-decoder/intel-pt-insn-decoder.o: util/intel-pt-decoder/inat.c $(OUTPUT)util/intel-pt-decoder/inat-tables.c
+
+CFLAGS_intel-pt-insn-decoder.o += -I$(OUTPUT)util/intel-pt-decoder -Wno-override-init
diff --git a/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk b/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
new file mode 100644
index 000000000000..517567347aac
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
@@ -0,0 +1,386 @@
+#!/bin/awk -f
+# gen-insn-attr-x86.awk: Instruction attribute table generator
+# Written by Masami Hiramatsu <mhiramat@redhat.com>
+#
+# Usage: awk -f gen-insn-attr-x86.awk x86-opcode-map.txt > inat-tables.c
+
+# Awk implementation sanity check
+function check_awk_implement() {
+	if (sprintf("%x", 0) != "0")
+		return "Your awk has a printf-format problem."
+	return ""
+}
+
+# Clear working vars
+function clear_vars() {
+	delete table
+	delete lptable2
+	delete lptable1
+	delete lptable3
+	eid = -1 # escape id
+	gid = -1 # group id
+	aid = -1 # AVX id
+	tname = ""
+}
+
+BEGIN {
+	# Implementation error checking
+	awkchecked = check_awk_implement()
+	if (awkchecked != "") {
+		print "Error: " awkchecked > "/dev/stderr"
+		print "Please try to use gawk." > "/dev/stderr"
+		exit 1
+	}
+
+	# Setup generating tables
+	print "/* x86 opcode map generated from x86-opcode-map.txt */"
+	print "/* Do not change this code. */\n"
+	ggid = 1
+	geid = 1
+	gaid = 0
+	delete etable
+	delete gtable
+	delete atable
+
+	opnd_expr = "^[A-Za-z/]"
+	ext_expr = "^\\("
+	sep_expr = "^\\|$"
+	group_expr = "^Grp[0-9A-Za-z]+"
+
+	imm_expr = "^[IJAOL][a-z]"
+	imm_flag["Ib"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
+	imm_flag["Jb"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
+	imm_flag["Iw"] = "INAT_MAKE_IMM(INAT_IMM_WORD)"
+	imm_flag["Id"] = "INAT_MAKE_IMM(INAT_IMM_DWORD)"
+	imm_flag["Iq"] = "INAT_MAKE_IMM(INAT_IMM_QWORD)"
+	imm_flag["Ap"] = "INAT_MAKE_IMM(INAT_IMM_PTR)"
+	imm_flag["Iz"] = "INAT_MAKE_IMM(INAT_IMM_VWORD32)"
+	imm_flag["Jz"] = "INAT_MAKE_IMM(INAT_IMM_VWORD32)"
+	imm_flag["Iv"] = "INAT_MAKE_IMM(INAT_IMM_VWORD)"
+	imm_flag["Ob"] = "INAT_MOFFSET"
+	imm_flag["Ov"] = "INAT_MOFFSET"
+	imm_flag["Lx"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
+
+	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
+	force64_expr = "\\([df]64\\)"
+	rex_expr = "^REX(\\.[XRWB]+)*"
+	fpu_expr = "^ESC" # TODO
+
+	lprefix1_expr = "\\((66|!F3)\\)"
+	lprefix2_expr = "\\(F3\\)"
+	lprefix3_expr = "\\((F2|!F3|66\\&F2)\\)"
+	lprefix_expr = "\\((66|F2|F3)\\)"
+	max_lprefix = 4
+
+	# All opcodes starting with lower-case 'v' or with (v1) superscript
+	# accepts VEX prefix
+	vexok_opcode_expr = "^v.*"
+	vexok_expr = "\\(v1\\)"
+	# All opcodes with (v) superscript supports *only* VEX prefix
+	vexonly_expr = "\\(v\\)"
+
+	prefix_expr = "\\(Prefix\\)"
+	prefix_num["Operand-Size"] = "INAT_PFX_OPNDSZ"
+	prefix_num["REPNE"] = "INAT_PFX_REPNE"
+	prefix_num["REP/REPE"] = "INAT_PFX_REPE"
+	prefix_num["XACQUIRE"] = "INAT_PFX_REPNE"
+	prefix_num["XRELEASE"] = "INAT_PFX_REPE"
+	prefix_num["LOCK"] = "INAT_PFX_LOCK"
+	prefix_num["SEG=CS"] = "INAT_PFX_CS"
+	prefix_num["SEG=DS"] = "INAT_PFX_DS"
+	prefix_num["SEG=ES"] = "INAT_PFX_ES"
+	prefix_num["SEG=FS"] = "INAT_PFX_FS"
+	prefix_num["SEG=GS"] = "INAT_PFX_GS"
+	prefix_num["SEG=SS"] = "INAT_PFX_SS"
+	prefix_num["Address-Size"] = "INAT_PFX_ADDRSZ"
+	prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
+	prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
+
+	clear_vars()
+}
+
+function semantic_error(msg) {
+	print "Semantic error at " NR ": " msg > "/dev/stderr"
+	exit 1
+}
+
+function debug(msg) {
+	print "DEBUG: " msg
+}
+
+function array_size(arr,   i,c) {
+	c = 0
+	for (i in arr)
+		c++
+	return c
+}
+
+/^Table:/ {
+	print "/* " $0 " */"
+	if (tname != "")
+		semantic_error("Hit Table: before EndTable:.");
+}
+
+/^Referrer:/ {
+	if (NF != 1) {
+		# escape opcode table
+		ref = ""
+		for (i = 2; i <= NF; i++)
+			ref = ref $i
+		eid = escape[ref]
+		tname = sprintf("inat_escape_table_%d", eid)
+	}
+}
+
+/^AVXcode:/ {
+	if (NF != 1) {
+		# AVX/escape opcode table
+		aid = $2
+		if (gaid <= aid)
+			gaid = aid + 1
+		if (tname == "")	# AVX only opcode table
+			tname = sprintf("inat_avx_table_%d", $2)
+	}
+	if (aid == -1 && eid == -1)	# primary opcode table
+		tname = "inat_primary_table"
+}
+
+/^GrpTable:/ {
+	print "/* " $0 " */"
+	if (!($2 in group))
+		semantic_error("No group: " $2 )
+	gid = group[$2]
+	tname = "inat_group_table_" gid
+}
+
+function print_table(tbl,name,fmt,n)
+{
+	print "const insn_attr_t " name " = {"
+	for (i = 0; i < n; i++) {
+		id = sprintf(fmt, i)
+		if (tbl[id])
+			print "	[" id "] = " tbl[id] ","
+	}
+	print "};"
+}
+
+/^EndTable/ {
+	if (gid != -1) {
+		# print group tables
+		if (array_size(table) != 0) {
+			print_table(table, tname "[INAT_GROUP_TABLE_SIZE]",
+				    "0x%x", 8)
+			gtable[gid,0] = tname
+		}
+		if (array_size(lptable1) != 0) {
+			print_table(lptable1, tname "_1[INAT_GROUP_TABLE_SIZE]",
+				    "0x%x", 8)
+			gtable[gid,1] = tname "_1"
+		}
+		if (array_size(lptable2) != 0) {
+			print_table(lptable2, tname "_2[INAT_GROUP_TABLE_SIZE]",
+				    "0x%x", 8)
+			gtable[gid,2] = tname "_2"
+		}
+		if (array_size(lptable3) != 0) {
+			print_table(lptable3, tname "_3[INAT_GROUP_TABLE_SIZE]",
+				    "0x%x", 8)
+			gtable[gid,3] = tname "_3"
+		}
+	} else {
+		# print primary/escaped tables
+		if (array_size(table) != 0) {
+			print_table(table, tname "[INAT_OPCODE_TABLE_SIZE]",
+				    "0x%02x", 256)
+			etable[eid,0] = tname
+			if (aid >= 0)
+				atable[aid,0] = tname
+		}
+		if (array_size(lptable1) != 0) {
+			print_table(lptable1,tname "_1[INAT_OPCODE_TABLE_SIZE]",
+				    "0x%02x", 256)
+			etable[eid,1] = tname "_1"
+			if (aid >= 0)
+				atable[aid,1] = tname "_1"
+		}
+		if (array_size(lptable2) != 0) {
+			print_table(lptable2,tname "_2[INAT_OPCODE_TABLE_SIZE]",
+				    "0x%02x", 256)
+			etable[eid,2] = tname "_2"
+			if (aid >= 0)
+				atable[aid,2] = tname "_2"
+		}
+		if (array_size(lptable3) != 0) {
+			print_table(lptable3,tname "_3[INAT_OPCODE_TABLE_SIZE]",
+				    "0x%02x", 256)
+			etable[eid,3] = tname "_3"
+			if (aid >= 0)
+				atable[aid,3] = tname "_3"
+		}
+	}
+	print ""
+	clear_vars()
+}
+
+function add_flags(old,new) {
+	if (old && new)
+		return old " | " new
+	else if (old)
+		return old
+	else
+		return new
+}
+
+# convert operands to flags.
+function convert_operands(count,opnd,       i,j,imm,mod)
+{
+	imm = null
+	mod = null
+	for (j = 1; j <= count; j++) {
+		i = opnd[j]
+		if (match(i, imm_expr) == 1) {
+			if (!imm_flag[i])
+				semantic_error("Unknown imm opnd: " i)
+			if (imm) {
+				if (i != "Ib")
+					semantic_error("Second IMM error")
+				imm = add_flags(imm, "INAT_SCNDIMM")
+			} else
+				imm = imm_flag[i]
+		} else if (match(i, modrm_expr))
+			mod = "INAT_MODRM"
+	}
+	return add_flags(imm, mod)
+}
+
+/^[0-9a-f]+\:/ {
+	if (NR == 1)
+		next
+	# get index
+	idx = "0x" substr($1, 1, index($1,":") - 1)
+	if (idx in table)
+		semantic_error("Redefine " idx " in " tname)
+
+	# check if escaped opcode
+	if ("escape" == $2) {
+		if ($3 != "#")
+			semantic_error("No escaped name")
+		ref = ""
+		for (i = 4; i <= NF; i++)
+			ref = ref $i
+		if (ref in escape)
+			semantic_error("Redefine escape (" ref ")")
+		escape[ref] = geid
+		geid++
+		table[idx] = "INAT_MAKE_ESCAPE(" escape[ref] ")"
+		next
+	}
+
+	variant = null
+	# converts
+	i = 2
+	while (i <= NF) {
+		opcode = $(i++)
+		delete opnds
+		ext = null
+		flags = null
+		opnd = null
+		# parse one opcode
+		if (match($i, opnd_expr)) {
+			opnd = $i
+			count = split($(i++), opnds, ",")
+			flags = convert_operands(count, opnds)
+		}
+		if (match($i, ext_expr))
+			ext = $(i++)
+		if (match($i, sep_expr))
+			i++
+		else if (i < NF)
+			semantic_error($i " is not a separator")
+
+		# check if group opcode
+		if (match(opcode, group_expr)) {
+			if (!(opcode in group)) {
+				group[opcode] = ggid
+				ggid++
+			}
+			flags = add_flags(flags, "INAT_MAKE_GROUP(" group[opcode] ")")
+		}
+		# check force(or default) 64bit
+		if (match(ext, force64_expr))
+			flags = add_flags(flags, "INAT_FORCE64")
+
+		# check REX prefix
+		if (match(opcode, rex_expr))
+			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
+
+		# check coprocessor escape : TODO
+		if (match(opcode, fpu_expr))
+			flags = add_flags(flags, "INAT_MODRM")
+
+		# check VEX codes
+		if (match(ext, vexonly_expr))
+			flags = add_flags(flags, "INAT_VEXOK | INAT_VEXONLY")
+		else if (match(ext, vexok_expr) || match(opcode, vexok_opcode_expr))
+			flags = add_flags(flags, "INAT_VEXOK")
+
+		# check prefixes
+		if (match(ext, prefix_expr)) {
+			if (!prefix_num[opcode])
+				semantic_error("Unknown prefix: " opcode)
+			flags = add_flags(flags, "INAT_MAKE_PREFIX(" prefix_num[opcode] ")")
+		}
+		if (length(flags) == 0)
+			continue
+		# check if last prefix
+		if (match(ext, lprefix1_expr)) {
+			lptable1[idx] = add_flags(lptable1[idx],flags)
+			variant = "INAT_VARIANT"
+		}
+		if (match(ext, lprefix2_expr)) {
+			lptable2[idx] = add_flags(lptable2[idx],flags)
+			variant = "INAT_VARIANT"
+		}
+		if (match(ext, lprefix3_expr)) {
+			lptable3[idx] = add_flags(lptable3[idx],flags)
+			variant = "INAT_VARIANT"
+		}
+		if (!match(ext, lprefix_expr)){
+			table[idx] = add_flags(table[idx],flags)
+		}
+	}
+	if (variant)
+		table[idx] = add_flags(table[idx],variant)
+}
+
+END {
+	if (awkchecked != "")
+		exit 1
+	# print escape opcode map's array
+	print "/* Escape opcode map array */"
+	print "const insn_attr_t * const inat_escape_tables[INAT_ESC_MAX + 1]" \
+	      "[INAT_LSTPFX_MAX + 1] = {"
+	for (i = 0; i < geid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (etable[i,j])
+				print "	["i"]["j"] = "etable[i,j]","
+	print "};\n"
+	# print group opcode map's array
+	print "/* Group opcode map array */"
+	print "const insn_attr_t * const inat_group_tables[INAT_GRP_MAX + 1]"\
+	      "[INAT_LSTPFX_MAX + 1] = {"
+	for (i = 0; i < ggid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (gtable[i,j])
+				print "	["i"]["j"] = "gtable[i,j]","
+	print "};\n"
+	# print AVX opcode map's array
+	print "/* AVX opcode map array */"
+	print "const insn_attr_t * const inat_avx_tables[X86_VEX_M_MAX + 1]"\
+	      "[INAT_LSTPFX_MAX + 1] = {"
+	for (i = 0; i < gaid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (atable[i,j])
+				print "	["i"]["j"] = "atable[i,j]","
+	print "};"
+}
diff --git a/tools/perf/util/intel-pt-decoder/inat.c b/tools/perf/util/intel-pt-decoder/inat.c
new file mode 100644
index 000000000000..feeaa509dfe4
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/inat.c
@@ -0,0 +1,96 @@
+/*
+ * x86 instruction attribute tables
+ *
+ * Written by Masami Hiramatsu <mhiramat@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+#include <asm/insn.h>
+
+/* Attribute tables are generated from opcode map */
+#include "inat-tables.c"
+
+/* Attribute search APIs */
+insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode)
+{
+	return inat_primary_table[opcode];
+}
+
+int inat_get_last_prefix_id(insn_byte_t last_pfx)
+{
+	insn_attr_t lpfx_attr;
+
+	lpfx_attr = inat_get_opcode_attribute(last_pfx);
+	return inat_last_prefix_id(lpfx_attr);
+}
+
+insn_attr_t inat_get_escape_attribute(insn_byte_t opcode, int lpfx_id,
+				      insn_attr_t esc_attr)
+{
+	const insn_attr_t *table;
+	int n;
+
+	n = inat_escape_id(esc_attr);
+
+	table = inat_escape_tables[n][0];
+	if (!table)
+		return 0;
+	if (inat_has_variant(table[opcode]) && lpfx_id) {
+		table = inat_escape_tables[n][lpfx_id];
+		if (!table)
+			return 0;
+	}
+	return table[opcode];
+}
+
+insn_attr_t inat_get_group_attribute(insn_byte_t modrm, int lpfx_id,
+				     insn_attr_t grp_attr)
+{
+	const insn_attr_t *table;
+	int n;
+
+	n = inat_group_id(grp_attr);
+
+	table = inat_group_tables[n][0];
+	if (!table)
+		return inat_group_common_attribute(grp_attr);
+	if (inat_has_variant(table[X86_MODRM_REG(modrm)]) && lpfx_id) {
+		table = inat_group_tables[n][lpfx_id];
+		if (!table)
+			return inat_group_common_attribute(grp_attr);
+	}
+	return table[X86_MODRM_REG(modrm)] |
+	       inat_group_common_attribute(grp_attr);
+}
+
+insn_attr_t inat_get_avx_attribute(insn_byte_t opcode, insn_byte_t vex_m,
+				   insn_byte_t vex_p)
+{
+	const insn_attr_t *table;
+	if (vex_m > X86_VEX_M_MAX || vex_p > INAT_LSTPFX_MAX)
+		return 0;
+	/* At first, this checks the master table */
+	table = inat_avx_tables[vex_m][0];
+	if (!table)
+		return 0;
+	if (!inat_is_group(table[opcode]) && vex_p) {
+		/* If this is not a group, get attribute directly */
+		table = inat_avx_tables[vex_m][vex_p];
+		if (!table)
+			return 0;
+	}
+	return table[opcode];
+}
diff --git a/tools/perf/util/intel-pt-decoder/inat.h b/tools/perf/util/intel-pt-decoder/inat.h
new file mode 100644
index 000000000000..74a2e312e8a2
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/inat.h
@@ -0,0 +1,221 @@
+#ifndef _ASM_X86_INAT_H
+#define _ASM_X86_INAT_H
+/*
+ * x86 instruction attributes
+ *
+ * Written by Masami Hiramatsu <mhiramat@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+#include <asm/inat_types.h>
+
+/*
+ * Internal bits. Don't use bitmasks directly, because these bits are
+ * unstable. You should use checking functions.
+ */
+
+#define INAT_OPCODE_TABLE_SIZE 256
+#define INAT_GROUP_TABLE_SIZE 8
+
+/* Legacy last prefixes */
+#define INAT_PFX_OPNDSZ	1	/* 0x66 */ /* LPFX1 */
+#define INAT_PFX_REPE	2	/* 0xF3 */ /* LPFX2 */
+#define INAT_PFX_REPNE	3	/* 0xF2 */ /* LPFX3 */
+/* Other Legacy prefixes */
+#define INAT_PFX_LOCK	4	/* 0xF0 */
+#define INAT_PFX_CS	5	/* 0x2E */
+#define INAT_PFX_DS	6	/* 0x3E */
+#define INAT_PFX_ES	7	/* 0x26 */
+#define INAT_PFX_FS	8	/* 0x64 */
+#define INAT_PFX_GS	9	/* 0x65 */
+#define INAT_PFX_SS	10	/* 0x36 */
+#define INAT_PFX_ADDRSZ	11	/* 0x67 */
+/* x86-64 REX prefix */
+#define INAT_PFX_REX	12	/* 0x4X */
+/* AVX VEX prefixes */
+#define INAT_PFX_VEX2	13	/* 2-bytes VEX prefix */
+#define INAT_PFX_VEX3	14	/* 3-bytes VEX prefix */
+
+#define INAT_LSTPFX_MAX	3
+#define INAT_LGCPFX_MAX	11
+
+/* Immediate size */
+#define INAT_IMM_BYTE		1
+#define INAT_IMM_WORD		2
+#define INAT_IMM_DWORD		3
+#define INAT_IMM_QWORD		4
+#define INAT_IMM_PTR		5
+#define INAT_IMM_VWORD32	6
+#define INAT_IMM_VWORD		7
+
+/* Legacy prefix */
+#define INAT_PFX_OFFS	0
+#define INAT_PFX_BITS	4
+#define INAT_PFX_MAX    ((1 << INAT_PFX_BITS) - 1)
+#define INAT_PFX_MASK	(INAT_PFX_MAX << INAT_PFX_OFFS)
+/* Escape opcodes */
+#define INAT_ESC_OFFS	(INAT_PFX_OFFS + INAT_PFX_BITS)
+#define INAT_ESC_BITS	2
+#define INAT_ESC_MAX	((1 << INAT_ESC_BITS) - 1)
+#define INAT_ESC_MASK	(INAT_ESC_MAX << INAT_ESC_OFFS)
+/* Group opcodes (1-16) */
+#define INAT_GRP_OFFS	(INAT_ESC_OFFS + INAT_ESC_BITS)
+#define INAT_GRP_BITS	5
+#define INAT_GRP_MAX	((1 << INAT_GRP_BITS) - 1)
+#define INAT_GRP_MASK	(INAT_GRP_MAX << INAT_GRP_OFFS)
+/* Immediates */
+#define INAT_IMM_OFFS	(INAT_GRP_OFFS + INAT_GRP_BITS)
+#define INAT_IMM_BITS	3
+#define INAT_IMM_MASK	(((1 << INAT_IMM_BITS) - 1) << INAT_IMM_OFFS)
+/* Flags */
+#define INAT_FLAG_OFFS	(INAT_IMM_OFFS + INAT_IMM_BITS)
+#define INAT_MODRM	(1 << (INAT_FLAG_OFFS))
+#define INAT_FORCE64	(1 << (INAT_FLAG_OFFS + 1))
+#define INAT_SCNDIMM	(1 << (INAT_FLAG_OFFS + 2))
+#define INAT_MOFFSET	(1 << (INAT_FLAG_OFFS + 3))
+#define INAT_VARIANT	(1 << (INAT_FLAG_OFFS + 4))
+#define INAT_VEXOK	(1 << (INAT_FLAG_OFFS + 5))
+#define INAT_VEXONLY	(1 << (INAT_FLAG_OFFS + 6))
+/* Attribute making macros for attribute tables */
+#define INAT_MAKE_PREFIX(pfx)	(pfx << INAT_PFX_OFFS)
+#define INAT_MAKE_ESCAPE(esc)	(esc << INAT_ESC_OFFS)
+#define INAT_MAKE_GROUP(grp)	((grp << INAT_GRP_OFFS) | INAT_MODRM)
+#define INAT_MAKE_IMM(imm)	(imm << INAT_IMM_OFFS)
+
+/* Attribute search APIs */
+extern insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode);
+extern int inat_get_last_prefix_id(insn_byte_t last_pfx);
+extern insn_attr_t inat_get_escape_attribute(insn_byte_t opcode,
+					     int lpfx_id,
+					     insn_attr_t esc_attr);
+extern insn_attr_t inat_get_group_attribute(insn_byte_t modrm,
+					    int lpfx_id,
+					    insn_attr_t esc_attr);
+extern insn_attr_t inat_get_avx_attribute(insn_byte_t opcode,
+					  insn_byte_t vex_m,
+					  insn_byte_t vex_pp);
+
+/* Attribute checking functions */
+static inline int inat_is_legacy_prefix(insn_attr_t attr)
+{
+	attr &= INAT_PFX_MASK;
+	return attr && attr <= INAT_LGCPFX_MAX;
+}
+
+static inline int inat_is_address_size_prefix(insn_attr_t attr)
+{
+	return (attr & INAT_PFX_MASK) == INAT_PFX_ADDRSZ;
+}
+
+static inline int inat_is_operand_size_prefix(insn_attr_t attr)
+{
+	return (attr & INAT_PFX_MASK) == INAT_PFX_OPNDSZ;
+}
+
+static inline int inat_is_rex_prefix(insn_attr_t attr)
+{
+	return (attr & INAT_PFX_MASK) == INAT_PFX_REX;
+}
+
+static inline int inat_last_prefix_id(insn_attr_t attr)
+{
+	if ((attr & INAT_PFX_MASK) > INAT_LSTPFX_MAX)
+		return 0;
+	else
+		return attr & INAT_PFX_MASK;
+}
+
+static inline int inat_is_vex_prefix(insn_attr_t attr)
+{
+	attr &= INAT_PFX_MASK;
+	return attr == INAT_PFX_VEX2 || attr == INAT_PFX_VEX3;
+}
+
+static inline int inat_is_vex3_prefix(insn_attr_t attr)
+{
+	return (attr & INAT_PFX_MASK) == INAT_PFX_VEX3;
+}
+
+static inline int inat_is_escape(insn_attr_t attr)
+{
+	return attr & INAT_ESC_MASK;
+}
+
+static inline int inat_escape_id(insn_attr_t attr)
+{
+	return (attr & INAT_ESC_MASK) >> INAT_ESC_OFFS;
+}
+
+static inline int inat_is_group(insn_attr_t attr)
+{
+	return attr & INAT_GRP_MASK;
+}
+
+static inline int inat_group_id(insn_attr_t attr)
+{
+	return (attr & INAT_GRP_MASK) >> INAT_GRP_OFFS;
+}
+
+static inline int inat_group_common_attribute(insn_attr_t attr)
+{
+	return attr & ~INAT_GRP_MASK;
+}
+
+static inline int inat_has_immediate(insn_attr_t attr)
+{
+	return attr & INAT_IMM_MASK;
+}
+
+static inline int inat_immediate_size(insn_attr_t attr)
+{
+	return (attr & INAT_IMM_MASK) >> INAT_IMM_OFFS;
+}
+
+static inline int inat_has_modrm(insn_attr_t attr)
+{
+	return attr & INAT_MODRM;
+}
+
+static inline int inat_is_force64(insn_attr_t attr)
+{
+	return attr & INAT_FORCE64;
+}
+
+static inline int inat_has_second_immediate(insn_attr_t attr)
+{
+	return attr & INAT_SCNDIMM;
+}
+
+static inline int inat_has_moffset(insn_attr_t attr)
+{
+	return attr & INAT_MOFFSET;
+}
+
+static inline int inat_has_variant(insn_attr_t attr)
+{
+	return attr & INAT_VARIANT;
+}
+
+static inline int inat_accept_vex(insn_attr_t attr)
+{
+	return attr & INAT_VEXOK;
+}
+
+static inline int inat_must_vex(insn_attr_t attr)
+{
+	return attr & INAT_VEXONLY;
+}
+#endif
diff --git a/tools/perf/util/intel-pt-decoder/insn.c b/tools/perf/util/intel-pt-decoder/insn.c
new file mode 100644
index 000000000000..8f72b334aea0
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/insn.c
@@ -0,0 +1,594 @@
+/*
+ * x86 instruction analysis
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2002, 2004, 2009
+ */
+
+#ifdef __KERNEL__
+#include <linux/string.h>
+#else
+#include <string.h>
+#endif
+#include <asm/inat.h>
+#include <asm/insn.h>
+
+/* Verify next sizeof(t) bytes can be on the same instruction */
+#define validate_next(t, insn, n)	\
+	((insn)->next_byte + sizeof(t) + n <= (insn)->end_kaddr)
+
+#define __get_next(t, insn)	\
+	({ t r = *(t*)insn->next_byte; insn->next_byte += sizeof(t); r; })
+
+#define __peek_nbyte_next(t, insn, n)	\
+	({ t r = *(t*)((insn)->next_byte + n); r; })
+
+#define get_next(t, insn)	\
+	({ if (unlikely(!validate_next(t, insn, 0))) goto err_out; __get_next(t, insn); })
+
+#define peek_nbyte_next(t, insn, n)	\
+	({ if (unlikely(!validate_next(t, insn, n))) goto err_out; __peek_nbyte_next(t, insn, n); })
+
+#define peek_next(t, insn)	peek_nbyte_next(t, insn, 0)
+
+/**
+ * insn_init() - initialize struct insn
+ * @insn:	&struct insn to be initialized
+ * @kaddr:	address (in kernel memory) of instruction (or copy thereof)
+ * @x86_64:	!0 for 64-bit kernel or 64-bit app
+ */
+void insn_init(struct insn *insn, const void *kaddr, int buf_len, int x86_64)
+{
+	/*
+	 * Instructions longer than MAX_INSN_SIZE (15 bytes) are invalid
+	 * even if the input buffer is long enough to hold them.
+	 */
+	if (buf_len > MAX_INSN_SIZE)
+		buf_len = MAX_INSN_SIZE;
+
+	memset(insn, 0, sizeof(*insn));
+	insn->kaddr = kaddr;
+	insn->end_kaddr = kaddr + buf_len;
+	insn->next_byte = kaddr;
+	insn->x86_64 = x86_64 ? 1 : 0;
+	insn->opnd_bytes = 4;
+	if (x86_64)
+		insn->addr_bytes = 8;
+	else
+		insn->addr_bytes = 4;
+}
+
+/**
+ * insn_get_prefixes - scan x86 instruction prefix bytes
+ * @insn:	&struct insn containing instruction
+ *
+ * Populates the @insn->prefixes bitmap, and updates @insn->next_byte
+ * to point to the (first) opcode.  No effect if @insn->prefixes.got
+ * is already set.
+ */
+void insn_get_prefixes(struct insn *insn)
+{
+	struct insn_field *prefixes = &insn->prefixes;
+	insn_attr_t attr;
+	insn_byte_t b, lb;
+	int i, nb;
+
+	if (prefixes->got)
+		return;
+
+	nb = 0;
+	lb = 0;
+	b = peek_next(insn_byte_t, insn);
+	attr = inat_get_opcode_attribute(b);
+	while (inat_is_legacy_prefix(attr)) {
+		/* Skip if same prefix */
+		for (i = 0; i < nb; i++)
+			if (prefixes->bytes[i] == b)
+				goto found;
+		if (nb == 4)
+			/* Invalid instruction */
+			break;
+		prefixes->bytes[nb++] = b;
+		if (inat_is_address_size_prefix(attr)) {
+			/* address size switches 2/4 or 4/8 */
+			if (insn->x86_64)
+				insn->addr_bytes ^= 12;
+			else
+				insn->addr_bytes ^= 6;
+		} else if (inat_is_operand_size_prefix(attr)) {
+			/* oprand size switches 2/4 */
+			insn->opnd_bytes ^= 6;
+		}
+found:
+		prefixes->nbytes++;
+		insn->next_byte++;
+		lb = b;
+		b = peek_next(insn_byte_t, insn);
+		attr = inat_get_opcode_attribute(b);
+	}
+	/* Set the last prefix */
+	if (lb && lb != insn->prefixes.bytes[3]) {
+		if (unlikely(insn->prefixes.bytes[3])) {
+			/* Swap the last prefix */
+			b = insn->prefixes.bytes[3];
+			for (i = 0; i < nb; i++)
+				if (prefixes->bytes[i] == lb)
+					prefixes->bytes[i] = b;
+		}
+		insn->prefixes.bytes[3] = lb;
+	}
+
+	/* Decode REX prefix */
+	if (insn->x86_64) {
+		b = peek_next(insn_byte_t, insn);
+		attr = inat_get_opcode_attribute(b);
+		if (inat_is_rex_prefix(attr)) {
+			insn->rex_prefix.value = b;
+			insn->rex_prefix.nbytes = 1;
+			insn->next_byte++;
+			if (X86_REX_W(b))
+				/* REX.W overrides opnd_size */
+				insn->opnd_bytes = 8;
+		}
+	}
+	insn->rex_prefix.got = 1;
+
+	/* Decode VEX prefix */
+	b = peek_next(insn_byte_t, insn);
+	attr = inat_get_opcode_attribute(b);
+	if (inat_is_vex_prefix(attr)) {
+		insn_byte_t b2 = peek_nbyte_next(insn_byte_t, insn, 1);
+		if (!insn->x86_64) {
+			/*
+			 * In 32-bits mode, if the [7:6] bits (mod bits of
+			 * ModRM) on the second byte are not 11b, it is
+			 * LDS or LES.
+			 */
+			if (X86_MODRM_MOD(b2) != 3)
+				goto vex_end;
+		}
+		insn->vex_prefix.bytes[0] = b;
+		insn->vex_prefix.bytes[1] = b2;
+		if (inat_is_vex3_prefix(attr)) {
+			b2 = peek_nbyte_next(insn_byte_t, insn, 2);
+			insn->vex_prefix.bytes[2] = b2;
+			insn->vex_prefix.nbytes = 3;
+			insn->next_byte += 3;
+			if (insn->x86_64 && X86_VEX_W(b2))
+				/* VEX.W overrides opnd_size */
+				insn->opnd_bytes = 8;
+		} else {
+			/*
+			 * For VEX2, fake VEX3-like byte#2.
+			 * Makes it easier to decode vex.W, vex.vvvv,
+			 * vex.L and vex.pp. Masking with 0x7f sets vex.W == 0.
+			 */
+			insn->vex_prefix.bytes[2] = b2 & 0x7f;
+			insn->vex_prefix.nbytes = 2;
+			insn->next_byte += 2;
+		}
+	}
+vex_end:
+	insn->vex_prefix.got = 1;
+
+	prefixes->got = 1;
+
+err_out:
+	return;
+}
+
+/**
+ * insn_get_opcode - collect opcode(s)
+ * @insn:	&struct insn containing instruction
+ *
+ * Populates @insn->opcode, updates @insn->next_byte to point past the
+ * opcode byte(s), and set @insn->attr (except for groups).
+ * If necessary, first collects any preceding (prefix) bytes.
+ * Sets @insn->opcode.value = opcode1.  No effect if @insn->opcode.got
+ * is already 1.
+ */
+void insn_get_opcode(struct insn *insn)
+{
+	struct insn_field *opcode = &insn->opcode;
+	insn_byte_t op;
+	int pfx_id;
+	if (opcode->got)
+		return;
+	if (!insn->prefixes.got)
+		insn_get_prefixes(insn);
+
+	/* Get first opcode */
+	op = get_next(insn_byte_t, insn);
+	opcode->bytes[0] = op;
+	opcode->nbytes = 1;
+
+	/* Check if there is VEX prefix or not */
+	if (insn_is_avx(insn)) {
+		insn_byte_t m, p;
+		m = insn_vex_m_bits(insn);
+		p = insn_vex_p_bits(insn);
+		insn->attr = inat_get_avx_attribute(op, m, p);
+		if (!inat_accept_vex(insn->attr) && !inat_is_group(insn->attr))
+			insn->attr = 0;	/* This instruction is bad */
+		goto end;	/* VEX has only 1 byte for opcode */
+	}
+
+	insn->attr = inat_get_opcode_attribute(op);
+	while (inat_is_escape(insn->attr)) {
+		/* Get escaped opcode */
+		op = get_next(insn_byte_t, insn);
+		opcode->bytes[opcode->nbytes++] = op;
+		pfx_id = insn_last_prefix_id(insn);
+		insn->attr = inat_get_escape_attribute(op, pfx_id, insn->attr);
+	}
+	if (inat_must_vex(insn->attr))
+		insn->attr = 0;	/* This instruction is bad */
+end:
+	opcode->got = 1;
+
+err_out:
+	return;
+}
+
+/**
+ * insn_get_modrm - collect ModRM byte, if any
+ * @insn:	&struct insn containing instruction
+ *
+ * Populates @insn->modrm and updates @insn->next_byte to point past the
+ * ModRM byte, if any.  If necessary, first collects the preceding bytes
+ * (prefixes and opcode(s)).  No effect if @insn->modrm.got is already 1.
+ */
+void insn_get_modrm(struct insn *insn)
+{
+	struct insn_field *modrm = &insn->modrm;
+	insn_byte_t pfx_id, mod;
+	if (modrm->got)
+		return;
+	if (!insn->opcode.got)
+		insn_get_opcode(insn);
+
+	if (inat_has_modrm(insn->attr)) {
+		mod = get_next(insn_byte_t, insn);
+		modrm->value = mod;
+		modrm->nbytes = 1;
+		if (inat_is_group(insn->attr)) {
+			pfx_id = insn_last_prefix_id(insn);
+			insn->attr = inat_get_group_attribute(mod, pfx_id,
+							      insn->attr);
+			if (insn_is_avx(insn) && !inat_accept_vex(insn->attr))
+				insn->attr = 0;	/* This is bad */
+		}
+	}
+
+	if (insn->x86_64 && inat_is_force64(insn->attr))
+		insn->opnd_bytes = 8;
+	modrm->got = 1;
+
+err_out:
+	return;
+}
+
+
+/**
+ * insn_rip_relative() - Does instruction use RIP-relative addressing mode?
+ * @insn:	&struct insn containing instruction
+ *
+ * If necessary, first collects the instruction up to and including the
+ * ModRM byte.  No effect if @insn->x86_64 is 0.
+ */
+int insn_rip_relative(struct insn *insn)
+{
+	struct insn_field *modrm = &insn->modrm;
+
+	if (!insn->x86_64)
+		return 0;
+	if (!modrm->got)
+		insn_get_modrm(insn);
+	/*
+	 * For rip-relative instructions, the mod field (top 2 bits)
+	 * is zero and the r/m field (bottom 3 bits) is 0x5.
+	 */
+	return (modrm->nbytes && (modrm->value & 0xc7) == 0x5);
+}
+
+/**
+ * insn_get_sib() - Get the SIB byte of instruction
+ * @insn:	&struct insn containing instruction
+ *
+ * If necessary, first collects the instruction up to and including the
+ * ModRM byte.
+ */
+void insn_get_sib(struct insn *insn)
+{
+	insn_byte_t modrm;
+
+	if (insn->sib.got)
+		return;
+	if (!insn->modrm.got)
+		insn_get_modrm(insn);
+	if (insn->modrm.nbytes) {
+		modrm = (insn_byte_t)insn->modrm.value;
+		if (insn->addr_bytes != 2 &&
+		    X86_MODRM_MOD(modrm) != 3 && X86_MODRM_RM(modrm) == 4) {
+			insn->sib.value = get_next(insn_byte_t, insn);
+			insn->sib.nbytes = 1;
+		}
+	}
+	insn->sib.got = 1;
+
+err_out:
+	return;
+}
+
+
+/**
+ * insn_get_displacement() - Get the displacement of instruction
+ * @insn:	&struct insn containing instruction
+ *
+ * If necessary, first collects the instruction up to and including the
+ * SIB byte.
+ * Displacement value is sign-expanded.
+ */
+void insn_get_displacement(struct insn *insn)
+{
+	insn_byte_t mod, rm, base;
+
+	if (insn->displacement.got)
+		return;
+	if (!insn->sib.got)
+		insn_get_sib(insn);
+	if (insn->modrm.nbytes) {
+		/*
+		 * Interpreting the modrm byte:
+		 * mod = 00 - no displacement fields (exceptions below)
+		 * mod = 01 - 1-byte displacement field
+		 * mod = 10 - displacement field is 4 bytes, or 2 bytes if
+		 * 	address size = 2 (0x67 prefix in 32-bit mode)
+		 * mod = 11 - no memory operand
+		 *
+		 * If address size = 2...
+		 * mod = 00, r/m = 110 - displacement field is 2 bytes
+		 *
+		 * If address size != 2...
+		 * mod != 11, r/m = 100 - SIB byte exists
+		 * mod = 00, SIB base = 101 - displacement field is 4 bytes
+		 * mod = 00, r/m = 101 - rip-relative addressing, displacement
+		 * 	field is 4 bytes
+		 */
+		mod = X86_MODRM_MOD(insn->modrm.value);
+		rm = X86_MODRM_RM(insn->modrm.value);
+		base = X86_SIB_BASE(insn->sib.value);
+		if (mod == 3)
+			goto out;
+		if (mod == 1) {
+			insn->displacement.value = get_next(char, insn);
+			insn->displacement.nbytes = 1;
+		} else if (insn->addr_bytes == 2) {
+			if ((mod == 0 && rm == 6) || mod == 2) {
+				insn->displacement.value =
+					 get_next(short, insn);
+				insn->displacement.nbytes = 2;
+			}
+		} else {
+			if ((mod == 0 && rm == 5) || mod == 2 ||
+			    (mod == 0 && base == 5)) {
+				insn->displacement.value = get_next(int, insn);
+				insn->displacement.nbytes = 4;
+			}
+		}
+	}
+out:
+	insn->displacement.got = 1;
+
+err_out:
+	return;
+}
+
+/* Decode moffset16/32/64. Return 0 if failed */
+static int __get_moffset(struct insn *insn)
+{
+	switch (insn->addr_bytes) {
+	case 2:
+		insn->moffset1.value = get_next(short, insn);
+		insn->moffset1.nbytes = 2;
+		break;
+	case 4:
+		insn->moffset1.value = get_next(int, insn);
+		insn->moffset1.nbytes = 4;
+		break;
+	case 8:
+		insn->moffset1.value = get_next(int, insn);
+		insn->moffset1.nbytes = 4;
+		insn->moffset2.value = get_next(int, insn);
+		insn->moffset2.nbytes = 4;
+		break;
+	default:	/* opnd_bytes must be modified manually */
+		goto err_out;
+	}
+	insn->moffset1.got = insn->moffset2.got = 1;
+
+	return 1;
+
+err_out:
+	return 0;
+}
+
+/* Decode imm v32(Iz). Return 0 if failed */
+static int __get_immv32(struct insn *insn)
+{
+	switch (insn->opnd_bytes) {
+	case 2:
+		insn->immediate.value = get_next(short, insn);
+		insn->immediate.nbytes = 2;
+		break;
+	case 4:
+	case 8:
+		insn->immediate.value = get_next(int, insn);
+		insn->immediate.nbytes = 4;
+		break;
+	default:	/* opnd_bytes must be modified manually */
+		goto err_out;
+	}
+
+	return 1;
+
+err_out:
+	return 0;
+}
+
+/* Decode imm v64(Iv/Ov), Return 0 if failed */
+static int __get_immv(struct insn *insn)
+{
+	switch (insn->opnd_bytes) {
+	case 2:
+		insn->immediate1.value = get_next(short, insn);
+		insn->immediate1.nbytes = 2;
+		break;
+	case 4:
+		insn->immediate1.value = get_next(int, insn);
+		insn->immediate1.nbytes = 4;
+		break;
+	case 8:
+		insn->immediate1.value = get_next(int, insn);
+		insn->immediate1.nbytes = 4;
+		insn->immediate2.value = get_next(int, insn);
+		insn->immediate2.nbytes = 4;
+		break;
+	default:	/* opnd_bytes must be modified manually */
+		goto err_out;
+	}
+	insn->immediate1.got = insn->immediate2.got = 1;
+
+	return 1;
+err_out:
+	return 0;
+}
+
+/* Decode ptr16:16/32(Ap) */
+static int __get_immptr(struct insn *insn)
+{
+	switch (insn->opnd_bytes) {
+	case 2:
+		insn->immediate1.value = get_next(short, insn);
+		insn->immediate1.nbytes = 2;
+		break;
+	case 4:
+		insn->immediate1.value = get_next(int, insn);
+		insn->immediate1.nbytes = 4;
+		break;
+	case 8:
+		/* ptr16:64 is not exist (no segment) */
+		return 0;
+	default:	/* opnd_bytes must be modified manually */
+		goto err_out;
+	}
+	insn->immediate2.value = get_next(unsigned short, insn);
+	insn->immediate2.nbytes = 2;
+	insn->immediate1.got = insn->immediate2.got = 1;
+
+	return 1;
+err_out:
+	return 0;
+}
+
+/**
+ * insn_get_immediate() - Get the immediates of instruction
+ * @insn:	&struct insn containing instruction
+ *
+ * If necessary, first collects the instruction up to and including the
+ * displacement bytes.
+ * Basically, most of immediates are sign-expanded. Unsigned-value can be
+ * get by bit masking with ((1 << (nbytes * 8)) - 1)
+ */
+void insn_get_immediate(struct insn *insn)
+{
+	if (insn->immediate.got)
+		return;
+	if (!insn->displacement.got)
+		insn_get_displacement(insn);
+
+	if (inat_has_moffset(insn->attr)) {
+		if (!__get_moffset(insn))
+			goto err_out;
+		goto done;
+	}
+
+	if (!inat_has_immediate(insn->attr))
+		/* no immediates */
+		goto done;
+
+	switch (inat_immediate_size(insn->attr)) {
+	case INAT_IMM_BYTE:
+		insn->immediate.value = get_next(char, insn);
+		insn->immediate.nbytes = 1;
+		break;
+	case INAT_IMM_WORD:
+		insn->immediate.value = get_next(short, insn);
+		insn->immediate.nbytes = 2;
+		break;
+	case INAT_IMM_DWORD:
+		insn->immediate.value = get_next(int, insn);
+		insn->immediate.nbytes = 4;
+		break;
+	case INAT_IMM_QWORD:
+		insn->immediate1.value = get_next(int, insn);
+		insn->immediate1.nbytes = 4;
+		insn->immediate2.value = get_next(int, insn);
+		insn->immediate2.nbytes = 4;
+		break;
+	case INAT_IMM_PTR:
+		if (!__get_immptr(insn))
+			goto err_out;
+		break;
+	case INAT_IMM_VWORD32:
+		if (!__get_immv32(insn))
+			goto err_out;
+		break;
+	case INAT_IMM_VWORD:
+		if (!__get_immv(insn))
+			goto err_out;
+		break;
+	default:
+		/* Here, insn must have an immediate, but failed */
+		goto err_out;
+	}
+	if (inat_has_second_immediate(insn->attr)) {
+		insn->immediate2.value = get_next(char, insn);
+		insn->immediate2.nbytes = 1;
+	}
+done:
+	insn->immediate.got = 1;
+
+err_out:
+	return;
+}
+
+/**
+ * insn_get_length() - Get the length of instruction
+ * @insn:	&struct insn containing instruction
+ *
+ * If necessary, first collects the instruction up to and including the
+ * immediates bytes.
+ */
+void insn_get_length(struct insn *insn)
+{
+	if (insn->length)
+		return;
+	if (!insn->immediate.got)
+		insn_get_immediate(insn);
+	insn->length = (unsigned char)((unsigned long)insn->next_byte
+				     - (unsigned long)insn->kaddr);
+}
diff --git a/tools/perf/util/intel-pt-decoder/insn.h b/tools/perf/util/intel-pt-decoder/insn.h
new file mode 100644
index 000000000000..e7814b74caf8
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/insn.h
@@ -0,0 +1,201 @@
+#ifndef _ASM_X86_INSN_H
+#define _ASM_X86_INSN_H
+/*
+ * x86 instruction analysis
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ * Copyright (C) IBM Corporation, 2009
+ */
+
+/* insn_attr_t is defined in inat.h */
+#include <asm/inat.h>
+
+struct insn_field {
+	union {
+		insn_value_t value;
+		insn_byte_t bytes[4];
+	};
+	/* !0 if we've run insn_get_xxx() for this field */
+	unsigned char got;
+	unsigned char nbytes;
+};
+
+struct insn {
+	struct insn_field prefixes;	/*
+					 * Prefixes
+					 * prefixes.bytes[3]: last prefix
+					 */
+	struct insn_field rex_prefix;	/* REX prefix */
+	struct insn_field vex_prefix;	/* VEX prefix */
+	struct insn_field opcode;	/*
+					 * opcode.bytes[0]: opcode1
+					 * opcode.bytes[1]: opcode2
+					 * opcode.bytes[2]: opcode3
+					 */
+	struct insn_field modrm;
+	struct insn_field sib;
+	struct insn_field displacement;
+	union {
+		struct insn_field immediate;
+		struct insn_field moffset1;	/* for 64bit MOV */
+		struct insn_field immediate1;	/* for 64bit imm or off16/32 */
+	};
+	union {
+		struct insn_field moffset2;	/* for 64bit MOV */
+		struct insn_field immediate2;	/* for 64bit imm or seg16 */
+	};
+
+	insn_attr_t attr;
+	unsigned char opnd_bytes;
+	unsigned char addr_bytes;
+	unsigned char length;
+	unsigned char x86_64;
+
+	const insn_byte_t *kaddr;	/* kernel address of insn to analyze */
+	const insn_byte_t *end_kaddr;	/* kernel address of last insn in buffer */
+	const insn_byte_t *next_byte;
+};
+
+#define MAX_INSN_SIZE	15
+
+#define X86_MODRM_MOD(modrm) (((modrm) & 0xc0) >> 6)
+#define X86_MODRM_REG(modrm) (((modrm) & 0x38) >> 3)
+#define X86_MODRM_RM(modrm) ((modrm) & 0x07)
+
+#define X86_SIB_SCALE(sib) (((sib) & 0xc0) >> 6)
+#define X86_SIB_INDEX(sib) (((sib) & 0x38) >> 3)
+#define X86_SIB_BASE(sib) ((sib) & 0x07)
+
+#define X86_REX_W(rex) ((rex) & 8)
+#define X86_REX_R(rex) ((rex) & 4)
+#define X86_REX_X(rex) ((rex) & 2)
+#define X86_REX_B(rex) ((rex) & 1)
+
+/* VEX bit flags  */
+#define X86_VEX_W(vex)	((vex) & 0x80)	/* VEX3 Byte2 */
+#define X86_VEX_R(vex)	((vex) & 0x80)	/* VEX2/3 Byte1 */
+#define X86_VEX_X(vex)	((vex) & 0x40)	/* VEX3 Byte1 */
+#define X86_VEX_B(vex)	((vex) & 0x20)	/* VEX3 Byte1 */
+#define X86_VEX_L(vex)	((vex) & 0x04)	/* VEX3 Byte2, VEX2 Byte1 */
+/* VEX bit fields */
+#define X86_VEX3_M(vex)	((vex) & 0x1f)		/* VEX3 Byte1 */
+#define X86_VEX2_M	1			/* VEX2.M always 1 */
+#define X86_VEX_V(vex)	(((vex) & 0x78) >> 3)	/* VEX3 Byte2, VEX2 Byte1 */
+#define X86_VEX_P(vex)	((vex) & 0x03)		/* VEX3 Byte2, VEX2 Byte1 */
+#define X86_VEX_M_MAX	0x1f			/* VEX3.M Maximum value */
+
+extern void insn_init(struct insn *insn, const void *kaddr, int buf_len, int x86_64);
+extern void insn_get_prefixes(struct insn *insn);
+extern void insn_get_opcode(struct insn *insn);
+extern void insn_get_modrm(struct insn *insn);
+extern void insn_get_sib(struct insn *insn);
+extern void insn_get_displacement(struct insn *insn);
+extern void insn_get_immediate(struct insn *insn);
+extern void insn_get_length(struct insn *insn);
+
+/* Attribute will be determined after getting ModRM (for opcode groups) */
+static inline void insn_get_attribute(struct insn *insn)
+{
+	insn_get_modrm(insn);
+}
+
+/* Instruction uses RIP-relative addressing */
+extern int insn_rip_relative(struct insn *insn);
+
+/* Init insn for kernel text */
+static inline void kernel_insn_init(struct insn *insn,
+				    const void *kaddr, int buf_len)
+{
+#ifdef CONFIG_X86_64
+	insn_init(insn, kaddr, buf_len, 1);
+#else /* CONFIG_X86_32 */
+	insn_init(insn, kaddr, buf_len, 0);
+#endif
+}
+
+static inline int insn_is_avx(struct insn *insn)
+{
+	if (!insn->prefixes.got)
+		insn_get_prefixes(insn);
+	return (insn->vex_prefix.value != 0);
+}
+
+/* Ensure this instruction is decoded completely */
+static inline int insn_complete(struct insn *insn)
+{
+	return insn->opcode.got && insn->modrm.got && insn->sib.got &&
+		insn->displacement.got && insn->immediate.got;
+}
+
+static inline insn_byte_t insn_vex_m_bits(struct insn *insn)
+{
+	if (insn->vex_prefix.nbytes == 2)	/* 2 bytes VEX */
+		return X86_VEX2_M;
+	else
+		return X86_VEX3_M(insn->vex_prefix.bytes[1]);
+}
+
+static inline insn_byte_t insn_vex_p_bits(struct insn *insn)
+{
+	if (insn->vex_prefix.nbytes == 2)	/* 2 bytes VEX */
+		return X86_VEX_P(insn->vex_prefix.bytes[1]);
+	else
+		return X86_VEX_P(insn->vex_prefix.bytes[2]);
+}
+
+/* Get the last prefix id from last prefix or VEX prefix */
+static inline int insn_last_prefix_id(struct insn *insn)
+{
+	if (insn_is_avx(insn))
+		return insn_vex_p_bits(insn);	/* VEX_p is a SIMD prefix id */
+
+	if (insn->prefixes.bytes[3])
+		return inat_get_last_prefix_id(insn->prefixes.bytes[3]);
+
+	return 0;
+}
+
+/* Offset of each field from kaddr */
+static inline int insn_offset_rex_prefix(struct insn *insn)
+{
+	return insn->prefixes.nbytes;
+}
+static inline int insn_offset_vex_prefix(struct insn *insn)
+{
+	return insn_offset_rex_prefix(insn) + insn->rex_prefix.nbytes;
+}
+static inline int insn_offset_opcode(struct insn *insn)
+{
+	return insn_offset_vex_prefix(insn) + insn->vex_prefix.nbytes;
+}
+static inline int insn_offset_modrm(struct insn *insn)
+{
+	return insn_offset_opcode(insn) + insn->opcode.nbytes;
+}
+static inline int insn_offset_sib(struct insn *insn)
+{
+	return insn_offset_modrm(insn) + insn->modrm.nbytes;
+}
+static inline int insn_offset_displacement(struct insn *insn)
+{
+	return insn_offset_sib(insn) + insn->sib.nbytes;
+}
+static inline int insn_offset_immediate(struct insn *insn)
+{
+	return insn_offset_displacement(insn) + insn->displacement.nbytes;
+}
+
+#endif /* _ASM_X86_INSN_H */
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
new file mode 100644
index 000000000000..46980fc663ac
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
@@ -0,0 +1,246 @@
+/*
+ * intel_pt_insn_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <endian.h>
+#include <byteswap.h>
+
+#include "event.h"
+
+#include <asm/insn.h>
+
+#include "inat.c"
+#include "insn.c"
+
+#include "intel-pt-insn-decoder.h"
+
+/* Based on branch_type() from perf_event_intel_lbr.c */
+static void intel_pt_insn_decoder(struct insn *insn,
+				  struct intel_pt_insn *intel_pt_insn)
+{
+	enum intel_pt_insn_op op = INTEL_PT_OP_OTHER;
+	enum intel_pt_insn_branch branch = INTEL_PT_BR_NO_BRANCH;
+	int ext;
+
+	if (insn_is_avx(insn)) {
+		intel_pt_insn->op = INTEL_PT_OP_OTHER;
+		intel_pt_insn->branch = INTEL_PT_BR_NO_BRANCH;
+		intel_pt_insn->length = insn->length;
+		return;
+	}
+
+	switch (insn->opcode.bytes[0]) {
+	case 0xf:
+		switch (insn->opcode.bytes[1]) {
+		case 0x05: /* syscall */
+		case 0x34: /* sysenter */
+			op = INTEL_PT_OP_SYSCALL;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 0x07: /* sysret */
+		case 0x35: /* sysexit */
+			op = INTEL_PT_OP_SYSRET;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 0x80 ... 0x8f: /* jcc */
+			op = INTEL_PT_OP_JCC;
+			branch = INTEL_PT_BR_CONDITIONAL;
+			break;
+		default:
+			break;
+		}
+		break;
+	case 0x70 ... 0x7f: /* jcc */
+		op = INTEL_PT_OP_JCC;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xc2: /* near ret */
+	case 0xc3: /* near ret */
+	case 0xca: /* far ret */
+	case 0xcb: /* far ret */
+		op = INTEL_PT_OP_RET;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xcf: /* iret */
+		op = INTEL_PT_OP_IRET;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xcc ... 0xce: /* int */
+		op = INTEL_PT_OP_INT;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xe8: /* call near rel */
+		op = INTEL_PT_OP_CALL;
+		branch = INTEL_PT_BR_UNCONDITIONAL;
+		break;
+	case 0x9a: /* call far absolute */
+		op = INTEL_PT_OP_CALL;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xe0 ... 0xe2: /* loop */
+		op = INTEL_PT_OP_LOOP;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xe3: /* jcc */
+		op = INTEL_PT_OP_JCC;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xe9: /* jmp */
+	case 0xeb: /* jmp */
+		op = INTEL_PT_OP_JMP;
+		branch = INTEL_PT_BR_UNCONDITIONAL;
+		break;
+	case 0xea: /* far jmp */
+		op = INTEL_PT_OP_JMP;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xff: /* call near absolute, call far absolute ind */
+		ext = (insn->modrm.bytes[0] >> 3) & 0x7;
+		switch (ext) {
+		case 2: /* near ind call */
+		case 3: /* far ind call */
+			op = INTEL_PT_OP_CALL;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 4:
+		case 5:
+			op = INTEL_PT_OP_JMP;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		default:
+			break;
+		}
+		break;
+	default:
+		break;
+	}
+
+	intel_pt_insn->op = op;
+	intel_pt_insn->branch = branch;
+	intel_pt_insn->length = insn->length;
+
+	if (branch == INTEL_PT_BR_CONDITIONAL ||
+	    branch == INTEL_PT_BR_UNCONDITIONAL) {
+#if __BYTE_ORDER == __BIG_ENDIAN
+		switch (insn->immediate.nbytes) {
+		case 1:
+			intel_pt_insn->rel = insn->immediate.value;
+			break;
+		case 2:
+			intel_pt_insn->rel =
+					bswap_16((short)insn->immediate.value);
+			break;
+		case 4:
+			intel_pt_insn->rel = bswap_32(insn->immediate.value);
+			break;
+		}
+#else
+		intel_pt_insn->rel = insn->immediate.value;
+#endif
+	}
+}
+
+int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
+		      struct intel_pt_insn *intel_pt_insn)
+{
+	struct insn insn;
+
+	insn_init(&insn, buf, len, x86_64);
+	insn_get_length(&insn);
+	if (!insn_complete(&insn) || insn.length > len)
+		return -1;
+	intel_pt_insn_decoder(&insn, intel_pt_insn);
+	if (insn.length < INTEL_PT_INSN_DBG_BUF_SZ)
+		memcpy(intel_pt_insn->buf, buf, insn.length);
+	else
+		memcpy(intel_pt_insn->buf, buf, INTEL_PT_INSN_DBG_BUF_SZ);
+	return 0;
+}
+
+const char *branch_name[] = {
+	[INTEL_PT_OP_OTHER]	= "Other",
+	[INTEL_PT_OP_CALL]	= "Call",
+	[INTEL_PT_OP_RET]	= "Ret",
+	[INTEL_PT_OP_JCC]	= "Jcc",
+	[INTEL_PT_OP_JMP]	= "Jmp",
+	[INTEL_PT_OP_LOOP]	= "Loop",
+	[INTEL_PT_OP_IRET]	= "IRet",
+	[INTEL_PT_OP_INT]	= "Int",
+	[INTEL_PT_OP_SYSCALL]	= "Syscall",
+	[INTEL_PT_OP_SYSRET]	= "Sysret",
+};
+
+const char *intel_pt_insn_name(enum intel_pt_insn_op op)
+{
+	return branch_name[op];
+}
+
+int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
+		       size_t buf_len)
+{
+	switch (intel_pt_insn->branch) {
+	case INTEL_PT_BR_CONDITIONAL:
+	case INTEL_PT_BR_UNCONDITIONAL:
+		return snprintf(buf, buf_len, "%s %s%d",
+				intel_pt_insn_name(intel_pt_insn->op),
+				intel_pt_insn->rel > 0 ? "+" : "",
+				intel_pt_insn->rel);
+	case INTEL_PT_BR_NO_BRANCH:
+	case INTEL_PT_BR_INDIRECT:
+		return snprintf(buf, buf_len, "%s",
+				intel_pt_insn_name(intel_pt_insn->op));
+	default:
+		break;
+	}
+	return 0;
+}
+
+size_t intel_pt_insn_max_size(void)
+{
+	return MAX_INSN_SIZE;
+}
+
+int intel_pt_insn_type(enum intel_pt_insn_op op)
+{
+	switch (op) {
+	case INTEL_PT_OP_OTHER:
+		return 0;
+	case INTEL_PT_OP_CALL:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL;
+	case INTEL_PT_OP_RET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN;
+	case INTEL_PT_OP_JCC:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
+	case INTEL_PT_OP_JMP:
+		return PERF_IP_FLAG_BRANCH;
+	case INTEL_PT_OP_LOOP:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
+	case INTEL_PT_OP_IRET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
+		       PERF_IP_FLAG_INTERRUPT;
+	case INTEL_PT_OP_INT:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+		       PERF_IP_FLAG_INTERRUPT;
+	case INTEL_PT_OP_SYSCALL:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+		       PERF_IP_FLAG_SYSCALLRET;
+	case INTEL_PT_OP_SYSRET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
+		       PERF_IP_FLAG_SYSCALLRET;
+	default:
+		return 0;
+	}
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
new file mode 100644
index 000000000000..b0adbf37323e
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
@@ -0,0 +1,65 @@
+/*
+ * intel_pt_insn_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_INSN_DECODER_H__
+#define INCLUDE__INTEL_PT_INSN_DECODER_H__
+
+#include <stddef.h>
+#include <stdint.h>
+
+#define INTEL_PT_INSN_DESC_MAX		32
+#define INTEL_PT_INSN_DBG_BUF_SZ	16
+
+enum intel_pt_insn_op {
+	INTEL_PT_OP_OTHER,
+	INTEL_PT_OP_CALL,
+	INTEL_PT_OP_RET,
+	INTEL_PT_OP_JCC,
+	INTEL_PT_OP_JMP,
+	INTEL_PT_OP_LOOP,
+	INTEL_PT_OP_IRET,
+	INTEL_PT_OP_INT,
+	INTEL_PT_OP_SYSCALL,
+	INTEL_PT_OP_SYSRET,
+};
+
+enum intel_pt_insn_branch {
+	INTEL_PT_BR_NO_BRANCH,
+	INTEL_PT_BR_INDIRECT,
+	INTEL_PT_BR_CONDITIONAL,
+	INTEL_PT_BR_UNCONDITIONAL,
+};
+
+struct intel_pt_insn {
+	enum intel_pt_insn_op		op;
+	enum intel_pt_insn_branch	branch;
+	int				length;
+	int32_t				rel;
+	unsigned char			buf[INTEL_PT_INSN_DBG_BUF_SZ];
+};
+
+int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
+		      struct intel_pt_insn *intel_pt_insn);
+
+const char *intel_pt_insn_name(enum intel_pt_insn_op op);
+
+int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
+		       size_t buf_len);
+
+size_t intel_pt_insn_max_size(void);
+
+int intel_pt_insn_type(enum intel_pt_insn_op op);
+
+#endif
diff --git a/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt b/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
new file mode 100644
index 000000000000..816488c0b97e
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
@@ -0,0 +1,970 @@
+# x86 Opcode Maps
+#
+# This is (mostly) based on following documentations.
+# - Intel(R) 64 and IA-32 Architectures Software Developer's Manual Vol.2C
+#   (#326018-047US, June 2013)
+#
+#<Opcode maps>
+# Table: table-name
+# Referrer: escaped-name
+# AVXcode: avx-code
+# opcode: mnemonic|GrpXXX [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...]
+# (or)
+# opcode: escape # escaped-name
+# EndTable
+#
+#<group maps>
+# GrpTable: GrpXXX
+# reg:  mnemonic [operand1[,operand2...]] [(extra1)[,(extra2)...] [| 2nd-mnemonic ...]
+# EndTable
+#
+# AVX Superscripts
+#  (v): this opcode requires VEX prefix.
+#  (v1): this opcode only supports 128bit VEX.
+#
+# Last Prefix Superscripts
+#  - (66): the last prefix is 0x66
+#  - (F3): the last prefix is 0xF3
+#  - (F2): the last prefix is 0xF2
+#  - (!F3) : the last prefix is not 0xF3 (including non-last prefix case)
+#  - (66&F2): Both 0x66 and 0xF2 prefixes are specified.
+
+Table: one byte opcode
+Referrer:
+AVXcode:
+# 0x00 - 0x0f
+00: ADD Eb,Gb
+01: ADD Ev,Gv
+02: ADD Gb,Eb
+03: ADD Gv,Ev
+04: ADD AL,Ib
+05: ADD rAX,Iz
+06: PUSH ES (i64)
+07: POP ES (i64)
+08: OR Eb,Gb
+09: OR Ev,Gv
+0a: OR Gb,Eb
+0b: OR Gv,Ev
+0c: OR AL,Ib
+0d: OR rAX,Iz
+0e: PUSH CS (i64)
+0f: escape # 2-byte escape
+# 0x10 - 0x1f
+10: ADC Eb,Gb
+11: ADC Ev,Gv
+12: ADC Gb,Eb
+13: ADC Gv,Ev
+14: ADC AL,Ib
+15: ADC rAX,Iz
+16: PUSH SS (i64)
+17: POP SS (i64)
+18: SBB Eb,Gb
+19: SBB Ev,Gv
+1a: SBB Gb,Eb
+1b: SBB Gv,Ev
+1c: SBB AL,Ib
+1d: SBB rAX,Iz
+1e: PUSH DS (i64)
+1f: POP DS (i64)
+# 0x20 - 0x2f
+20: AND Eb,Gb
+21: AND Ev,Gv
+22: AND Gb,Eb
+23: AND Gv,Ev
+24: AND AL,Ib
+25: AND rAx,Iz
+26: SEG=ES (Prefix)
+27: DAA (i64)
+28: SUB Eb,Gb
+29: SUB Ev,Gv
+2a: SUB Gb,Eb
+2b: SUB Gv,Ev
+2c: SUB AL,Ib
+2d: SUB rAX,Iz
+2e: SEG=CS (Prefix)
+2f: DAS (i64)
+# 0x30 - 0x3f
+30: XOR Eb,Gb
+31: XOR Ev,Gv
+32: XOR Gb,Eb
+33: XOR Gv,Ev
+34: XOR AL,Ib
+35: XOR rAX,Iz
+36: SEG=SS (Prefix)
+37: AAA (i64)
+38: CMP Eb,Gb
+39: CMP Ev,Gv
+3a: CMP Gb,Eb
+3b: CMP Gv,Ev
+3c: CMP AL,Ib
+3d: CMP rAX,Iz
+3e: SEG=DS (Prefix)
+3f: AAS (i64)
+# 0x40 - 0x4f
+40: INC eAX (i64) | REX (o64)
+41: INC eCX (i64) | REX.B (o64)
+42: INC eDX (i64) | REX.X (o64)
+43: INC eBX (i64) | REX.XB (o64)
+44: INC eSP (i64) | REX.R (o64)
+45: INC eBP (i64) | REX.RB (o64)
+46: INC eSI (i64) | REX.RX (o64)
+47: INC eDI (i64) | REX.RXB (o64)
+48: DEC eAX (i64) | REX.W (o64)
+49: DEC eCX (i64) | REX.WB (o64)
+4a: DEC eDX (i64) | REX.WX (o64)
+4b: DEC eBX (i64) | REX.WXB (o64)
+4c: DEC eSP (i64) | REX.WR (o64)
+4d: DEC eBP (i64) | REX.WRB (o64)
+4e: DEC eSI (i64) | REX.WRX (o64)
+4f: DEC eDI (i64) | REX.WRXB (o64)
+# 0x50 - 0x5f
+50: PUSH rAX/r8 (d64)
+51: PUSH rCX/r9 (d64)
+52: PUSH rDX/r10 (d64)
+53: PUSH rBX/r11 (d64)
+54: PUSH rSP/r12 (d64)
+55: PUSH rBP/r13 (d64)
+56: PUSH rSI/r14 (d64)
+57: PUSH rDI/r15 (d64)
+58: POP rAX/r8 (d64)
+59: POP rCX/r9 (d64)
+5a: POP rDX/r10 (d64)
+5b: POP rBX/r11 (d64)
+5c: POP rSP/r12 (d64)
+5d: POP rBP/r13 (d64)
+5e: POP rSI/r14 (d64)
+5f: POP rDI/r15 (d64)
+# 0x60 - 0x6f
+60: PUSHA/PUSHAD (i64)
+61: POPA/POPAD (i64)
+62: BOUND Gv,Ma (i64)
+63: ARPL Ew,Gw (i64) | MOVSXD Gv,Ev (o64)
+64: SEG=FS (Prefix)
+65: SEG=GS (Prefix)
+66: Operand-Size (Prefix)
+67: Address-Size (Prefix)
+68: PUSH Iz (d64)
+69: IMUL Gv,Ev,Iz
+6a: PUSH Ib (d64)
+6b: IMUL Gv,Ev,Ib
+6c: INS/INSB Yb,DX
+6d: INS/INSW/INSD Yz,DX
+6e: OUTS/OUTSB DX,Xb
+6f: OUTS/OUTSW/OUTSD DX,Xz
+# 0x70 - 0x7f
+70: JO Jb
+71: JNO Jb
+72: JB/JNAE/JC Jb
+73: JNB/JAE/JNC Jb
+74: JZ/JE Jb
+75: JNZ/JNE Jb
+76: JBE/JNA Jb
+77: JNBE/JA Jb
+78: JS Jb
+79: JNS Jb
+7a: JP/JPE Jb
+7b: JNP/JPO Jb
+7c: JL/JNGE Jb
+7d: JNL/JGE Jb
+7e: JLE/JNG Jb
+7f: JNLE/JG Jb
+# 0x80 - 0x8f
+80: Grp1 Eb,Ib (1A)
+81: Grp1 Ev,Iz (1A)
+82: Grp1 Eb,Ib (1A),(i64)
+83: Grp1 Ev,Ib (1A)
+84: TEST Eb,Gb
+85: TEST Ev,Gv
+86: XCHG Eb,Gb
+87: XCHG Ev,Gv
+88: MOV Eb,Gb
+89: MOV Ev,Gv
+8a: MOV Gb,Eb
+8b: MOV Gv,Ev
+8c: MOV Ev,Sw
+8d: LEA Gv,M
+8e: MOV Sw,Ew
+8f: Grp1A (1A) | POP Ev (d64)
+# 0x90 - 0x9f
+90: NOP | PAUSE (F3) | XCHG r8,rAX
+91: XCHG rCX/r9,rAX
+92: XCHG rDX/r10,rAX
+93: XCHG rBX/r11,rAX
+94: XCHG rSP/r12,rAX
+95: XCHG rBP/r13,rAX
+96: XCHG rSI/r14,rAX
+97: XCHG rDI/r15,rAX
+98: CBW/CWDE/CDQE
+99: CWD/CDQ/CQO
+9a: CALLF Ap (i64)
+9b: FWAIT/WAIT
+9c: PUSHF/D/Q Fv (d64)
+9d: POPF/D/Q Fv (d64)
+9e: SAHF
+9f: LAHF
+# 0xa0 - 0xaf
+a0: MOV AL,Ob
+a1: MOV rAX,Ov
+a2: MOV Ob,AL
+a3: MOV Ov,rAX
+a4: MOVS/B Yb,Xb
+a5: MOVS/W/D/Q Yv,Xv
+a6: CMPS/B Xb,Yb
+a7: CMPS/W/D Xv,Yv
+a8: TEST AL,Ib
+a9: TEST rAX,Iz
+aa: STOS/B Yb,AL
+ab: STOS/W/D/Q Yv,rAX
+ac: LODS/B AL,Xb
+ad: LODS/W/D/Q rAX,Xv
+ae: SCAS/B AL,Yb
+# Note: The May 2011 Intel manual shows Xv for the second parameter of the
+# next instruction but Yv is correct
+af: SCAS/W/D/Q rAX,Yv
+# 0xb0 - 0xbf
+b0: MOV AL/R8L,Ib
+b1: MOV CL/R9L,Ib
+b2: MOV DL/R10L,Ib
+b3: MOV BL/R11L,Ib
+b4: MOV AH/R12L,Ib
+b5: MOV CH/R13L,Ib
+b6: MOV DH/R14L,Ib
+b7: MOV BH/R15L,Ib
+b8: MOV rAX/r8,Iv
+b9: MOV rCX/r9,Iv
+ba: MOV rDX/r10,Iv
+bb: MOV rBX/r11,Iv
+bc: MOV rSP/r12,Iv
+bd: MOV rBP/r13,Iv
+be: MOV rSI/r14,Iv
+bf: MOV rDI/r15,Iv
+# 0xc0 - 0xcf
+c0: Grp2 Eb,Ib (1A)
+c1: Grp2 Ev,Ib (1A)
+c2: RETN Iw (f64)
+c3: RETN
+c4: LES Gz,Mp (i64) | VEX+2byte (Prefix)
+c5: LDS Gz,Mp (i64) | VEX+1byte (Prefix)
+c6: Grp11A Eb,Ib (1A)
+c7: Grp11B Ev,Iz (1A)
+c8: ENTER Iw,Ib
+c9: LEAVE (d64)
+ca: RETF Iw
+cb: RETF
+cc: INT3
+cd: INT Ib
+ce: INTO (i64)
+cf: IRET/D/Q
+# 0xd0 - 0xdf
+d0: Grp2 Eb,1 (1A)
+d1: Grp2 Ev,1 (1A)
+d2: Grp2 Eb,CL (1A)
+d3: Grp2 Ev,CL (1A)
+d4: AAM Ib (i64)
+d5: AAD Ib (i64)
+d6:
+d7: XLAT/XLATB
+d8: ESC
+d9: ESC
+da: ESC
+db: ESC
+dc: ESC
+dd: ESC
+de: ESC
+df: ESC
+# 0xe0 - 0xef
+# Note: "forced64" is Intel CPU behavior: they ignore 0x66 prefix
+# in 64-bit mode. AMD CPUs accept 0x66 prefix, it causes RIP truncation
+# to 16 bits. In 32-bit mode, 0x66 is accepted by both Intel and AMD.
+e0: LOOPNE/LOOPNZ Jb (f64)
+e1: LOOPE/LOOPZ Jb (f64)
+e2: LOOP Jb (f64)
+e3: JrCXZ Jb (f64)
+e4: IN AL,Ib
+e5: IN eAX,Ib
+e6: OUT Ib,AL
+e7: OUT Ib,eAX
+# With 0x66 prefix in 64-bit mode, for AMD CPUs immediate offset
+# in "near" jumps and calls is 16-bit. For CALL,
+# push of return address is 16-bit wide, RSP is decremented by 2
+# but is not truncated to 16 bits, unlike RIP.
+e8: CALL Jz (f64)
+e9: JMP-near Jz (f64)
+ea: JMP-far Ap (i64)
+eb: JMP-short Jb (f64)
+ec: IN AL,DX
+ed: IN eAX,DX
+ee: OUT DX,AL
+ef: OUT DX,eAX
+# 0xf0 - 0xff
+f0: LOCK (Prefix)
+f1:
+f2: REPNE (Prefix) | XACQUIRE (Prefix)
+f3: REP/REPE (Prefix) | XRELEASE (Prefix)
+f4: HLT
+f5: CMC
+f6: Grp3_1 Eb (1A)
+f7: Grp3_2 Ev (1A)
+f8: CLC
+f9: STC
+fa: CLI
+fb: STI
+fc: CLD
+fd: STD
+fe: Grp4 (1A)
+ff: Grp5 (1A)
+EndTable
+
+Table: 2-byte opcode (0x0f)
+Referrer: 2-byte escape
+AVXcode: 1
+# 0x0f 0x00-0x0f
+00: Grp6 (1A)
+01: Grp7 (1A)
+02: LAR Gv,Ew
+03: LSL Gv,Ew
+04:
+05: SYSCALL (o64)
+06: CLTS
+07: SYSRET (o64)
+08: INVD
+09: WBINVD
+0a:
+0b: UD2 (1B)
+0c:
+# AMD's prefetch group. Intel supports prefetchw(/1) only.
+0d: GrpP
+0e: FEMMS
+# 3DNow! uses the last imm byte as opcode extension.
+0f: 3DNow! Pq,Qq,Ib
+# 0x0f 0x10-0x1f
+# NOTE: According to Intel SDM opcode map, vmovups and vmovupd has no operands
+# but it actually has operands. And also, vmovss and vmovsd only accept 128bit.
+# MOVSS/MOVSD has too many forms(3) on SDM. This map just shows a typical form.
+# Many AVX instructions lack v1 superscript, according to Intel AVX-Prgramming
+# Reference A.1
+10: vmovups Vps,Wps | vmovupd Vpd,Wpd (66) | vmovss Vx,Hx,Wss (F3),(v1) | vmovsd Vx,Hx,Wsd (F2),(v1)
+11: vmovups Wps,Vps | vmovupd Wpd,Vpd (66) | vmovss Wss,Hx,Vss (F3),(v1) | vmovsd Wsd,Hx,Vsd (F2),(v1)
+12: vmovlps Vq,Hq,Mq (v1) | vmovhlps Vq,Hq,Uq (v1) | vmovlpd Vq,Hq,Mq (66),(v1) | vmovsldup Vx,Wx (F3) | vmovddup Vx,Wx (F2)
+13: vmovlps Mq,Vq (v1) | vmovlpd Mq,Vq (66),(v1)
+14: vunpcklps Vx,Hx,Wx | vunpcklpd Vx,Hx,Wx (66)
+15: vunpckhps Vx,Hx,Wx | vunpckhpd Vx,Hx,Wx (66)
+16: vmovhps Vdq,Hq,Mq (v1) | vmovlhps Vdq,Hq,Uq (v1) | vmovhpd Vdq,Hq,Mq (66),(v1) | vmovshdup Vx,Wx (F3)
+17: vmovhps Mq,Vq (v1) | vmovhpd Mq,Vq (66),(v1)
+18: Grp16 (1A)
+19:
+1a: BNDCL Ev,Gv | BNDCU Ev,Gv | BNDMOV Gv,Ev | BNDLDX Gv,Ev,Gv
+1b: BNDCN Ev,Gv | BNDMOV Ev,Gv | BNDMK Gv,Ev | BNDSTX Ev,GV,Gv
+1c:
+1d:
+1e:
+1f: NOP Ev
+# 0x0f 0x20-0x2f
+20: MOV Rd,Cd
+21: MOV Rd,Dd
+22: MOV Cd,Rd
+23: MOV Dd,Rd
+24:
+25:
+26:
+27:
+28: vmovaps Vps,Wps | vmovapd Vpd,Wpd (66)
+29: vmovaps Wps,Vps | vmovapd Wpd,Vpd (66)
+2a: cvtpi2ps Vps,Qpi | cvtpi2pd Vpd,Qpi (66) | vcvtsi2ss Vss,Hss,Ey (F3),(v1) | vcvtsi2sd Vsd,Hsd,Ey (F2),(v1)
+2b: vmovntps Mps,Vps | vmovntpd Mpd,Vpd (66)
+2c: cvttps2pi Ppi,Wps | cvttpd2pi Ppi,Wpd (66) | vcvttss2si Gy,Wss (F3),(v1) | vcvttsd2si Gy,Wsd (F2),(v1)
+2d: cvtps2pi Ppi,Wps | cvtpd2pi Qpi,Wpd (66) | vcvtss2si Gy,Wss (F3),(v1) | vcvtsd2si Gy,Wsd (F2),(v1)
+2e: vucomiss Vss,Wss (v1) | vucomisd  Vsd,Wsd (66),(v1)
+2f: vcomiss Vss,Wss (v1) | vcomisd  Vsd,Wsd (66),(v1)
+# 0x0f 0x30-0x3f
+30: WRMSR
+31: RDTSC
+32: RDMSR
+33: RDPMC
+34: SYSENTER
+35: SYSEXIT
+36:
+37: GETSEC
+38: escape # 3-byte escape 1
+39:
+3a: escape # 3-byte escape 2
+3b:
+3c:
+3d:
+3e:
+3f:
+# 0x0f 0x40-0x4f
+40: CMOVO Gv,Ev
+41: CMOVNO Gv,Ev
+42: CMOVB/C/NAE Gv,Ev
+43: CMOVAE/NB/NC Gv,Ev
+44: CMOVE/Z Gv,Ev
+45: CMOVNE/NZ Gv,Ev
+46: CMOVBE/NA Gv,Ev
+47: CMOVA/NBE Gv,Ev
+48: CMOVS Gv,Ev
+49: CMOVNS Gv,Ev
+4a: CMOVP/PE Gv,Ev
+4b: CMOVNP/PO Gv,Ev
+4c: CMOVL/NGE Gv,Ev
+4d: CMOVNL/GE Gv,Ev
+4e: CMOVLE/NG Gv,Ev
+4f: CMOVNLE/G Gv,Ev
+# 0x0f 0x50-0x5f
+50: vmovmskps Gy,Ups | vmovmskpd Gy,Upd (66)
+51: vsqrtps Vps,Wps | vsqrtpd Vpd,Wpd (66) | vsqrtss Vss,Hss,Wss (F3),(v1) | vsqrtsd Vsd,Hsd,Wsd (F2),(v1)
+52: vrsqrtps Vps,Wps | vrsqrtss Vss,Hss,Wss (F3),(v1)
+53: vrcpps Vps,Wps | vrcpss Vss,Hss,Wss (F3),(v1)
+54: vandps Vps,Hps,Wps | vandpd Vpd,Hpd,Wpd (66)
+55: vandnps Vps,Hps,Wps | vandnpd Vpd,Hpd,Wpd (66)
+56: vorps Vps,Hps,Wps | vorpd Vpd,Hpd,Wpd (66)
+57: vxorps Vps,Hps,Wps | vxorpd Vpd,Hpd,Wpd (66)
+58: vaddps Vps,Hps,Wps | vaddpd Vpd,Hpd,Wpd (66) | vaddss Vss,Hss,Wss (F3),(v1) | vaddsd Vsd,Hsd,Wsd (F2),(v1)
+59: vmulps Vps,Hps,Wps | vmulpd Vpd,Hpd,Wpd (66) | vmulss Vss,Hss,Wss (F3),(v1) | vmulsd Vsd,Hsd,Wsd (F2),(v1)
+5a: vcvtps2pd Vpd,Wps | vcvtpd2ps Vps,Wpd (66) | vcvtss2sd Vsd,Hx,Wss (F3),(v1) | vcvtsd2ss Vss,Hx,Wsd (F2),(v1)
+5b: vcvtdq2ps Vps,Wdq | vcvtps2dq Vdq,Wps (66) | vcvttps2dq Vdq,Wps (F3)
+5c: vsubps Vps,Hps,Wps | vsubpd Vpd,Hpd,Wpd (66) | vsubss Vss,Hss,Wss (F3),(v1) | vsubsd Vsd,Hsd,Wsd (F2),(v1)
+5d: vminps Vps,Hps,Wps | vminpd Vpd,Hpd,Wpd (66) | vminss Vss,Hss,Wss (F3),(v1) | vminsd Vsd,Hsd,Wsd (F2),(v1)
+5e: vdivps Vps,Hps,Wps | vdivpd Vpd,Hpd,Wpd (66) | vdivss Vss,Hss,Wss (F3),(v1) | vdivsd Vsd,Hsd,Wsd (F2),(v1)
+5f: vmaxps Vps,Hps,Wps | vmaxpd Vpd,Hpd,Wpd (66) | vmaxss Vss,Hss,Wss (F3),(v1) | vmaxsd Vsd,Hsd,Wsd (F2),(v1)
+# 0x0f 0x60-0x6f
+60: punpcklbw Pq,Qd | vpunpcklbw Vx,Hx,Wx (66),(v1)
+61: punpcklwd Pq,Qd | vpunpcklwd Vx,Hx,Wx (66),(v1)
+62: punpckldq Pq,Qd | vpunpckldq Vx,Hx,Wx (66),(v1)
+63: packsswb Pq,Qq | vpacksswb Vx,Hx,Wx (66),(v1)
+64: pcmpgtb Pq,Qq | vpcmpgtb Vx,Hx,Wx (66),(v1)
+65: pcmpgtw Pq,Qq | vpcmpgtw Vx,Hx,Wx (66),(v1)
+66: pcmpgtd Pq,Qq | vpcmpgtd Vx,Hx,Wx (66),(v1)
+67: packuswb Pq,Qq | vpackuswb Vx,Hx,Wx (66),(v1)
+68: punpckhbw Pq,Qd | vpunpckhbw Vx,Hx,Wx (66),(v1)
+69: punpckhwd Pq,Qd | vpunpckhwd Vx,Hx,Wx (66),(v1)
+6a: punpckhdq Pq,Qd | vpunpckhdq Vx,Hx,Wx (66),(v1)
+6b: packssdw Pq,Qd | vpackssdw Vx,Hx,Wx (66),(v1)
+6c: vpunpcklqdq Vx,Hx,Wx (66),(v1)
+6d: vpunpckhqdq Vx,Hx,Wx (66),(v1)
+6e: movd/q Pd,Ey | vmovd/q Vy,Ey (66),(v1)
+6f: movq Pq,Qq | vmovdqa Vx,Wx (66) | vmovdqu Vx,Wx (F3)
+# 0x0f 0x70-0x7f
+70: pshufw Pq,Qq,Ib | vpshufd Vx,Wx,Ib (66),(v1) | vpshufhw Vx,Wx,Ib (F3),(v1) | vpshuflw Vx,Wx,Ib (F2),(v1)
+71: Grp12 (1A)
+72: Grp13 (1A)
+73: Grp14 (1A)
+74: pcmpeqb Pq,Qq | vpcmpeqb Vx,Hx,Wx (66),(v1)
+75: pcmpeqw Pq,Qq | vpcmpeqw Vx,Hx,Wx (66),(v1)
+76: pcmpeqd Pq,Qq | vpcmpeqd Vx,Hx,Wx (66),(v1)
+# Note: Remove (v), because vzeroall and vzeroupper becomes emms without VEX.
+77: emms | vzeroupper | vzeroall
+78: VMREAD Ey,Gy
+79: VMWRITE Gy,Ey
+7a:
+7b:
+7c: vhaddpd Vpd,Hpd,Wpd (66) | vhaddps Vps,Hps,Wps (F2)
+7d: vhsubpd Vpd,Hpd,Wpd (66) | vhsubps Vps,Hps,Wps (F2)
+7e: movd/q Ey,Pd | vmovd/q Ey,Vy (66),(v1) | vmovq Vq,Wq (F3),(v1)
+7f: movq Qq,Pq | vmovdqa Wx,Vx (66) | vmovdqu Wx,Vx (F3)
+# 0x0f 0x80-0x8f
+# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
+80: JO Jz (f64)
+81: JNO Jz (f64)
+82: JB/JC/JNAE Jz (f64)
+83: JAE/JNB/JNC Jz (f64)
+84: JE/JZ Jz (f64)
+85: JNE/JNZ Jz (f64)
+86: JBE/JNA Jz (f64)
+87: JA/JNBE Jz (f64)
+88: JS Jz (f64)
+89: JNS Jz (f64)
+8a: JP/JPE Jz (f64)
+8b: JNP/JPO Jz (f64)
+8c: JL/JNGE Jz (f64)
+8d: JNL/JGE Jz (f64)
+8e: JLE/JNG Jz (f64)
+8f: JNLE/JG Jz (f64)
+# 0x0f 0x90-0x9f
+90: SETO Eb
+91: SETNO Eb
+92: SETB/C/NAE Eb
+93: SETAE/NB/NC Eb
+94: SETE/Z Eb
+95: SETNE/NZ Eb
+96: SETBE/NA Eb
+97: SETA/NBE Eb
+98: SETS Eb
+99: SETNS Eb
+9a: SETP/PE Eb
+9b: SETNP/PO Eb
+9c: SETL/NGE Eb
+9d: SETNL/GE Eb
+9e: SETLE/NG Eb
+9f: SETNLE/G Eb
+# 0x0f 0xa0-0xaf
+a0: PUSH FS (d64)
+a1: POP FS (d64)
+a2: CPUID
+a3: BT Ev,Gv
+a4: SHLD Ev,Gv,Ib
+a5: SHLD Ev,Gv,CL
+a6: GrpPDLK
+a7: GrpRNG
+a8: PUSH GS (d64)
+a9: POP GS (d64)
+aa: RSM
+ab: BTS Ev,Gv
+ac: SHRD Ev,Gv,Ib
+ad: SHRD Ev,Gv,CL
+ae: Grp15 (1A),(1C)
+af: IMUL Gv,Ev
+# 0x0f 0xb0-0xbf
+b0: CMPXCHG Eb,Gb
+b1: CMPXCHG Ev,Gv
+b2: LSS Gv,Mp
+b3: BTR Ev,Gv
+b4: LFS Gv,Mp
+b5: LGS Gv,Mp
+b6: MOVZX Gv,Eb
+b7: MOVZX Gv,Ew
+b8: JMPE (!F3) | POPCNT Gv,Ev (F3)
+b9: Grp10 (1A)
+ba: Grp8 Ev,Ib (1A)
+bb: BTC Ev,Gv
+bc: BSF Gv,Ev (!F3) | TZCNT Gv,Ev (F3)
+bd: BSR Gv,Ev (!F3) | LZCNT Gv,Ev (F3)
+be: MOVSX Gv,Eb
+bf: MOVSX Gv,Ew
+# 0x0f 0xc0-0xcf
+c0: XADD Eb,Gb
+c1: XADD Ev,Gv
+c2: vcmpps Vps,Hps,Wps,Ib | vcmppd Vpd,Hpd,Wpd,Ib (66) | vcmpss Vss,Hss,Wss,Ib (F3),(v1) | vcmpsd Vsd,Hsd,Wsd,Ib (F2),(v1)
+c3: movnti My,Gy
+c4: pinsrw Pq,Ry/Mw,Ib | vpinsrw Vdq,Hdq,Ry/Mw,Ib (66),(v1)
+c5: pextrw Gd,Nq,Ib | vpextrw Gd,Udq,Ib (66),(v1)
+c6: vshufps Vps,Hps,Wps,Ib | vshufpd Vpd,Hpd,Wpd,Ib (66)
+c7: Grp9 (1A)
+c8: BSWAP RAX/EAX/R8/R8D
+c9: BSWAP RCX/ECX/R9/R9D
+ca: BSWAP RDX/EDX/R10/R10D
+cb: BSWAP RBX/EBX/R11/R11D
+cc: BSWAP RSP/ESP/R12/R12D
+cd: BSWAP RBP/EBP/R13/R13D
+ce: BSWAP RSI/ESI/R14/R14D
+cf: BSWAP RDI/EDI/R15/R15D
+# 0x0f 0xd0-0xdf
+d0: vaddsubpd Vpd,Hpd,Wpd (66) | vaddsubps Vps,Hps,Wps (F2)
+d1: psrlw Pq,Qq | vpsrlw Vx,Hx,Wx (66),(v1)
+d2: psrld Pq,Qq | vpsrld Vx,Hx,Wx (66),(v1)
+d3: psrlq Pq,Qq | vpsrlq Vx,Hx,Wx (66),(v1)
+d4: paddq Pq,Qq | vpaddq Vx,Hx,Wx (66),(v1)
+d5: pmullw Pq,Qq | vpmullw Vx,Hx,Wx (66),(v1)
+d6: vmovq Wq,Vq (66),(v1) | movq2dq Vdq,Nq (F3) | movdq2q Pq,Uq (F2)
+d7: pmovmskb Gd,Nq | vpmovmskb Gd,Ux (66),(v1)
+d8: psubusb Pq,Qq | vpsubusb Vx,Hx,Wx (66),(v1)
+d9: psubusw Pq,Qq | vpsubusw Vx,Hx,Wx (66),(v1)
+da: pminub Pq,Qq | vpminub Vx,Hx,Wx (66),(v1)
+db: pand Pq,Qq | vpand Vx,Hx,Wx (66),(v1)
+dc: paddusb Pq,Qq | vpaddusb Vx,Hx,Wx (66),(v1)
+dd: paddusw Pq,Qq | vpaddusw Vx,Hx,Wx (66),(v1)
+de: pmaxub Pq,Qq | vpmaxub Vx,Hx,Wx (66),(v1)
+df: pandn Pq,Qq | vpandn Vx,Hx,Wx (66),(v1)
+# 0x0f 0xe0-0xef
+e0: pavgb Pq,Qq | vpavgb Vx,Hx,Wx (66),(v1)
+e1: psraw Pq,Qq | vpsraw Vx,Hx,Wx (66),(v1)
+e2: psrad Pq,Qq | vpsrad Vx,Hx,Wx (66),(v1)
+e3: pavgw Pq,Qq | vpavgw Vx,Hx,Wx (66),(v1)
+e4: pmulhuw Pq,Qq | vpmulhuw Vx,Hx,Wx (66),(v1)
+e5: pmulhw Pq,Qq | vpmulhw Vx,Hx,Wx (66),(v1)
+e6: vcvttpd2dq Vx,Wpd (66) | vcvtdq2pd Vx,Wdq (F3) | vcvtpd2dq Vx,Wpd (F2)
+e7: movntq Mq,Pq | vmovntdq Mx,Vx (66)
+e8: psubsb Pq,Qq | vpsubsb Vx,Hx,Wx (66),(v1)
+e9: psubsw Pq,Qq | vpsubsw Vx,Hx,Wx (66),(v1)
+ea: pminsw Pq,Qq | vpminsw Vx,Hx,Wx (66),(v1)
+eb: por Pq,Qq | vpor Vx,Hx,Wx (66),(v1)
+ec: paddsb Pq,Qq | vpaddsb Vx,Hx,Wx (66),(v1)
+ed: paddsw Pq,Qq | vpaddsw Vx,Hx,Wx (66),(v1)
+ee: pmaxsw Pq,Qq | vpmaxsw Vx,Hx,Wx (66),(v1)
+ef: pxor Pq,Qq | vpxor Vx,Hx,Wx (66),(v1)
+# 0x0f 0xf0-0xff
+f0: vlddqu Vx,Mx (F2)
+f1: psllw Pq,Qq | vpsllw Vx,Hx,Wx (66),(v1)
+f2: pslld Pq,Qq | vpslld Vx,Hx,Wx (66),(v1)
+f3: psllq Pq,Qq | vpsllq Vx,Hx,Wx (66),(v1)
+f4: pmuludq Pq,Qq | vpmuludq Vx,Hx,Wx (66),(v1)
+f5: pmaddwd Pq,Qq | vpmaddwd Vx,Hx,Wx (66),(v1)
+f6: psadbw Pq,Qq | vpsadbw Vx,Hx,Wx (66),(v1)
+f7: maskmovq Pq,Nq | vmaskmovdqu Vx,Ux (66),(v1)
+f8: psubb Pq,Qq | vpsubb Vx,Hx,Wx (66),(v1)
+f9: psubw Pq,Qq | vpsubw Vx,Hx,Wx (66),(v1)
+fa: psubd Pq,Qq | vpsubd Vx,Hx,Wx (66),(v1)
+fb: psubq Pq,Qq | vpsubq Vx,Hx,Wx (66),(v1)
+fc: paddb Pq,Qq | vpaddb Vx,Hx,Wx (66),(v1)
+fd: paddw Pq,Qq | vpaddw Vx,Hx,Wx (66),(v1)
+fe: paddd Pq,Qq | vpaddd Vx,Hx,Wx (66),(v1)
+ff:
+EndTable
+
+Table: 3-byte opcode 1 (0x0f 0x38)
+Referrer: 3-byte escape 1
+AVXcode: 2
+# 0x0f 0x38 0x00-0x0f
+00: pshufb Pq,Qq | vpshufb Vx,Hx,Wx (66),(v1)
+01: phaddw Pq,Qq | vphaddw Vx,Hx,Wx (66),(v1)
+02: phaddd Pq,Qq | vphaddd Vx,Hx,Wx (66),(v1)
+03: phaddsw Pq,Qq | vphaddsw Vx,Hx,Wx (66),(v1)
+04: pmaddubsw Pq,Qq | vpmaddubsw Vx,Hx,Wx (66),(v1)
+05: phsubw Pq,Qq | vphsubw Vx,Hx,Wx (66),(v1)
+06: phsubd Pq,Qq | vphsubd Vx,Hx,Wx (66),(v1)
+07: phsubsw Pq,Qq | vphsubsw Vx,Hx,Wx (66),(v1)
+08: psignb Pq,Qq | vpsignb Vx,Hx,Wx (66),(v1)
+09: psignw Pq,Qq | vpsignw Vx,Hx,Wx (66),(v1)
+0a: psignd Pq,Qq | vpsignd Vx,Hx,Wx (66),(v1)
+0b: pmulhrsw Pq,Qq | vpmulhrsw Vx,Hx,Wx (66),(v1)
+0c: vpermilps Vx,Hx,Wx (66),(v)
+0d: vpermilpd Vx,Hx,Wx (66),(v)
+0e: vtestps Vx,Wx (66),(v)
+0f: vtestpd Vx,Wx (66),(v)
+# 0x0f 0x38 0x10-0x1f
+10: pblendvb Vdq,Wdq (66)
+11:
+12:
+13: vcvtph2ps Vx,Wx,Ib (66),(v)
+14: blendvps Vdq,Wdq (66)
+15: blendvpd Vdq,Wdq (66)
+16: vpermps Vqq,Hqq,Wqq (66),(v)
+17: vptest Vx,Wx (66)
+18: vbroadcastss Vx,Wd (66),(v)
+19: vbroadcastsd Vqq,Wq (66),(v)
+1a: vbroadcastf128 Vqq,Mdq (66),(v)
+1b:
+1c: pabsb Pq,Qq | vpabsb Vx,Wx (66),(v1)
+1d: pabsw Pq,Qq | vpabsw Vx,Wx (66),(v1)
+1e: pabsd Pq,Qq | vpabsd Vx,Wx (66),(v1)
+1f:
+# 0x0f 0x38 0x20-0x2f
+20: vpmovsxbw Vx,Ux/Mq (66),(v1)
+21: vpmovsxbd Vx,Ux/Md (66),(v1)
+22: vpmovsxbq Vx,Ux/Mw (66),(v1)
+23: vpmovsxwd Vx,Ux/Mq (66),(v1)
+24: vpmovsxwq Vx,Ux/Md (66),(v1)
+25: vpmovsxdq Vx,Ux/Mq (66),(v1)
+26:
+27:
+28: vpmuldq Vx,Hx,Wx (66),(v1)
+29: vpcmpeqq Vx,Hx,Wx (66),(v1)
+2a: vmovntdqa Vx,Mx (66),(v1)
+2b: vpackusdw Vx,Hx,Wx (66),(v1)
+2c: vmaskmovps Vx,Hx,Mx (66),(v)
+2d: vmaskmovpd Vx,Hx,Mx (66),(v)
+2e: vmaskmovps Mx,Hx,Vx (66),(v)
+2f: vmaskmovpd Mx,Hx,Vx (66),(v)
+# 0x0f 0x38 0x30-0x3f
+30: vpmovzxbw Vx,Ux/Mq (66),(v1)
+31: vpmovzxbd Vx,Ux/Md (66),(v1)
+32: vpmovzxbq Vx,Ux/Mw (66),(v1)
+33: vpmovzxwd Vx,Ux/Mq (66),(v1)
+34: vpmovzxwq Vx,Ux/Md (66),(v1)
+35: vpmovzxdq Vx,Ux/Mq (66),(v1)
+36: vpermd Vqq,Hqq,Wqq (66),(v)
+37: vpcmpgtq Vx,Hx,Wx (66),(v1)
+38: vpminsb Vx,Hx,Wx (66),(v1)
+39: vpminsd Vx,Hx,Wx (66),(v1)
+3a: vpminuw Vx,Hx,Wx (66),(v1)
+3b: vpminud Vx,Hx,Wx (66),(v1)
+3c: vpmaxsb Vx,Hx,Wx (66),(v1)
+3d: vpmaxsd Vx,Hx,Wx (66),(v1)
+3e: vpmaxuw Vx,Hx,Wx (66),(v1)
+3f: vpmaxud Vx,Hx,Wx (66),(v1)
+# 0x0f 0x38 0x40-0x8f
+40: vpmulld Vx,Hx,Wx (66),(v1)
+41: vphminposuw Vdq,Wdq (66),(v1)
+42:
+43:
+44:
+45: vpsrlvd/q Vx,Hx,Wx (66),(v)
+46: vpsravd Vx,Hx,Wx (66),(v)
+47: vpsllvd/q Vx,Hx,Wx (66),(v)
+# Skip 0x48-0x57
+58: vpbroadcastd Vx,Wx (66),(v)
+59: vpbroadcastq Vx,Wx (66),(v)
+5a: vbroadcasti128 Vqq,Mdq (66),(v)
+# Skip 0x5b-0x77
+78: vpbroadcastb Vx,Wx (66),(v)
+79: vpbroadcastw Vx,Wx (66),(v)
+# Skip 0x7a-0x7f
+80: INVEPT Gy,Mdq (66)
+81: INVPID Gy,Mdq (66)
+82: INVPCID Gy,Mdq (66)
+8c: vpmaskmovd/q Vx,Hx,Mx (66),(v)
+8e: vpmaskmovd/q Mx,Vx,Hx (66),(v)
+# 0x0f 0x38 0x90-0xbf (FMA)
+90: vgatherdd/q Vx,Hx,Wx (66),(v)
+91: vgatherqd/q Vx,Hx,Wx (66),(v)
+92: vgatherdps/d Vx,Hx,Wx (66),(v)
+93: vgatherqps/d Vx,Hx,Wx (66),(v)
+94:
+95:
+96: vfmaddsub132ps/d Vx,Hx,Wx (66),(v)
+97: vfmsubadd132ps/d Vx,Hx,Wx (66),(v)
+98: vfmadd132ps/d Vx,Hx,Wx (66),(v)
+99: vfmadd132ss/d Vx,Hx,Wx (66),(v),(v1)
+9a: vfmsub132ps/d Vx,Hx,Wx (66),(v)
+9b: vfmsub132ss/d Vx,Hx,Wx (66),(v),(v1)
+9c: vfnmadd132ps/d Vx,Hx,Wx (66),(v)
+9d: vfnmadd132ss/d Vx,Hx,Wx (66),(v),(v1)
+9e: vfnmsub132ps/d Vx,Hx,Wx (66),(v)
+9f: vfnmsub132ss/d Vx,Hx,Wx (66),(v),(v1)
+a6: vfmaddsub213ps/d Vx,Hx,Wx (66),(v)
+a7: vfmsubadd213ps/d Vx,Hx,Wx (66),(v)
+a8: vfmadd213ps/d Vx,Hx,Wx (66),(v)
+a9: vfmadd213ss/d Vx,Hx,Wx (66),(v),(v1)
+aa: vfmsub213ps/d Vx,Hx,Wx (66),(v)
+ab: vfmsub213ss/d Vx,Hx,Wx (66),(v),(v1)
+ac: vfnmadd213ps/d Vx,Hx,Wx (66),(v)
+ad: vfnmadd213ss/d Vx,Hx,Wx (66),(v),(v1)
+ae: vfnmsub213ps/d Vx,Hx,Wx (66),(v)
+af: vfnmsub213ss/d Vx,Hx,Wx (66),(v),(v1)
+b6: vfmaddsub231ps/d Vx,Hx,Wx (66),(v)
+b7: vfmsubadd231ps/d Vx,Hx,Wx (66),(v)
+b8: vfmadd231ps/d Vx,Hx,Wx (66),(v)
+b9: vfmadd231ss/d Vx,Hx,Wx (66),(v),(v1)
+ba: vfmsub231ps/d Vx,Hx,Wx (66),(v)
+bb: vfmsub231ss/d Vx,Hx,Wx (66),(v),(v1)
+bc: vfnmadd231ps/d Vx,Hx,Wx (66),(v)
+bd: vfnmadd231ss/d Vx,Hx,Wx (66),(v),(v1)
+be: vfnmsub231ps/d Vx,Hx,Wx (66),(v)
+bf: vfnmsub231ss/d Vx,Hx,Wx (66),(v),(v1)
+# 0x0f 0x38 0xc0-0xff
+db: VAESIMC Vdq,Wdq (66),(v1)
+dc: VAESENC Vdq,Hdq,Wdq (66),(v1)
+dd: VAESENCLAST Vdq,Hdq,Wdq (66),(v1)
+de: VAESDEC Vdq,Hdq,Wdq (66),(v1)
+df: VAESDECLAST Vdq,Hdq,Wdq (66),(v1)
+f0: MOVBE Gy,My | MOVBE Gw,Mw (66) | CRC32 Gd,Eb (F2) | CRC32 Gd,Eb (66&F2)
+f1: MOVBE My,Gy | MOVBE Mw,Gw (66) | CRC32 Gd,Ey (F2) | CRC32 Gd,Ew (66&F2)
+f2: ANDN Gy,By,Ey (v)
+f3: Grp17 (1A)
+f5: BZHI Gy,Ey,By (v) | PEXT Gy,By,Ey (F3),(v) | PDEP Gy,By,Ey (F2),(v)
+f6: ADCX Gy,Ey (66) | ADOX Gy,Ey (F3) | MULX By,Gy,rDX,Ey (F2),(v)
+f7: BEXTR Gy,Ey,By (v) | SHLX Gy,Ey,By (66),(v) | SARX Gy,Ey,By (F3),(v) | SHRX Gy,Ey,By (F2),(v)
+EndTable
+
+Table: 3-byte opcode 2 (0x0f 0x3a)
+Referrer: 3-byte escape 2
+AVXcode: 3
+# 0x0f 0x3a 0x00-0xff
+00: vpermq Vqq,Wqq,Ib (66),(v)
+01: vpermpd Vqq,Wqq,Ib (66),(v)
+02: vpblendd Vx,Hx,Wx,Ib (66),(v)
+03:
+04: vpermilps Vx,Wx,Ib (66),(v)
+05: vpermilpd Vx,Wx,Ib (66),(v)
+06: vperm2f128 Vqq,Hqq,Wqq,Ib (66),(v)
+07:
+08: vroundps Vx,Wx,Ib (66)
+09: vroundpd Vx,Wx,Ib (66)
+0a: vroundss Vss,Wss,Ib (66),(v1)
+0b: vroundsd Vsd,Wsd,Ib (66),(v1)
+0c: vblendps Vx,Hx,Wx,Ib (66)
+0d: vblendpd Vx,Hx,Wx,Ib (66)
+0e: vpblendw Vx,Hx,Wx,Ib (66),(v1)
+0f: palignr Pq,Qq,Ib | vpalignr Vx,Hx,Wx,Ib (66),(v1)
+14: vpextrb Rd/Mb,Vdq,Ib (66),(v1)
+15: vpextrw Rd/Mw,Vdq,Ib (66),(v1)
+16: vpextrd/q Ey,Vdq,Ib (66),(v1)
+17: vextractps Ed,Vdq,Ib (66),(v1)
+18: vinsertf128 Vqq,Hqq,Wqq,Ib (66),(v)
+19: vextractf128 Wdq,Vqq,Ib (66),(v)
+1d: vcvtps2ph Wx,Vx,Ib (66),(v)
+20: vpinsrb Vdq,Hdq,Ry/Mb,Ib (66),(v1)
+21: vinsertps Vdq,Hdq,Udq/Md,Ib (66),(v1)
+22: vpinsrd/q Vdq,Hdq,Ey,Ib (66),(v1)
+38: vinserti128 Vqq,Hqq,Wqq,Ib (66),(v)
+39: vextracti128 Wdq,Vqq,Ib (66),(v)
+40: vdpps Vx,Hx,Wx,Ib (66)
+41: vdppd Vdq,Hdq,Wdq,Ib (66),(v1)
+42: vmpsadbw Vx,Hx,Wx,Ib (66),(v1)
+44: vpclmulqdq Vdq,Hdq,Wdq,Ib (66),(v1)
+46: vperm2i128 Vqq,Hqq,Wqq,Ib (66),(v)
+4a: vblendvps Vx,Hx,Wx,Lx (66),(v)
+4b: vblendvpd Vx,Hx,Wx,Lx (66),(v)
+4c: vpblendvb Vx,Hx,Wx,Lx (66),(v1)
+60: vpcmpestrm Vdq,Wdq,Ib (66),(v1)
+61: vpcmpestri Vdq,Wdq,Ib (66),(v1)
+62: vpcmpistrm Vdq,Wdq,Ib (66),(v1)
+63: vpcmpistri Vdq,Wdq,Ib (66),(v1)
+df: VAESKEYGEN Vdq,Wdq,Ib (66),(v1)
+f0: RORX Gy,Ey,Ib (F2),(v)
+EndTable
+
+GrpTable: Grp1
+0: ADD
+1: OR
+2: ADC
+3: SBB
+4: AND
+5: SUB
+6: XOR
+7: CMP
+EndTable
+
+GrpTable: Grp1A
+0: POP
+EndTable
+
+GrpTable: Grp2
+0: ROL
+1: ROR
+2: RCL
+3: RCR
+4: SHL/SAL
+5: SHR
+6:
+7: SAR
+EndTable
+
+GrpTable: Grp3_1
+0: TEST Eb,Ib
+1:
+2: NOT Eb
+3: NEG Eb
+4: MUL AL,Eb
+5: IMUL AL,Eb
+6: DIV AL,Eb
+7: IDIV AL,Eb
+EndTable
+
+GrpTable: Grp3_2
+0: TEST Ev,Iz
+1:
+2: NOT Ev
+3: NEG Ev
+4: MUL rAX,Ev
+5: IMUL rAX,Ev
+6: DIV rAX,Ev
+7: IDIV rAX,Ev
+EndTable
+
+GrpTable: Grp4
+0: INC Eb
+1: DEC Eb
+EndTable
+
+GrpTable: Grp5
+0: INC Ev
+1: DEC Ev
+# Note: "forced64" is Intel CPU behavior (see comment about CALL insn).
+2: CALLN Ev (f64)
+3: CALLF Ep
+4: JMPN Ev (f64)
+5: JMPF Mp
+6: PUSH Ev (d64)
+7:
+EndTable
+
+GrpTable: Grp6
+0: SLDT Rv/Mw
+1: STR Rv/Mw
+2: LLDT Ew
+3: LTR Ew
+4: VERR Ew
+5: VERW Ew
+EndTable
+
+GrpTable: Grp7
+0: SGDT Ms | VMCALL (001),(11B) | VMLAUNCH (010),(11B) | VMRESUME (011),(11B) | VMXOFF (100),(11B)
+1: SIDT Ms | MONITOR (000),(11B) | MWAIT (001),(11B) | CLAC (010),(11B) | STAC (011),(11B)
+2: LGDT Ms | XGETBV (000),(11B) | XSETBV (001),(11B) | VMFUNC (100),(11B) | XEND (101)(11B) | XTEST (110)(11B)
+3: LIDT Ms
+4: SMSW Mw/Rv
+5:
+6: LMSW Ew
+7: INVLPG Mb | SWAPGS (o64),(000),(11B) | RDTSCP (001),(11B)
+EndTable
+
+GrpTable: Grp8
+4: BT
+5: BTS
+6: BTR
+7: BTC
+EndTable
+
+GrpTable: Grp9
+1: CMPXCHG8B/16B Mq/Mdq
+6: VMPTRLD Mq | VMCLEAR Mq (66) | VMXON Mq (F3) | RDRAND Rv (11B)
+7: VMPTRST Mq | VMPTRST Mq (F3) | RDSEED Rv (11B)
+EndTable
+
+GrpTable: Grp10
+EndTable
+
+# Grp11A and Grp11B are expressed as Grp11 in Intel SDM
+GrpTable: Grp11A
+0: MOV Eb,Ib
+7: XABORT Ib (000),(11B)
+EndTable
+
+GrpTable: Grp11B
+0: MOV Eb,Iz
+7: XBEGIN Jz (000),(11B)
+EndTable
+
+GrpTable: Grp12
+2: psrlw Nq,Ib (11B) | vpsrlw Hx,Ux,Ib (66),(11B),(v1)
+4: psraw Nq,Ib (11B) | vpsraw Hx,Ux,Ib (66),(11B),(v1)
+6: psllw Nq,Ib (11B) | vpsllw Hx,Ux,Ib (66),(11B),(v1)
+EndTable
+
+GrpTable: Grp13
+2: psrld Nq,Ib (11B) | vpsrld Hx,Ux,Ib (66),(11B),(v1)
+4: psrad Nq,Ib (11B) | vpsrad Hx,Ux,Ib (66),(11B),(v1)
+6: pslld Nq,Ib (11B) | vpslld Hx,Ux,Ib (66),(11B),(v1)
+EndTable
+
+GrpTable: Grp14
+2: psrlq Nq,Ib (11B) | vpsrlq Hx,Ux,Ib (66),(11B),(v1)
+3: vpsrldq Hx,Ux,Ib (66),(11B),(v1)
+6: psllq Nq,Ib (11B) | vpsllq Hx,Ux,Ib (66),(11B),(v1)
+7: vpslldq Hx,Ux,Ib (66),(11B),(v1)
+EndTable
+
+GrpTable: Grp15
+0: fxsave | RDFSBASE Ry (F3),(11B)
+1: fxstor | RDGSBASE Ry (F3),(11B)
+2: vldmxcsr Md (v1) | WRFSBASE Ry (F3),(11B)
+3: vstmxcsr Md (v1) | WRGSBASE Ry (F3),(11B)
+4: XSAVE
+5: XRSTOR | lfence (11B)
+6: XSAVEOPT | mfence (11B)
+7: clflush | sfence (11B)
+EndTable
+
+GrpTable: Grp16
+0: prefetch NTA
+1: prefetch T0
+2: prefetch T1
+3: prefetch T2
+EndTable
+
+GrpTable: Grp17
+1: BLSR By,Ey (v)
+2: BLSMSK By,Ey (v)
+3: BLSI By,Ey (v)
+EndTable
+
+# AMD's Prefetch Group
+GrpTable: GrpP
+0: PREFETCH
+1: PREFETCHW
+EndTable
+
+GrpTable: GrpPDLK
+0: MONTMUL
+1: XSHA1
+2: XSHA2
+EndTable
+
+GrpTable: GrpRNG
+0: xstore-rng
+1: xcrypt-ecb
+2: xcrypt-cbc
+4: xcrypt-cfb
+5: xcrypt-ofb
+EndTable
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH V9 03/25] perf tools: Add Intel PT instruction decoder
  2015-08-13  7:14       ` [PATCH V9 " Adrian Hunter
@ 2015-08-13 12:37         ` Arnaldo Carvalho de Melo
  2015-08-20  9:57         ` [tip:perf/core] " tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-13 12:37 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: linux-kernel, Ingo Molnar, Jiri Olsa

Em Thu, Aug 13, 2015 at 10:14:55AM +0300, Adrian Hunter escreveu:
> Add support for decoding instructions for Intel Processor Trace.  The
> kernel x86 instruction decoder is copied for this.
> 
> This essentially provides intel_pt_get_insn() which takes a binary
> buffer, uses the kernel's x86 instruction decoder to get details of the
> instruction and then categorizes it for consumption by an Intel PT
> decoder.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
> 
> 
> Changes in V9:
> 
> 	insn.c is in the tools/perf sources and so should
> 	be included as "insn.c" not <insn.c>.
> 
> 	inat_types.h is in the kernel headers so is not needed
> 	in the tools/perf sources.

Thanks, now it builds with O=

- Arnaldo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 08/25] perf tools: Add Intel BTS support
  2015-07-17 16:33 ` [PATCH V8 08/25] perf tools: Add Intel BTS support Adrian Hunter
@ 2015-08-17 15:52   ` Arnaldo Carvalho de Melo
  2015-08-17 17:43     ` Adrian Hunter
  2015-08-22  6:52   ` [tip:perf/core] " tip-bot for Adrian Hunter
  1 sibling, 1 reply; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-17 15:52 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Jul 17, 2015 at 07:33:43PM +0300, Adrian Hunter escreveu:
> Intel BTS support fits within the new auxtrace infrastructure.
> Recording is supporting by identifying the Intel BTS PMU, parsing
> options and setting up events.
> 
> Decoding is supported by queuing up trace data by thread and then
> decoding synchronously delivering synthesized event samples into the
> session processing for tools to consume.

So, I am not being able to reproduce the results from last time I tried
it..

[root@zoo ~]# uname -r
4.2.0-rc5+

But all the DSOs are not being resolved :-\ Same machine, will try after lunch
with a tip/master built kernel, right now I get this, perhaps that "81649
instruction errors" message? I'll see what are the results with tip/master,
meanwhile what I have is at the tmp.perf/intel_pt branch in my tree.

- Arnaldo

[root@zoo ~]# perf record --per-thread -e intel_bts// usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.884 MB perf.data ]
[root@zoo ~]# perf evlist
intel_bts//
dummy:u
[root@zoo ~]# perf evlist -v
intel_bts//: type: 6, size: 112, { sample_period, sample_freq }: 1,
sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1,
enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
dummy:u: type: 1, size: 112, config: 0x9, { sample_period, sample_freq
}: 1, sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1,
exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, enable_on_exec: 1,
task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1
[root@zoo ~]#

Warning:
81649 instruction trace errors
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 0  of event 'intel_bts//'
# Event count (approx.): 0
#
# Overhead  Command  Shared Object  Symbol
# ........  .......  .............  ......
#


# Samples: 0  of event 'dummy:u'
# Event count (approx.): 0
#
# Overhead  Command  Shared Object  Symbol
# ........  .......  .............  ......
#


# Samples: 81K of event 'branches'
# Event count (approx.): 81649
#
# Overhead  Command  Shared Object     Symbol                
# ........  .......  ................  ......................
#
     2.71%  usleep   [unknown]         [.] 0x00007fa0ff695061
     2.62%  usleep   [unknown]         [.] 0x00007fa0ffb3de7e
     1.96%  usleep   [unknown]         [.] 0x00007fa0ffb2e726
     1.92%  usleep   [unknown]         [.] 0x00007fa0ff695086
     1.60%  usleep   [unknown]         [.] 0xffffffff811c91d0
     1.48%  usleep   [unknown]         [.] 0x00007fa0ffb3030d
     1.24%  usleep   [unknown]         [.] 0x00007fa0ff6950c7



 
> E.g:
> 
>   [root@zoo ~]# perf record --per-thread -e intel_bts// ls
>   anaconda-ks.cfg  b  bin  lib64	libexec  new  old  perf.data
>   perf.data.old  stream_test  tg.run
>   [ perf record: Woken up 2 times to write data ]
>   [ perf record: Captured and wrote 4.242 MB perf.data ]
>   [root@zoo ~]# perf evlist
>   intel_bts//
>   dummy:u
>   [root@zoo ~]# perf evlist -v
>   intel_bts//: type: 7, size: 112, { sample_period, sample_freq }: 1,
>   sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1,
>   enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1 dummy:u:
>   type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 1,
>   sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1,
>   exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, enable_on_exec: 1,
>   task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1
>   [root@zoo ~]# perf report --stdio
>   # To display the perf.data header info, please use --header/--header-only options.
>   #
>   #
>   # Total Lost Samples: 0
>   #
>   # Samples: 0  of event 'intel_bts//'
>   # Event count (approx.): 0
>   #
>   # Overhead  Command  Shared Object  Symbol
>   # ........  .......  .............  ......
>   #
> 
>   # Samples: 0  of event 'dummy:u'
>   # Event count (approx.): 0
>   #
>   # Overhead  Command  Shared Object  Symbol
>   # ........  .......  .............  ......
>   #
> 
>   # Samples: 184K of event 'branches'
>   # Event count (approx.): 184522
>   #
>   # Overhead  Command  Shared Object       Symbol
>   # ........  .......  ..................  ..............................................
>   #
>      7.36%  ls       [kernel.kallsyms]   [.] unmap_single_vma
>      4.69%  ls       ld-2.20.so          [.] strcmp
>      4.66%  ls       libc-2.20.so        [.] _dl_addr
>      3.43%  ls       [kernel.kallsyms]   [.] change_protection
>      3.01%  ls       ld-2.20.so          [.] do_lookup_x
>      2.18%  ls       [kernel.kallsyms]   [.] filemap_map_pages
>      2.09%  ls       ld-2.20.so          [.] _dl_name_match_p
>      1.93%  ls       ld-2.20.so          [.] _dl_lookup_symbol_x
>      1.91%  ls       [kernel.kallsyms]   [.] page_remove_rmap
>      1.72%  ls       [kernel.kallsyms]   [.] do_set_pte
>      1.48%  ls       ld-2.20.so          [.] _dl_relocate_object
>      1.34%  ls       [kernel.kallsyms]   [.] mem_cgroup_begin_page_stat
>      1.11%  ls       [kernel.kallsyms]   [.] page_add_file_rmap
>      1.00%  ls       [kernel.kallsyms]   [.] mark_page_accessed
>      0.97%  ls       [kernel.kallsyms]   [.] perf_event_aux
>      0.95%  ls       [kernel.kallsyms]   [.] handle_mm_fault
>      0.94%  ls       [kernel.kallsyms]   [.] get_page_from_freelist
>   <SNIP>
>   [root@zoo ~]# perf record --per-thread -e intel_bts//u ls
>   <SNIP>
>   [ perf record: Woken up 1 times to write data ]
>   [ perf record: Captured and wrote 1.278 MB perf.data ]
>   [root@zoo ~]# perf report --stdio
>   <SNIP>
>   # Samples: 55K of event 'branches:u'
>   # Event count (approx.): 55165
>   #
>   # Overhead  Command  Shared Object       Symbol
>   # ........  .......  ..................  ......................................
>   #
>     15.69%  ls       ld-2.20.so          [.] strcmp
>     15.58%  ls       libc-2.20.so        [.] _dl_addr
>     10.05%  ls       ld-2.20.so          [.] do_lookup_x
>      6.98%  ls       ld-2.20.so          [.] _dl_name_match_p
>      6.46%  ls       ld-2.20.so          [.] _dl_lookup_symbol_x
>      4.95%  ls       ld-2.20.so          [.] _dl_relocate_object
>      2.96%  ls       ls                  [.] quotearg_buffer_restyled
>      2.78%  ls       libc-2.20.so        [.] getenv
>      1.91%  ls       ld-2.20.so          [.] _dl_cache_libcmp
>      1.76%  ls       libc-2.20.so        [.] __memmove_sse2
>      1.75%  ls       ld-2.20.so          [.] check_match.isra.0
>      1.47%  ls       ld-2.20.so          [.] _dl_map_object_deps
>      1.27%  ls       ld-2.20.so          [.] _dl_map_object_from_fd
>      1.17%  ls       ls                  [.] quote_name
>      1.16%  ls       ld-2.20.so          [.] _dl_check_map_versions
>    <SNIP>
>      0.19%  ls       [kernel.kallsyms]   [.] entry_SYSCALL_64_fastpath
>      0.19%  ls       [kernel.kallsyms]   [.] native_irq_return_iret
>      0.19%  ls       libc-2.20.so        [.] _IO_file_xsputn@@GLIBC_2.2.5
>    <SNIP>
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/Documentation/intel-bts.txt |  86 +++
>  tools/perf/arch/x86/util/Build         |   1 +
>  tools/perf/arch/x86/util/auxtrace.c    |  49 +-
>  tools/perf/arch/x86/util/intel-bts.c   | 458 ++++++++++++++++
>  tools/perf/arch/x86/util/pmu.c         |   3 +
>  tools/perf/util/Build                  |   1 +
>  tools/perf/util/auxtrace.c             |   3 +
>  tools/perf/util/auxtrace.h             |   1 +
>  tools/perf/util/intel-bts.c            | 933 +++++++++++++++++++++++++++++++++
>  tools/perf/util/intel-bts.h            |  43 ++
>  tools/perf/util/pmu.c                  |   4 -
>  11 files changed, 1576 insertions(+), 6 deletions(-)
>  create mode 100644 tools/perf/Documentation/intel-bts.txt
>  create mode 100644 tools/perf/arch/x86/util/intel-bts.c
>  create mode 100644 tools/perf/util/intel-bts.c
>  create mode 100644 tools/perf/util/intel-bts.h
> 
> diff --git a/tools/perf/Documentation/intel-bts.txt b/tools/perf/Documentation/intel-bts.txt
> new file mode 100644
> index 000000000000..8bdc93bd7fdb
> --- /dev/null
> +++ b/tools/perf/Documentation/intel-bts.txt
> @@ -0,0 +1,86 @@
> +Intel Branch Trace Store
> +========================
> +
> +Overview
> +========
> +
> +Intel BTS could be regarded as a predecessor to Intel PT and has some
> +similarities because it can also identify every branch a program takes.  A
> +notable difference is that Intel BTS has no timing information and as a
> +consequence the present implementation is limited to per-thread recording.
> +
> +While decoding Intel BTS does not require walking the object code, the object
> +code is still needed to pair up calls and returns correctly, consequently much
> +of the Intel PT documentation applies also to Intel BTS.  Refer to the Intel PT
> +documentation and consider that the PMU 'intel_bts' can usually be used in
> +place of 'intel_pt' in the examples provided, with the proviso that per-thread
> +recording must also be stipulated i.e. the --per-thread option for
> +'perf record'.
> +
> +
> +perf record
> +===========
> +
> +new event
> +---------
> +
> +The Intel BTS kernel driver creates a new PMU for Intel BTS.  The perf record
> +option is:
> +
> +	-e intel_bts//
> +
> +Currently Intel BTS is limited to per-thread tracing so the --per-thread option
> +is also needed.
> +
> +
> +snapshot option
> +---------------
> +
> +The snapshot option is the same as Intel PT (refer Intel PT documentation).
> +
> +
> +auxtrace mmap size option
> +-----------------------
> +
> +The mmap size option is the same as Intel PT (refer Intel PT documentation).
> +
> +
> +perf script
> +===========
> +
> +By default, perf script will decode trace data found in the perf.data file.
> +This can be further controlled by option --itrace.  The --itrace option is
> +the same as Intel PT (refer Intel PT documentation) except that neither
> +"instructions" events nor "transactions" events (and consequently call
> +chains) are supported.
> +
> +To disable trace decoding entirely, use the option --no-itrace.
> +
> +
> +dump option
> +-----------
> +
> +perf script has an option (-D) to "dump" the events i.e. display the binary
> +data.
> +
> +When -D is used, Intel BTS packets are displayed.
> +
> +To disable the display of Intel BTS packets, combine the -D option with
> +--no-itrace.
> +
> +
> +perf report
> +===========
> +
> +By default, perf report will decode trace data found in the perf.data file.
> +This can be further controlled by new option --itrace exactly the same as
> +perf script.
> +
> +
> +perf inject
> +===========
> +
> +perf inject also accepts the --itrace option in which case tracing data is
> +removed and replaced with the synthesized events. e.g.
> +
> +	perf inject --itrace -i perf.data -o perf.data.new
> diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
> index a8be9f9d0462..2c55e1b336c5 100644
> --- a/tools/perf/arch/x86/util/Build
> +++ b/tools/perf/arch/x86/util/Build
> @@ -10,3 +10,4 @@ libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
>  
>  libperf-$(CONFIG_AUXTRACE) += auxtrace.o
>  libperf-$(CONFIG_AUXTRACE) += intel-pt.o
> +libperf-$(CONFIG_AUXTRACE) += intel-bts.o
> diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
> index e7654b506312..7a7805583e3f 100644
> --- a/tools/perf/arch/x86/util/auxtrace.c
> +++ b/tools/perf/arch/x86/util/auxtrace.c
> @@ -13,11 +13,56 @@
>   *
>   */
>  
> +#include <stdbool.h>
> +
>  #include "../../util/header.h"
> +#include "../../util/debug.h"
> +#include "../../util/pmu.h"
>  #include "../../util/auxtrace.h"
>  #include "../../util/intel-pt.h"
> +#include "../../util/intel-bts.h"
> +#include "../../util/evlist.h"
> +
> +static
> +struct auxtrace_record *auxtrace_record__init_intel(struct perf_evlist *evlist,
> +						    int *err)
> +{
> +	struct perf_pmu *intel_pt_pmu;
> +	struct perf_pmu *intel_bts_pmu;
> +	struct perf_evsel *evsel;
> +	bool found_pt = false;
> +	bool found_bts = false;
> +
> +	intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
> +	intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
> +
> +	if (evlist) {
> +		evlist__for_each(evlist, evsel) {
> +			if (intel_pt_pmu &&
> +			    evsel->attr.type == intel_pt_pmu->type)
> +				found_pt = true;
> +			if (intel_bts_pmu &&
> +			    evsel->attr.type == intel_bts_pmu->type)
> +				found_bts = true;
> +		}
> +	}
> +
> +	if (found_pt && found_bts) {
> +		pr_err("intel_pt and intel_bts may not be used together\n");
> +		*err = -EINVAL;
> +		return NULL;
> +	}
> +
> +	if (found_pt)
> +		return intel_pt_recording_init(err);
> +
> +	if (found_bts)
> +		return intel_bts_recording_init(err);
>  
> -struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe_unused,
> +	return NULL;
> +}
> +
> +struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist,
>  					      int *err)
>  {
>  	char buffer[64];
> @@ -32,7 +77,7 @@ struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe
>  	}
>  
>  	if (!strncmp(buffer, "GenuineIntel,", 13))
> -		return intel_pt_recording_init(err);
> +		return auxtrace_record__init_intel(evlist, err);
>  
>  	return NULL;
>  }
> diff --git a/tools/perf/arch/x86/util/intel-bts.c b/tools/perf/arch/x86/util/intel-bts.c
> new file mode 100644
> index 000000000000..9b94ce520917
> --- /dev/null
> +++ b/tools/perf/arch/x86/util/intel-bts.c
> @@ -0,0 +1,458 @@
> +/*
> + * intel-bts.c: Intel Processor Trace support
> + * Copyright (c) 2013-2015, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + */
> +
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/bitops.h>
> +#include <linux/log2.h>
> +
> +#include "../../util/cpumap.h"
> +#include "../../util/evsel.h"
> +#include "../../util/evlist.h"
> +#include "../../util/session.h"
> +#include "../../util/util.h"
> +#include "../../util/pmu.h"
> +#include "../../util/debug.h"
> +#include "../../util/tsc.h"
> +#include "../../util/auxtrace.h"
> +#include "../../util/intel-bts.h"
> +
> +#define KiB(x) ((x) * 1024)
> +#define MiB(x) ((x) * 1024 * 1024)
> +#define KiB_MASK(x) (KiB(x) - 1)
> +#define MiB_MASK(x) (MiB(x) - 1)
> +
> +#define INTEL_BTS_DFLT_SAMPLE_SIZE	KiB(4)
> +
> +#define INTEL_BTS_MAX_SAMPLE_SIZE	KiB(60)
> +
> +struct intel_bts_snapshot_ref {
> +	void	*ref_buf;
> +	size_t	ref_offset;
> +	bool	wrapped;
> +};
> +
> +struct intel_bts_recording {
> +	struct auxtrace_record		itr;
> +	struct perf_pmu			*intel_bts_pmu;
> +	struct perf_evlist		*evlist;
> +	bool				snapshot_mode;
> +	size_t				snapshot_size;
> +	int				snapshot_ref_cnt;
> +	struct intel_bts_snapshot_ref	*snapshot_refs;
> +};
> +
> +struct branch {
> +	u64 from;
> +	u64 to;
> +	u64 misc;
> +};
> +
> +static size_t intel_bts_info_priv_size(struct auxtrace_record *itr __maybe_unused)
> +{
> +	return INTEL_BTS_AUXTRACE_PRIV_SIZE;
> +}
> +
> +static int intel_bts_info_fill(struct auxtrace_record *itr,
> +			       struct perf_session *session,
> +			       struct auxtrace_info_event *auxtrace_info,
> +			       size_t priv_size)
> +{
> +	struct intel_bts_recording *btsr =
> +			container_of(itr, struct intel_bts_recording, itr);
> +	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
> +	struct perf_event_mmap_page *pc;
> +	struct perf_tsc_conversion tc = { .time_mult = 0, };
> +	bool cap_user_time_zero = false;
> +	int err;
> +
> +	if (priv_size != INTEL_BTS_AUXTRACE_PRIV_SIZE)
> +		return -EINVAL;
> +
> +	if (!session->evlist->nr_mmaps)
> +		return -EINVAL;
> +
> +	pc = session->evlist->mmap[0].base;
> +	if (pc) {
> +		err = perf_read_tsc_conversion(pc, &tc);
> +		if (err) {
> +			if (err != -EOPNOTSUPP)
> +				return err;
> +		} else {
> +			cap_user_time_zero = tc.time_mult != 0;
> +		}
> +		if (!cap_user_time_zero)
> +			ui__warning("Intel BTS: TSC not available\n");
> +	}
> +
> +	auxtrace_info->type = PERF_AUXTRACE_INTEL_BTS;
> +	auxtrace_info->priv[INTEL_BTS_PMU_TYPE] = intel_bts_pmu->type;
> +	auxtrace_info->priv[INTEL_BTS_TIME_SHIFT] = tc.time_shift;
> +	auxtrace_info->priv[INTEL_BTS_TIME_MULT] = tc.time_mult;
> +	auxtrace_info->priv[INTEL_BTS_TIME_ZERO] = tc.time_zero;
> +	auxtrace_info->priv[INTEL_BTS_CAP_USER_TIME_ZERO] = cap_user_time_zero;
> +	auxtrace_info->priv[INTEL_BTS_SNAPSHOT_MODE] = btsr->snapshot_mode;
> +
> +	return 0;
> +}
> +
> +static int intel_bts_recording_options(struct auxtrace_record *itr,
> +				       struct perf_evlist *evlist,
> +				       struct record_opts *opts)
> +{
> +	struct intel_bts_recording *btsr =
> +			container_of(itr, struct intel_bts_recording, itr);
> +	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
> +	struct perf_evsel *evsel, *intel_bts_evsel = NULL;
> +	const struct cpu_map *cpus = evlist->cpus;
> +	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
> +
> +	btsr->evlist = evlist;
> +	btsr->snapshot_mode = opts->auxtrace_snapshot_mode;
> +
> +	evlist__for_each(evlist, evsel) {
> +		if (evsel->attr.type == intel_bts_pmu->type) {
> +			if (intel_bts_evsel) {
> +				pr_err("There may be only one " INTEL_BTS_PMU_NAME " event\n");
> +				return -EINVAL;
> +			}
> +			evsel->attr.freq = 0;
> +			evsel->attr.sample_period = 1;
> +			intel_bts_evsel = evsel;
> +			opts->full_auxtrace = true;
> +		}
> +	}
> +
> +	if (opts->auxtrace_snapshot_mode && !opts->full_auxtrace) {
> +		pr_err("Snapshot mode (-S option) requires " INTEL_BTS_PMU_NAME " PMU event (-e " INTEL_BTS_PMU_NAME ")\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!opts->full_auxtrace)
> +		return 0;
> +
> +	if (opts->full_auxtrace && !cpu_map__empty(cpus)) {
> +		pr_err(INTEL_BTS_PMU_NAME " does not support per-cpu recording\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Set default sizes for snapshot mode */
> +	if (opts->auxtrace_snapshot_mode) {
> +		if (!opts->auxtrace_snapshot_size && !opts->auxtrace_mmap_pages) {
> +			if (privileged) {
> +				opts->auxtrace_mmap_pages = MiB(4) / page_size;
> +			} else {
> +				opts->auxtrace_mmap_pages = KiB(128) / page_size;
> +				if (opts->mmap_pages == UINT_MAX)
> +					opts->mmap_pages = KiB(256) / page_size;
> +			}
> +		} else if (!opts->auxtrace_mmap_pages && !privileged &&
> +			   opts->mmap_pages == UINT_MAX) {
> +			opts->mmap_pages = KiB(256) / page_size;
> +		}
> +		if (!opts->auxtrace_snapshot_size)
> +			opts->auxtrace_snapshot_size =
> +				opts->auxtrace_mmap_pages * (size_t)page_size;
> +		if (!opts->auxtrace_mmap_pages) {
> +			size_t sz = opts->auxtrace_snapshot_size;
> +
> +			sz = round_up(sz, page_size) / page_size;
> +			opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
> +		}
> +		if (opts->auxtrace_snapshot_size >
> +				opts->auxtrace_mmap_pages * (size_t)page_size) {
> +			pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
> +			       opts->auxtrace_snapshot_size,
> +			       opts->auxtrace_mmap_pages * (size_t)page_size);
> +			return -EINVAL;
> +		}
> +		if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages) {
> +			pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
> +			return -EINVAL;
> +		}
> +		pr_debug2("Intel BTS snapshot size: %zu\n",
> +			  opts->auxtrace_snapshot_size);
> +	}
> +
> +	/* Set default sizes for full trace mode */
> +	if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) {
> +		if (privileged) {
> +			opts->auxtrace_mmap_pages = MiB(4) / page_size;
> +		} else {
> +			opts->auxtrace_mmap_pages = KiB(128) / page_size;
> +			if (opts->mmap_pages == UINT_MAX)
> +				opts->mmap_pages = KiB(256) / page_size;
> +		}
> +	}
> +
> +	/* Validate auxtrace_mmap_pages */
> +	if (opts->auxtrace_mmap_pages) {
> +		size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
> +		size_t min_sz;
> +
> +		if (opts->auxtrace_snapshot_mode)
> +			min_sz = KiB(4);
> +		else
> +			min_sz = KiB(8);
> +
> +		if (sz < min_sz || !is_power_of_2(sz)) {
> +			pr_err("Invalid mmap size for Intel BTS: must be at least %zuKiB and a power of 2\n",
> +			       min_sz / 1024);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	if (intel_bts_evsel) {
> +		/*
> +		 * To obtain the auxtrace buffer file descriptor, the auxtrace event
> +		 * must come first.
> +		 */
> +		perf_evlist__to_front(evlist, intel_bts_evsel);
> +		/*
> +		 * In the case of per-cpu mmaps, we need the CPU on the
> +		 * AUX event.
> +		 */
> +		if (!cpu_map__empty(cpus))
> +			perf_evsel__set_sample_bit(intel_bts_evsel, CPU);
> +	}
> +
> +	/* Add dummy event to keep tracking */
> +	if (opts->full_auxtrace) {
> +		struct perf_evsel *tracking_evsel;
> +		int err;
> +
> +		err = parse_events(evlist, "dummy:u", NULL);
> +		if (err)
> +			return err;
> +
> +		tracking_evsel = perf_evlist__last(evlist);
> +
> +		perf_evlist__set_tracking_event(evlist, tracking_evsel);
> +
> +		tracking_evsel->attr.freq = 0;
> +		tracking_evsel->attr.sample_period = 1;
> +	}
> +
> +	return 0;
> +}
> +
> +static int intel_bts_parse_snapshot_options(struct auxtrace_record *itr,
> +					    struct record_opts *opts,
> +					    const char *str)
> +{
> +	struct intel_bts_recording *btsr =
> +			container_of(itr, struct intel_bts_recording, itr);
> +	unsigned long long snapshot_size = 0;
> +	char *endptr;
> +
> +	if (str) {
> +		snapshot_size = strtoull(str, &endptr, 0);
> +		if (*endptr || snapshot_size > SIZE_MAX)
> +			return -1;
> +	}
> +
> +	opts->auxtrace_snapshot_mode = true;
> +	opts->auxtrace_snapshot_size = snapshot_size;
> +
> +	btsr->snapshot_size = snapshot_size;
> +
> +	return 0;
> +}
> +
> +static u64 intel_bts_reference(struct auxtrace_record *itr __maybe_unused)
> +{
> +	return rdtsc();
> +}
> +
> +static int intel_bts_alloc_snapshot_refs(struct intel_bts_recording *btsr,
> +					 int idx)
> +{
> +	const size_t sz = sizeof(struct intel_bts_snapshot_ref);
> +	int cnt = btsr->snapshot_ref_cnt, new_cnt = cnt * 2;
> +	struct intel_bts_snapshot_ref *refs;
> +
> +	if (!new_cnt)
> +		new_cnt = 16;
> +
> +	while (new_cnt <= idx)
> +		new_cnt *= 2;
> +
> +	refs = calloc(new_cnt, sz);
> +	if (!refs)
> +		return -ENOMEM;
> +
> +	memcpy(refs, btsr->snapshot_refs, cnt * sz);
> +
> +	btsr->snapshot_refs = refs;
> +	btsr->snapshot_ref_cnt = new_cnt;
> +
> +	return 0;
> +}
> +
> +static void intel_bts_free_snapshot_refs(struct intel_bts_recording *btsr)
> +{
> +	int i;
> +
> +	for (i = 0; i < btsr->snapshot_ref_cnt; i++)
> +		zfree(&btsr->snapshot_refs[i].ref_buf);
> +	zfree(&btsr->snapshot_refs);
> +}
> +
> +static void intel_bts_recording_free(struct auxtrace_record *itr)
> +{
> +	struct intel_bts_recording *btsr =
> +			container_of(itr, struct intel_bts_recording, itr);
> +
> +	intel_bts_free_snapshot_refs(btsr);
> +	free(btsr);
> +}
> +
> +static int intel_bts_snapshot_start(struct auxtrace_record *itr)
> +{
> +	struct intel_bts_recording *btsr =
> +			container_of(itr, struct intel_bts_recording, itr);
> +	struct perf_evsel *evsel;
> +
> +	evlist__for_each(btsr->evlist, evsel) {
> +		if (evsel->attr.type == btsr->intel_bts_pmu->type)
> +			return perf_evlist__disable_event(btsr->evlist, evsel);
> +	}
> +	return -EINVAL;
> +}
> +
> +static int intel_bts_snapshot_finish(struct auxtrace_record *itr)
> +{
> +	struct intel_bts_recording *btsr =
> +			container_of(itr, struct intel_bts_recording, itr);
> +	struct perf_evsel *evsel;
> +
> +	evlist__for_each(btsr->evlist, evsel) {
> +		if (evsel->attr.type == btsr->intel_bts_pmu->type)
> +			return perf_evlist__enable_event(btsr->evlist, evsel);
> +	}
> +	return -EINVAL;
> +}
> +
> +static bool intel_bts_first_wrap(u64 *data, size_t buf_size)
> +{
> +	int i, a, b;
> +
> +	b = buf_size >> 3;
> +	a = b - 512;
> +	if (a < 0)
> +		a = 0;
> +
> +	for (i = a; i < b; i++) {
> +		if (data[i])
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +static int intel_bts_find_snapshot(struct auxtrace_record *itr, int idx,
> +				   struct auxtrace_mmap *mm, unsigned char *data,
> +				   u64 *head, u64 *old)
> +{
> +	struct intel_bts_recording *btsr =
> +			container_of(itr, struct intel_bts_recording, itr);
> +	bool wrapped;
> +	int err;
> +
> +	pr_debug3("%s: mmap index %d old head %zu new head %zu\n",
> +		  __func__, idx, (size_t)*old, (size_t)*head);
> +
> +	if (idx >= btsr->snapshot_ref_cnt) {
> +		err = intel_bts_alloc_snapshot_refs(btsr, idx);
> +		if (err)
> +			goto out_err;
> +	}
> +
> +	wrapped = btsr->snapshot_refs[idx].wrapped;
> +	if (!wrapped && intel_bts_first_wrap((u64 *)data, mm->len)) {
> +		btsr->snapshot_refs[idx].wrapped = true;
> +		wrapped = true;
> +	}
> +
> +	/*
> +	 * In full trace mode 'head' continually increases.  However in snapshot
> +	 * mode 'head' is an offset within the buffer.  Here 'old' and 'head'
> +	 * are adjusted to match the full trace case which expects that 'old' is
> +	 * always less than 'head'.
> +	 */
> +	if (wrapped) {
> +		*old = *head;
> +		*head += mm->len;
> +	} else {
> +		if (mm->mask)
> +			*old &= mm->mask;
> +		else
> +			*old %= mm->len;
> +		if (*old > *head)
> +			*head += mm->len;
> +	}
> +
> +	pr_debug3("%s: wrap-around %sdetected, adjusted old head %zu adjusted new head %zu\n",
> +		  __func__, wrapped ? "" : "not ", (size_t)*old, (size_t)*head);
> +
> +	return 0;
> +
> +out_err:
> +	pr_err("%s: failed, error %d\n", __func__, err);
> +	return err;
> +}
> +
> +static int intel_bts_read_finish(struct auxtrace_record *itr, int idx)
> +{
> +	struct intel_bts_recording *btsr =
> +			container_of(itr, struct intel_bts_recording, itr);
> +	struct perf_evsel *evsel;
> +
> +	evlist__for_each(btsr->evlist, evsel) {
> +		if (evsel->attr.type == btsr->intel_bts_pmu->type)
> +			return perf_evlist__enable_event_idx(btsr->evlist,
> +							     evsel, idx);
> +	}
> +	return -EINVAL;
> +}
> +
> +struct auxtrace_record *intel_bts_recording_init(int *err)
> +{
> +	struct perf_pmu *intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
> +	struct intel_bts_recording *btsr;
> +
> +	if (!intel_bts_pmu)
> +		return NULL;
> +
> +	btsr = zalloc(sizeof(struct intel_bts_recording));
> +	if (!btsr) {
> +		*err = -ENOMEM;
> +		return NULL;
> +	}
> +
> +	btsr->intel_bts_pmu = intel_bts_pmu;
> +	btsr->itr.recording_options = intel_bts_recording_options;
> +	btsr->itr.info_priv_size = intel_bts_info_priv_size;
> +	btsr->itr.info_fill = intel_bts_info_fill;
> +	btsr->itr.free = intel_bts_recording_free;
> +	btsr->itr.snapshot_start = intel_bts_snapshot_start;
> +	btsr->itr.snapshot_finish = intel_bts_snapshot_finish;
> +	btsr->itr.find_snapshot = intel_bts_find_snapshot;
> +	btsr->itr.parse_snapshot_options = intel_bts_parse_snapshot_options;
> +	btsr->itr.reference = intel_bts_reference;
> +	btsr->itr.read_finish = intel_bts_read_finish;
> +	btsr->itr.alignment = sizeof(struct branch);
> +	return &btsr->itr;
> +}
> diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
> index fd11cc3ce780..79fe07158d00 100644
> --- a/tools/perf/arch/x86/util/pmu.c
> +++ b/tools/perf/arch/x86/util/pmu.c
> @@ -3,6 +3,7 @@
>  #include <linux/perf_event.h>
>  
>  #include "../../util/intel-pt.h"
> +#include "../../util/intel-bts.h"
>  #include "../../util/pmu.h"
>  
>  struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
> @@ -10,6 +11,8 @@ struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __mayb
>  #ifdef HAVE_AUXTRACE_SUPPORT
>  	if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
>  		return intel_pt_pmu_default_config(pmu);
> +	if (!strcmp(pmu->name, INTEL_BTS_PMU_NAME))
> +		pmu->selectable = true;
>  #endif
>  	return NULL;
>  }
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index 7f5f2b6aff19..dc29c58cb300 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -78,6 +78,7 @@ libperf-y += thread-stack.o
>  libperf-$(CONFIG_AUXTRACE) += auxtrace.o
>  libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
>  libperf-$(CONFIG_AUXTRACE) += intel-pt.o
> +libperf-$(CONFIG_AUXTRACE) += intel-bts.o
>  libperf-y += parse-branch-options.o
>  
>  libperf-$(CONFIG_LIBELF) += symbol-elf.o
> diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
> index 1b3cc297290d..3f3b40fa87c8 100644
> --- a/tools/perf/util/auxtrace.c
> +++ b/tools/perf/util/auxtrace.c
> @@ -48,6 +48,7 @@
>  #include "parse-options.h"
>  
>  #include "intel-pt.h"
> +#include "intel-bts.h"
>  
>  int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
>  			struct auxtrace_mmap_params *mp,
> @@ -888,6 +889,8 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
>  	switch (type) {
>  	case PERF_AUXTRACE_INTEL_PT:
>  		return intel_pt_process_auxtrace_info(event, session);
> +	case PERF_AUXTRACE_INTEL_BTS:
> +		return intel_bts_process_auxtrace_info(event, session);
>  	case PERF_AUXTRACE_UNKNOWN:
>  	default:
>  		return -EINVAL;
> diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
> index 7d12f33a3a06..bf72b77a588a 100644
> --- a/tools/perf/util/auxtrace.h
> +++ b/tools/perf/util/auxtrace.h
> @@ -40,6 +40,7 @@ struct events_stats;
>  enum auxtrace_type {
>  	PERF_AUXTRACE_UNKNOWN,
>  	PERF_AUXTRACE_INTEL_PT,
> +	PERF_AUXTRACE_INTEL_BTS,
>  };
>  
>  enum itrace_period_type {
> diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
> new file mode 100644
> index 000000000000..dce99cfb1309
> --- /dev/null
> +++ b/tools/perf/util/intel-bts.c
> @@ -0,0 +1,933 @@
> +/*
> + * intel-bts.c: Intel Processor Trace support
> + * Copyright (c) 2013-2015, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + */
> +
> +#include <endian.h>
> +#include <byteswap.h>
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/bitops.h>
> +#include <linux/log2.h>
> +
> +#include "cpumap.h"
> +#include "color.h"
> +#include "evsel.h"
> +#include "evlist.h"
> +#include "machine.h"
> +#include "session.h"
> +#include "util.h"
> +#include "thread.h"
> +#include "thread-stack.h"
> +#include "debug.h"
> +#include "tsc.h"
> +#include "auxtrace.h"
> +#include "intel-pt-decoder/intel-pt-insn-decoder.h"
> +#include "intel-bts.h"
> +
> +#define MAX_TIMESTAMP (~0ULL)
> +
> +#define INTEL_BTS_ERR_NOINSN  5
> +#define INTEL_BTS_ERR_LOST    9
> +
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +#define le64_to_cpu bswap_64
> +#else
> +#define le64_to_cpu
> +#endif
> +
> +struct intel_bts {
> +	struct auxtrace			auxtrace;
> +	struct auxtrace_queues		queues;
> +	struct auxtrace_heap		heap;
> +	u32				auxtrace_type;
> +	struct perf_session		*session;
> +	struct machine			*machine;
> +	bool				sampling_mode;
> +	bool				snapshot_mode;
> +	bool				data_queued;
> +	u32				pmu_type;
> +	struct perf_tsc_conversion	tc;
> +	bool				cap_user_time_zero;
> +	struct itrace_synth_opts	synth_opts;
> +	bool				sample_branches;
> +	u32				branches_filter;
> +	u64				branches_sample_type;
> +	u64				branches_id;
> +	size_t				branches_event_size;
> +	bool				synth_needs_swap;
> +};
> +
> +struct intel_bts_queue {
> +	struct intel_bts	*bts;
> +	unsigned int		queue_nr;
> +	struct auxtrace_buffer	*buffer;
> +	bool			on_heap;
> +	bool			done;
> +	pid_t			pid;
> +	pid_t			tid;
> +	int			cpu;
> +	u64			time;
> +	struct intel_pt_insn	intel_pt_insn;
> +	u32			sample_flags;
> +};
> +
> +struct branch {
> +	u64 from;
> +	u64 to;
> +	u64 misc;
> +};
> +
> +static void intel_bts_dump(struct intel_bts *bts __maybe_unused,
> +			   unsigned char *buf, size_t len)
> +{
> +	struct branch *branch;
> +	size_t i, pos = 0, br_sz = sizeof(struct branch), sz;
> +	const char *color = PERF_COLOR_BLUE;
> +
> +	color_fprintf(stdout, color,
> +		      ". ... Intel BTS data: size %zu bytes\n",
> +		      len);
> +
> +	while (len) {
> +		if (len >= br_sz)
> +			sz = br_sz;
> +		else
> +			sz = len;
> +		printf(".");
> +		color_fprintf(stdout, color, "  %08x: ", pos);
> +		for (i = 0; i < sz; i++)
> +			color_fprintf(stdout, color, " %02x", buf[i]);
> +		for (; i < br_sz; i++)
> +			color_fprintf(stdout, color, "   ");
> +		if (len >= br_sz) {
> +			branch = (struct branch *)buf;
> +			color_fprintf(stdout, color, " %"PRIx64" -> %"PRIx64" %s\n",
> +				      le64_to_cpu(branch->from),
> +				      le64_to_cpu(branch->to),
> +				      le64_to_cpu(branch->misc) & 0x10 ?
> +							"pred" : "miss");
> +		} else {
> +			color_fprintf(stdout, color, " Bad record!\n");
> +		}
> +		pos += sz;
> +		buf += sz;
> +		len -= sz;
> +	}
> +}
> +
> +static void intel_bts_dump_event(struct intel_bts *bts, unsigned char *buf,
> +				 size_t len)
> +{
> +	printf(".\n");
> +	intel_bts_dump(bts, buf, len);
> +}
> +
> +static int intel_bts_lost(struct intel_bts *bts, struct perf_sample *sample)
> +{
> +	union perf_event event;
> +	int err;
> +
> +	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
> +			     INTEL_BTS_ERR_LOST, sample->cpu, sample->pid,
> +			     sample->tid, 0, "Lost trace data");
> +
> +	err = perf_session__deliver_synth_event(bts->session, &event, NULL);
> +	if (err)
> +		pr_err("Intel BTS: failed to deliver error event, error %d\n",
> +		       err);
> +
> +	return err;
> +}
> +
> +static struct intel_bts_queue *intel_bts_alloc_queue(struct intel_bts *bts,
> +						     unsigned int queue_nr)
> +{
> +	struct intel_bts_queue *btsq;
> +
> +	btsq = zalloc(sizeof(struct intel_bts_queue));
> +	if (!btsq)
> +		return NULL;
> +
> +	btsq->bts = bts;
> +	btsq->queue_nr = queue_nr;
> +	btsq->pid = -1;
> +	btsq->tid = -1;
> +	btsq->cpu = -1;
> +
> +	return btsq;
> +}
> +
> +static int intel_bts_setup_queue(struct intel_bts *bts,
> +				 struct auxtrace_queue *queue,
> +				 unsigned int queue_nr)
> +{
> +	struct intel_bts_queue *btsq = queue->priv;
> +
> +	if (list_empty(&queue->head))
> +		return 0;
> +
> +	if (!btsq) {
> +		btsq = intel_bts_alloc_queue(bts, queue_nr);
> +		if (!btsq)
> +			return -ENOMEM;
> +		queue->priv = btsq;
> +
> +		if (queue->cpu != -1)
> +			btsq->cpu = queue->cpu;
> +		btsq->tid = queue->tid;
> +	}
> +
> +	if (bts->sampling_mode)
> +		return 0;
> +
> +	if (!btsq->on_heap && !btsq->buffer) {
> +		int ret;
> +
> +		btsq->buffer = auxtrace_buffer__next(queue, NULL);
> +		if (!btsq->buffer)
> +			return 0;
> +
> +		ret = auxtrace_heap__add(&bts->heap, queue_nr,
> +					 btsq->buffer->reference);
> +		if (ret)
> +			return ret;
> +		btsq->on_heap = true;
> +	}
> +
> +	return 0;
> +}
> +
> +static int intel_bts_setup_queues(struct intel_bts *bts)
> +{
> +	unsigned int i;
> +	int ret;
> +
> +	for (i = 0; i < bts->queues.nr_queues; i++) {
> +		ret = intel_bts_setup_queue(bts, &bts->queues.queue_array[i],
> +					    i);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
> +static inline int intel_bts_update_queues(struct intel_bts *bts)
> +{
> +	if (bts->queues.new_data) {
> +		bts->queues.new_data = false;
> +		return intel_bts_setup_queues(bts);
> +	}
> +	return 0;
> +}
> +
> +static unsigned char *intel_bts_find_overlap(unsigned char *buf_a, size_t len_a,
> +					     unsigned char *buf_b, size_t len_b)
> +{
> +	size_t offs, len;
> +
> +	if (len_a > len_b)
> +		offs = len_a - len_b;
> +	else
> +		offs = 0;
> +
> +	for (; offs < len_a; offs += sizeof(struct branch)) {
> +		len = len_a - offs;
> +		if (!memcmp(buf_a + offs, buf_b, len))
> +			return buf_b + len;
> +	}
> +
> +	return buf_b;
> +}
> +
> +static int intel_bts_do_fix_overlap(struct auxtrace_queue *queue,
> +				    struct auxtrace_buffer *b)
> +{
> +	struct auxtrace_buffer *a;
> +	void *start;
> +
> +	if (b->list.prev == &queue->head)
> +		return 0;
> +	a = list_entry(b->list.prev, struct auxtrace_buffer, list);
> +	start = intel_bts_find_overlap(a->data, a->size, b->data, b->size);
> +	if (!start)
> +		return -EINVAL;
> +	b->use_size = b->data + b->size - start;
> +	b->use_data = start;
> +	return 0;
> +}
> +
> +static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
> +					 struct branch *branch)
> +{
> +	int ret;
> +	struct intel_bts *bts = btsq->bts;
> +	union perf_event event;
> +	struct perf_sample sample = { .ip = 0, };
> +
> +	event.sample.header.type = PERF_RECORD_SAMPLE;
> +	event.sample.header.misc = PERF_RECORD_MISC_USER;
> +	event.sample.header.size = sizeof(struct perf_event_header);
> +
> +	sample.ip = le64_to_cpu(branch->from);
> +	sample.pid = btsq->pid;
> +	sample.tid = btsq->tid;
> +	sample.addr = le64_to_cpu(branch->to);
> +	sample.id = btsq->bts->branches_id;
> +	sample.stream_id = btsq->bts->branches_id;
> +	sample.period = 1;
> +	sample.cpu = btsq->cpu;
> +	sample.flags = btsq->sample_flags;
> +	sample.insn_len = btsq->intel_pt_insn.length;
> +
> +	if (bts->synth_opts.inject) {
> +		event.sample.header.size = bts->branches_event_size;
> +		ret = perf_event__synthesize_sample(&event,
> +						    bts->branches_sample_type,
> +						    0, &sample,
> +						    bts->synth_needs_swap);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	ret = perf_session__deliver_synth_event(bts->session, &event, &sample);
> +	if (ret)
> +		pr_err("Intel BTS: failed to deliver branch event, error %d\n",
> +		       ret);
> +
> +	return ret;
> +}
> +
> +static int intel_bts_get_next_insn(struct intel_bts_queue *btsq, u64 ip)
> +{
> +	struct machine *machine = btsq->bts->machine;
> +	struct thread *thread;
> +	struct addr_location al;
> +	unsigned char buf[1024];
> +	size_t bufsz;
> +	ssize_t len;
> +	int x86_64;
> +	uint8_t cpumode;
> +	int err = -1;
> +
> +	bufsz = intel_pt_insn_max_size();
> +
> +	if (machine__kernel_ip(machine, ip))
> +		cpumode = PERF_RECORD_MISC_KERNEL;
> +	else
> +		cpumode = PERF_RECORD_MISC_USER;
> +
> +	thread = machine__find_thread(machine, -1, btsq->tid);
> +	if (!thread)
> +		return -1;
> +
> +	thread__find_addr_map(thread, cpumode, MAP__FUNCTION, ip, &al);
> +	if (!al.map || !al.map->dso)
> +		goto out_put;
> +
> +	len = dso__data_read_addr(al.map->dso, al.map, machine, ip, buf, bufsz);
> +	if (len <= 0)
> +		goto out_put;
> +
> +	/* Load maps to ensure dso->is_64_bit has been updated */
> +	map__load(al.map, machine->symbol_filter);
> +
> +	x86_64 = al.map->dso->is_64_bit;
> +
> +	if (intel_pt_get_insn(buf, len, x86_64, &btsq->intel_pt_insn))
> +		goto out_put;
> +
> +	err = 0;
> +out_put:
> +	thread__put(thread);
> +	return err;
> +}
> +
> +static int intel_bts_synth_error(struct intel_bts *bts, int cpu, pid_t pid,
> +				 pid_t tid, u64 ip)
> +{
> +	union perf_event event;
> +	int err;
> +
> +	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
> +			     INTEL_BTS_ERR_NOINSN, cpu, pid, tid, ip,
> +			     "Failed to get instruction");
> +
> +	err = perf_session__deliver_synth_event(bts->session, &event, NULL);
> +	if (err)
> +		pr_err("Intel BTS: failed to deliver error event, error %d\n",
> +		       err);
> +
> +	return err;
> +}
> +
> +static int intel_bts_get_branch_type(struct intel_bts_queue *btsq,
> +				     struct branch *branch)
> +{
> +	int err;
> +
> +	if (!branch->from) {
> +		if (branch->to)
> +			btsq->sample_flags = PERF_IP_FLAG_BRANCH |
> +					     PERF_IP_FLAG_TRACE_BEGIN;
> +		else
> +			btsq->sample_flags = 0;
> +		btsq->intel_pt_insn.length = 0;
> +	} else if (!branch->to) {
> +		btsq->sample_flags = PERF_IP_FLAG_BRANCH |
> +				     PERF_IP_FLAG_TRACE_END;
> +		btsq->intel_pt_insn.length = 0;
> +	} else {
> +		err = intel_bts_get_next_insn(btsq, branch->from);
> +		if (err) {
> +			btsq->sample_flags = 0;
> +			btsq->intel_pt_insn.length = 0;
> +			if (!btsq->bts->synth_opts.errors)
> +				return 0;
> +			err = intel_bts_synth_error(btsq->bts, btsq->cpu,
> +						    btsq->pid, btsq->tid,
> +						    branch->from);
> +			return err;
> +		}
> +		btsq->sample_flags = intel_pt_insn_type(btsq->intel_pt_insn.op);
> +		/* Check for an async branch into the kernel */
> +		if (!machine__kernel_ip(btsq->bts->machine, branch->from) &&
> +		    machine__kernel_ip(btsq->bts->machine, branch->to) &&
> +		    btsq->sample_flags != (PERF_IP_FLAG_BRANCH |
> +					   PERF_IP_FLAG_CALL |
> +					   PERF_IP_FLAG_SYSCALLRET))
> +			btsq->sample_flags = PERF_IP_FLAG_BRANCH |
> +					     PERF_IP_FLAG_CALL |
> +					     PERF_IP_FLAG_ASYNC |
> +					     PERF_IP_FLAG_INTERRUPT;
> +	}
> +
> +	return 0;
> +}
> +
> +static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
> +				    struct auxtrace_buffer *buffer)
> +{
> +	struct branch *branch;
> +	size_t sz, bsz = sizeof(struct branch);
> +	u32 filter = btsq->bts->branches_filter;
> +	int err = 0;
> +
> +	if (buffer->use_data) {
> +		sz = buffer->use_size;
> +		branch = buffer->use_data;
> +	} else {
> +		sz = buffer->size;
> +		branch = buffer->data;
> +	}
> +
> +	if (!btsq->bts->sample_branches)
> +		return 0;
> +
> +	for (; sz > bsz; branch += 1, sz -= bsz) {
> +		if (!branch->from && !branch->to)
> +			continue;
> +		intel_bts_get_branch_type(btsq, branch);
> +		if (filter && !(filter & btsq->sample_flags))
> +			continue;
> +		err = intel_bts_synth_branch_sample(btsq, branch);
> +		if (err)
> +			break;
> +	}
> +	return err;
> +}
> +
> +static int intel_bts_process_queue(struct intel_bts_queue *btsq, u64 *timestamp)
> +{
> +	struct auxtrace_buffer *buffer = btsq->buffer, *old_buffer = buffer;
> +	struct auxtrace_queue *queue;
> +	struct thread *thread;
> +	int err;
> +
> +	if (btsq->done)
> +		return 1;
> +
> +	if (btsq->pid == -1) {
> +		thread = machine__find_thread(btsq->bts->machine, -1,
> +					      btsq->tid);
> +		if (thread)
> +			btsq->pid = thread->pid_;
> +	} else {
> +		thread = machine__findnew_thread(btsq->bts->machine, btsq->pid,
> +						 btsq->tid);
> +	}
> +
> +	queue = &btsq->bts->queues.queue_array[btsq->queue_nr];
> +
> +	if (!buffer)
> +		buffer = auxtrace_buffer__next(queue, NULL);
> +
> +	if (!buffer) {
> +		if (!btsq->bts->sampling_mode)
> +			btsq->done = 1;
> +		err = 1;
> +		goto out_put;
> +	}
> +
> +	/* Currently there is no support for split buffers */
> +	if (buffer->consecutive) {
> +		err = -EINVAL;
> +		goto out_put;
> +	}
> +
> +	if (!buffer->data) {
> +		int fd = perf_data_file__fd(btsq->bts->session->file);
> +
> +		buffer->data = auxtrace_buffer__get_data(buffer, fd);
> +		if (!buffer->data) {
> +			err = -ENOMEM;
> +			goto out_put;
> +		}
> +	}
> +
> +	if (btsq->bts->snapshot_mode && !buffer->consecutive &&
> +	    intel_bts_do_fix_overlap(queue, buffer)) {
> +		err = -ENOMEM;
> +		goto out_put;
> +	}
> +
> +	if (!btsq->bts->synth_opts.callchain && thread &&
> +	    (!old_buffer || btsq->bts->sampling_mode ||
> +	     (btsq->bts->snapshot_mode && !buffer->consecutive)))
> +		thread_stack__set_trace_nr(thread, buffer->buffer_nr + 1);
> +
> +	err = intel_bts_process_buffer(btsq, buffer);
> +
> +	auxtrace_buffer__drop_data(buffer);
> +
> +	btsq->buffer = auxtrace_buffer__next(queue, buffer);
> +	if (btsq->buffer) {
> +		if (timestamp)
> +			*timestamp = btsq->buffer->reference;
> +	} else {
> +		if (!btsq->bts->sampling_mode)
> +			btsq->done = 1;
> +	}
> +out_put:
> +	thread__put(thread);
> +	return err;
> +}
> +
> +static int intel_bts_flush_queue(struct intel_bts_queue *btsq)
> +{
> +	u64 ts = 0;
> +	int ret;
> +
> +	while (1) {
> +		ret = intel_bts_process_queue(btsq, &ts);
> +		if (ret < 0)
> +			return ret;
> +		if (ret)
> +			break;
> +	}
> +	return 0;
> +}
> +
> +static int intel_bts_process_tid_exit(struct intel_bts *bts, pid_t tid)
> +{
> +	struct auxtrace_queues *queues = &bts->queues;
> +	unsigned int i;
> +
> +	for (i = 0; i < queues->nr_queues; i++) {
> +		struct auxtrace_queue *queue = &bts->queues.queue_array[i];
> +		struct intel_bts_queue *btsq = queue->priv;
> +
> +		if (btsq && btsq->tid == tid)
> +			return intel_bts_flush_queue(btsq);
> +	}
> +	return 0;
> +}
> +
> +static int intel_bts_process_queues(struct intel_bts *bts, u64 timestamp)
> +{
> +	while (1) {
> +		unsigned int queue_nr;
> +		struct auxtrace_queue *queue;
> +		struct intel_bts_queue *btsq;
> +		u64 ts = 0;
> +		int ret;
> +
> +		if (!bts->heap.heap_cnt)
> +			return 0;
> +
> +		if (bts->heap.heap_array[0].ordinal > timestamp)
> +			return 0;
> +
> +		queue_nr = bts->heap.heap_array[0].queue_nr;
> +		queue = &bts->queues.queue_array[queue_nr];
> +		btsq = queue->priv;
> +
> +		auxtrace_heap__pop(&bts->heap);
> +
> +		ret = intel_bts_process_queue(btsq, &ts);
> +		if (ret < 0) {
> +			auxtrace_heap__add(&bts->heap, queue_nr, ts);
> +			return ret;
> +		}
> +
> +		if (!ret) {
> +			ret = auxtrace_heap__add(&bts->heap, queue_nr, ts);
> +			if (ret < 0)
> +				return ret;
> +		} else {
> +			btsq->on_heap = false;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int intel_bts_process_event(struct perf_session *session,
> +				   union perf_event *event,
> +				   struct perf_sample *sample,
> +				   struct perf_tool *tool)
> +{
> +	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
> +					     auxtrace);
> +	u64 timestamp;
> +	int err;
> +
> +	if (dump_trace)
> +		return 0;
> +
> +	if (!tool->ordered_events) {
> +		pr_err("Intel BTS requires ordered events\n");
> +		return -EINVAL;
> +	}
> +
> +	if (sample->time)
> +		timestamp = perf_time_to_tsc(sample->time, &bts->tc);
> +	else
> +		timestamp = 0;
> +
> +	err = intel_bts_update_queues(bts);
> +	if (err)
> +		return err;
> +
> +	err = intel_bts_process_queues(bts, timestamp);
> +	if (err)
> +		return err;
> +	if (event->header.type == PERF_RECORD_EXIT) {
> +		err = intel_bts_process_tid_exit(bts, event->comm.tid);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (event->header.type == PERF_RECORD_AUX &&
> +	    (event->aux.flags & PERF_AUX_FLAG_TRUNCATED) &&
> +	    bts->synth_opts.errors)
> +		err = intel_bts_lost(bts, sample);
> +
> +	return err;
> +}
> +
> +static int intel_bts_process_auxtrace_event(struct perf_session *session,
> +					    union perf_event *event,
> +					    struct perf_tool *tool __maybe_unused)
> +{
> +	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
> +					     auxtrace);
> +
> +	if (bts->sampling_mode)
> +		return 0;
> +
> +	if (!bts->data_queued) {
> +		struct auxtrace_buffer *buffer;
> +		off_t data_offset;
> +		int fd = perf_data_file__fd(session->file);
> +		int err;
> +
> +		if (perf_data_file__is_pipe(session->file)) {
> +			data_offset = 0;
> +		} else {
> +			data_offset = lseek(fd, 0, SEEK_CUR);
> +			if (data_offset == -1)
> +				return -errno;
> +		}
> +
> +		err = auxtrace_queues__add_event(&bts->queues, session, event,
> +						 data_offset, &buffer);
> +		if (err)
> +			return err;
> +
> +		/* Dump here now we have copied a piped trace out of the pipe */
> +		if (dump_trace) {
> +			if (auxtrace_buffer__get_data(buffer, fd)) {
> +				intel_bts_dump_event(bts, buffer->data,
> +						     buffer->size);
> +				auxtrace_buffer__put_data(buffer);
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int intel_bts_flush(struct perf_session *session __maybe_unused,
> +			   struct perf_tool *tool __maybe_unused)
> +{
> +	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
> +					     auxtrace);
> +	int ret;
> +
> +	if (dump_trace || bts->sampling_mode)
> +		return 0;
> +
> +	if (!tool->ordered_events)
> +		return -EINVAL;
> +
> +	ret = intel_bts_update_queues(bts);
> +	if (ret < 0)
> +		return ret;
> +
> +	return intel_bts_process_queues(bts, MAX_TIMESTAMP);
> +}
> +
> +static void intel_bts_free_queue(void *priv)
> +{
> +	struct intel_bts_queue *btsq = priv;
> +
> +	if (!btsq)
> +		return;
> +	free(btsq);
> +}
> +
> +static void intel_bts_free_events(struct perf_session *session)
> +{
> +	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
> +					     auxtrace);
> +	struct auxtrace_queues *queues = &bts->queues;
> +	unsigned int i;
> +
> +	for (i = 0; i < queues->nr_queues; i++) {
> +		intel_bts_free_queue(queues->queue_array[i].priv);
> +		queues->queue_array[i].priv = NULL;
> +	}
> +	auxtrace_queues__free(queues);
> +}
> +
> +static void intel_bts_free(struct perf_session *session)
> +{
> +	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
> +					     auxtrace);
> +
> +	auxtrace_heap__free(&bts->heap);
> +	intel_bts_free_events(session);
> +	session->auxtrace = NULL;
> +	free(bts);
> +}
> +
> +struct intel_bts_synth {
> +	struct perf_tool dummy_tool;
> +	struct perf_session *session;
> +};
> +
> +static int intel_bts_event_synth(struct perf_tool *tool,
> +				 union perf_event *event,
> +				 struct perf_sample *sample __maybe_unused,
> +				 struct machine *machine __maybe_unused)
> +{
> +	struct intel_bts_synth *intel_bts_synth =
> +			container_of(tool, struct intel_bts_synth, dummy_tool);
> +
> +	return perf_session__deliver_synth_event(intel_bts_synth->session,
> +						 event, NULL);
> +}
> +
> +static int intel_bts_synth_event(struct perf_session *session,
> +				 struct perf_event_attr *attr, u64 id)
> +{
> +	struct intel_bts_synth intel_bts_synth;
> +
> +	memset(&intel_bts_synth, 0, sizeof(struct intel_bts_synth));
> +	intel_bts_synth.session = session;
> +
> +	return perf_event__synthesize_attr(&intel_bts_synth.dummy_tool, attr, 1,
> +					   &id, intel_bts_event_synth);
> +}
> +
> +static int intel_bts_synth_events(struct intel_bts *bts,
> +				  struct perf_session *session)
> +{
> +	struct perf_evlist *evlist = session->evlist;
> +	struct perf_evsel *evsel;
> +	struct perf_event_attr attr;
> +	bool found = false;
> +	u64 id;
> +	int err;
> +
> +	evlist__for_each(evlist, evsel) {
> +		if (evsel->attr.type == bts->pmu_type && evsel->ids) {
> +			found = true;
> +			break;
> +		}
> +	}
> +
> +	if (!found) {
> +		pr_debug("There are no selected events with Intel BTS data\n");
> +		return 0;
> +	}
> +
> +	memset(&attr, 0, sizeof(struct perf_event_attr));
> +	attr.size = sizeof(struct perf_event_attr);
> +	attr.type = PERF_TYPE_HARDWARE;
> +	attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
> +	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
> +			    PERF_SAMPLE_PERIOD;
> +	attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
> +	attr.sample_type &= ~(u64)PERF_SAMPLE_CPU;
> +	attr.exclude_user = evsel->attr.exclude_user;
> +	attr.exclude_kernel = evsel->attr.exclude_kernel;
> +	attr.exclude_hv = evsel->attr.exclude_hv;
> +	attr.exclude_host = evsel->attr.exclude_host;
> +	attr.exclude_guest = evsel->attr.exclude_guest;
> +	attr.sample_id_all = evsel->attr.sample_id_all;
> +	attr.read_format = evsel->attr.read_format;
> +
> +	id = evsel->id[0] + 1000000000;
> +	if (!id)
> +		id = 1;
> +
> +	if (bts->synth_opts.branches) {
> +		attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
> +		attr.sample_period = 1;
> +		attr.sample_type |= PERF_SAMPLE_ADDR;
> +		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
> +			 id, (u64)attr.sample_type);
> +		err = intel_bts_synth_event(session, &attr, id);
> +		if (err) {
> +			pr_err("%s: failed to synthesize 'branches' event type\n",
> +			       __func__);
> +			return err;
> +		}
> +		bts->sample_branches = true;
> +		bts->branches_sample_type = attr.sample_type;
> +		bts->branches_id = id;
> +		/*
> +		 * We only use sample types from PERF_SAMPLE_MASK so we can use
> +		 * __perf_evsel__sample_size() here.
> +		 */
> +		bts->branches_event_size = sizeof(struct sample_event) +
> +				__perf_evsel__sample_size(attr.sample_type);
> +	}
> +
> +	bts->synth_needs_swap = evsel->needs_swap;
> +
> +	return 0;
> +}
> +
> +static const char * const intel_bts_info_fmts[] = {
> +	[INTEL_BTS_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
> +	[INTEL_BTS_TIME_SHIFT]		= "  Time Shift         %"PRIu64"\n",
> +	[INTEL_BTS_TIME_MULT]		= "  Time Muliplier     %"PRIu64"\n",
> +	[INTEL_BTS_TIME_ZERO]		= "  Time Zero          %"PRIu64"\n",
> +	[INTEL_BTS_CAP_USER_TIME_ZERO]	= "  Cap Time Zero      %"PRId64"\n",
> +	[INTEL_BTS_SNAPSHOT_MODE]	= "  Snapshot mode      %"PRId64"\n",
> +};
> +
> +static void intel_bts_print_info(u64 *arr, int start, int finish)
> +{
> +	int i;
> +
> +	if (!dump_trace)
> +		return;
> +
> +	for (i = start; i <= finish; i++)
> +		fprintf(stdout, intel_bts_info_fmts[i], arr[i]);
> +}
> +
> +u64 intel_bts_auxtrace_info_priv[INTEL_BTS_AUXTRACE_PRIV_SIZE];
> +
> +int intel_bts_process_auxtrace_info(union perf_event *event,
> +				    struct perf_session *session)
> +{
> +	struct auxtrace_info_event *auxtrace_info = &event->auxtrace_info;
> +	size_t min_sz = sizeof(u64) * INTEL_BTS_SNAPSHOT_MODE;
> +	struct intel_bts *bts;
> +	int err;
> +
> +	if (auxtrace_info->header.size < sizeof(struct auxtrace_info_event) +
> +					min_sz)
> +		return -EINVAL;
> +
> +	bts = zalloc(sizeof(struct intel_bts));
> +	if (!bts)
> +		return -ENOMEM;
> +
> +	err = auxtrace_queues__init(&bts->queues);
> +	if (err)
> +		goto err_free;
> +
> +	bts->session = session;
> +	bts->machine = &session->machines.host; /* No kvm support */
> +	bts->auxtrace_type = auxtrace_info->type;
> +	bts->pmu_type = auxtrace_info->priv[INTEL_BTS_PMU_TYPE];
> +	bts->tc.time_shift = auxtrace_info->priv[INTEL_BTS_TIME_SHIFT];
> +	bts->tc.time_mult = auxtrace_info->priv[INTEL_BTS_TIME_MULT];
> +	bts->tc.time_zero = auxtrace_info->priv[INTEL_BTS_TIME_ZERO];
> +	bts->cap_user_time_zero =
> +			auxtrace_info->priv[INTEL_BTS_CAP_USER_TIME_ZERO];
> +	bts->snapshot_mode = auxtrace_info->priv[INTEL_BTS_SNAPSHOT_MODE];
> +
> +	bts->sampling_mode = false;
> +
> +	bts->auxtrace.process_event = intel_bts_process_event;
> +	bts->auxtrace.process_auxtrace_event = intel_bts_process_auxtrace_event;
> +	bts->auxtrace.flush_events = intel_bts_flush;
> +	bts->auxtrace.free_events = intel_bts_free_events;
> +	bts->auxtrace.free = intel_bts_free;
> +	session->auxtrace = &bts->auxtrace;
> +
> +	intel_bts_print_info(&auxtrace_info->priv[0], INTEL_BTS_PMU_TYPE,
> +			     INTEL_BTS_SNAPSHOT_MODE);
> +
> +	if (dump_trace)
> +		return 0;
> +
> +	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
> +		bts->synth_opts = *session->itrace_synth_opts;
> +	else
> +		itrace_synth_opts__set_default(&bts->synth_opts);
> +
> +	if (bts->synth_opts.calls)
> +		bts->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
> +					PERF_IP_FLAG_TRACE_END;
> +	if (bts->synth_opts.returns)
> +		bts->branches_filter |= PERF_IP_FLAG_RETURN |
> +					PERF_IP_FLAG_TRACE_BEGIN;
> +
> +	err = intel_bts_synth_events(bts, session);
> +	if (err)
> +		goto err_free_queues;
> +
> +	err = auxtrace_queues__process_index(&bts->queues, session);
> +	if (err)
> +		goto err_free_queues;
> +
> +	if (bts->queues.populated)
> +		bts->data_queued = true;
> +
> +	return 0;
> +
> +err_free_queues:
> +	auxtrace_queues__free(&bts->queues);
> +	session->auxtrace = NULL;
> +err_free:
> +	free(bts);
> +	return err;
> +}
> diff --git a/tools/perf/util/intel-bts.h b/tools/perf/util/intel-bts.h
> new file mode 100644
> index 000000000000..ca65e21b3e83
> --- /dev/null
> +++ b/tools/perf/util/intel-bts.h
> @@ -0,0 +1,43 @@
> +/*
> + * intel-bts.h: Intel Processor Trace support
> + * Copyright (c) 2013-2014, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + */
> +
> +#ifndef INCLUDE__PERF_INTEL_BTS_H__
> +#define INCLUDE__PERF_INTEL_BTS_H__
> +
> +#define INTEL_BTS_PMU_NAME "intel_bts"
> +
> +enum {
> +	INTEL_BTS_PMU_TYPE,
> +	INTEL_BTS_TIME_SHIFT,
> +	INTEL_BTS_TIME_MULT,
> +	INTEL_BTS_TIME_ZERO,
> +	INTEL_BTS_CAP_USER_TIME_ZERO,
> +	INTEL_BTS_SNAPSHOT_MODE,
> +	INTEL_BTS_AUXTRACE_PRIV_MAX,
> +};
> +
> +#define INTEL_BTS_AUXTRACE_PRIV_SIZE (INTEL_BTS_AUXTRACE_PRIV_MAX * sizeof(u64))
> +
> +struct auxtrace_record;
> +struct perf_tool;
> +union perf_event;
> +struct perf_session;
> +
> +struct auxtrace_record *intel_bts_recording_init(int *err);
> +
> +int intel_bts_process_auxtrace_info(union perf_event *event,
> +				    struct perf_session *session);
> +
> +#endif
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index b5ea26bc290d..52d569cda606 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -462,10 +462,6 @@ static struct perf_pmu *pmu_lookup(const char *name)
>  	LIST_HEAD(aliases);
>  	__u32 type;
>  
> -	/* No support for intel_bts so disallow it */
> -	if (!strcmp(name, "intel_bts"))
> -		return NULL;
> -
>  	/*
>  	 * The pmu data we store & need consists of the pmu
>  	 * type value and format definitions. Load both right
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 08/25] perf tools: Add Intel BTS support
  2015-08-17 15:52   ` Arnaldo Carvalho de Melo
@ 2015-08-17 17:43     ` Adrian Hunter
  2015-08-17 17:58       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-08-17 17:43 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

On 17/08/2015 6:52 p.m., Arnaldo Carvalho de Melo wrote:
> Em Fri, Jul 17, 2015 at 07:33:43PM +0300, Adrian Hunter escreveu:
>> Intel BTS support fits within the new auxtrace infrastructure.
>> Recording is supporting by identifying the Intel BTS PMU, parsing
>> options and setting up events.
>>
>> Decoding is supported by queuing up trace data by thread and then
>> decoding synchronously delivering synthesized event samples into the
>> session processing for tools to consume.
>
> So, I am not being able to reproduce the results from last time I tried
> it..
>
> [root@zoo ~]# uname -r
> 4.2.0-rc5+
>
> But all the DSOs are not being resolved :-\ Same machine, will try after lunch
> with a tip/master built kernel, right now I get this, perhaps that "81649
> instruction errors" message? I'll see what are the results with tip/master,
> meanwhile what I have is at the tmp.perf/intel_pt branch in my tree.
>
> - Arnaldo
>
> [root@zoo ~]# perf record --per-thread -e intel_bts// usleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 1.884 MB perf.data ]
> [root@zoo ~]# perf evlist
> intel_bts//
> dummy:u
> [root@zoo ~]# perf evlist -v
> intel_bts//: type: 6, size: 112, { sample_period, sample_freq }: 1,
> sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1,
> enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
> dummy:u: type: 1, size: 112, config: 0x9, { sample_period, sample_freq
> }: 1, sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1,
> exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, enable_on_exec: 1,
> task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1
> [root@zoo ~]#
>
> Warning:
> 81649 instruction trace errors
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 0  of event 'intel_bts//'
> # Event count (approx.): 0
> #
> # Overhead  Command  Shared Object  Symbol
> # ........  .......  .............  ......
> #
>
>
> # Samples: 0  of event 'dummy:u'
> # Event count (approx.): 0
> #
> # Overhead  Command  Shared Object  Symbol
> # ........  .......  .............  ......
> #
>
>
> # Samples: 81K of event 'branches'
> # Event count (approx.): 81649
> #
> # Overhead  Command  Shared Object     Symbol
> # ........  .......  ................  ......................
> #
>       2.71%  usleep   [unknown]         [.] 0x00007fa0ff695061
>       2.62%  usleep   [unknown]         [.] 0x00007fa0ffb3de7e
>       1.96%  usleep   [unknown]         [.] 0x00007fa0ffb2e726
>       1.92%  usleep   [unknown]         [.] 0x00007fa0ff695086
>       1.60%  usleep   [unknown]         [.] 0xffffffff811c91d0
>       1.48%  usleep   [unknown]         [.] 0x00007fa0ffb3030d
>       1.24%  usleep   [unknown]         [.] 0x00007fa0ff6950c7
>
>
>
>

It is very weird that it doesn't know the dso.

I presume there is nothing unusual about the environment e.g. in a chroot or anything

What if you try a different event e.g. perf record --per-thread -e cycles sleep 1

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 08/25] perf tools: Add Intel BTS support
  2015-08-17 17:43     ` Adrian Hunter
@ 2015-08-17 17:58       ` Arnaldo Carvalho de Melo
  2015-08-17 19:09         ` Adrian Hunter
  0 siblings, 1 reply; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-17 17:58 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Mon, Aug 17, 2015 at 08:43:09PM +0300, Adrian Hunter escreveu:
> On 17/08/2015 6:52 p.m., Arnaldo Carvalho de Melo wrote:
> >      1.92%  usleep   [unknown]         [.] 0x00007fa0ff695086
> >      1.60%  usleep   [unknown]         [.] 0xffffffff811c91d0
> >      1.48%  usleep   [unknown]         [.] 0x00007fa0ffb3030d
> >      1.24%  usleep   [unknown]         [.] 0x00007fa0ff6950c7

> It is very weird that it doesn't know the dso.
 
> I presume there is nothing unusual about the environment e.g. in a chroot or anything
 
> What if you try a different event e.g. perf record --per-thread -e cycles sleep 1

[root@zoo ~]# perf record --per-thread -e cycles sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.014 MB perf.data (9 samples) ]
[root@zoo ~]# perf report | grep -v ^# | head -5
    42.57%  sleep    libc-2.20.so      [.] malloc_hook_ini      
    41.81%  sleep    [kernel.vmlinux]  [k] filemap_fault        
    14.35%  sleep    [kernel.vmlinux]  [k] flush_tlb_mm_range   
     1.18%  sleep    [kernel.vmlinux]  [k] strlcpy              
     0.09%  sleep    [kernel.vmlinux]  [k] native_write_msr_safe
[root@zoo ~]#

[root@zoo ~]# perf report --dsos libc-2.20.so | grep -v '^[#]'
    42.57%  sleep    [.] malloc_hook_ini


[root@zoo ~]# 

[root@zoo ~]# perf record --per-thread -e intel_bts// usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.825 MB perf.data ]
[root@zoo ~]# perf report | grep -v ^# | head -5
Warning:
79074 instruction trace errors




     2.80%  usleep   [unknown]         [.] 0x00007f48888aa061
[root@zoo ~]#

- Arnaldo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 08/25] perf tools: Add Intel BTS support
  2015-08-17 17:58       ` Arnaldo Carvalho de Melo
@ 2015-08-17 19:09         ` Adrian Hunter
  2015-08-17 19:58           ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-08-17 19:09 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

On 17/08/2015 8:58 p.m., Arnaldo Carvalho de Melo wrote:
> Em Mon, Aug 17, 2015 at 08:43:09PM +0300, Adrian Hunter escreveu:
>> On 17/08/2015 6:52 p.m., Arnaldo Carvalho de Melo wrote:
>>>       1.92%  usleep   [unknown]         [.] 0x00007fa0ff695086
>>>       1.60%  usleep   [unknown]         [.] 0xffffffff811c91d0
>>>       1.48%  usleep   [unknown]         [.] 0x00007fa0ffb3030d
>>>       1.24%  usleep   [unknown]         [.] 0x00007fa0ff6950c7
>
>> It is very weird that it doesn't know the dso.
>
>> I presume there is nothing unusual about the environment e.g. in a chroot or anything
>
>> What if you try a different event e.g. perf record --per-thread -e cycles sleep 1
>
> [root@zoo ~]# perf record --per-thread -e cycles sleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.014 MB perf.data (9 samples) ]
> [root@zoo ~]# perf report | grep -v ^# | head -5
>      42.57%  sleep    libc-2.20.so      [.] malloc_hook_ini
>      41.81%  sleep    [kernel.vmlinux]  [k] filemap_fault
>      14.35%  sleep    [kernel.vmlinux]  [k] flush_tlb_mm_range
>       1.18%  sleep    [kernel.vmlinux]  [k] strlcpy
>       0.09%  sleep    [kernel.vmlinux]  [k] native_write_msr_safe
> [root@zoo ~]#
>
> [root@zoo ~]# perf report --dsos libc-2.20.so | grep -v '^[#]'
>      42.57%  sleep    [.] malloc_hook_ini
>
>
> [root@zoo ~]#
>
> [root@zoo ~]# perf record --per-thread -e intel_bts// usleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 1.825 MB perf.data ]
> [root@zoo ~]# perf report | grep -v ^# | head -5
> Warning:
> 79074 instruction trace errors
>
>
>
>
>       2.80%  usleep   [unknown]         [.] 0x00007f48888aa061
> [root@zoo ~]#

Running out of ideas.  Can you somehow share or send me the offending perf.data file?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 08/25] perf tools: Add Intel BTS support
  2015-08-17 19:09         ` Adrian Hunter
@ 2015-08-17 19:58           ` Arnaldo Carvalho de Melo
  2015-08-18  6:39             ` Adrian Hunter
  0 siblings, 1 reply; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-17 19:58 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Mon, Aug 17, 2015 at 10:09:26PM +0300, Adrian Hunter escreveu:
> On 17/08/2015 8:58 p.m., Arnaldo Carvalho de Melo wrote:
> >Em Mon, Aug 17, 2015 at 08:43:09PM +0300, Adrian Hunter escreveu:
> >>On 17/08/2015 6:52 p.m., Arnaldo Carvalho de Melo wrote:
> >>>      1.92%  usleep   [unknown]         [.] 0x00007fa0ff695086
> >>>      1.60%  usleep   [unknown]         [.] 0xffffffff811c91d0
> >>>      1.48%  usleep   [unknown]         [.] 0x00007fa0ffb3030d
> >>>      1.24%  usleep   [unknown]         [.] 0x00007fa0ff6950c7
> >
> >>It is very weird that it doesn't know the dso.
> >
> >>I presume there is nothing unusual about the environment e.g. in a chroot or anything
> >
> >>What if you try a different event e.g. perf record --per-thread -e cycles sleep 1
> >
> >[root@zoo ~]# perf record --per-thread -e cycles sleep 1
> >[ perf record: Woken up 1 times to write data ]
> >[ perf record: Captured and wrote 0.014 MB perf.data (9 samples) ]
> >[root@zoo ~]# perf report | grep -v ^# | head -5
> >     42.57%  sleep    libc-2.20.so      [.] malloc_hook_ini
> >     41.81%  sleep    [kernel.vmlinux]  [k] filemap_fault
> >     14.35%  sleep    [kernel.vmlinux]  [k] flush_tlb_mm_range
> >      1.18%  sleep    [kernel.vmlinux]  [k] strlcpy
> >      0.09%  sleep    [kernel.vmlinux]  [k] native_write_msr_safe
> >[root@zoo ~]#
> >
> >[root@zoo ~]# perf report --dsos libc-2.20.so | grep -v '^[#]'
> >     42.57%  sleep    [.] malloc_hook_ini
> >
> >[root@zoo ~]#
> >
> >[root@zoo ~]# perf record --per-thread -e intel_bts// usleep 1
> >[ perf record: Woken up 1 times to write data ]
> >[ perf record: Captured and wrote 1.825 MB perf.data ]
> >[root@zoo ~]# perf report | grep -v ^# | head -5
> >Warning:
> >79074 instruction trace errors
> >      2.80%  usleep   [unknown]         [.] 0x00007f48888aa061
> >[root@zoo ~]#
> 
> Running out of ideas.  Can you somehow share or send me the offending perf.data file?

http://vger.kernel.org/~acme/perf/perf.data.intel_bts-4.2.0-rc5+.xz

It works if I use a tip/master kernel:

[root@perf4 ~]# uname -r
4.2.0-rc7+
[root@perf4 ~]# uname -r
4.2.0-rc7+
[root@perf4 ~]# perf record --per-thread -e intel_bts// usleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.785 MB perf.data ]
[root@perf4 ~]# dmesg | grep Performance
[    0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell
events, full-width counters, Intel PMU driver.
[root@perf4 ~]# perf report --stdio | grep -v ^# | head -10




    10.81%  usleep   libc-2.17.so       [.] _dl_addr                            
     6.56%  usleep   [kernel.kallsyms]  [.] unmap_single_vma                    
     3.33%  usleep   ld-2.17.so         [.] strcmp                              
     2.53%  usleep   [kernel.kallsyms]  [.] mem_cgroup_begin_page_stat          
     2.49%  usleep   ld-2.17.so         [.] _dl_lookup_symbol_x                 
     2.39%  usleep   ld-2.17.so         [.] _dl_relocate_object                 
[root@perf4 ~]# 

Probably some fix for the kernel driver is missing?

- Arnaldo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 08/25] perf tools: Add Intel BTS support
  2015-08-17 19:58           ` Arnaldo Carvalho de Melo
@ 2015-08-18  6:39             ` Adrian Hunter
  2015-08-18  9:09               ` Adrian Hunter
  2015-08-18 16:10               ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 77+ messages in thread
From: Adrian Hunter @ 2015-08-18  6:39 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

On 17/08/15 22:58, Arnaldo Carvalho de Melo wrote:
> Em Mon, Aug 17, 2015 at 10:09:26PM +0300, Adrian Hunter escreveu:
>> On 17/08/2015 8:58 p.m., Arnaldo Carvalho de Melo wrote:
>>> Em Mon, Aug 17, 2015 at 08:43:09PM +0300, Adrian Hunter escreveu:
>>>> On 17/08/2015 6:52 p.m., Arnaldo Carvalho de Melo wrote:
>>>>>      1.92%  usleep   [unknown]         [.] 0x00007fa0ff695086
>>>>>      1.60%  usleep   [unknown]         [.] 0xffffffff811c91d0
>>>>>      1.48%  usleep   [unknown]         [.] 0x00007fa0ffb3030d
>>>>>      1.24%  usleep   [unknown]         [.] 0x00007fa0ff6950c7
>>>
>>>> It is very weird that it doesn't know the dso.
>>>
>>>> I presume there is nothing unusual about the environment e.g. in a chroot or anything
>>>
>>>> What if you try a different event e.g. perf record --per-thread -e cycles sleep 1
>>>
>>> [root@zoo ~]# perf record --per-thread -e cycles sleep 1
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 0.014 MB perf.data (9 samples) ]
>>> [root@zoo ~]# perf report | grep -v ^# | head -5
>>>     42.57%  sleep    libc-2.20.so      [.] malloc_hook_ini
>>>     41.81%  sleep    [kernel.vmlinux]  [k] filemap_fault
>>>     14.35%  sleep    [kernel.vmlinux]  [k] flush_tlb_mm_range
>>>      1.18%  sleep    [kernel.vmlinux]  [k] strlcpy
>>>      0.09%  sleep    [kernel.vmlinux]  [k] native_write_msr_safe
>>> [root@zoo ~]#
>>>
>>> [root@zoo ~]# perf report --dsos libc-2.20.so | grep -v '^[#]'
>>>     42.57%  sleep    [.] malloc_hook_ini
>>>
>>> [root@zoo ~]#
>>>
>>> [root@zoo ~]# perf record --per-thread -e intel_bts// usleep 1
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 1.825 MB perf.data ]
>>> [root@zoo ~]# perf report | grep -v ^# | head -5
>>> Warning:
>>> 79074 instruction trace errors
>>>      2.80%  usleep   [unknown]         [.] 0x00007f48888aa061
>>> [root@zoo ~]#
>>
>> Running out of ideas.  Can you somehow share or send me the offending perf.data file?
> 
> http://vger.kernel.org/~acme/perf/perf.data.intel_bts-4.2.0-rc5+.xz

Says: You don't have permission to access
/~acme/perf/perf.data.intel_bts-4.2.0-rc5+.xz on this server.

> 
> It works if I use a tip/master kernel:
> 
> [root@perf4 ~]# uname -r
> 4.2.0-rc7+
> [root@perf4 ~]# uname -r
> 4.2.0-rc7+
> [root@perf4 ~]# perf record --per-thread -e intel_bts// usleep 1
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 1.785 MB perf.data ]
> [root@perf4 ~]# dmesg | grep Performance
> [    0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell
> events, full-width counters, Intel PMU driver.
> [root@perf4 ~]# perf report --stdio | grep -v ^# | head -10
> 
> 
> 
> 
>     10.81%  usleep   libc-2.17.so       [.] _dl_addr                            
>      6.56%  usleep   [kernel.kallsyms]  [.] unmap_single_vma                    
>      3.33%  usleep   ld-2.17.so         [.] strcmp                              
>      2.53%  usleep   [kernel.kallsyms]  [.] mem_cgroup_begin_page_stat          
>      2.49%  usleep   ld-2.17.so         [.] _dl_lookup_symbol_x                 
>      2.39%  usleep   ld-2.17.so         [.] _dl_relocate_object                 
> [root@perf4 ~]# 
> 
> Probably some fix for the kernel driver is missing?

Works for me though.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 08/25] perf tools: Add Intel BTS support
  2015-08-18  6:39             ` Adrian Hunter
@ 2015-08-18  9:09               ` Adrian Hunter
  2015-08-18 16:10               ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 77+ messages in thread
From: Adrian Hunter @ 2015-08-18  9:09 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

On 18/08/15 09:39, Adrian Hunter wrote:
> On 17/08/15 22:58, Arnaldo Carvalho de Melo wrote:
>> Em Mon, Aug 17, 2015 at 10:09:26PM +0300, Adrian Hunter escreveu:
>>> On 17/08/2015 8:58 p.m., Arnaldo Carvalho de Melo wrote:
>>>> Em Mon, Aug 17, 2015 at 08:43:09PM +0300, Adrian Hunter escreveu:
>>>>> On 17/08/2015 6:52 p.m., Arnaldo Carvalho de Melo wrote:
>>>>>>      1.92%  usleep   [unknown]         [.] 0x00007fa0ff695086
>>>>>>      1.60%  usleep   [unknown]         [.] 0xffffffff811c91d0
>>>>>>      1.48%  usleep   [unknown]         [.] 0x00007fa0ffb3030d
>>>>>>      1.24%  usleep   [unknown]         [.] 0x00007fa0ff6950c7
>>>>
>>>>> It is very weird that it doesn't know the dso.
>>>>
>>>>> I presume there is nothing unusual about the environment e.g. in a chroot or anything
>>>>
>>>>> What if you try a different event e.g. perf record --per-thread -e cycles sleep 1
>>>>
>>>> [root@zoo ~]# perf record --per-thread -e cycles sleep 1
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 0.014 MB perf.data (9 samples) ]
>>>> [root@zoo ~]# perf report | grep -v ^# | head -5
>>>>     42.57%  sleep    libc-2.20.so      [.] malloc_hook_ini
>>>>     41.81%  sleep    [kernel.vmlinux]  [k] filemap_fault
>>>>     14.35%  sleep    [kernel.vmlinux]  [k] flush_tlb_mm_range
>>>>      1.18%  sleep    [kernel.vmlinux]  [k] strlcpy
>>>>      0.09%  sleep    [kernel.vmlinux]  [k] native_write_msr_safe
>>>> [root@zoo ~]#
>>>>
>>>> [root@zoo ~]# perf report --dsos libc-2.20.so | grep -v '^[#]'
>>>>     42.57%  sleep    [.] malloc_hook_ini
>>>>
>>>> [root@zoo ~]#
>>>>
>>>> [root@zoo ~]# perf record --per-thread -e intel_bts// usleep 1
>>>> [ perf record: Woken up 1 times to write data ]
>>>> [ perf record: Captured and wrote 1.825 MB perf.data ]
>>>> [root@zoo ~]# perf report | grep -v ^# | head -5
>>>> Warning:
>>>> 79074 instruction trace errors
>>>>      2.80%  usleep   [unknown]         [.] 0x00007f48888aa061
>>>> [root@zoo ~]#
>>>
>>> Running out of ideas.  Can you somehow share or send me the offending perf.data file?
>>
>> http://vger.kernel.org/~acme/perf/perf.data.intel_bts-4.2.0-rc5+.xz
> 
> Says: You don't have permission to access
> /~acme/perf/perf.data.intel_bts-4.2.0-rc5+.xz on this server.
> 
>>
>> It works if I use a tip/master kernel:
>>
>> [root@perf4 ~]# uname -r
>> 4.2.0-rc7+
>> [root@perf4 ~]# uname -r
>> 4.2.0-rc7+
>> [root@perf4 ~]# perf record --per-thread -e intel_bts// usleep 1
>> [ perf record: Woken up 1 times to write data ]
>> [ perf record: Captured and wrote 1.785 MB perf.data ]
>> [root@perf4 ~]# dmesg | grep Performance
>> [    0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell
>> events, full-width counters, Intel PMU driver.
>> [root@perf4 ~]# perf report --stdio | grep -v ^# | head -10
>>
>>
>>
>>
>>     10.81%  usleep   libc-2.17.so       [.] _dl_addr                            
>>      6.56%  usleep   [kernel.kallsyms]  [.] unmap_single_vma                    
>>      3.33%  usleep   ld-2.17.so         [.] strcmp                              
>>      2.53%  usleep   [kernel.kallsyms]  [.] mem_cgroup_begin_page_stat          
>>      2.49%  usleep   ld-2.17.so         [.] _dl_lookup_symbol_x                 
>>      2.39%  usleep   ld-2.17.so         [.] _dl_relocate_object                 
>> [root@perf4 ~]# 
>>
>> Probably some fix for the kernel driver is missing?
> 
> Works for me though.

I looked at the code and noticed a minor issue, but I don't think it is the
cause of the problem.  I sent a fix - see "perf tools: Fix use of wrong
event when processing exit events"


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 08/25] perf tools: Add Intel BTS support
  2015-08-18  6:39             ` Adrian Hunter
  2015-08-18  9:09               ` Adrian Hunter
@ 2015-08-18 16:10               ` Arnaldo Carvalho de Melo
  2015-08-20  8:53                 ` Adrian Hunter
  1 sibling, 1 reply; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-18 16:10 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Tue, Aug 18, 2015 at 09:39:46AM +0300, Adrian Hunter escreveu:
> On 17/08/15 22:58, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Aug 17, 2015 at 10:09:26PM +0300, Adrian Hunter escreveu:
> >> On 17/08/2015 8:58 p.m., Arnaldo Carvalho de Melo wrote:
> >>> Em Mon, Aug 17, 2015 at 08:43:09PM +0300, Adrian Hunter escreveu:
> >>>> On 17/08/2015 6:52 p.m., Arnaldo Carvalho de Melo wrote:
> >>>>>      1.92%  usleep   [unknown]         [.] 0x00007fa0ff695086
> >>>>>      1.60%  usleep   [unknown]         [.] 0xffffffff811c91d0
> >>>>>      1.48%  usleep   [unknown]         [.] 0x00007fa0ffb3030d
> >>>>>      1.24%  usleep   [unknown]         [.] 0x00007fa0ff6950c7
> >>>
> >>>> It is very weird that it doesn't know the dso.
> >>>
> >>>> I presume there is nothing unusual about the environment e.g. in a chroot or anything
> >>>
> >>>> What if you try a different event e.g. perf record --per-thread -e cycles sleep 1
> >>>
> >>> [root@zoo ~]# perf record --per-thread -e cycles sleep 1
> >>> [ perf record: Woken up 1 times to write data ]
> >>> [ perf record: Captured and wrote 0.014 MB perf.data (9 samples) ]
> >>> [root@zoo ~]# perf report | grep -v ^# | head -5
> >>>     42.57%  sleep    libc-2.20.so      [.] malloc_hook_ini
> >>>     41.81%  sleep    [kernel.vmlinux]  [k] filemap_fault
> >>>     14.35%  sleep    [kernel.vmlinux]  [k] flush_tlb_mm_range
> >>>      1.18%  sleep    [kernel.vmlinux]  [k] strlcpy
> >>>      0.09%  sleep    [kernel.vmlinux]  [k] native_write_msr_safe
> >>> [root@zoo ~]#
> >>>
> >>> [root@zoo ~]# perf report --dsos libc-2.20.so | grep -v '^[#]'
> >>>     42.57%  sleep    [.] malloc_hook_ini
> >>>
> >>> [root@zoo ~]#
> >>>
> >>> [root@zoo ~]# perf record --per-thread -e intel_bts// usleep 1
> >>> [ perf record: Woken up 1 times to write data ]
> >>> [ perf record: Captured and wrote 1.825 MB perf.data ]
> >>> [root@zoo ~]# perf report | grep -v ^# | head -5
> >>> Warning:
> >>> 79074 instruction trace errors
> >>>      2.80%  usleep   [unknown]         [.] 0x00007f48888aa061
> >>> [root@zoo ~]#
> >>
> >> Running out of ideas.  Can you somehow share or send me the offending perf.data file?
> > 
> > http://vger.kernel.org/~acme/perf/perf.data.intel_bts-4.2.0-rc5+.xz
> 
> Says: You don't have permission to access
> /~acme/perf/perf.data.intel_bts-4.2.0-rc5+.xz on this server.

Ooops, fixed.
 
> > It works if I use a tip/master kernel:
> > 
> > [root@perf4 ~]# uname -r
> > 4.2.0-rc7+
> > [root@perf4 ~]# uname -r
> > 4.2.0-rc7+
> > [root@perf4 ~]# perf record --per-thread -e intel_bts// usleep 1
> > [ perf record: Woken up 1 times to write data ]
> > [ perf record: Captured and wrote 1.785 MB perf.data ]
> > [root@perf4 ~]# dmesg | grep Performance
> > [    0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell
> > events, full-width counters, Intel PMU driver.
> > [root@perf4 ~]# perf report --stdio | grep -v ^# | head -10
> > 
> > 
> > 
> > 
> >     10.81%  usleep   libc-2.17.so       [.] _dl_addr                            
> >      6.56%  usleep   [kernel.kallsyms]  [.] unmap_single_vma                    
> >      3.33%  usleep   ld-2.17.so         [.] strcmp                              
> >      2.53%  usleep   [kernel.kallsyms]  [.] mem_cgroup_begin_page_stat          
> >      2.49%  usleep   ld-2.17.so         [.] _dl_lookup_symbol_x                 
> >      2.39%  usleep   ld-2.17.so         [.] _dl_relocate_object                 
> > [root@perf4 ~]# 
> > 
> > Probably some fix for the kernel driver is missing?
> 
> Works for me though.

Works for you if you test in lockstep the tooling with the kernel
sources, right? It works for me as well, if I use the latest tip/master
tooling _and_ kernel, my report was for using the 4.2.0-rc latest for
the kernel and the tooling in tip/master.

I think this would be just a matter of having a better message, or
perhaps some fix that we need to send to the stable@kernel.org guys, so
that the new tooling works with a slightly older kernel.

Anyway, consider looking at that perf.data file that I fixed the
permissions on vger, and I will look at the fix you mentioned in another
message.

- Arnaldo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 08/25] perf tools: Add Intel BTS support
  2015-08-18 16:10               ` Arnaldo Carvalho de Melo
@ 2015-08-20  8:53                 ` Adrian Hunter
  2015-08-21 14:18                   ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-08-20  8:53 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

On 18/08/15 19:10, Arnaldo Carvalho de Melo wrote:
> Em Tue, Aug 18, 2015 at 09:39:46AM +0300, Adrian Hunter escreveu:
>> On 17/08/15 22:58, Arnaldo Carvalho de Melo wrote:
>>> Em Mon, Aug 17, 2015 at 10:09:26PM +0300, Adrian Hunter escreveu:
>>>> On 17/08/2015 8:58 p.m., Arnaldo Carvalho de Melo wrote:
>>>>> Em Mon, Aug 17, 2015 at 08:43:09PM +0300, Adrian Hunter escreveu:
>>>>>> On 17/08/2015 6:52 p.m., Arnaldo Carvalho de Melo wrote:
>>>>>>>      1.92%  usleep   [unknown]         [.] 0x00007fa0ff695086
>>>>>>>      1.60%  usleep   [unknown]         [.] 0xffffffff811c91d0
>>>>>>>      1.48%  usleep   [unknown]         [.] 0x00007fa0ffb3030d
>>>>>>>      1.24%  usleep   [unknown]         [.] 0x00007fa0ff6950c7
>>>>>
>>>>>> It is very weird that it doesn't know the dso.
>>>>>
>>>>>> I presume there is nothing unusual about the environment e.g. in a chroot or anything
>>>>>
>>>>>> What if you try a different event e.g. perf record --per-thread -e cycles sleep 1
>>>>>
>>>>> [root@zoo ~]# perf record --per-thread -e cycles sleep 1
>>>>> [ perf record: Woken up 1 times to write data ]
>>>>> [ perf record: Captured and wrote 0.014 MB perf.data (9 samples) ]
>>>>> [root@zoo ~]# perf report | grep -v ^# | head -5
>>>>>     42.57%  sleep    libc-2.20.so      [.] malloc_hook_ini
>>>>>     41.81%  sleep    [kernel.vmlinux]  [k] filemap_fault
>>>>>     14.35%  sleep    [kernel.vmlinux]  [k] flush_tlb_mm_range
>>>>>      1.18%  sleep    [kernel.vmlinux]  [k] strlcpy
>>>>>      0.09%  sleep    [kernel.vmlinux]  [k] native_write_msr_safe
>>>>> [root@zoo ~]#
>>>>>
>>>>> [root@zoo ~]# perf report --dsos libc-2.20.so | grep -v '^[#]'
>>>>>     42.57%  sleep    [.] malloc_hook_ini
>>>>>
>>>>> [root@zoo ~]#
>>>>>
>>>>> [root@zoo ~]# perf record --per-thread -e intel_bts// usleep 1
>>>>> [ perf record: Woken up 1 times to write data ]
>>>>> [ perf record: Captured and wrote 1.825 MB perf.data ]
>>>>> [root@zoo ~]# perf report | grep -v ^# | head -5
>>>>> Warning:
>>>>> 79074 instruction trace errors
>>>>>      2.80%  usleep   [unknown]         [.] 0x00007f48888aa061
>>>>> [root@zoo ~]#
>>>>
>>>> Running out of ideas.  Can you somehow share or send me the offending perf.data file?
>>>
>>> http://vger.kernel.org/~acme/perf/perf.data.intel_bts-4.2.0-rc5+.xz
>>
>> Says: You don't have permission to access
>> /~acme/perf/perf.data.intel_bts-4.2.0-rc5+.xz on this server.
> 
> Ooops, fixed.
>  
>>> It works if I use a tip/master kernel:
>>>
>>> [root@perf4 ~]# uname -r
>>> 4.2.0-rc7+
>>> [root@perf4 ~]# uname -r
>>> 4.2.0-rc7+
>>> [root@perf4 ~]# perf record --per-thread -e intel_bts// usleep 1
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 1.785 MB perf.data ]
>>> [root@perf4 ~]# dmesg | grep Performance
>>> [    0.188477] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell
>>> events, full-width counters, Intel PMU driver.
>>> [root@perf4 ~]# perf report --stdio | grep -v ^# | head -10
>>>
>>>
>>>
>>>
>>>     10.81%  usleep   libc-2.17.so       [.] _dl_addr                            
>>>      6.56%  usleep   [kernel.kallsyms]  [.] unmap_single_vma                    
>>>      3.33%  usleep   ld-2.17.so         [.] strcmp                              
>>>      2.53%  usleep   [kernel.kallsyms]  [.] mem_cgroup_begin_page_stat          
>>>      2.49%  usleep   ld-2.17.so         [.] _dl_lookup_symbol_x                 
>>>      2.39%  usleep   ld-2.17.so         [.] _dl_relocate_object                 
>>> [root@perf4 ~]# 
>>>
>>> Probably some fix for the kernel driver is missing?
>>
>> Works for me though.
> 
> Works for you if you test in lockstep the tooling with the kernel
> sources, right? It works for me as well, if I use the latest tip/master
> tooling _and_ kernel, my report was for using the 4.2.0-rc latest for
> the kernel and the tooling in tip/master.
> 
> I think this would be just a matter of having a better message, or
> perhaps some fix that we need to send to the stable@kernel.org guys, so
> that the new tooling works with a slightly older kernel.
> 
> Anyway, consider looking at that perf.data file that I fixed the
> permissions on vger, and I will look at the fix you mentioned in another
> message.

With your perf.data file I was able to find the problem.  I have sent patch
"perf tools: Fix Intel BTS/PT timestamp handling" to fix it.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf auxtrace: Add Intel PT as an AUX area tracing type
  2015-07-17 16:33 ` [PATCH V8 01/25] perf auxtrace: Add Intel PT as an AUX area tracing type Adrian Hunter
@ 2015-08-20  9:57   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-20  9:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, linux-kernel, tglx, jolsa, mingo, adrian.hunter, hpa, acme

Commit-ID:  55ea4ab4260f42b824450faa47fe4d129fce0918
Gitweb:     http://git.kernel.org/tip/55ea4ab4260f42b824450faa47fe4d129fce0918
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:36 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 17 Aug 2015 11:11:36 -0300

perf auxtrace: Add Intel PT as an AUX area tracing type

Add the Intel Processor Trace type constant PERF_AUXTRACE_INTEL_PT.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/auxtrace.c | 1 +
 tools/perf/util/auxtrace.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index a25b360..49dbfbe 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -884,6 +884,7 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 		fprintf(stdout, " type: %u\n", type);
 
 	switch (type) {
+	case PERF_AUXTRACE_INTEL_PT:
 	case PERF_AUXTRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 471aecb..7d12f33 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -39,6 +39,7 @@ struct events_stats;
 
 enum auxtrace_type {
 	PERF_AUXTRACE_UNKNOWN,
+	PERF_AUXTRACE_INTEL_PT,
 };
 
 enum itrace_period_type {

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add Intel PT packet decoder
  2015-07-17 16:33 ` [PATCH V8 02/25] perf tools: Add Intel PT packet decoder Adrian Hunter
@ 2015-08-20  9:57   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-20  9:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, tglx, linux-kernel, adrian.hunter, hpa, mingo, acme

Commit-ID:  a4e925905c98fb83538c164878946d77d0df1433
Gitweb:     http://git.kernel.org/tip/a4e925905c98fb83538c164878946d77d0df1433
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:37 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 17 Aug 2015 11:11:36 -0300

perf tools: Add Intel PT packet decoder

Add support for decoding Intel Processor Trace packets.

This essentially provides intel_pt_get_packet() which takes a buffer of
binary data and returns the decoded packet.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/Build                              |   1 +
 tools/perf/util/intel-pt-decoder/Build             |   1 +
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   | 400 +++++++++++++++++++++
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.h   |  64 ++++
 4 files changed, 466 insertions(+)

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 1ce0adc..615ca12 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -78,6 +78,7 @@ libperf-$(CONFIG_X86) += tsc.o
 libperf-y += cloexec.o
 libperf-y += thread-stack.o
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 libperf-y += parse-branch-options.o
 
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
new file mode 100644
index 0000000..9d67381
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -0,0 +1 @@
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
new file mode 100644
index 0000000..988c82c
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
@@ -0,0 +1,400 @@
+/*
+ * intel_pt_pkt_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <endian.h>
+#include <byteswap.h>
+
+#include "intel-pt-pkt-decoder.h"
+
+#define BIT(n)		(1 << (n))
+
+#define BIT63		((uint64_t)1 << 63)
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define le16_to_cpu bswap_16
+#define le32_to_cpu bswap_32
+#define le64_to_cpu bswap_64
+#define memcpy_le64(d, s, n) do { \
+	memcpy((d), (s), (n));    \
+	*(d) = le64_to_cpu(*(d)); \
+} while (0)
+#else
+#define le16_to_cpu
+#define le32_to_cpu
+#define le64_to_cpu
+#define memcpy_le64 memcpy
+#endif
+
+static const char * const packet_name[] = {
+	[INTEL_PT_BAD]		= "Bad Packet!",
+	[INTEL_PT_PAD]		= "PAD",
+	[INTEL_PT_TNT]		= "TNT",
+	[INTEL_PT_TIP_PGD]	= "TIP.PGD",
+	[INTEL_PT_TIP_PGE]	= "TIP.PGE",
+	[INTEL_PT_TSC]		= "TSC",
+	[INTEL_PT_MODE_EXEC]	= "MODE.Exec",
+	[INTEL_PT_MODE_TSX]	= "MODE.TSX",
+	[INTEL_PT_TIP]		= "TIP",
+	[INTEL_PT_FUP]		= "FUP",
+	[INTEL_PT_PSB]		= "PSB",
+	[INTEL_PT_PSBEND]	= "PSBEND",
+	[INTEL_PT_CBR]		= "CBR",
+	[INTEL_PT_PIP]		= "PIP",
+	[INTEL_PT_OVF]		= "OVF",
+};
+
+const char *intel_pt_pkt_name(enum intel_pt_pkt_type type)
+{
+	return packet_name[type];
+}
+
+static int intel_pt_get_long_tnt(const unsigned char *buf, size_t len,
+				 struct intel_pt_pkt *packet)
+{
+	uint64_t payload;
+	int count;
+
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	payload = le64_to_cpu(*(uint64_t *)buf);
+
+	for (count = 47; count; count--) {
+		if (payload & BIT63)
+			break;
+		payload <<= 1;
+	}
+
+	packet->type = INTEL_PT_TNT;
+	packet->count = count;
+	packet->payload = payload << 1;
+	return 8;
+}
+
+static int intel_pt_get_pip(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	uint64_t payload = 0;
+
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	packet->type = INTEL_PT_PIP;
+	memcpy_le64(&payload, buf + 2, 6);
+	packet->payload = payload >> 1;
+
+	return 8;
+}
+
+static int intel_pt_get_cbr(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 4)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_CBR;
+	packet->payload = buf[2];
+	return 4;
+}
+
+static int intel_pt_get_ovf(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_OVF;
+	return 2;
+}
+
+static int intel_pt_get_psb(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	int i;
+
+	if (len < 16)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	for (i = 2; i < 16; i += 2) {
+		if (buf[i] != 2 || buf[i + 1] != 0x82)
+			return INTEL_PT_BAD_PACKET;
+	}
+
+	packet->type = INTEL_PT_PSB;
+	return 16;
+}
+
+static int intel_pt_get_psbend(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_PSBEND;
+	return 2;
+}
+
+static int intel_pt_get_pad(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_PAD;
+	return 1;
+}
+
+static int intel_pt_get_ext(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	switch (buf[1]) {
+	case 0xa3: /* Long TNT */
+		return intel_pt_get_long_tnt(buf, len, packet);
+	case 0x43: /* PIP */
+		return intel_pt_get_pip(buf, len, packet);
+	case 0x03: /* CBR */
+		return intel_pt_get_cbr(buf, len, packet);
+	case 0xf3: /* OVF */
+		return intel_pt_get_ovf(packet);
+	case 0x82: /* PSB */
+		return intel_pt_get_psb(buf, len, packet);
+	case 0x23: /* PSBEND */
+		return intel_pt_get_psbend(packet);
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+}
+
+static int intel_pt_get_short_tnt(unsigned int byte,
+				  struct intel_pt_pkt *packet)
+{
+	int count;
+
+	for (count = 6; count; count--) {
+		if (byte & BIT(7))
+			break;
+		byte <<= 1;
+	}
+
+	packet->type = INTEL_PT_TNT;
+	packet->count = count;
+	packet->payload = (uint64_t)byte << 57;
+
+	return 1;
+}
+
+static int intel_pt_get_ip(enum intel_pt_pkt_type type, unsigned int byte,
+			   const unsigned char *buf, size_t len,
+			   struct intel_pt_pkt *packet)
+{
+	switch (byte >> 5) {
+	case 0:
+		packet->count = 0;
+		break;
+	case 1:
+		if (len < 3)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 2;
+		packet->payload = le16_to_cpu(*(uint16_t *)(buf + 1));
+		break;
+	case 2:
+		if (len < 5)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 4;
+		packet->payload = le32_to_cpu(*(uint32_t *)(buf + 1));
+		break;
+	case 3:
+	case 6:
+		if (len < 7)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 6;
+		memcpy_le64(&packet->payload, buf + 1, 6);
+		break;
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+
+	packet->type = type;
+
+	return packet->count + 1;
+}
+
+static int intel_pt_get_mode(const unsigned char *buf, size_t len,
+			     struct intel_pt_pkt *packet)
+{
+	if (len < 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	switch (buf[1] >> 5) {
+	case 0:
+		packet->type = INTEL_PT_MODE_EXEC;
+		switch (buf[1] & 3) {
+		case 0:
+			packet->payload = 16;
+			break;
+		case 1:
+			packet->payload = 64;
+			break;
+		case 2:
+			packet->payload = 32;
+			break;
+		default:
+			return INTEL_PT_BAD_PACKET;
+		}
+		break;
+	case 1:
+		packet->type = INTEL_PT_MODE_TSX;
+		if ((buf[1] & 3) == 3)
+			return INTEL_PT_BAD_PACKET;
+		packet->payload = buf[1] & 3;
+		break;
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+
+	return 2;
+}
+
+static int intel_pt_get_tsc(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_TSC;
+	memcpy_le64(&packet->payload, buf + 1, 7);
+	return 8;
+}
+
+static int intel_pt_do_get_packet(const unsigned char *buf, size_t len,
+				  struct intel_pt_pkt *packet)
+{
+	unsigned int byte;
+
+	memset(packet, 0, sizeof(struct intel_pt_pkt));
+
+	if (!len)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	byte = buf[0];
+	if (!(byte & BIT(0))) {
+		if (byte == 0)
+			return intel_pt_get_pad(packet);
+		if (byte == 2)
+			return intel_pt_get_ext(buf, len, packet);
+		return intel_pt_get_short_tnt(byte, packet);
+	}
+
+	switch (byte & 0x1f) {
+	case 0x0D:
+		return intel_pt_get_ip(INTEL_PT_TIP, byte, buf, len, packet);
+	case 0x11:
+		return intel_pt_get_ip(INTEL_PT_TIP_PGE, byte, buf, len,
+				       packet);
+	case 0x01:
+		return intel_pt_get_ip(INTEL_PT_TIP_PGD, byte, buf, len,
+				       packet);
+	case 0x1D:
+		return intel_pt_get_ip(INTEL_PT_FUP, byte, buf, len, packet);
+	case 0x19:
+		switch (byte) {
+		case 0x99:
+			return intel_pt_get_mode(buf, len, packet);
+		case 0x19:
+			return intel_pt_get_tsc(buf, len, packet);
+		default:
+			return INTEL_PT_BAD_PACKET;
+		}
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+}
+
+int intel_pt_get_packet(const unsigned char *buf, size_t len,
+			struct intel_pt_pkt *packet)
+{
+	int ret;
+
+	ret = intel_pt_do_get_packet(buf, len, packet);
+	if (ret > 0) {
+		while (ret < 8 && len > (size_t)ret && !buf[ret])
+			ret += 1;
+	}
+	return ret;
+}
+
+int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf,
+		      size_t buf_len)
+{
+	int ret, i;
+	unsigned long long payload = packet->payload;
+	const char *name = intel_pt_pkt_name(packet->type);
+
+	switch (packet->type) {
+	case INTEL_PT_BAD:
+	case INTEL_PT_PAD:
+	case INTEL_PT_PSB:
+	case INTEL_PT_PSBEND:
+	case INTEL_PT_OVF:
+		return snprintf(buf, buf_len, "%s", name);
+	case INTEL_PT_TNT: {
+		size_t blen = buf_len;
+
+		ret = snprintf(buf, blen, "%s ", name);
+		if (ret < 0)
+			return ret;
+		buf += ret;
+		blen -= ret;
+		for (i = 0; i < packet->count; i++) {
+			if (payload & BIT63)
+				ret = snprintf(buf, blen, "T");
+			else
+				ret = snprintf(buf, blen, "N");
+			if (ret < 0)
+				return ret;
+			buf += ret;
+			blen -= ret;
+			payload <<= 1;
+		}
+		ret = snprintf(buf, blen, " (%d)", packet->count);
+		if (ret < 0)
+			return ret;
+		blen -= ret;
+		return buf_len - blen;
+	}
+	case INTEL_PT_TIP_PGD:
+	case INTEL_PT_TIP_PGE:
+	case INTEL_PT_TIP:
+	case INTEL_PT_FUP:
+		if (!(packet->count))
+			return snprintf(buf, buf_len, "%s no ip", name);
+	case INTEL_PT_CBR:
+		return snprintf(buf, buf_len, "%s 0x%llx", name, payload);
+	case INTEL_PT_TSC:
+		if (packet->count)
+			return snprintf(buf, buf_len,
+					"%s 0x%llx CTC 0x%x FC 0x%x",
+					name, payload, packet->count & 0xffff,
+					(packet->count >> 16) & 0x1ff);
+		else
+			return snprintf(buf, buf_len, "%s 0x%llx",
+					name, payload);
+	case INTEL_PT_MODE_EXEC:
+		return snprintf(buf, buf_len, "%s %lld", name, payload);
+	case INTEL_PT_MODE_TSX:
+		return snprintf(buf, buf_len, "%s TXAbort:%u InTX:%u",
+				name, (unsigned)(payload >> 1) & 1,
+				(unsigned)payload & 1);
+	case INTEL_PT_PIP:
+		ret = snprintf(buf, buf_len, "%s 0x%llx",
+			       name, payload);
+		return ret;
+	default:
+		break;
+	}
+	return snprintf(buf, buf_len, "%s 0x%llx (%d)",
+			name, payload, packet->count);
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
new file mode 100644
index 0000000..53404fa
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
@@ -0,0 +1,64 @@
+/*
+ * intel_pt_pkt_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_PKT_DECODER_H__
+#define INCLUDE__INTEL_PT_PKT_DECODER_H__
+
+#include <stddef.h>
+#include <stdint.h>
+
+#define INTEL_PT_PKT_DESC_MAX	256
+
+#define INTEL_PT_NEED_MORE_BYTES	-1
+#define INTEL_PT_BAD_PACKET		-2
+
+#define INTEL_PT_PSB_STR		"\002\202\002\202\002\202\002\202" \
+					"\002\202\002\202\002\202\002\202"
+#define INTEL_PT_PSB_LEN		16
+
+#define INTEL_PT_PKT_MAX_SZ		16
+
+enum intel_pt_pkt_type {
+	INTEL_PT_BAD,
+	INTEL_PT_PAD,
+	INTEL_PT_TNT,
+	INTEL_PT_TIP_PGD,
+	INTEL_PT_TIP_PGE,
+	INTEL_PT_TSC,
+	INTEL_PT_MODE_EXEC,
+	INTEL_PT_MODE_TSX,
+	INTEL_PT_TIP,
+	INTEL_PT_FUP,
+	INTEL_PT_PSB,
+	INTEL_PT_PSBEND,
+	INTEL_PT_CBR,
+	INTEL_PT_PIP,
+	INTEL_PT_OVF,
+};
+
+struct intel_pt_pkt {
+	enum intel_pt_pkt_type	type;
+	int			count;
+	uint64_t		payload;
+};
+
+const char *intel_pt_pkt_name(enum intel_pt_pkt_type);
+
+int intel_pt_get_packet(const unsigned char *buf, size_t len,
+			struct intel_pt_pkt *packet);
+
+int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf, size_t len);
+
+#endif

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add Intel PT instruction decoder
  2015-08-13  7:14       ` [PATCH V9 " Adrian Hunter
  2015-08-13 12:37         ` Arnaldo Carvalho de Melo
@ 2015-08-20  9:57         ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-20  9:57 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, adrian.hunter, jolsa, hpa, mingo, tglx, acme

Commit-ID:  237fae79f50d2d0c7bdeb039bc2c87fc6d52c7e7
Gitweb:     http://git.kernel.org/tip/237fae79f50d2d0c7bdeb039bc2c87fc6d52c7e7
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Thu, 13 Aug 2015 10:14:55 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 17 Aug 2015 11:11:36 -0300

perf tools: Add Intel PT instruction decoder

Add support for decoding instructions for Intel Processor Trace.  The
kernel x86 instruction decoder is copied for this.

This essentially provides intel_pt_get_insn() which takes a binary
buffer, uses the kernel's x86 instruction decoder to get details of the
instruction and then categorizes it for consumption by an Intel PT
decoder.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1439450095-30122-1-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/build/Makefile.build                         |   2 +
 tools/perf/.gitignore                              |   1 +
 tools/perf/Makefile.perf                           |  12 +-
 tools/perf/util/intel-pt-decoder/Build             |  12 +-
 .../util/intel-pt-decoder/gen-insn-attr-x86.awk    | 386 +++++++++++++++++++++
 tools/perf/util/intel-pt-decoder/inat.c            |  96 +++++
 .../perf/util/intel-pt-decoder}/inat.h             |   0
 .../perf/util/intel-pt-decoder}/insn.c             |   0
 .../perf/util/intel-pt-decoder}/insn.h             |   0
 .../util/intel-pt-decoder/intel-pt-insn-decoder.c  | 246 +++++++++++++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |  65 ++++
 .../perf/util/intel-pt-decoder}/x86-opcode-map.txt |   0
 12 files changed, 817 insertions(+), 3 deletions(-)

diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index faca2bf..8120af9 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -57,6 +57,8 @@ quiet_cmd_cc_i_c = CPP      $@
 quiet_cmd_cc_s_c = AS       $@
       cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $<
 
+quiet_cmd_gen = GEN      $@
+
 # Link agregate command
 # If there's nothing to link, create empty $@ object.
 quiet_cmd_ld_multi = LD       $@
diff --git a/tools/perf/.gitignore b/tools/perf/.gitignore
index 09db62b..3d1bb80 100644
--- a/tools/perf/.gitignore
+++ b/tools/perf/.gitignore
@@ -29,3 +29,4 @@ config.mak.autogen
 *.pyc
 *.pyo
 .config-detected
+util/intel-pt-decoder/inat-tables.c
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 4b58dae..d9863cb 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -76,6 +76,12 @@ include config/utilities.mak
 #
 # Define NO_AUXTRACE if you do not want AUX area tracing support
 
+# As per kernel Makefile, avoid funny character set dependencies
+unexport LC_ALL
+LC_COLLATE=C
+LC_NUMERIC=C
+export LC_COLLATE LC_NUMERIC
+
 ifeq ($(srctree),)
 srctree := $(patsubst %/,%,$(dir $(shell pwd)))
 srctree := $(patsubst %/,%,$(dir $(srctree)))
@@ -135,6 +141,7 @@ INSTALL = install
 FLEX    = flex
 BISON   = bison
 STRIP   = strip
+AWK     = awk
 
 LIB_DIR          = $(srctree)/tools/lib/api/
 TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
@@ -289,7 +296,7 @@ strip: $(PROGRAMS) $(OUTPUT)perf
 
 PERF_IN := $(OUTPUT)perf-in.o
 
-export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX
+export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX AWK
 build := -f $(srctree)/tools/build/Makefile.build dir=. obj
 
 $(PERF_IN): $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h FORCE
@@ -565,7 +572,8 @@ clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
 	$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
 	$(Q)$(RM) $(OUTPUT).config-detected
 	$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf perf-read-vdso32 perf-read-vdsox32
-	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex*
+	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex* \
+		$(OUTPUT)util/intel-pt-decoder/inat-tables.c
 	$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean
 	$(python-clean)
 
diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
index 9d67381..5a46ce1 100644
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -1 +1,11 @@
-libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o
+
+inat_tables_script = util/intel-pt-decoder/gen-insn-attr-x86.awk
+inat_tables_maps = util/intel-pt-decoder/x86-opcode-map.txt
+
+$(OUTPUT)util/intel-pt-decoder/inat-tables.c: $(inat_tables_script) $(inat_tables_maps)
+	@$(call echo-cmd,gen)$(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
+
+$(OUTPUT)util/intel-pt-decoder/intel-pt-insn-decoder.o: util/intel-pt-decoder/inat.c $(OUTPUT)util/intel-pt-decoder/inat-tables.c
+
+CFLAGS_intel-pt-insn-decoder.o += -I$(OUTPUT)util/intel-pt-decoder -Wno-override-init
diff --git a/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk b/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
new file mode 100644
index 0000000..51756734
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/gen-insn-attr-x86.awk
@@ -0,0 +1,386 @@
+#!/bin/awk -f
+# gen-insn-attr-x86.awk: Instruction attribute table generator
+# Written by Masami Hiramatsu <mhiramat@redhat.com>
+#
+# Usage: awk -f gen-insn-attr-x86.awk x86-opcode-map.txt > inat-tables.c
+
+# Awk implementation sanity check
+function check_awk_implement() {
+	if (sprintf("%x", 0) != "0")
+		return "Your awk has a printf-format problem."
+	return ""
+}
+
+# Clear working vars
+function clear_vars() {
+	delete table
+	delete lptable2
+	delete lptable1
+	delete lptable3
+	eid = -1 # escape id
+	gid = -1 # group id
+	aid = -1 # AVX id
+	tname = ""
+}
+
+BEGIN {
+	# Implementation error checking
+	awkchecked = check_awk_implement()
+	if (awkchecked != "") {
+		print "Error: " awkchecked > "/dev/stderr"
+		print "Please try to use gawk." > "/dev/stderr"
+		exit 1
+	}
+
+	# Setup generating tables
+	print "/* x86 opcode map generated from x86-opcode-map.txt */"
+	print "/* Do not change this code. */\n"
+	ggid = 1
+	geid = 1
+	gaid = 0
+	delete etable
+	delete gtable
+	delete atable
+
+	opnd_expr = "^[A-Za-z/]"
+	ext_expr = "^\\("
+	sep_expr = "^\\|$"
+	group_expr = "^Grp[0-9A-Za-z]+"
+
+	imm_expr = "^[IJAOL][a-z]"
+	imm_flag["Ib"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
+	imm_flag["Jb"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
+	imm_flag["Iw"] = "INAT_MAKE_IMM(INAT_IMM_WORD)"
+	imm_flag["Id"] = "INAT_MAKE_IMM(INAT_IMM_DWORD)"
+	imm_flag["Iq"] = "INAT_MAKE_IMM(INAT_IMM_QWORD)"
+	imm_flag["Ap"] = "INAT_MAKE_IMM(INAT_IMM_PTR)"
+	imm_flag["Iz"] = "INAT_MAKE_IMM(INAT_IMM_VWORD32)"
+	imm_flag["Jz"] = "INAT_MAKE_IMM(INAT_IMM_VWORD32)"
+	imm_flag["Iv"] = "INAT_MAKE_IMM(INAT_IMM_VWORD)"
+	imm_flag["Ob"] = "INAT_MOFFSET"
+	imm_flag["Ov"] = "INAT_MOFFSET"
+	imm_flag["Lx"] = "INAT_MAKE_IMM(INAT_IMM_BYTE)"
+
+	modrm_expr = "^([CDEGMNPQRSUVW/][a-z]+|NTA|T[012])"
+	force64_expr = "\\([df]64\\)"
+	rex_expr = "^REX(\\.[XRWB]+)*"
+	fpu_expr = "^ESC" # TODO
+
+	lprefix1_expr = "\\((66|!F3)\\)"
+	lprefix2_expr = "\\(F3\\)"
+	lprefix3_expr = "\\((F2|!F3|66\\&F2)\\)"
+	lprefix_expr = "\\((66|F2|F3)\\)"
+	max_lprefix = 4
+
+	# All opcodes starting with lower-case 'v' or with (v1) superscript
+	# accepts VEX prefix
+	vexok_opcode_expr = "^v.*"
+	vexok_expr = "\\(v1\\)"
+	# All opcodes with (v) superscript supports *only* VEX prefix
+	vexonly_expr = "\\(v\\)"
+
+	prefix_expr = "\\(Prefix\\)"
+	prefix_num["Operand-Size"] = "INAT_PFX_OPNDSZ"
+	prefix_num["REPNE"] = "INAT_PFX_REPNE"
+	prefix_num["REP/REPE"] = "INAT_PFX_REPE"
+	prefix_num["XACQUIRE"] = "INAT_PFX_REPNE"
+	prefix_num["XRELEASE"] = "INAT_PFX_REPE"
+	prefix_num["LOCK"] = "INAT_PFX_LOCK"
+	prefix_num["SEG=CS"] = "INAT_PFX_CS"
+	prefix_num["SEG=DS"] = "INAT_PFX_DS"
+	prefix_num["SEG=ES"] = "INAT_PFX_ES"
+	prefix_num["SEG=FS"] = "INAT_PFX_FS"
+	prefix_num["SEG=GS"] = "INAT_PFX_GS"
+	prefix_num["SEG=SS"] = "INAT_PFX_SS"
+	prefix_num["Address-Size"] = "INAT_PFX_ADDRSZ"
+	prefix_num["VEX+1byte"] = "INAT_PFX_VEX2"
+	prefix_num["VEX+2byte"] = "INAT_PFX_VEX3"
+
+	clear_vars()
+}
+
+function semantic_error(msg) {
+	print "Semantic error at " NR ": " msg > "/dev/stderr"
+	exit 1
+}
+
+function debug(msg) {
+	print "DEBUG: " msg
+}
+
+function array_size(arr,   i,c) {
+	c = 0
+	for (i in arr)
+		c++
+	return c
+}
+
+/^Table:/ {
+	print "/* " $0 " */"
+	if (tname != "")
+		semantic_error("Hit Table: before EndTable:.");
+}
+
+/^Referrer:/ {
+	if (NF != 1) {
+		# escape opcode table
+		ref = ""
+		for (i = 2; i <= NF; i++)
+			ref = ref $i
+		eid = escape[ref]
+		tname = sprintf("inat_escape_table_%d", eid)
+	}
+}
+
+/^AVXcode:/ {
+	if (NF != 1) {
+		# AVX/escape opcode table
+		aid = $2
+		if (gaid <= aid)
+			gaid = aid + 1
+		if (tname == "")	# AVX only opcode table
+			tname = sprintf("inat_avx_table_%d", $2)
+	}
+	if (aid == -1 && eid == -1)	# primary opcode table
+		tname = "inat_primary_table"
+}
+
+/^GrpTable:/ {
+	print "/* " $0 " */"
+	if (!($2 in group))
+		semantic_error("No group: " $2 )
+	gid = group[$2]
+	tname = "inat_group_table_" gid
+}
+
+function print_table(tbl,name,fmt,n)
+{
+	print "const insn_attr_t " name " = {"
+	for (i = 0; i < n; i++) {
+		id = sprintf(fmt, i)
+		if (tbl[id])
+			print "	[" id "] = " tbl[id] ","
+	}
+	print "};"
+}
+
+/^EndTable/ {
+	if (gid != -1) {
+		# print group tables
+		if (array_size(table) != 0) {
+			print_table(table, tname "[INAT_GROUP_TABLE_SIZE]",
+				    "0x%x", 8)
+			gtable[gid,0] = tname
+		}
+		if (array_size(lptable1) != 0) {
+			print_table(lptable1, tname "_1[INAT_GROUP_TABLE_SIZE]",
+				    "0x%x", 8)
+			gtable[gid,1] = tname "_1"
+		}
+		if (array_size(lptable2) != 0) {
+			print_table(lptable2, tname "_2[INAT_GROUP_TABLE_SIZE]",
+				    "0x%x", 8)
+			gtable[gid,2] = tname "_2"
+		}
+		if (array_size(lptable3) != 0) {
+			print_table(lptable3, tname "_3[INAT_GROUP_TABLE_SIZE]",
+				    "0x%x", 8)
+			gtable[gid,3] = tname "_3"
+		}
+	} else {
+		# print primary/escaped tables
+		if (array_size(table) != 0) {
+			print_table(table, tname "[INAT_OPCODE_TABLE_SIZE]",
+				    "0x%02x", 256)
+			etable[eid,0] = tname
+			if (aid >= 0)
+				atable[aid,0] = tname
+		}
+		if (array_size(lptable1) != 0) {
+			print_table(lptable1,tname "_1[INAT_OPCODE_TABLE_SIZE]",
+				    "0x%02x", 256)
+			etable[eid,1] = tname "_1"
+			if (aid >= 0)
+				atable[aid,1] = tname "_1"
+		}
+		if (array_size(lptable2) != 0) {
+			print_table(lptable2,tname "_2[INAT_OPCODE_TABLE_SIZE]",
+				    "0x%02x", 256)
+			etable[eid,2] = tname "_2"
+			if (aid >= 0)
+				atable[aid,2] = tname "_2"
+		}
+		if (array_size(lptable3) != 0) {
+			print_table(lptable3,tname "_3[INAT_OPCODE_TABLE_SIZE]",
+				    "0x%02x", 256)
+			etable[eid,3] = tname "_3"
+			if (aid >= 0)
+				atable[aid,3] = tname "_3"
+		}
+	}
+	print ""
+	clear_vars()
+}
+
+function add_flags(old,new) {
+	if (old && new)
+		return old " | " new
+	else if (old)
+		return old
+	else
+		return new
+}
+
+# convert operands to flags.
+function convert_operands(count,opnd,       i,j,imm,mod)
+{
+	imm = null
+	mod = null
+	for (j = 1; j <= count; j++) {
+		i = opnd[j]
+		if (match(i, imm_expr) == 1) {
+			if (!imm_flag[i])
+				semantic_error("Unknown imm opnd: " i)
+			if (imm) {
+				if (i != "Ib")
+					semantic_error("Second IMM error")
+				imm = add_flags(imm, "INAT_SCNDIMM")
+			} else
+				imm = imm_flag[i]
+		} else if (match(i, modrm_expr))
+			mod = "INAT_MODRM"
+	}
+	return add_flags(imm, mod)
+}
+
+/^[0-9a-f]+\:/ {
+	if (NR == 1)
+		next
+	# get index
+	idx = "0x" substr($1, 1, index($1,":") - 1)
+	if (idx in table)
+		semantic_error("Redefine " idx " in " tname)
+
+	# check if escaped opcode
+	if ("escape" == $2) {
+		if ($3 != "#")
+			semantic_error("No escaped name")
+		ref = ""
+		for (i = 4; i <= NF; i++)
+			ref = ref $i
+		if (ref in escape)
+			semantic_error("Redefine escape (" ref ")")
+		escape[ref] = geid
+		geid++
+		table[idx] = "INAT_MAKE_ESCAPE(" escape[ref] ")"
+		next
+	}
+
+	variant = null
+	# converts
+	i = 2
+	while (i <= NF) {
+		opcode = $(i++)
+		delete opnds
+		ext = null
+		flags = null
+		opnd = null
+		# parse one opcode
+		if (match($i, opnd_expr)) {
+			opnd = $i
+			count = split($(i++), opnds, ",")
+			flags = convert_operands(count, opnds)
+		}
+		if (match($i, ext_expr))
+			ext = $(i++)
+		if (match($i, sep_expr))
+			i++
+		else if (i < NF)
+			semantic_error($i " is not a separator")
+
+		# check if group opcode
+		if (match(opcode, group_expr)) {
+			if (!(opcode in group)) {
+				group[opcode] = ggid
+				ggid++
+			}
+			flags = add_flags(flags, "INAT_MAKE_GROUP(" group[opcode] ")")
+		}
+		# check force(or default) 64bit
+		if (match(ext, force64_expr))
+			flags = add_flags(flags, "INAT_FORCE64")
+
+		# check REX prefix
+		if (match(opcode, rex_expr))
+			flags = add_flags(flags, "INAT_MAKE_PREFIX(INAT_PFX_REX)")
+
+		# check coprocessor escape : TODO
+		if (match(opcode, fpu_expr))
+			flags = add_flags(flags, "INAT_MODRM")
+
+		# check VEX codes
+		if (match(ext, vexonly_expr))
+			flags = add_flags(flags, "INAT_VEXOK | INAT_VEXONLY")
+		else if (match(ext, vexok_expr) || match(opcode, vexok_opcode_expr))
+			flags = add_flags(flags, "INAT_VEXOK")
+
+		# check prefixes
+		if (match(ext, prefix_expr)) {
+			if (!prefix_num[opcode])
+				semantic_error("Unknown prefix: " opcode)
+			flags = add_flags(flags, "INAT_MAKE_PREFIX(" prefix_num[opcode] ")")
+		}
+		if (length(flags) == 0)
+			continue
+		# check if last prefix
+		if (match(ext, lprefix1_expr)) {
+			lptable1[idx] = add_flags(lptable1[idx],flags)
+			variant = "INAT_VARIANT"
+		}
+		if (match(ext, lprefix2_expr)) {
+			lptable2[idx] = add_flags(lptable2[idx],flags)
+			variant = "INAT_VARIANT"
+		}
+		if (match(ext, lprefix3_expr)) {
+			lptable3[idx] = add_flags(lptable3[idx],flags)
+			variant = "INAT_VARIANT"
+		}
+		if (!match(ext, lprefix_expr)){
+			table[idx] = add_flags(table[idx],flags)
+		}
+	}
+	if (variant)
+		table[idx] = add_flags(table[idx],variant)
+}
+
+END {
+	if (awkchecked != "")
+		exit 1
+	# print escape opcode map's array
+	print "/* Escape opcode map array */"
+	print "const insn_attr_t * const inat_escape_tables[INAT_ESC_MAX + 1]" \
+	      "[INAT_LSTPFX_MAX + 1] = {"
+	for (i = 0; i < geid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (etable[i,j])
+				print "	["i"]["j"] = "etable[i,j]","
+	print "};\n"
+	# print group opcode map's array
+	print "/* Group opcode map array */"
+	print "const insn_attr_t * const inat_group_tables[INAT_GRP_MAX + 1]"\
+	      "[INAT_LSTPFX_MAX + 1] = {"
+	for (i = 0; i < ggid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (gtable[i,j])
+				print "	["i"]["j"] = "gtable[i,j]","
+	print "};\n"
+	# print AVX opcode map's array
+	print "/* AVX opcode map array */"
+	print "const insn_attr_t * const inat_avx_tables[X86_VEX_M_MAX + 1]"\
+	      "[INAT_LSTPFX_MAX + 1] = {"
+	for (i = 0; i < gaid; i++)
+		for (j = 0; j < max_lprefix; j++)
+			if (atable[i,j])
+				print "	["i"]["j"] = "atable[i,j]","
+	print "};"
+}
diff --git a/tools/perf/util/intel-pt-decoder/inat.c b/tools/perf/util/intel-pt-decoder/inat.c
new file mode 100644
index 0000000..feeaa50
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/inat.c
@@ -0,0 +1,96 @@
+/*
+ * x86 instruction attribute tables
+ *
+ * Written by Masami Hiramatsu <mhiramat@redhat.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, write to the Free Software
+ * Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+ *
+ */
+#include <asm/insn.h>
+
+/* Attribute tables are generated from opcode map */
+#include "inat-tables.c"
+
+/* Attribute search APIs */
+insn_attr_t inat_get_opcode_attribute(insn_byte_t opcode)
+{
+	return inat_primary_table[opcode];
+}
+
+int inat_get_last_prefix_id(insn_byte_t last_pfx)
+{
+	insn_attr_t lpfx_attr;
+
+	lpfx_attr = inat_get_opcode_attribute(last_pfx);
+	return inat_last_prefix_id(lpfx_attr);
+}
+
+insn_attr_t inat_get_escape_attribute(insn_byte_t opcode, int lpfx_id,
+				      insn_attr_t esc_attr)
+{
+	const insn_attr_t *table;
+	int n;
+
+	n = inat_escape_id(esc_attr);
+
+	table = inat_escape_tables[n][0];
+	if (!table)
+		return 0;
+	if (inat_has_variant(table[opcode]) && lpfx_id) {
+		table = inat_escape_tables[n][lpfx_id];
+		if (!table)
+			return 0;
+	}
+	return table[opcode];
+}
+
+insn_attr_t inat_get_group_attribute(insn_byte_t modrm, int lpfx_id,
+				     insn_attr_t grp_attr)
+{
+	const insn_attr_t *table;
+	int n;
+
+	n = inat_group_id(grp_attr);
+
+	table = inat_group_tables[n][0];
+	if (!table)
+		return inat_group_common_attribute(grp_attr);
+	if (inat_has_variant(table[X86_MODRM_REG(modrm)]) && lpfx_id) {
+		table = inat_group_tables[n][lpfx_id];
+		if (!table)
+			return inat_group_common_attribute(grp_attr);
+	}
+	return table[X86_MODRM_REG(modrm)] |
+	       inat_group_common_attribute(grp_attr);
+}
+
+insn_attr_t inat_get_avx_attribute(insn_byte_t opcode, insn_byte_t vex_m,
+				   insn_byte_t vex_p)
+{
+	const insn_attr_t *table;
+	if (vex_m > X86_VEX_M_MAX || vex_p > INAT_LSTPFX_MAX)
+		return 0;
+	/* At first, this checks the master table */
+	table = inat_avx_tables[vex_m][0];
+	if (!table)
+		return 0;
+	if (!inat_is_group(table[opcode]) && vex_p) {
+		/* If this is not a group, get attribute directly */
+		table = inat_avx_tables[vex_m][vex_p];
+		if (!table)
+			return 0;
+	}
+	return table[opcode];
+}
diff --git a/arch/x86/include/asm/inat.h b/tools/perf/util/intel-pt-decoder/inat.h
similarity index 100%
copy from arch/x86/include/asm/inat.h
copy to tools/perf/util/intel-pt-decoder/inat.h
diff --git a/arch/x86/lib/insn.c b/tools/perf/util/intel-pt-decoder/insn.c
similarity index 100%
copy from arch/x86/lib/insn.c
copy to tools/perf/util/intel-pt-decoder/insn.c
diff --git a/arch/x86/include/asm/insn.h b/tools/perf/util/intel-pt-decoder/insn.h
similarity index 100%
copy from arch/x86/include/asm/insn.h
copy to tools/perf/util/intel-pt-decoder/insn.h
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
new file mode 100644
index 0000000..46980fc
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
@@ -0,0 +1,246 @@
+/*
+ * intel_pt_insn_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <endian.h>
+#include <byteswap.h>
+
+#include "event.h"
+
+#include <asm/insn.h>
+
+#include "inat.c"
+#include "insn.c"
+
+#include "intel-pt-insn-decoder.h"
+
+/* Based on branch_type() from perf_event_intel_lbr.c */
+static void intel_pt_insn_decoder(struct insn *insn,
+				  struct intel_pt_insn *intel_pt_insn)
+{
+	enum intel_pt_insn_op op = INTEL_PT_OP_OTHER;
+	enum intel_pt_insn_branch branch = INTEL_PT_BR_NO_BRANCH;
+	int ext;
+
+	if (insn_is_avx(insn)) {
+		intel_pt_insn->op = INTEL_PT_OP_OTHER;
+		intel_pt_insn->branch = INTEL_PT_BR_NO_BRANCH;
+		intel_pt_insn->length = insn->length;
+		return;
+	}
+
+	switch (insn->opcode.bytes[0]) {
+	case 0xf:
+		switch (insn->opcode.bytes[1]) {
+		case 0x05: /* syscall */
+		case 0x34: /* sysenter */
+			op = INTEL_PT_OP_SYSCALL;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 0x07: /* sysret */
+		case 0x35: /* sysexit */
+			op = INTEL_PT_OP_SYSRET;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 0x80 ... 0x8f: /* jcc */
+			op = INTEL_PT_OP_JCC;
+			branch = INTEL_PT_BR_CONDITIONAL;
+			break;
+		default:
+			break;
+		}
+		break;
+	case 0x70 ... 0x7f: /* jcc */
+		op = INTEL_PT_OP_JCC;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xc2: /* near ret */
+	case 0xc3: /* near ret */
+	case 0xca: /* far ret */
+	case 0xcb: /* far ret */
+		op = INTEL_PT_OP_RET;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xcf: /* iret */
+		op = INTEL_PT_OP_IRET;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xcc ... 0xce: /* int */
+		op = INTEL_PT_OP_INT;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xe8: /* call near rel */
+		op = INTEL_PT_OP_CALL;
+		branch = INTEL_PT_BR_UNCONDITIONAL;
+		break;
+	case 0x9a: /* call far absolute */
+		op = INTEL_PT_OP_CALL;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xe0 ... 0xe2: /* loop */
+		op = INTEL_PT_OP_LOOP;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xe3: /* jcc */
+		op = INTEL_PT_OP_JCC;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xe9: /* jmp */
+	case 0xeb: /* jmp */
+		op = INTEL_PT_OP_JMP;
+		branch = INTEL_PT_BR_UNCONDITIONAL;
+		break;
+	case 0xea: /* far jmp */
+		op = INTEL_PT_OP_JMP;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xff: /* call near absolute, call far absolute ind */
+		ext = (insn->modrm.bytes[0] >> 3) & 0x7;
+		switch (ext) {
+		case 2: /* near ind call */
+		case 3: /* far ind call */
+			op = INTEL_PT_OP_CALL;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 4:
+		case 5:
+			op = INTEL_PT_OP_JMP;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		default:
+			break;
+		}
+		break;
+	default:
+		break;
+	}
+
+	intel_pt_insn->op = op;
+	intel_pt_insn->branch = branch;
+	intel_pt_insn->length = insn->length;
+
+	if (branch == INTEL_PT_BR_CONDITIONAL ||
+	    branch == INTEL_PT_BR_UNCONDITIONAL) {
+#if __BYTE_ORDER == __BIG_ENDIAN
+		switch (insn->immediate.nbytes) {
+		case 1:
+			intel_pt_insn->rel = insn->immediate.value;
+			break;
+		case 2:
+			intel_pt_insn->rel =
+					bswap_16((short)insn->immediate.value);
+			break;
+		case 4:
+			intel_pt_insn->rel = bswap_32(insn->immediate.value);
+			break;
+		}
+#else
+		intel_pt_insn->rel = insn->immediate.value;
+#endif
+	}
+}
+
+int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
+		      struct intel_pt_insn *intel_pt_insn)
+{
+	struct insn insn;
+
+	insn_init(&insn, buf, len, x86_64);
+	insn_get_length(&insn);
+	if (!insn_complete(&insn) || insn.length > len)
+		return -1;
+	intel_pt_insn_decoder(&insn, intel_pt_insn);
+	if (insn.length < INTEL_PT_INSN_DBG_BUF_SZ)
+		memcpy(intel_pt_insn->buf, buf, insn.length);
+	else
+		memcpy(intel_pt_insn->buf, buf, INTEL_PT_INSN_DBG_BUF_SZ);
+	return 0;
+}
+
+const char *branch_name[] = {
+	[INTEL_PT_OP_OTHER]	= "Other",
+	[INTEL_PT_OP_CALL]	= "Call",
+	[INTEL_PT_OP_RET]	= "Ret",
+	[INTEL_PT_OP_JCC]	= "Jcc",
+	[INTEL_PT_OP_JMP]	= "Jmp",
+	[INTEL_PT_OP_LOOP]	= "Loop",
+	[INTEL_PT_OP_IRET]	= "IRet",
+	[INTEL_PT_OP_INT]	= "Int",
+	[INTEL_PT_OP_SYSCALL]	= "Syscall",
+	[INTEL_PT_OP_SYSRET]	= "Sysret",
+};
+
+const char *intel_pt_insn_name(enum intel_pt_insn_op op)
+{
+	return branch_name[op];
+}
+
+int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
+		       size_t buf_len)
+{
+	switch (intel_pt_insn->branch) {
+	case INTEL_PT_BR_CONDITIONAL:
+	case INTEL_PT_BR_UNCONDITIONAL:
+		return snprintf(buf, buf_len, "%s %s%d",
+				intel_pt_insn_name(intel_pt_insn->op),
+				intel_pt_insn->rel > 0 ? "+" : "",
+				intel_pt_insn->rel);
+	case INTEL_PT_BR_NO_BRANCH:
+	case INTEL_PT_BR_INDIRECT:
+		return snprintf(buf, buf_len, "%s",
+				intel_pt_insn_name(intel_pt_insn->op));
+	default:
+		break;
+	}
+	return 0;
+}
+
+size_t intel_pt_insn_max_size(void)
+{
+	return MAX_INSN_SIZE;
+}
+
+int intel_pt_insn_type(enum intel_pt_insn_op op)
+{
+	switch (op) {
+	case INTEL_PT_OP_OTHER:
+		return 0;
+	case INTEL_PT_OP_CALL:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL;
+	case INTEL_PT_OP_RET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN;
+	case INTEL_PT_OP_JCC:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
+	case INTEL_PT_OP_JMP:
+		return PERF_IP_FLAG_BRANCH;
+	case INTEL_PT_OP_LOOP:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
+	case INTEL_PT_OP_IRET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
+		       PERF_IP_FLAG_INTERRUPT;
+	case INTEL_PT_OP_INT:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+		       PERF_IP_FLAG_INTERRUPT;
+	case INTEL_PT_OP_SYSCALL:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+		       PERF_IP_FLAG_SYSCALLRET;
+	case INTEL_PT_OP_SYSRET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
+		       PERF_IP_FLAG_SYSCALLRET;
+	default:
+		return 0;
+	}
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
new file mode 100644
index 0000000..b0adbf3
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
@@ -0,0 +1,65 @@
+/*
+ * intel_pt_insn_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_INSN_DECODER_H__
+#define INCLUDE__INTEL_PT_INSN_DECODER_H__
+
+#include <stddef.h>
+#include <stdint.h>
+
+#define INTEL_PT_INSN_DESC_MAX		32
+#define INTEL_PT_INSN_DBG_BUF_SZ	16
+
+enum intel_pt_insn_op {
+	INTEL_PT_OP_OTHER,
+	INTEL_PT_OP_CALL,
+	INTEL_PT_OP_RET,
+	INTEL_PT_OP_JCC,
+	INTEL_PT_OP_JMP,
+	INTEL_PT_OP_LOOP,
+	INTEL_PT_OP_IRET,
+	INTEL_PT_OP_INT,
+	INTEL_PT_OP_SYSCALL,
+	INTEL_PT_OP_SYSRET,
+};
+
+enum intel_pt_insn_branch {
+	INTEL_PT_BR_NO_BRANCH,
+	INTEL_PT_BR_INDIRECT,
+	INTEL_PT_BR_CONDITIONAL,
+	INTEL_PT_BR_UNCONDITIONAL,
+};
+
+struct intel_pt_insn {
+	enum intel_pt_insn_op		op;
+	enum intel_pt_insn_branch	branch;
+	int				length;
+	int32_t				rel;
+	unsigned char			buf[INTEL_PT_INSN_DBG_BUF_SZ];
+};
+
+int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
+		      struct intel_pt_insn *intel_pt_insn);
+
+const char *intel_pt_insn_name(enum intel_pt_insn_op op);
+
+int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
+		       size_t buf_len);
+
+size_t intel_pt_insn_max_size(void);
+
+int intel_pt_insn_type(enum intel_pt_insn_op op);
+
+#endif
diff --git a/arch/x86/lib/x86-opcode-map.txt b/tools/perf/util/intel-pt-decoder/x86-opcode-map.txt
similarity index 100%
copy from arch/x86/lib/x86-opcode-map.txt
copy to tools/perf/util/intel-pt-decoder/x86-opcode-map.txt

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add Intel PT log
  2015-07-17 16:33 ` [PATCH V8 04/25] perf tools: Add Intel PT log Adrian Hunter
@ 2015-08-20  9:58   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-20  9:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, adrian.hunter, jolsa, linux-kernel, tglx, hpa, mingo

Commit-ID:  53af92849d793662e943d61bb16f7d3eb2d7a072
Gitweb:     http://git.kernel.org/tip/53af92849d793662e943d61bb16f7d3eb2d7a072
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:39 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 17 Aug 2015 11:11:36 -0300

perf tools: Add Intel PT log

Add a facility to log Intel Processor Trace decoding.  The log is
intended for debugging purposes only.

The log file name is "intel_pt.log" and is opened in the current
directory.  The log contains a record of all packets and instructions
decoded and can get very large (10 MB would be a small one).

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-5-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/intel-pt-decoder/Build          |   2 +-
 tools/perf/util/intel-pt-decoder/intel-pt-log.c | 155 ++++++++++++++++++++++++
 tools/perf/util/intel-pt-decoder/intel-pt-log.h |  52 ++++++++
 3 files changed, 208 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
index 5a46ce1..3c717b4 100644
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -1,4 +1,4 @@
-libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o intel-pt-log.o
 
 inat_tables_script = util/intel-pt-decoder/gen-insn-attr-x86.awk
 inat_tables_maps = util/intel-pt-decoder/x86-opcode-map.txt
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.c b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
new file mode 100644
index 0000000..d09c7d9
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
@@ -0,0 +1,155 @@
+/*
+ * intel_pt_log.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdarg.h>
+#include <stdbool.h>
+#include <string.h>
+
+#include "intel-pt-log.h"
+#include "intel-pt-insn-decoder.h"
+
+#include "intel-pt-pkt-decoder.h"
+
+#define MAX_LOG_NAME 256
+
+static FILE *f;
+static char log_name[MAX_LOG_NAME];
+static bool enable_logging;
+
+void intel_pt_log_enable(void)
+{
+	enable_logging = true;
+}
+
+void intel_pt_log_disable(void)
+{
+	if (f)
+		fflush(f);
+	enable_logging = false;
+}
+
+void intel_pt_log_set_name(const char *name)
+{
+	strncpy(log_name, name, MAX_LOG_NAME - 5);
+	strcat(log_name, ".log");
+}
+
+static void intel_pt_print_data(const unsigned char *buf, int len, uint64_t pos,
+				int indent)
+{
+	int i;
+
+	for (i = 0; i < indent; i++)
+		fprintf(f, " ");
+
+	fprintf(f, "  %08" PRIx64 ": ", pos);
+	for (i = 0; i < len; i++)
+		fprintf(f, " %02x", buf[i]);
+	for (; i < 16; i++)
+		fprintf(f, "   ");
+	fprintf(f, " ");
+}
+
+static void intel_pt_print_no_data(uint64_t pos, int indent)
+{
+	int i;
+
+	for (i = 0; i < indent; i++)
+		fprintf(f, " ");
+
+	fprintf(f, "  %08" PRIx64 ": ", pos);
+	for (i = 0; i < 16; i++)
+		fprintf(f, "   ");
+	fprintf(f, " ");
+}
+
+static int intel_pt_log_open(void)
+{
+	if (!enable_logging)
+		return -1;
+
+	if (f)
+		return 0;
+
+	if (!log_name[0])
+		return -1;
+
+	f = fopen(log_name, "w+");
+	if (!f) {
+		enable_logging = false;
+		return -1;
+	}
+
+	return 0;
+}
+
+void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			 uint64_t pos, const unsigned char *buf)
+{
+	char desc[INTEL_PT_PKT_DESC_MAX];
+
+	if (intel_pt_log_open())
+		return;
+
+	intel_pt_print_data(buf, pkt_len, pos, 0);
+	intel_pt_pkt_desc(packet, desc, INTEL_PT_PKT_DESC_MAX);
+	fprintf(f, "%s\n", desc);
+}
+
+void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	char desc[INTEL_PT_INSN_DESC_MAX];
+	size_t len = intel_pt_insn->length;
+
+	if (intel_pt_log_open())
+		return;
+
+	if (len > INTEL_PT_INSN_DBG_BUF_SZ)
+		len = INTEL_PT_INSN_DBG_BUF_SZ;
+	intel_pt_print_data(intel_pt_insn->buf, len, ip, 8);
+	if (intel_pt_insn_desc(intel_pt_insn, desc, INTEL_PT_INSN_DESC_MAX) > 0)
+		fprintf(f, "%s\n", desc);
+	else
+		fprintf(f, "Bad instruction!\n");
+}
+
+void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	char desc[INTEL_PT_INSN_DESC_MAX];
+
+	if (intel_pt_log_open())
+		return;
+
+	intel_pt_print_no_data(ip, 8);
+	if (intel_pt_insn_desc(intel_pt_insn, desc, INTEL_PT_INSN_DESC_MAX) > 0)
+		fprintf(f, "%s\n", desc);
+	else
+		fprintf(f, "Bad instruction!\n");
+}
+
+void intel_pt_log(const char *fmt, ...)
+{
+	va_list args;
+
+	if (intel_pt_log_open())
+		return;
+
+	va_start(args, fmt);
+	vfprintf(f, fmt, args);
+	va_end(args);
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.h b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
new file mode 100644
index 0000000..db3942f
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
@@ -0,0 +1,52 @@
+/*
+ * intel_pt_log.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_LOG_H__
+#define INCLUDE__INTEL_PT_LOG_H__
+
+#include <stdint.h>
+#include <inttypes.h>
+
+struct intel_pt_pkt;
+
+void intel_pt_log_enable(void);
+void intel_pt_log_disable(void);
+void intel_pt_log_set_name(const char *name);
+
+void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			 uint64_t pos, const unsigned char *buf);
+
+struct intel_pt_insn;
+
+void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip);
+void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn,
+			       uint64_t ip);
+
+__attribute__((format(printf, 1, 2)))
+void intel_pt_log(const char *fmt, ...);
+
+#define x64_fmt "0x%" PRIx64
+
+static inline void intel_pt_log_at(const char *msg, uint64_t u)
+{
+	intel_pt_log("%s at " x64_fmt "\n", msg, u);
+}
+
+static inline void intel_pt_log_to(const char *msg, uint64_t u)
+{
+	intel_pt_log("%s to " x64_fmt "\n", msg, u);
+}
+
+#endif

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add Intel PT decoder
  2015-07-17 16:33 ` [PATCH V8 05/25] perf tools: Add Intel PT decoder Adrian Hunter
@ 2015-08-20  9:58   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-20  9:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mingo, adrian.hunter, acme, hpa, jolsa, linux-kernel

Commit-ID:  f4aa081949e7b6b01e711229c5a47ee3482a169c
Gitweb:     http://git.kernel.org/tip/f4aa081949e7b6b01e711229c5a47ee3482a169c
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:40 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 17 Aug 2015 11:11:36 -0300

perf tools: Add Intel PT decoder

Add support for decoding an Intel Processor Trace.

Intel PT trace data must be 'decoded' which involves walking the object
code and matching the trace data packets.

The decoder requests a buffer of binary data via a get_trace()
call-back, which it decodes using instruction information which it gets
via another call-back walk_insn().

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-6-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/intel-pt-decoder/Build             |    2 +-
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 1816 ++++++++++++++++++++
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |  104 ++
 3 files changed, 1921 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
index 3c717b4..240730d 100644
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -1,4 +1,4 @@
-libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o intel-pt-log.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o intel-pt-log.o intel-pt-decoder.o
 
 inat_tables_script = util/intel-pt-decoder/gen-insn-attr-x86.awk
 inat_tables_maps = util/intel-pt-decoder/x86-opcode-map.txt
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
new file mode 100644
index 0000000..f8ac462
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -0,0 +1,1816 @@
+/*
+ * intel_pt_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include "../cache.h"
+#include "../util.h"
+
+#include "intel-pt-insn-decoder.h"
+#include "intel-pt-pkt-decoder.h"
+#include "intel-pt-decoder.h"
+#include "intel-pt-log.h"
+
+#define INTEL_PT_BLK_SIZE 1024
+
+#define BIT63 (((uint64_t)1 << 63))
+
+#define INTEL_PT_RETURN 1
+
+/* Maximum number of loops with no packets consumed i.e. stuck in a loop */
+#define INTEL_PT_MAX_LOOPS 10000
+
+struct intel_pt_blk {
+	struct intel_pt_blk *prev;
+	uint64_t ip[INTEL_PT_BLK_SIZE];
+};
+
+struct intel_pt_stack {
+	struct intel_pt_blk *blk;
+	struct intel_pt_blk *spare;
+	int pos;
+};
+
+enum intel_pt_pkt_state {
+	INTEL_PT_STATE_NO_PSB,
+	INTEL_PT_STATE_NO_IP,
+	INTEL_PT_STATE_ERR_RESYNC,
+	INTEL_PT_STATE_IN_SYNC,
+	INTEL_PT_STATE_TNT,
+	INTEL_PT_STATE_TIP,
+	INTEL_PT_STATE_TIP_PGD,
+	INTEL_PT_STATE_FUP,
+	INTEL_PT_STATE_FUP_NO_TIP,
+};
+
+#ifdef INTEL_PT_STRICT
+#define INTEL_PT_STATE_ERR1	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR2	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR3	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR4	INTEL_PT_STATE_NO_PSB
+#else
+#define INTEL_PT_STATE_ERR1	(decoder->pkt_state)
+#define INTEL_PT_STATE_ERR2	INTEL_PT_STATE_NO_IP
+#define INTEL_PT_STATE_ERR3	INTEL_PT_STATE_ERR_RESYNC
+#define INTEL_PT_STATE_ERR4	INTEL_PT_STATE_IN_SYNC
+#endif
+
+struct intel_pt_decoder {
+	int (*get_trace)(struct intel_pt_buffer *buffer, void *data);
+	int (*walk_insn)(struct intel_pt_insn *intel_pt_insn,
+			 uint64_t *insn_cnt_ptr, uint64_t *ip, uint64_t to_ip,
+			 uint64_t max_insn_cnt, void *data);
+	void *data;
+	struct intel_pt_state state;
+	const unsigned char *buf;
+	size_t len;
+	bool return_compression;
+	bool pge;
+	uint64_t pos;
+	uint64_t last_ip;
+	uint64_t ip;
+	uint64_t cr3;
+	uint64_t timestamp;
+	uint64_t tsc_timestamp;
+	uint64_t ref_timestamp;
+	uint64_t ret_addr;
+	struct intel_pt_stack stack;
+	enum intel_pt_pkt_state pkt_state;
+	struct intel_pt_pkt packet;
+	struct intel_pt_pkt tnt;
+	int pkt_step;
+	int pkt_len;
+	unsigned int cbr;
+	unsigned int max_non_turbo_ratio;
+	int exec_mode;
+	unsigned int insn_bytes;
+	uint64_t sign_bit;
+	uint64_t sign_bits;
+	uint64_t period;
+	enum intel_pt_period_type period_type;
+	uint64_t period_insn_cnt;
+	uint64_t period_mask;
+	uint64_t period_ticks;
+	uint64_t last_masked_timestamp;
+	bool continuous_period;
+	bool overflow;
+	bool set_fup_tx_flags;
+	unsigned int fup_tx_flags;
+	unsigned int tx_flags;
+	uint64_t timestamp_insn_cnt;
+	uint64_t stuck_ip;
+	int no_progress;
+	int stuck_ip_prd;
+	int stuck_ip_cnt;
+	const unsigned char *next_buf;
+	size_t next_len;
+	unsigned char temp_buf[INTEL_PT_PKT_MAX_SZ];
+};
+
+static uint64_t intel_pt_lower_power_of_2(uint64_t x)
+{
+	int i;
+
+	for (i = 0; x != 1; i++)
+		x >>= 1;
+
+	return x << i;
+}
+
+static void intel_pt_setup_period(struct intel_pt_decoder *decoder)
+{
+	if (decoder->period_type == INTEL_PT_PERIOD_TICKS) {
+		uint64_t period;
+
+		period = intel_pt_lower_power_of_2(decoder->period);
+		decoder->period_mask  = ~(period - 1);
+		decoder->period_ticks = period;
+	}
+}
+
+struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
+{
+	struct intel_pt_decoder *decoder;
+
+	if (!params->get_trace || !params->walk_insn)
+		return NULL;
+
+	decoder = zalloc(sizeof(struct intel_pt_decoder));
+	if (!decoder)
+		return NULL;
+
+	decoder->get_trace          = params->get_trace;
+	decoder->walk_insn          = params->walk_insn;
+	decoder->data               = params->data;
+	decoder->return_compression = params->return_compression;
+
+	decoder->sign_bit           = (uint64_t)1 << 47;
+	decoder->sign_bits          = ~(((uint64_t)1 << 48) - 1);
+
+	decoder->period             = params->period;
+	decoder->period_type        = params->period_type;
+
+	decoder->max_non_turbo_ratio = params->max_non_turbo_ratio;
+
+	intel_pt_setup_period(decoder);
+
+	return decoder;
+}
+
+static void intel_pt_pop_blk(struct intel_pt_stack *stack)
+{
+	struct intel_pt_blk *blk = stack->blk;
+
+	stack->blk = blk->prev;
+	if (!stack->spare)
+		stack->spare = blk;
+	else
+		free(blk);
+}
+
+static uint64_t intel_pt_pop(struct intel_pt_stack *stack)
+{
+	if (!stack->pos) {
+		if (!stack->blk)
+			return 0;
+		intel_pt_pop_blk(stack);
+		if (!stack->blk)
+			return 0;
+		stack->pos = INTEL_PT_BLK_SIZE;
+	}
+	return stack->blk->ip[--stack->pos];
+}
+
+static int intel_pt_alloc_blk(struct intel_pt_stack *stack)
+{
+	struct intel_pt_blk *blk;
+
+	if (stack->spare) {
+		blk = stack->spare;
+		stack->spare = NULL;
+	} else {
+		blk = malloc(sizeof(struct intel_pt_blk));
+		if (!blk)
+			return -ENOMEM;
+	}
+
+	blk->prev = stack->blk;
+	stack->blk = blk;
+	stack->pos = 0;
+	return 0;
+}
+
+static int intel_pt_push(struct intel_pt_stack *stack, uint64_t ip)
+{
+	int err;
+
+	if (!stack->blk || stack->pos == INTEL_PT_BLK_SIZE) {
+		err = intel_pt_alloc_blk(stack);
+		if (err)
+			return err;
+	}
+
+	stack->blk->ip[stack->pos++] = ip;
+	return 0;
+}
+
+static void intel_pt_clear_stack(struct intel_pt_stack *stack)
+{
+	while (stack->blk)
+		intel_pt_pop_blk(stack);
+	stack->pos = 0;
+}
+
+static void intel_pt_free_stack(struct intel_pt_stack *stack)
+{
+	intel_pt_clear_stack(stack);
+	zfree(&stack->blk);
+	zfree(&stack->spare);
+}
+
+void intel_pt_decoder_free(struct intel_pt_decoder *decoder)
+{
+	intel_pt_free_stack(&decoder->stack);
+	free(decoder);
+}
+
+static int intel_pt_ext_err(int code)
+{
+	switch (code) {
+	case -ENOMEM:
+		return INTEL_PT_ERR_NOMEM;
+	case -ENOSYS:
+		return INTEL_PT_ERR_INTERN;
+	case -EBADMSG:
+		return INTEL_PT_ERR_BADPKT;
+	case -ENODATA:
+		return INTEL_PT_ERR_NODATA;
+	case -EILSEQ:
+		return INTEL_PT_ERR_NOINSN;
+	case -ENOENT:
+		return INTEL_PT_ERR_MISMAT;
+	case -EOVERFLOW:
+		return INTEL_PT_ERR_OVR;
+	case -ENOSPC:
+		return INTEL_PT_ERR_LOST;
+	case -ELOOP:
+		return INTEL_PT_ERR_NELOOP;
+	default:
+		return INTEL_PT_ERR_UNK;
+	}
+}
+
+static const char *intel_pt_err_msgs[] = {
+	[INTEL_PT_ERR_NOMEM]  = "Memory allocation failed",
+	[INTEL_PT_ERR_INTERN] = "Internal error",
+	[INTEL_PT_ERR_BADPKT] = "Bad packet",
+	[INTEL_PT_ERR_NODATA] = "No more data",
+	[INTEL_PT_ERR_NOINSN] = "Failed to get instruction",
+	[INTEL_PT_ERR_MISMAT] = "Trace doesn't match instruction",
+	[INTEL_PT_ERR_OVR]    = "Overflow packet",
+	[INTEL_PT_ERR_LOST]   = "Lost trace data",
+	[INTEL_PT_ERR_UNK]    = "Unknown error!",
+	[INTEL_PT_ERR_NELOOP] = "Never-ending loop",
+};
+
+int intel_pt__strerror(int code, char *buf, size_t buflen)
+{
+	if (code < 1 || code > INTEL_PT_ERR_MAX)
+		code = INTEL_PT_ERR_UNK;
+	strlcpy(buf, intel_pt_err_msgs[code], buflen);
+	return 0;
+}
+
+static uint64_t intel_pt_calc_ip(struct intel_pt_decoder *decoder,
+				 const struct intel_pt_pkt *packet,
+				 uint64_t last_ip)
+{
+	uint64_t ip;
+
+	switch (packet->count) {
+	case 2:
+		ip = (last_ip & (uint64_t)0xffffffffffff0000ULL) |
+		     packet->payload;
+		break;
+	case 4:
+		ip = (last_ip & (uint64_t)0xffffffff00000000ULL) |
+		     packet->payload;
+		break;
+	case 6:
+		ip = packet->payload;
+		break;
+	default:
+		return 0;
+	}
+
+	if (ip & decoder->sign_bit)
+		return ip | decoder->sign_bits;
+
+	return ip;
+}
+
+static inline void intel_pt_set_last_ip(struct intel_pt_decoder *decoder)
+{
+	decoder->last_ip = intel_pt_calc_ip(decoder, &decoder->packet,
+					    decoder->last_ip);
+}
+
+static inline void intel_pt_set_ip(struct intel_pt_decoder *decoder)
+{
+	intel_pt_set_last_ip(decoder);
+	decoder->ip = decoder->last_ip;
+}
+
+static void intel_pt_decoder_log_packet(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log_packet(&decoder->packet, decoder->pkt_len, decoder->pos,
+			    decoder->buf);
+}
+
+static int intel_pt_bug(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log("ERROR: Internal error\n");
+	decoder->pkt_state = INTEL_PT_STATE_NO_PSB;
+	return -ENOSYS;
+}
+
+static inline void intel_pt_clear_tx_flags(struct intel_pt_decoder *decoder)
+{
+	decoder->tx_flags = 0;
+}
+
+static inline void intel_pt_update_in_tx(struct intel_pt_decoder *decoder)
+{
+	decoder->tx_flags = decoder->packet.payload & INTEL_PT_IN_TX;
+}
+
+static int intel_pt_bad_packet(struct intel_pt_decoder *decoder)
+{
+	intel_pt_clear_tx_flags(decoder);
+	decoder->pkt_len = 1;
+	decoder->pkt_step = 1;
+	intel_pt_decoder_log_packet(decoder);
+	if (decoder->pkt_state != INTEL_PT_STATE_NO_PSB) {
+		intel_pt_log("ERROR: Bad packet\n");
+		decoder->pkt_state = INTEL_PT_STATE_ERR1;
+	}
+	return -EBADMSG;
+}
+
+static int intel_pt_get_data(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_buffer buffer = { .buf = 0, };
+	int ret;
+
+	decoder->pkt_step = 0;
+
+	intel_pt_log("Getting more data\n");
+	ret = decoder->get_trace(&buffer, decoder->data);
+	if (ret)
+		return ret;
+	decoder->buf = buffer.buf;
+	decoder->len = buffer.len;
+	if (!decoder->len) {
+		intel_pt_log("No more data\n");
+		return -ENODATA;
+	}
+	if (!buffer.consecutive) {
+		decoder->ip = 0;
+		decoder->pkt_state = INTEL_PT_STATE_NO_PSB;
+		decoder->ref_timestamp = buffer.ref_timestamp;
+		decoder->timestamp = 0;
+		decoder->state.trace_nr = buffer.trace_nr;
+		intel_pt_log("Reference timestamp 0x%" PRIx64 "\n",
+			     decoder->ref_timestamp);
+		return -ENOLINK;
+	}
+
+	return 0;
+}
+
+static int intel_pt_get_next_data(struct intel_pt_decoder *decoder)
+{
+	if (!decoder->next_buf)
+		return intel_pt_get_data(decoder);
+
+	decoder->buf = decoder->next_buf;
+	decoder->len = decoder->next_len;
+	decoder->next_buf = 0;
+	decoder->next_len = 0;
+	return 0;
+}
+
+static int intel_pt_get_split_packet(struct intel_pt_decoder *decoder)
+{
+	unsigned char *buf = decoder->temp_buf;
+	size_t old_len, len, n;
+	int ret;
+
+	old_len = decoder->len;
+	len = decoder->len;
+	memcpy(buf, decoder->buf, len);
+
+	ret = intel_pt_get_data(decoder);
+	if (ret) {
+		decoder->pos += old_len;
+		return ret < 0 ? ret : -EINVAL;
+	}
+
+	n = INTEL_PT_PKT_MAX_SZ - len;
+	if (n > decoder->len)
+		n = decoder->len;
+	memcpy(buf + len, decoder->buf, n);
+	len += n;
+
+	ret = intel_pt_get_packet(buf, len, &decoder->packet);
+	if (ret < (int)old_len) {
+		decoder->next_buf = decoder->buf;
+		decoder->next_len = decoder->len;
+		decoder->buf = buf;
+		decoder->len = old_len;
+		return intel_pt_bad_packet(decoder);
+	}
+
+	decoder->next_buf = decoder->buf + (ret - old_len);
+	decoder->next_len = decoder->len - (ret - old_len);
+
+	decoder->buf = buf;
+	decoder->len = ret;
+
+	return ret;
+}
+
+static int intel_pt_get_next_packet(struct intel_pt_decoder *decoder)
+{
+	int ret;
+
+	do {
+		decoder->pos += decoder->pkt_step;
+		decoder->buf += decoder->pkt_step;
+		decoder->len -= decoder->pkt_step;
+
+		if (!decoder->len) {
+			ret = intel_pt_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		ret = intel_pt_get_packet(decoder->buf, decoder->len,
+					  &decoder->packet);
+		if (ret == INTEL_PT_NEED_MORE_BYTES &&
+		    decoder->len < INTEL_PT_PKT_MAX_SZ && !decoder->next_buf) {
+			ret = intel_pt_get_split_packet(decoder);
+			if (ret < 0)
+				return ret;
+		}
+		if (ret <= 0)
+			return intel_pt_bad_packet(decoder);
+
+		decoder->pkt_len = ret;
+		decoder->pkt_step = ret;
+		intel_pt_decoder_log_packet(decoder);
+	} while (decoder->packet.type == INTEL_PT_PAD);
+
+	return 0;
+}
+
+static uint64_t intel_pt_next_period(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp, masked_timestamp;
+
+	timestamp = decoder->timestamp + decoder->timestamp_insn_cnt;
+	masked_timestamp = timestamp & decoder->period_mask;
+	if (decoder->continuous_period) {
+		if (masked_timestamp != decoder->last_masked_timestamp)
+			return 1;
+	} else {
+		timestamp += 1;
+		masked_timestamp = timestamp & decoder->period_mask;
+		if (masked_timestamp != decoder->last_masked_timestamp) {
+			decoder->last_masked_timestamp = masked_timestamp;
+			decoder->continuous_period = true;
+		}
+	}
+	return decoder->period_ticks - (timestamp - masked_timestamp);
+}
+
+static uint64_t intel_pt_next_sample(struct intel_pt_decoder *decoder)
+{
+	switch (decoder->period_type) {
+	case INTEL_PT_PERIOD_INSTRUCTIONS:
+		return decoder->period - decoder->period_insn_cnt;
+	case INTEL_PT_PERIOD_TICKS:
+		return intel_pt_next_period(decoder);
+	case INTEL_PT_PERIOD_NONE:
+	default:
+		return 0;
+	}
+}
+
+static void intel_pt_sample_insn(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp, masked_timestamp;
+
+	switch (decoder->period_type) {
+	case INTEL_PT_PERIOD_INSTRUCTIONS:
+		decoder->period_insn_cnt = 0;
+		break;
+	case INTEL_PT_PERIOD_TICKS:
+		timestamp = decoder->timestamp + decoder->timestamp_insn_cnt;
+		masked_timestamp = timestamp & decoder->period_mask;
+		decoder->last_masked_timestamp = masked_timestamp;
+		break;
+	case INTEL_PT_PERIOD_NONE:
+	default:
+		break;
+	}
+
+	decoder->state.type |= INTEL_PT_INSTRUCTION;
+}
+
+static int intel_pt_walk_insn(struct intel_pt_decoder *decoder,
+			      struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	uint64_t max_insn_cnt, insn_cnt = 0;
+	int err;
+
+	max_insn_cnt = intel_pt_next_sample(decoder);
+
+	err = decoder->walk_insn(intel_pt_insn, &insn_cnt, &decoder->ip, ip,
+				 max_insn_cnt, decoder->data);
+
+	decoder->timestamp_insn_cnt += insn_cnt;
+	decoder->period_insn_cnt += insn_cnt;
+
+	if (err) {
+		decoder->no_progress = 0;
+		decoder->pkt_state = INTEL_PT_STATE_ERR2;
+		intel_pt_log_at("ERROR: Failed to get instruction",
+				decoder->ip);
+		if (err == -ENOENT)
+			return -ENOLINK;
+		return -EILSEQ;
+	}
+
+	if (ip && decoder->ip == ip) {
+		err = -EAGAIN;
+		goto out;
+	}
+
+	if (max_insn_cnt && insn_cnt >= max_insn_cnt)
+		intel_pt_sample_insn(decoder);
+
+	if (intel_pt_insn->branch == INTEL_PT_BR_NO_BRANCH) {
+		decoder->state.type = INTEL_PT_INSTRUCTION;
+		decoder->state.from_ip = decoder->ip;
+		decoder->state.to_ip = 0;
+		decoder->ip += intel_pt_insn->length;
+		err = INTEL_PT_RETURN;
+		goto out;
+	}
+
+	if (intel_pt_insn->op == INTEL_PT_OP_CALL) {
+		/* Zero-length calls are excluded */
+		if (intel_pt_insn->branch != INTEL_PT_BR_UNCONDITIONAL ||
+		    intel_pt_insn->rel) {
+			err = intel_pt_push(&decoder->stack, decoder->ip +
+					    intel_pt_insn->length);
+			if (err)
+				goto out;
+		}
+	} else if (intel_pt_insn->op == INTEL_PT_OP_RET) {
+		decoder->ret_addr = intel_pt_pop(&decoder->stack);
+	}
+
+	if (intel_pt_insn->branch == INTEL_PT_BR_UNCONDITIONAL) {
+		int cnt = decoder->no_progress++;
+
+		decoder->state.from_ip = decoder->ip;
+		decoder->ip += intel_pt_insn->length +
+				intel_pt_insn->rel;
+		decoder->state.to_ip = decoder->ip;
+		err = INTEL_PT_RETURN;
+
+		/*
+		 * Check for being stuck in a loop.  This can happen if a
+		 * decoder error results in the decoder erroneously setting the
+		 * ip to an address that is itself in an infinite loop that
+		 * consumes no packets.  When that happens, there must be an
+		 * unconditional branch.
+		 */
+		if (cnt) {
+			if (cnt == 1) {
+				decoder->stuck_ip = decoder->state.to_ip;
+				decoder->stuck_ip_prd = 1;
+				decoder->stuck_ip_cnt = 1;
+			} else if (cnt > INTEL_PT_MAX_LOOPS ||
+				   decoder->state.to_ip == decoder->stuck_ip) {
+				intel_pt_log_at("ERROR: Never-ending loop",
+						decoder->state.to_ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+				err = -ELOOP;
+				goto out;
+			} else if (!--decoder->stuck_ip_cnt) {
+				decoder->stuck_ip_prd += 1;
+				decoder->stuck_ip_cnt = decoder->stuck_ip_prd;
+				decoder->stuck_ip = decoder->state.to_ip;
+			}
+		}
+		goto out_no_progress;
+	}
+out:
+	decoder->no_progress = 0;
+out_no_progress:
+	decoder->state.insn_op = intel_pt_insn->op;
+	decoder->state.insn_len = intel_pt_insn->length;
+
+	if (decoder->tx_flags & INTEL_PT_IN_TX)
+		decoder->state.flags |= INTEL_PT_IN_TX;
+
+	return err;
+}
+
+static int intel_pt_walk_fup(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	uint64_t ip;
+	int err;
+
+	ip = decoder->last_ip;
+
+	while (1) {
+		err = intel_pt_walk_insn(decoder, &intel_pt_insn, ip);
+		if (err == INTEL_PT_RETURN)
+			return 0;
+		if (err == -EAGAIN) {
+			if (decoder->set_fup_tx_flags) {
+				decoder->set_fup_tx_flags = false;
+				decoder->tx_flags = decoder->fup_tx_flags;
+				decoder->state.type = INTEL_PT_TRANSACTION;
+				decoder->state.from_ip = decoder->ip;
+				decoder->state.to_ip = 0;
+				decoder->state.flags = decoder->fup_tx_flags;
+				return 0;
+			}
+			return err;
+		}
+		decoder->set_fup_tx_flags = false;
+		if (err)
+			return err;
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+			intel_pt_log_at("ERROR: Unexpected indirect branch",
+					decoder->ip);
+			decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+			return -ENOENT;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+			intel_pt_log_at("ERROR: Unexpected conditional branch",
+					decoder->ip);
+			decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+			return -ENOENT;
+		}
+
+		intel_pt_bug(decoder);
+	}
+}
+
+static int intel_pt_walk_tip(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	int err;
+
+	err = intel_pt_walk_insn(decoder, &intel_pt_insn, 0);
+	if (err == INTEL_PT_RETURN)
+		return 0;
+	if (err)
+		return err;
+
+	if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+		if (decoder->pkt_state == INTEL_PT_STATE_TIP_PGD) {
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = 0;
+			if (decoder->packet.count != 0)
+				decoder->ip = decoder->last_ip;
+		} else {
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->state.from_ip = decoder->ip;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				decoder->state.to_ip = decoder->last_ip;
+				decoder->ip = decoder->last_ip;
+			}
+		}
+		return 0;
+	}
+
+	if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+		intel_pt_log_at("ERROR: Conditional branch when expecting indirect branch",
+				decoder->ip);
+		decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+		return -ENOENT;
+	}
+
+	return intel_pt_bug(decoder);
+}
+
+static int intel_pt_walk_tnt(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	int err;
+
+	while (1) {
+		err = intel_pt_walk_insn(decoder, &intel_pt_insn, 0);
+		if (err == INTEL_PT_RETURN)
+			return 0;
+		if (err)
+			return err;
+
+		if (intel_pt_insn.op == INTEL_PT_OP_RET) {
+			if (!decoder->return_compression) {
+				intel_pt_log_at("ERROR: RET when expecting conditional branch",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			if (!decoder->ret_addr) {
+				intel_pt_log_at("ERROR: Bad RET compression (stack empty)",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			if (!(decoder->tnt.payload & BIT63)) {
+				intel_pt_log_at("ERROR: Bad RET compression (TNT=N)",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			decoder->tnt.count -= 1;
+			if (!decoder->tnt.count)
+				decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->tnt.payload <<= 1;
+			decoder->state.from_ip = decoder->ip;
+			decoder->ip = decoder->ret_addr;
+			decoder->state.to_ip = decoder->ip;
+			return 0;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+			/* Handle deferred TIPs */
+			err = intel_pt_get_next_packet(decoder);
+			if (err)
+				return err;
+			if (decoder->packet.type != INTEL_PT_TIP ||
+			    decoder->packet.count == 0) {
+				intel_pt_log_at("ERROR: Missing deferred TIP for indirect branch",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				decoder->pkt_step = 0;
+				return -ENOENT;
+			}
+			intel_pt_set_last_ip(decoder);
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = decoder->last_ip;
+			decoder->ip = decoder->last_ip;
+			return 0;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+			decoder->tnt.count -= 1;
+			if (!decoder->tnt.count)
+				decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			if (decoder->tnt.payload & BIT63) {
+				decoder->tnt.payload <<= 1;
+				decoder->state.from_ip = decoder->ip;
+				decoder->ip += intel_pt_insn.length +
+					       intel_pt_insn.rel;
+				decoder->state.to_ip = decoder->ip;
+				return 0;
+			}
+			/* Instruction sample for a non-taken branch */
+			if (decoder->state.type & INTEL_PT_INSTRUCTION) {
+				decoder->tnt.payload <<= 1;
+				decoder->state.type = INTEL_PT_INSTRUCTION;
+				decoder->state.from_ip = decoder->ip;
+				decoder->state.to_ip = 0;
+				decoder->ip += intel_pt_insn.length;
+				return 0;
+			}
+			decoder->ip += intel_pt_insn.length;
+			if (!decoder->tnt.count)
+				return -EAGAIN;
+			decoder->tnt.payload <<= 1;
+			continue;
+		}
+
+		return intel_pt_bug(decoder);
+	}
+}
+
+static int intel_pt_mode_tsx(struct intel_pt_decoder *decoder, bool *no_tip)
+{
+	unsigned int fup_tx_flags;
+	int err;
+
+	fup_tx_flags = decoder->packet.payload &
+		       (INTEL_PT_IN_TX | INTEL_PT_ABORT_TX);
+	err = intel_pt_get_next_packet(decoder);
+	if (err)
+		return err;
+	if (decoder->packet.type == INTEL_PT_FUP) {
+		decoder->fup_tx_flags = fup_tx_flags;
+		decoder->set_fup_tx_flags = true;
+		if (!(decoder->fup_tx_flags & INTEL_PT_ABORT_TX))
+			*no_tip = true;
+	} else {
+		intel_pt_log_at("ERROR: Missing FUP after MODE.TSX",
+				decoder->pos);
+		intel_pt_update_in_tx(decoder);
+	}
+	return 0;
+}
+
+static void intel_pt_calc_tsc_timestamp(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp;
+
+	if (decoder->ref_timestamp) {
+		timestamp = decoder->packet.payload |
+			    (decoder->ref_timestamp & (0xffULL << 56));
+		if (timestamp < decoder->ref_timestamp) {
+			if (decoder->ref_timestamp - timestamp > (1ULL << 55))
+				timestamp += (1ULL << 56);
+		} else {
+			if (timestamp - decoder->ref_timestamp > (1ULL << 55))
+				timestamp -= (1ULL << 56);
+		}
+		decoder->tsc_timestamp = timestamp;
+		decoder->timestamp = timestamp;
+		decoder->ref_timestamp = 0;
+		decoder->timestamp_insn_cnt = 0;
+	} else if (decoder->timestamp) {
+		timestamp = decoder->packet.payload |
+			    (decoder->timestamp & (0xffULL << 56));
+		if (timestamp < decoder->timestamp &&
+		    decoder->timestamp - timestamp < 0x100) {
+			intel_pt_log_to("ERROR: Suppressing backwards timestamp",
+					timestamp);
+			timestamp = decoder->timestamp;
+		}
+		while (timestamp < decoder->timestamp) {
+			intel_pt_log_to("Wraparound timestamp", timestamp);
+			timestamp += (1ULL << 56);
+		}
+		decoder->tsc_timestamp = timestamp;
+		decoder->timestamp = timestamp;
+		decoder->timestamp_insn_cnt = 0;
+	}
+
+	intel_pt_log_to("Setting timestamp", decoder->timestamp);
+}
+
+static int intel_pt_overflow(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log("ERROR: Buffer overflow\n");
+	intel_pt_clear_tx_flags(decoder);
+	decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+	decoder->overflow = true;
+	return -EOVERFLOW;
+}
+
+/* Walk PSB+ packets when already in sync. */
+static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_PSBEND:
+			return 0;
+
+		case INTEL_PT_TIP_PGD:
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+		case INTEL_PT_TNT:
+		case INTEL_PT_BAD:
+		case INTEL_PT_PSB:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			return -EAGAIN;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_FUP:
+			decoder->pge = true;
+			intel_pt_set_last_ip(decoder);
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	if (decoder->tx_flags & INTEL_PT_ABORT_TX) {
+		decoder->tx_flags = 0;
+		decoder->state.flags &= ~INTEL_PT_IN_TX;
+		decoder->state.flags |= INTEL_PT_ABORT_TX;
+	} else {
+		decoder->state.flags |= INTEL_PT_ASYNC;
+	}
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TNT:
+		case INTEL_PT_FUP:
+		case INTEL_PT_PSB:
+		case INTEL_PT_TSC:
+		case INTEL_PT_CBR:
+		case INTEL_PT_MODE_TSX:
+		case INTEL_PT_BAD:
+		case INTEL_PT_PSBEND:
+			intel_pt_log("ERROR: Missing TIP after FUP\n");
+			decoder->pkt_state = INTEL_PT_STATE_ERR3;
+			return -ENOENT;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TIP_PGD:
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = 0;
+			if (decoder->packet.count != 0) {
+				intel_pt_set_ip(decoder);
+				intel_pt_log("Omitting PGD ip " x64_fmt "\n",
+					     decoder->ip);
+			}
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			return 0;
+
+		case INTEL_PT_TIP_PGE:
+			decoder->pge = true;
+			intel_pt_log("Omitting PGE ip " x64_fmt "\n",
+				     decoder->ip);
+			decoder->state.from_ip = 0;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				intel_pt_set_ip(decoder);
+				decoder->state.to_ip = decoder->ip;
+			}
+			return 0;
+
+		case INTEL_PT_TIP:
+			decoder->state.from_ip = decoder->ip;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				intel_pt_set_ip(decoder);
+				decoder->state.to_ip = decoder->ip;
+			}
+			return 0;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PAD:
+			break;
+
+		default:
+			return intel_pt_bug(decoder);
+		}
+	}
+}
+
+static int intel_pt_walk_trace(struct intel_pt_decoder *decoder)
+{
+	bool no_tip = false;
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+next:
+		switch (decoder->packet.type) {
+		case INTEL_PT_TNT:
+			if (!decoder->packet.count)
+				break;
+			decoder->tnt = decoder->packet;
+			decoder->pkt_state = INTEL_PT_STATE_TNT;
+			err = intel_pt_walk_tnt(decoder);
+			if (err == -EAGAIN)
+				break;
+			return err;
+
+		case INTEL_PT_TIP_PGD:
+			if (decoder->packet.count != 0)
+				intel_pt_set_last_ip(decoder);
+			decoder->pkt_state = INTEL_PT_STATE_TIP_PGD;
+			return intel_pt_walk_tip(decoder);
+
+		case INTEL_PT_TIP_PGE: {
+			decoder->pge = true;
+			if (decoder->packet.count == 0) {
+				intel_pt_log_at("Skipping zero TIP.PGE",
+						decoder->pos);
+				break;
+			}
+			intel_pt_set_ip(decoder);
+			decoder->state.from_ip = 0;
+			decoder->state.to_ip = decoder->ip;
+			return 0;
+		}
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TIP:
+			if (decoder->packet.count != 0)
+				intel_pt_set_last_ip(decoder);
+			decoder->pkt_state = INTEL_PT_STATE_TIP;
+			return intel_pt_walk_tip(decoder);
+
+		case INTEL_PT_FUP:
+			if (decoder->packet.count == 0) {
+				intel_pt_log_at("Skipping zero FUP",
+						decoder->pos);
+				no_tip = false;
+				break;
+			}
+			intel_pt_set_last_ip(decoder);
+			err = intel_pt_walk_fup(decoder);
+			if (err != -EAGAIN) {
+				if (err)
+					return err;
+				if (no_tip)
+					decoder->pkt_state =
+						INTEL_PT_STATE_FUP_NO_TIP;
+				else
+					decoder->pkt_state = INTEL_PT_STATE_FUP;
+				return 0;
+			}
+			if (no_tip) {
+				no_tip = false;
+				break;
+			}
+			return intel_pt_walk_fup_tip(decoder);
+
+		case INTEL_PT_PSB:
+			intel_pt_clear_stack(&decoder->stack);
+			err = intel_pt_walk_psbend(decoder);
+			if (err == -EAGAIN)
+				goto next;
+			if (err)
+				return err;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			/* MODE_TSX need not be followed by FUP */
+			if (!decoder->pge) {
+				intel_pt_update_in_tx(decoder);
+				break;
+			}
+			err = intel_pt_mode_tsx(decoder, &no_tip);
+			if (err)
+				return err;
+			goto next;
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_PSBEND:
+		case INTEL_PT_PAD:
+			break;
+
+		default:
+			return intel_pt_bug(decoder);
+		}
+	}
+}
+
+/* Walk PSB+ packets to get in sync. */
+static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TIP_PGD:
+			decoder->continuous_period = false;
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			return -ENOENT;
+
+		case INTEL_PT_FUP:
+			decoder->pge = true;
+			if (decoder->last_ip || decoder->packet.count == 6 ||
+			    decoder->packet.count == 0) {
+				uint64_t current_ip = decoder->ip;
+
+				intel_pt_set_ip(decoder);
+				if (current_ip)
+					intel_pt_log_to("Setting IP",
+							decoder->ip);
+			}
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_TNT:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			if (decoder->ip)
+				decoder->pkt_state = INTEL_PT_STATE_ERR4;
+			else
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+			return -ENOENT;
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_PSBEND:
+			return 0;
+
+		case INTEL_PT_PSB:
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TIP_PGD:
+			decoder->continuous_period = false;
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+			decoder->pge = decoder->packet.type != INTEL_PT_TIP_PGD;
+			if (decoder->last_ip || decoder->packet.count == 6 ||
+			    decoder->packet.count == 0)
+				intel_pt_set_ip(decoder);
+			if (decoder->ip)
+				return 0;
+			break;
+
+		case INTEL_PT_FUP:
+			if (decoder->overflow) {
+				if (decoder->last_ip ||
+				    decoder->packet.count == 6 ||
+				    decoder->packet.count == 0)
+					intel_pt_set_ip(decoder);
+				if (decoder->ip)
+					return 0;
+			}
+			if (decoder->packet.count)
+				intel_pt_set_last_ip(decoder);
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_PSB:
+			err = intel_pt_walk_psb(decoder);
+			if (err)
+				return err;
+			if (decoder->ip) {
+				/* Do not have a sample */
+				decoder->state.type = 0;
+				return 0;
+			}
+			break;
+
+		case INTEL_PT_TNT:
+		case INTEL_PT_PSBEND:
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_sync_ip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	intel_pt_log("Scanning for full IP\n");
+	err = intel_pt_walk_to_ip(decoder);
+	if (err)
+		return err;
+
+	decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+	decoder->overflow = false;
+
+	decoder->state.from_ip = 0;
+	decoder->state.to_ip = decoder->ip;
+	intel_pt_log_to("Setting IP", decoder->ip);
+
+	return 0;
+}
+
+static int intel_pt_part_psb(struct intel_pt_decoder *decoder)
+{
+	const unsigned char *end = decoder->buf + decoder->len;
+	size_t i;
+
+	for (i = INTEL_PT_PSB_LEN - 1; i; i--) {
+		if (i > decoder->len)
+			continue;
+		if (!memcmp(end - i, INTEL_PT_PSB_STR, i))
+			return i;
+	}
+	return 0;
+}
+
+static int intel_pt_rest_psb(struct intel_pt_decoder *decoder, int part_psb)
+{
+	size_t rest_psb = INTEL_PT_PSB_LEN - part_psb;
+	const char *psb = INTEL_PT_PSB_STR;
+
+	if (rest_psb > decoder->len ||
+	    memcmp(decoder->buf, psb + part_psb, rest_psb))
+		return 0;
+
+	return rest_psb;
+}
+
+static int intel_pt_get_split_psb(struct intel_pt_decoder *decoder,
+				  int part_psb)
+{
+	int rest_psb, ret;
+
+	decoder->pos += decoder->len;
+	decoder->len = 0;
+
+	ret = intel_pt_get_next_data(decoder);
+	if (ret)
+		return ret;
+
+	rest_psb = intel_pt_rest_psb(decoder, part_psb);
+	if (!rest_psb)
+		return 0;
+
+	decoder->pos -= part_psb;
+	decoder->next_buf = decoder->buf + rest_psb;
+	decoder->next_len = decoder->len - rest_psb;
+	memcpy(decoder->temp_buf, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	decoder->buf = decoder->temp_buf;
+	decoder->len = INTEL_PT_PSB_LEN;
+
+	return 0;
+}
+
+static int intel_pt_scan_for_psb(struct intel_pt_decoder *decoder)
+{
+	unsigned char *next;
+	int ret;
+
+	intel_pt_log("Scanning for PSB\n");
+	while (1) {
+		if (!decoder->len) {
+			ret = intel_pt_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		next = memmem(decoder->buf, decoder->len, INTEL_PT_PSB_STR,
+			      INTEL_PT_PSB_LEN);
+		if (!next) {
+			int part_psb;
+
+			part_psb = intel_pt_part_psb(decoder);
+			if (part_psb) {
+				ret = intel_pt_get_split_psb(decoder, part_psb);
+				if (ret)
+					return ret;
+			} else {
+				decoder->pos += decoder->len;
+				decoder->len = 0;
+			}
+			continue;
+		}
+
+		decoder->pkt_step = next - decoder->buf;
+		return intel_pt_get_next_packet(decoder);
+	}
+}
+
+static int intel_pt_sync(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	decoder->pge = false;
+	decoder->continuous_period = false;
+	decoder->last_ip = 0;
+	decoder->ip = 0;
+	intel_pt_clear_stack(&decoder->stack);
+
+	err = intel_pt_scan_for_psb(decoder);
+	if (err)
+		return err;
+
+	decoder->pkt_state = INTEL_PT_STATE_NO_IP;
+
+	err = intel_pt_walk_psb(decoder);
+	if (err)
+		return err;
+
+	if (decoder->ip) {
+		decoder->state.type = 0; /* Do not have a sample */
+		decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+	} else {
+		return intel_pt_sync_ip(decoder);
+	}
+
+	return 0;
+}
+
+static uint64_t intel_pt_est_timestamp(struct intel_pt_decoder *decoder)
+{
+	uint64_t est = decoder->timestamp_insn_cnt << 1;
+
+	if (!decoder->cbr || !decoder->max_non_turbo_ratio)
+		goto out;
+
+	est *= decoder->max_non_turbo_ratio;
+	est /= decoder->cbr;
+out:
+	return decoder->timestamp + est;
+}
+
+const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	do {
+		decoder->state.type = INTEL_PT_BRANCH;
+		decoder->state.flags = 0;
+
+		switch (decoder->pkt_state) {
+		case INTEL_PT_STATE_NO_PSB:
+			err = intel_pt_sync(decoder);
+			break;
+		case INTEL_PT_STATE_NO_IP:
+			decoder->last_ip = 0;
+			/* Fall through */
+		case INTEL_PT_STATE_ERR_RESYNC:
+			err = intel_pt_sync_ip(decoder);
+			break;
+		case INTEL_PT_STATE_IN_SYNC:
+			err = intel_pt_walk_trace(decoder);
+			break;
+		case INTEL_PT_STATE_TNT:
+			err = intel_pt_walk_tnt(decoder);
+			if (err == -EAGAIN)
+				err = intel_pt_walk_trace(decoder);
+			break;
+		case INTEL_PT_STATE_TIP:
+		case INTEL_PT_STATE_TIP_PGD:
+			err = intel_pt_walk_tip(decoder);
+			break;
+		case INTEL_PT_STATE_FUP:
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			err = intel_pt_walk_fup(decoder);
+			if (err == -EAGAIN)
+				err = intel_pt_walk_fup_tip(decoder);
+			else if (!err)
+				decoder->pkt_state = INTEL_PT_STATE_FUP;
+			break;
+		case INTEL_PT_STATE_FUP_NO_TIP:
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			err = intel_pt_walk_fup(decoder);
+			if (err == -EAGAIN)
+				err = intel_pt_walk_trace(decoder);
+			break;
+		default:
+			err = intel_pt_bug(decoder);
+			break;
+		}
+	} while (err == -ENOLINK);
+
+	decoder->state.err = err ? intel_pt_ext_err(err) : 0;
+	decoder->state.timestamp = decoder->timestamp;
+	decoder->state.est_timestamp = intel_pt_est_timestamp(decoder);
+	decoder->state.cr3 = decoder->cr3;
+
+	if (err)
+		decoder->state.from_ip = decoder->ip;
+
+	return &decoder->state;
+}
+
+static bool intel_pt_at_psb(unsigned char *buf, size_t len)
+{
+	if (len < INTEL_PT_PSB_LEN)
+		return false;
+	return memmem(buf, INTEL_PT_PSB_LEN, INTEL_PT_PSB_STR,
+		      INTEL_PT_PSB_LEN);
+}
+
+/**
+ * intel_pt_next_psb - move buffer pointer to the start of the next PSB packet.
+ * @buf: pointer to buffer pointer
+ * @len: size of buffer
+ *
+ * Updates the buffer pointer to point to the start of the next PSB packet if
+ * there is one, otherwise the buffer pointer is unchanged.  If @buf is updated,
+ * @len is adjusted accordingly.
+ *
+ * Return: %true if a PSB packet is found, %false otherwise.
+ */
+static bool intel_pt_next_psb(unsigned char **buf, size_t *len)
+{
+	unsigned char *next;
+
+	next = memmem(*buf, *len, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	if (next) {
+		*len -= next - *buf;
+		*buf = next;
+		return true;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_step_psb - move buffer pointer to the start of the following PSB
+ *                     packet.
+ * @buf: pointer to buffer pointer
+ * @len: size of buffer
+ *
+ * Updates the buffer pointer to point to the start of the following PSB packet
+ * (skipping the PSB at @buf itself) if there is one, otherwise the buffer
+ * pointer is unchanged.  If @buf is updated, @len is adjusted accordingly.
+ *
+ * Return: %true if a PSB packet is found, %false otherwise.
+ */
+static bool intel_pt_step_psb(unsigned char **buf, size_t *len)
+{
+	unsigned char *next;
+
+	if (!*len)
+		return false;
+
+	next = memmem(*buf + 1, *len - 1, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	if (next) {
+		*len -= next - *buf;
+		*buf = next;
+		return true;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_last_psb - find the last PSB packet in a buffer.
+ * @buf: buffer
+ * @len: size of buffer
+ *
+ * This function finds the last PSB in a buffer.
+ *
+ * Return: A pointer to the last PSB in @buf if found, %NULL otherwise.
+ */
+static unsigned char *intel_pt_last_psb(unsigned char *buf, size_t len)
+{
+	const char *n = INTEL_PT_PSB_STR;
+	unsigned char *p;
+	size_t k;
+
+	if (len < INTEL_PT_PSB_LEN)
+		return NULL;
+
+	k = len - INTEL_PT_PSB_LEN + 1;
+	while (1) {
+		p = memrchr(buf, n[0], k);
+		if (!p)
+			return NULL;
+		if (!memcmp(p + 1, n + 1, INTEL_PT_PSB_LEN - 1))
+			return p;
+		k = p - buf;
+		if (!k)
+			return NULL;
+	}
+}
+
+/**
+ * intel_pt_next_tsc - find and return next TSC.
+ * @buf: buffer
+ * @len: size of buffer
+ * @tsc: TSC value returned
+ *
+ * Find a TSC packet in @buf and return the TSC value.  This function assumes
+ * that @buf starts at a PSB and that PSB+ will contain TSC and so stops if a
+ * PSBEND packet is found.
+ *
+ * Return: %true if TSC is found, false otherwise.
+ */
+static bool intel_pt_next_tsc(unsigned char *buf, size_t len, uint64_t *tsc)
+{
+	struct intel_pt_pkt packet;
+	int ret;
+
+	while (len) {
+		ret = intel_pt_get_packet(buf, len, &packet);
+		if (ret <= 0)
+			return false;
+		if (packet.type == INTEL_PT_TSC) {
+			*tsc = packet.payload;
+			return true;
+		}
+		if (packet.type == INTEL_PT_PSBEND)
+			return false;
+		buf += ret;
+		len -= ret;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_tsc_cmp - compare 7-byte TSCs.
+ * @tsc1: first TSC to compare
+ * @tsc2: second TSC to compare
+ *
+ * This function compares 7-byte TSC values allowing for the possibility that
+ * TSC wrapped around.  Generally it is not possible to know if TSC has wrapped
+ * around so for that purpose this function assumes the absolute difference is
+ * less than half the maximum difference.
+ *
+ * Return: %-1 if @tsc1 is before @tsc2, %0 if @tsc1 == @tsc2, %1 if @tsc1 is
+ * after @tsc2.
+ */
+static int intel_pt_tsc_cmp(uint64_t tsc1, uint64_t tsc2)
+{
+	const uint64_t halfway = (1ULL << 55);
+
+	if (tsc1 == tsc2)
+		return 0;
+
+	if (tsc1 < tsc2) {
+		if (tsc2 - tsc1 < halfway)
+			return -1;
+		else
+			return 1;
+	} else {
+		if (tsc1 - tsc2 < halfway)
+			return 1;
+		else
+			return -1;
+	}
+}
+
+/**
+ * intel_pt_find_overlap_tsc - determine start of non-overlapped trace data
+ *                             using TSC.
+ * @buf_a: first buffer
+ * @len_a: size of first buffer
+ * @buf_b: second buffer
+ * @len_b: size of second buffer
+ *
+ * If the trace contains TSC we can look at the last TSC of @buf_a and the
+ * first TSC of @buf_b in order to determine if the buffers overlap, and then
+ * walk forward in @buf_b until a later TSC is found.  A precondition is that
+ * @buf_a and @buf_b are positioned at a PSB.
+ *
+ * Return: A pointer into @buf_b from where non-overlapped data starts, or
+ * @buf_b + @len_b if there is no non-overlapped data.
+ */
+static unsigned char *intel_pt_find_overlap_tsc(unsigned char *buf_a,
+						size_t len_a,
+						unsigned char *buf_b,
+						size_t len_b)
+{
+	uint64_t tsc_a, tsc_b;
+	unsigned char *p;
+	size_t len;
+
+	p = intel_pt_last_psb(buf_a, len_a);
+	if (!p)
+		return buf_b; /* No PSB in buf_a => no overlap */
+
+	len = len_a - (p - buf_a);
+	if (!intel_pt_next_tsc(p, len, &tsc_a)) {
+		/* The last PSB+ in buf_a is incomplete, so go back one more */
+		len_a -= len;
+		p = intel_pt_last_psb(buf_a, len_a);
+		if (!p)
+			return buf_b; /* No full PSB+ => assume no overlap */
+		len = len_a - (p - buf_a);
+		if (!intel_pt_next_tsc(p, len, &tsc_a))
+			return buf_b; /* No TSC in buf_a => assume no overlap */
+	}
+
+	while (1) {
+		/* Ignore PSB+ with no TSC */
+		if (intel_pt_next_tsc(buf_b, len_b, &tsc_b) &&
+		    intel_pt_tsc_cmp(tsc_a, tsc_b) < 0)
+			return buf_b; /* tsc_a < tsc_b => no overlap */
+
+		if (!intel_pt_step_psb(&buf_b, &len_b))
+			return buf_b + len_b; /* No PSB in buf_b => no data */
+	}
+}
+
+/**
+ * intel_pt_find_overlap - determine start of non-overlapped trace data.
+ * @buf_a: first buffer
+ * @len_a: size of first buffer
+ * @buf_b: second buffer
+ * @len_b: size of second buffer
+ * @have_tsc: can use TSC packets to detect overlap
+ *
+ * When trace samples or snapshots are recorded there is the possibility that
+ * the data overlaps.  Note that, for the purposes of decoding, data is only
+ * useful if it begins with a PSB packet.
+ *
+ * Return: A pointer into @buf_b from where non-overlapped data starts, or
+ * @buf_b + @len_b if there is no non-overlapped data.
+ */
+unsigned char *intel_pt_find_overlap(unsigned char *buf_a, size_t len_a,
+				     unsigned char *buf_b, size_t len_b,
+				     bool have_tsc)
+{
+	unsigned char *found;
+
+	/* Buffer 'b' must start at PSB so throw away everything before that */
+	if (!intel_pt_next_psb(&buf_b, &len_b))
+		return buf_b + len_b; /* No PSB */
+
+	if (!intel_pt_next_psb(&buf_a, &len_a))
+		return buf_b; /* No overlap */
+
+	if (have_tsc) {
+		found = intel_pt_find_overlap_tsc(buf_a, len_a, buf_b, len_b);
+		if (found)
+			return found;
+	}
+
+	/*
+	 * Buffer 'b' cannot end within buffer 'a' so, for comparison purposes,
+	 * we can ignore the first part of buffer 'a'.
+	 */
+	while (len_b < len_a) {
+		if (!intel_pt_step_psb(&buf_a, &len_a))
+			return buf_b; /* No overlap */
+	}
+
+	/* Now len_b >= len_a */
+	if (len_b > len_a) {
+		/* The leftover buffer 'b' must start at a PSB */
+		while (!intel_pt_at_psb(buf_b + len_a, len_b - len_a)) {
+			if (!intel_pt_step_psb(&buf_a, &len_a))
+				return buf_b; /* No overlap */
+		}
+	}
+
+	while (1) {
+		/* Potential overlap so check the bytes */
+		found = memmem(buf_a, len_a, buf_b, len_a);
+		if (found)
+			return buf_b + len_a;
+
+		/* Try again at next PSB in buffer 'a' */
+		if (!intel_pt_step_psb(&buf_a, &len_a))
+			return buf_b; /* No overlap */
+
+		/* The leftover buffer 'b' must start at a PSB */
+		while (!intel_pt_at_psb(buf_b + len_a, len_b - len_a)) {
+			if (!intel_pt_step_psb(&buf_a, &len_a))
+				return buf_b; /* No overlap */
+		}
+	}
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
new file mode 100644
index 0000000..4c48802
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -0,0 +1,104 @@
+/*
+ * intel_pt_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_DECODER_H__
+#define INCLUDE__INTEL_PT_DECODER_H__
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+
+#include "intel-pt-insn-decoder.h"
+
+#define INTEL_PT_IN_TX		(1 << 0)
+#define INTEL_PT_ABORT_TX	(1 << 1)
+#define INTEL_PT_ASYNC		(1 << 2)
+
+enum intel_pt_sample_type {
+	INTEL_PT_BRANCH		= 1 << 0,
+	INTEL_PT_INSTRUCTION	= 1 << 1,
+	INTEL_PT_TRANSACTION	= 1 << 2,
+};
+
+enum intel_pt_period_type {
+	INTEL_PT_PERIOD_NONE,
+	INTEL_PT_PERIOD_INSTRUCTIONS,
+	INTEL_PT_PERIOD_TICKS,
+};
+
+enum {
+	INTEL_PT_ERR_NOMEM = 1,
+	INTEL_PT_ERR_INTERN,
+	INTEL_PT_ERR_BADPKT,
+	INTEL_PT_ERR_NODATA,
+	INTEL_PT_ERR_NOINSN,
+	INTEL_PT_ERR_MISMAT,
+	INTEL_PT_ERR_OVR,
+	INTEL_PT_ERR_LOST,
+	INTEL_PT_ERR_UNK,
+	INTEL_PT_ERR_NELOOP,
+	INTEL_PT_ERR_MAX,
+};
+
+struct intel_pt_state {
+	enum intel_pt_sample_type type;
+	int err;
+	uint64_t from_ip;
+	uint64_t to_ip;
+	uint64_t cr3;
+	uint64_t timestamp;
+	uint64_t est_timestamp;
+	uint64_t trace_nr;
+	uint32_t flags;
+	enum intel_pt_insn_op insn_op;
+	int insn_len;
+};
+
+struct intel_pt_insn;
+
+struct intel_pt_buffer {
+	const unsigned char *buf;
+	size_t len;
+	bool consecutive;
+	uint64_t ref_timestamp;
+	uint64_t trace_nr;
+};
+
+struct intel_pt_params {
+	int (*get_trace)(struct intel_pt_buffer *buffer, void *data);
+	int (*walk_insn)(struct intel_pt_insn *intel_pt_insn,
+			 uint64_t *insn_cnt_ptr, uint64_t *ip, uint64_t to_ip,
+			 uint64_t max_insn_cnt, void *data);
+	void *data;
+	bool return_compression;
+	uint64_t period;
+	enum intel_pt_period_type period_type;
+	unsigned max_non_turbo_ratio;
+};
+
+struct intel_pt_decoder;
+
+struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params);
+void intel_pt_decoder_free(struct intel_pt_decoder *decoder);
+
+const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder);
+
+unsigned char *intel_pt_find_overlap(unsigned char *buf_a, size_t len_a,
+				     unsigned char *buf_b, size_t len_b,
+				     bool have_tsc);
+
+int intel_pt__strerror(int code, char *buf, size_t buflen);
+
+#endif

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add Intel PT support
  2015-07-17 16:33 ` [PATCH V8 06/25] perf tools: Add Intel PT support Adrian Hunter
@ 2015-08-20  9:59   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-20  9:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, adrian.hunter, acme, linux-kernel, jolsa, hpa, tglx

Commit-ID:  90e457f7be0870052724b2d9c2c106e5847f2c19
Gitweb:     http://git.kernel.org/tip/90e457f7be0870052724b2d9c2c106e5847f2c19
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:41 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 17 Aug 2015 11:11:36 -0300

perf tools: Add Intel PT support

Add support for Intel Processor Trace.

Intel PT support fits within the new auxtrace infrastructure.  Recording
is supporting by identifying the Intel PT PMU, parsing options and
setting up events.

Decoding is supported by queuing up trace data by cpu or thread and then
decoding synchronously delivering synthesized event samples into the
session processing for tools to consume.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-7-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/x86/util/Build      |    2 +
 tools/perf/arch/x86/util/intel-pt.c |  752 ++++++++++++++
 tools/perf/util/Build               |    1 +
 tools/perf/util/intel-pt.c          | 1911 +++++++++++++++++++++++++++++++++++
 tools/perf/util/intel-pt.h          |   51 +
 5 files changed, 2717 insertions(+)

diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index cfbccc4..1396088 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -6,3 +6,5 @@ libperf-$(CONFIG_DWARF) += dwarf-regs.o
 
 libperf-$(CONFIG_LIBUNWIND)          += unwind-libunwind.o
 libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
+
+libperf-$(CONFIG_AUXTRACE) += intel-pt.o
diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
new file mode 100644
index 0000000..da7d2c1
--- /dev/null
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -0,0 +1,752 @@
+/*
+ * intel_pt.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdbool.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "../../perf.h"
+#include "../../util/session.h"
+#include "../../util/event.h"
+#include "../../util/evlist.h"
+#include "../../util/evsel.h"
+#include "../../util/cpumap.h"
+#include "../../util/parse-options.h"
+#include "../../util/parse-events.h"
+#include "../../util/pmu.h"
+#include "../../util/debug.h"
+#include "../../util/auxtrace.h"
+#include "../../util/tsc.h"
+#include "../../util/intel-pt.h"
+
+#define KiB(x) ((x) * 1024)
+#define MiB(x) ((x) * 1024 * 1024)
+#define KiB_MASK(x) (KiB(x) - 1)
+#define MiB_MASK(x) (MiB(x) - 1)
+
+#define INTEL_PT_DEFAULT_SAMPLE_SIZE	KiB(4)
+
+#define INTEL_PT_MAX_SAMPLE_SIZE	KiB(60)
+
+#define INTEL_PT_PSB_PERIOD_NEAR	256
+
+struct intel_pt_snapshot_ref {
+	void *ref_buf;
+	size_t ref_offset;
+	bool wrapped;
+};
+
+struct intel_pt_recording {
+	struct auxtrace_record		itr;
+	struct perf_pmu			*intel_pt_pmu;
+	int				have_sched_switch;
+	struct perf_evlist		*evlist;
+	bool				snapshot_mode;
+	bool				snapshot_init_done;
+	size_t				snapshot_size;
+	size_t				snapshot_ref_buf_size;
+	int				snapshot_ref_cnt;
+	struct intel_pt_snapshot_ref	*snapshot_refs;
+};
+
+static int intel_pt_parse_terms_with_default(struct list_head *formats,
+					     const char *str,
+					     u64 *config)
+{
+	struct list_head *terms;
+	struct perf_event_attr attr = { .size = 0, };
+	int err;
+
+	terms = malloc(sizeof(struct list_head));
+	if (!terms)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(terms);
+
+	err = parse_events_terms(terms, str);
+	if (err)
+		goto out_free;
+
+	attr.config = *config;
+	err = perf_pmu__config_terms(formats, &attr, terms, true, NULL);
+	if (err)
+		goto out_free;
+
+	*config = attr.config;
+out_free:
+	parse_events__free_terms(terms);
+	return err;
+}
+
+static int intel_pt_parse_terms(struct list_head *formats, const char *str,
+				u64 *config)
+{
+	*config = 0;
+	return intel_pt_parse_terms_with_default(formats, str, config);
+}
+
+static size_t intel_pt_psb_period(struct perf_pmu *intel_pt_pmu __maybe_unused,
+				  struct perf_evlist *evlist __maybe_unused)
+{
+	return 256;
+}
+
+static u64 intel_pt_default_config(struct perf_pmu *intel_pt_pmu)
+{
+	u64 config;
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &config);
+	return config;
+}
+
+static int intel_pt_parse_snapshot_options(struct auxtrace_record *itr,
+					   struct record_opts *opts,
+					   const char *str)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	unsigned long long snapshot_size = 0;
+	char *endptr;
+
+	if (str) {
+		snapshot_size = strtoull(str, &endptr, 0);
+		if (*endptr || snapshot_size > SIZE_MAX)
+			return -1;
+	}
+
+	opts->auxtrace_snapshot_mode = true;
+	opts->auxtrace_snapshot_size = snapshot_size;
+
+	ptr->snapshot_size = snapshot_size;
+
+	return 0;
+}
+
+struct perf_event_attr *
+intel_pt_pmu_default_config(struct perf_pmu *intel_pt_pmu)
+{
+	struct perf_event_attr *attr;
+
+	attr = zalloc(sizeof(struct perf_event_attr));
+	if (!attr)
+		return NULL;
+
+	attr->config = intel_pt_default_config(intel_pt_pmu);
+
+	intel_pt_pmu->selectable = true;
+
+	return attr;
+}
+
+static size_t intel_pt_info_priv_size(struct auxtrace_record *itr __maybe_unused)
+{
+	return INTEL_PT_AUXTRACE_PRIV_SIZE;
+}
+
+static int intel_pt_info_fill(struct auxtrace_record *itr,
+			      struct perf_session *session,
+			      struct auxtrace_info_event *auxtrace_info,
+			      size_t priv_size)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
+	struct perf_event_mmap_page *pc;
+	struct perf_tsc_conversion tc = { .time_mult = 0, };
+	bool cap_user_time_zero = false, per_cpu_mmaps;
+	u64 tsc_bit, noretcomp_bit;
+	int err;
+
+	if (priv_size != INTEL_PT_AUXTRACE_PRIV_SIZE)
+		return -EINVAL;
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
+	intel_pt_parse_terms(&intel_pt_pmu->format, "noretcomp",
+			     &noretcomp_bit);
+
+	if (!session->evlist->nr_mmaps)
+		return -EINVAL;
+
+	pc = session->evlist->mmap[0].base;
+	if (pc) {
+		err = perf_read_tsc_conversion(pc, &tc);
+		if (err) {
+			if (err != -EOPNOTSUPP)
+				return err;
+		} else {
+			cap_user_time_zero = tc.time_mult != 0;
+		}
+		if (!cap_user_time_zero)
+			ui__warning("Intel Processor Trace: TSC not available\n");
+	}
+
+	per_cpu_mmaps = !cpu_map__empty(session->evlist->cpus);
+
+	auxtrace_info->type = PERF_AUXTRACE_INTEL_PT;
+	auxtrace_info->priv[INTEL_PT_PMU_TYPE] = intel_pt_pmu->type;
+	auxtrace_info->priv[INTEL_PT_TIME_SHIFT] = tc.time_shift;
+	auxtrace_info->priv[INTEL_PT_TIME_MULT] = tc.time_mult;
+	auxtrace_info->priv[INTEL_PT_TIME_ZERO] = tc.time_zero;
+	auxtrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO] = cap_user_time_zero;
+	auxtrace_info->priv[INTEL_PT_TSC_BIT] = tsc_bit;
+	auxtrace_info->priv[INTEL_PT_NORETCOMP_BIT] = noretcomp_bit;
+	auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH] = ptr->have_sched_switch;
+	auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE] = ptr->snapshot_mode;
+	auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS] = per_cpu_mmaps;
+
+	return 0;
+}
+
+static int intel_pt_track_switches(struct perf_evlist *evlist)
+{
+	const char *sched_switch = "sched:sched_switch";
+	struct perf_evsel *evsel;
+	int err;
+
+	if (!perf_evlist__can_select_event(evlist, sched_switch))
+		return -EPERM;
+
+	err = parse_events(evlist, sched_switch, NULL);
+	if (err) {
+		pr_debug2("%s: failed to parse %s, error %d\n",
+			  __func__, sched_switch, err);
+		return err;
+	}
+
+	evsel = perf_evlist__last(evlist);
+
+	perf_evsel__set_sample_bit(evsel, CPU);
+	perf_evsel__set_sample_bit(evsel, TIME);
+
+	evsel->system_wide = true;
+	evsel->no_aux_samples = true;
+	evsel->immediate = true;
+
+	return 0;
+}
+
+static int intel_pt_recording_options(struct auxtrace_record *itr,
+				      struct perf_evlist *evlist,
+				      struct record_opts *opts)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
+	bool have_timing_info;
+	struct perf_evsel *evsel, *intel_pt_evsel = NULL;
+	const struct cpu_map *cpus = evlist->cpus;
+	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
+	u64 tsc_bit;
+
+	ptr->evlist = evlist;
+	ptr->snapshot_mode = opts->auxtrace_snapshot_mode;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == intel_pt_pmu->type) {
+			if (intel_pt_evsel) {
+				pr_err("There may be only one " INTEL_PT_PMU_NAME " event\n");
+				return -EINVAL;
+			}
+			evsel->attr.freq = 0;
+			evsel->attr.sample_period = 1;
+			intel_pt_evsel = evsel;
+			opts->full_auxtrace = true;
+		}
+	}
+
+	if (opts->auxtrace_snapshot_mode && !opts->full_auxtrace) {
+		pr_err("Snapshot mode (-S option) requires " INTEL_PT_PMU_NAME " PMU event (-e " INTEL_PT_PMU_NAME ")\n");
+		return -EINVAL;
+	}
+
+	if (opts->use_clockid) {
+		pr_err("Cannot use clockid (-k option) with " INTEL_PT_PMU_NAME "\n");
+		return -EINVAL;
+	}
+
+	if (!opts->full_auxtrace)
+		return 0;
+
+	/* Set default sizes for snapshot mode */
+	if (opts->auxtrace_snapshot_mode) {
+		size_t psb_period = intel_pt_psb_period(intel_pt_pmu, evlist);
+
+		if (!opts->auxtrace_snapshot_size && !opts->auxtrace_mmap_pages) {
+			if (privileged) {
+				opts->auxtrace_mmap_pages = MiB(4) / page_size;
+			} else {
+				opts->auxtrace_mmap_pages = KiB(128) / page_size;
+				if (opts->mmap_pages == UINT_MAX)
+					opts->mmap_pages = KiB(256) / page_size;
+			}
+		} else if (!opts->auxtrace_mmap_pages && !privileged &&
+			   opts->mmap_pages == UINT_MAX) {
+			opts->mmap_pages = KiB(256) / page_size;
+		}
+		if (!opts->auxtrace_snapshot_size)
+			opts->auxtrace_snapshot_size =
+				opts->auxtrace_mmap_pages * (size_t)page_size;
+		if (!opts->auxtrace_mmap_pages) {
+			size_t sz = opts->auxtrace_snapshot_size;
+
+			sz = round_up(sz, page_size) / page_size;
+			opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
+		}
+		if (opts->auxtrace_snapshot_size >
+				opts->auxtrace_mmap_pages * (size_t)page_size) {
+			pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
+			       opts->auxtrace_snapshot_size,
+			       opts->auxtrace_mmap_pages * (size_t)page_size);
+			return -EINVAL;
+		}
+		if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages) {
+			pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
+			return -EINVAL;
+		}
+		pr_debug2("Intel PT snapshot size: %zu\n",
+			  opts->auxtrace_snapshot_size);
+		if (psb_period &&
+		    opts->auxtrace_snapshot_size <= psb_period +
+						  INTEL_PT_PSB_PERIOD_NEAR)
+			ui__warning("Intel PT snapshot size (%zu) may be too small for PSB period (%zu)\n",
+				    opts->auxtrace_snapshot_size, psb_period);
+	}
+
+	/* Set default sizes for full trace mode */
+	if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) {
+		if (privileged) {
+			opts->auxtrace_mmap_pages = MiB(4) / page_size;
+		} else {
+			opts->auxtrace_mmap_pages = KiB(128) / page_size;
+			if (opts->mmap_pages == UINT_MAX)
+				opts->mmap_pages = KiB(256) / page_size;
+		}
+	}
+
+	/* Validate auxtrace_mmap_pages */
+	if (opts->auxtrace_mmap_pages) {
+		size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
+		size_t min_sz;
+
+		if (opts->auxtrace_snapshot_mode)
+			min_sz = KiB(4);
+		else
+			min_sz = KiB(8);
+
+		if (sz < min_sz || !is_power_of_2(sz)) {
+			pr_err("Invalid mmap size for Intel Processor Trace: must be at least %zuKiB and a power of 2\n",
+			       min_sz / 1024);
+			return -EINVAL;
+		}
+	}
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
+
+	if (opts->full_auxtrace && (intel_pt_evsel->attr.config & tsc_bit))
+		have_timing_info = true;
+	else
+		have_timing_info = false;
+
+	/*
+	 * Per-cpu recording needs sched_switch events to distinguish different
+	 * threads.
+	 */
+	if (have_timing_info && !cpu_map__empty(cpus)) {
+		int err;
+
+		err = intel_pt_track_switches(evlist);
+		if (err == -EPERM)
+			pr_debug2("Unable to select sched:sched_switch\n");
+		else if (err)
+			return err;
+		else
+			ptr->have_sched_switch = 1;
+	}
+
+	if (intel_pt_evsel) {
+		/*
+		 * To obtain the auxtrace buffer file descriptor, the auxtrace
+		 * event must come first.
+		 */
+		perf_evlist__to_front(evlist, intel_pt_evsel);
+		/*
+		 * In the case of per-cpu mmaps, we need the CPU on the
+		 * AUX event.
+		 */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(intel_pt_evsel, CPU);
+	}
+
+	/* Add dummy event to keep tracking */
+	if (opts->full_auxtrace) {
+		struct perf_evsel *tracking_evsel;
+		int err;
+
+		err = parse_events(evlist, "dummy:u", NULL);
+		if (err)
+			return err;
+
+		tracking_evsel = perf_evlist__last(evlist);
+
+		perf_evlist__set_tracking_event(evlist, tracking_evsel);
+
+		tracking_evsel->attr.freq = 0;
+		tracking_evsel->attr.sample_period = 1;
+
+		/* In per-cpu case, always need the time of mmap events etc */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(tracking_evsel, TIME);
+	}
+
+	/*
+	 * Warn the user when we do not have enough information to decode i.e.
+	 * per-cpu with no sched_switch (except workload-only).
+	 */
+	if (!ptr->have_sched_switch && !cpu_map__empty(cpus) &&
+	    !target__none(&opts->target))
+		ui__warning("Intel Processor Trace decoding will not be possible except for kernel tracing!\n");
+
+	return 0;
+}
+
+static int intel_pt_snapshot_start(struct auxtrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(ptr->evlist, evsel) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__disable_event(ptr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_pt_snapshot_finish(struct auxtrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(ptr->evlist, evsel) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__enable_event(ptr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_pt_alloc_snapshot_refs(struct intel_pt_recording *ptr, int idx)
+{
+	const size_t sz = sizeof(struct intel_pt_snapshot_ref);
+	int cnt = ptr->snapshot_ref_cnt, new_cnt = cnt * 2;
+	struct intel_pt_snapshot_ref *refs;
+
+	if (!new_cnt)
+		new_cnt = 16;
+
+	while (new_cnt <= idx)
+		new_cnt *= 2;
+
+	refs = calloc(new_cnt, sz);
+	if (!refs)
+		return -ENOMEM;
+
+	memcpy(refs, ptr->snapshot_refs, cnt * sz);
+
+	ptr->snapshot_refs = refs;
+	ptr->snapshot_ref_cnt = new_cnt;
+
+	return 0;
+}
+
+static void intel_pt_free_snapshot_refs(struct intel_pt_recording *ptr)
+{
+	int i;
+
+	for (i = 0; i < ptr->snapshot_ref_cnt; i++)
+		zfree(&ptr->snapshot_refs[i].ref_buf);
+	zfree(&ptr->snapshot_refs);
+}
+
+static void intel_pt_recording_free(struct auxtrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+
+	intel_pt_free_snapshot_refs(ptr);
+	free(ptr);
+}
+
+static int intel_pt_alloc_snapshot_ref(struct intel_pt_recording *ptr, int idx,
+				       size_t snapshot_buf_size)
+{
+	size_t ref_buf_size = ptr->snapshot_ref_buf_size;
+	void *ref_buf;
+
+	ref_buf = zalloc(ref_buf_size);
+	if (!ref_buf)
+		return -ENOMEM;
+
+	ptr->snapshot_refs[idx].ref_buf = ref_buf;
+	ptr->snapshot_refs[idx].ref_offset = snapshot_buf_size - ref_buf_size;
+
+	return 0;
+}
+
+static size_t intel_pt_snapshot_ref_buf_size(struct intel_pt_recording *ptr,
+					     size_t snapshot_buf_size)
+{
+	const size_t max_size = 256 * 1024;
+	size_t buf_size = 0, psb_period;
+
+	if (ptr->snapshot_size <= 64 * 1024)
+		return 0;
+
+	psb_period = intel_pt_psb_period(ptr->intel_pt_pmu, ptr->evlist);
+	if (psb_period)
+		buf_size = psb_period * 2;
+
+	if (!buf_size || buf_size > max_size)
+		buf_size = max_size;
+
+	if (buf_size >= snapshot_buf_size)
+		return 0;
+
+	if (buf_size >= ptr->snapshot_size / 2)
+		return 0;
+
+	return buf_size;
+}
+
+static int intel_pt_snapshot_init(struct intel_pt_recording *ptr,
+				  size_t snapshot_buf_size)
+{
+	if (ptr->snapshot_init_done)
+		return 0;
+
+	ptr->snapshot_init_done = true;
+
+	ptr->snapshot_ref_buf_size = intel_pt_snapshot_ref_buf_size(ptr,
+							snapshot_buf_size);
+
+	return 0;
+}
+
+/**
+ * intel_pt_compare_buffers - compare bytes in a buffer to a circular buffer.
+ * @buf1: first buffer
+ * @compare_size: number of bytes to compare
+ * @buf2: second buffer (a circular buffer)
+ * @offs2: offset in second buffer
+ * @buf2_size: size of second buffer
+ *
+ * The comparison allows for the possibility that the bytes to compare in the
+ * circular buffer are not contiguous.  It is assumed that @compare_size <=
+ * @buf2_size.  This function returns %false if the bytes are identical, %true
+ * otherwise.
+ */
+static bool intel_pt_compare_buffers(void *buf1, size_t compare_size,
+				     void *buf2, size_t offs2, size_t buf2_size)
+{
+	size_t end2 = offs2 + compare_size, part_size;
+
+	if (end2 <= buf2_size)
+		return memcmp(buf1, buf2 + offs2, compare_size);
+
+	part_size = end2 - buf2_size;
+	if (memcmp(buf1, buf2 + offs2, part_size))
+		return true;
+
+	compare_size -= part_size;
+
+	return memcmp(buf1 + part_size, buf2, compare_size);
+}
+
+static bool intel_pt_compare_ref(void *ref_buf, size_t ref_offset,
+				 size_t ref_size, size_t buf_size,
+				 void *data, size_t head)
+{
+	size_t ref_end = ref_offset + ref_size;
+
+	if (ref_end > buf_size) {
+		if (head > ref_offset || head < ref_end - buf_size)
+			return true;
+	} else if (head > ref_offset && head < ref_end) {
+		return true;
+	}
+
+	return intel_pt_compare_buffers(ref_buf, ref_size, data, ref_offset,
+					buf_size);
+}
+
+static void intel_pt_copy_ref(void *ref_buf, size_t ref_size, size_t buf_size,
+			      void *data, size_t head)
+{
+	if (head >= ref_size) {
+		memcpy(ref_buf, data + head - ref_size, ref_size);
+	} else {
+		memcpy(ref_buf, data, head);
+		ref_size -= head;
+		memcpy(ref_buf + head, data + buf_size - ref_size, ref_size);
+	}
+}
+
+static bool intel_pt_wrapped(struct intel_pt_recording *ptr, int idx,
+			     struct auxtrace_mmap *mm, unsigned char *data,
+			     u64 head)
+{
+	struct intel_pt_snapshot_ref *ref = &ptr->snapshot_refs[idx];
+	bool wrapped;
+
+	wrapped = intel_pt_compare_ref(ref->ref_buf, ref->ref_offset,
+				       ptr->snapshot_ref_buf_size, mm->len,
+				       data, head);
+
+	intel_pt_copy_ref(ref->ref_buf, ptr->snapshot_ref_buf_size, mm->len,
+			  data, head);
+
+	return wrapped;
+}
+
+static bool intel_pt_first_wrap(u64 *data, size_t buf_size)
+{
+	int i, a, b;
+
+	b = buf_size >> 3;
+	a = b - 512;
+	if (a < 0)
+		a = 0;
+
+	for (i = a; i < b; i++) {
+		if (data[i])
+			return true;
+	}
+
+	return false;
+}
+
+static int intel_pt_find_snapshot(struct auxtrace_record *itr, int idx,
+				  struct auxtrace_mmap *mm, unsigned char *data,
+				  u64 *head, u64 *old)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	bool wrapped;
+	int err;
+
+	pr_debug3("%s: mmap index %d old head %zu new head %zu\n",
+		  __func__, idx, (size_t)*old, (size_t)*head);
+
+	err = intel_pt_snapshot_init(ptr, mm->len);
+	if (err)
+		goto out_err;
+
+	if (idx >= ptr->snapshot_ref_cnt) {
+		err = intel_pt_alloc_snapshot_refs(ptr, idx);
+		if (err)
+			goto out_err;
+	}
+
+	if (ptr->snapshot_ref_buf_size) {
+		if (!ptr->snapshot_refs[idx].ref_buf) {
+			err = intel_pt_alloc_snapshot_ref(ptr, idx, mm->len);
+			if (err)
+				goto out_err;
+		}
+		wrapped = intel_pt_wrapped(ptr, idx, mm, data, *head);
+	} else {
+		wrapped = ptr->snapshot_refs[idx].wrapped;
+		if (!wrapped && intel_pt_first_wrap((u64 *)data, mm->len)) {
+			ptr->snapshot_refs[idx].wrapped = true;
+			wrapped = true;
+		}
+	}
+
+	/*
+	 * In full trace mode 'head' continually increases.  However in snapshot
+	 * mode 'head' is an offset within the buffer.  Here 'old' and 'head'
+	 * are adjusted to match the full trace case which expects that 'old' is
+	 * always less than 'head'.
+	 */
+	if (wrapped) {
+		*old = *head;
+		*head += mm->len;
+	} else {
+		if (mm->mask)
+			*old &= mm->mask;
+		else
+			*old %= mm->len;
+		if (*old > *head)
+			*head += mm->len;
+	}
+
+	pr_debug3("%s: wrap-around %sdetected, adjusted old head %zu adjusted new head %zu\n",
+		  __func__, wrapped ? "" : "not ", (size_t)*old, (size_t)*head);
+
+	return 0;
+
+out_err:
+	pr_err("%s: failed, error %d\n", __func__, err);
+	return err;
+}
+
+static u64 intel_pt_reference(struct auxtrace_record *itr __maybe_unused)
+{
+	return rdtsc();
+}
+
+static int intel_pt_read_finish(struct auxtrace_record *itr, int idx)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(ptr->evlist, evsel) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__enable_event_idx(ptr->evlist, evsel,
+							     idx);
+	}
+	return -EINVAL;
+}
+
+struct auxtrace_record *intel_pt_recording_init(int *err)
+{
+	struct perf_pmu *intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
+	struct intel_pt_recording *ptr;
+
+	if (!intel_pt_pmu)
+		return NULL;
+
+	ptr = zalloc(sizeof(struct intel_pt_recording));
+	if (!ptr) {
+		*err = -ENOMEM;
+		return NULL;
+	}
+
+	ptr->intel_pt_pmu = intel_pt_pmu;
+	ptr->itr.recording_options = intel_pt_recording_options;
+	ptr->itr.info_priv_size = intel_pt_info_priv_size;
+	ptr->itr.info_fill = intel_pt_info_fill;
+	ptr->itr.free = intel_pt_recording_free;
+	ptr->itr.snapshot_start = intel_pt_snapshot_start;
+	ptr->itr.snapshot_finish = intel_pt_snapshot_finish;
+	ptr->itr.find_snapshot = intel_pt_find_snapshot;
+	ptr->itr.parse_snapshot_options = intel_pt_parse_snapshot_options;
+	ptr->itr.reference = intel_pt_reference;
+	ptr->itr.read_finish = intel_pt_read_finish;
+	return &ptr->itr;
+}
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 615ca12..c20473d 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -79,6 +79,7 @@ libperf-y += cloexec.o
 libperf-y += thread-stack.o
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
+libperf-$(CONFIG_AUXTRACE) += intel-pt.o
 libperf-y += parse-branch-options.o
 
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
new file mode 100644
index 0000000..2a4a412
--- /dev/null
+++ b/tools/perf/util/intel-pt.c
@@ -0,0 +1,1911 @@
+/*
+ * intel_pt.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+
+#include "../perf.h"
+#include "session.h"
+#include "machine.h"
+#include "tool.h"
+#include "event.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "map.h"
+#include "color.h"
+#include "util.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "symbol.h"
+#include "callchain.h"
+#include "dso.h"
+#include "debug.h"
+#include "auxtrace.h"
+#include "tsc.h"
+#include "intel-pt.h"
+
+#include "intel-pt-decoder/intel-pt-log.h"
+#include "intel-pt-decoder/intel-pt-decoder.h"
+#include "intel-pt-decoder/intel-pt-insn-decoder.h"
+#include "intel-pt-decoder/intel-pt-pkt-decoder.h"
+
+#define MAX_TIMESTAMP (~0ULL)
+
+struct intel_pt {
+	struct auxtrace auxtrace;
+	struct auxtrace_queues queues;
+	struct auxtrace_heap heap;
+	u32 auxtrace_type;
+	struct perf_session *session;
+	struct machine *machine;
+	struct perf_evsel *switch_evsel;
+	struct thread *unknown_thread;
+	bool timeless_decoding;
+	bool sampling_mode;
+	bool snapshot_mode;
+	bool per_cpu_mmaps;
+	bool have_tsc;
+	bool data_queued;
+	bool est_tsc;
+	bool sync_switch;
+	int have_sched_switch;
+	u32 pmu_type;
+	u64 kernel_start;
+	u64 switch_ip;
+	u64 ptss_ip;
+
+	struct perf_tsc_conversion tc;
+	bool cap_user_time_zero;
+
+	struct itrace_synth_opts synth_opts;
+
+	bool sample_instructions;
+	u64 instructions_sample_type;
+	u64 instructions_sample_period;
+	u64 instructions_id;
+
+	bool sample_branches;
+	u32 branches_filter;
+	u64 branches_sample_type;
+	u64 branches_id;
+
+	bool sample_transactions;
+	u64 transactions_sample_type;
+	u64 transactions_id;
+
+	bool synth_needs_swap;
+
+	u64 tsc_bit;
+	u64 noretcomp_bit;
+	unsigned max_non_turbo_ratio;
+};
+
+enum switch_state {
+	INTEL_PT_SS_NOT_TRACING,
+	INTEL_PT_SS_UNKNOWN,
+	INTEL_PT_SS_TRACING,
+	INTEL_PT_SS_EXPECTING_SWITCH_EVENT,
+	INTEL_PT_SS_EXPECTING_SWITCH_IP,
+};
+
+struct intel_pt_queue {
+	struct intel_pt *pt;
+	unsigned int queue_nr;
+	struct auxtrace_buffer *buffer;
+	void *decoder;
+	const struct intel_pt_state *state;
+	struct ip_callchain *chain;
+	union perf_event *event_buf;
+	bool on_heap;
+	bool stop;
+	bool step_through_buffers;
+	bool use_buffer_pid_tid;
+	pid_t pid, tid;
+	int cpu;
+	int switch_state;
+	pid_t next_tid;
+	struct thread *thread;
+	bool exclude_kernel;
+	bool have_sample;
+	u64 time;
+	u64 timestamp;
+	u32 flags;
+	u16 insn_len;
+};
+
+static void intel_pt_dump(struct intel_pt *pt __maybe_unused,
+			  unsigned char *buf, size_t len)
+{
+	struct intel_pt_pkt packet;
+	size_t pos = 0;
+	int ret, pkt_len, i;
+	char desc[INTEL_PT_PKT_DESC_MAX];
+	const char *color = PERF_COLOR_BLUE;
+
+	color_fprintf(stdout, color,
+		      ". ... Intel Processor Trace data: size %zu bytes\n",
+		      len);
+
+	while (len) {
+		ret = intel_pt_get_packet(buf, len, &packet);
+		if (ret > 0)
+			pkt_len = ret;
+		else
+			pkt_len = 1;
+		printf(".");
+		color_fprintf(stdout, color, "  %08x: ", pos);
+		for (i = 0; i < pkt_len; i++)
+			color_fprintf(stdout, color, " %02x", buf[i]);
+		for (; i < 16; i++)
+			color_fprintf(stdout, color, "   ");
+		if (ret > 0) {
+			ret = intel_pt_pkt_desc(&packet, desc,
+						INTEL_PT_PKT_DESC_MAX);
+			if (ret > 0)
+				color_fprintf(stdout, color, " %s\n", desc);
+		} else {
+			color_fprintf(stdout, color, " Bad packet!\n");
+		}
+		pos += pkt_len;
+		buf += pkt_len;
+		len -= pkt_len;
+	}
+}
+
+static void intel_pt_dump_event(struct intel_pt *pt, unsigned char *buf,
+				size_t len)
+{
+	printf(".\n");
+	intel_pt_dump(pt, buf, len);
+}
+
+static int intel_pt_do_fix_overlap(struct intel_pt *pt, struct auxtrace_buffer *a,
+				   struct auxtrace_buffer *b)
+{
+	void *start;
+
+	start = intel_pt_find_overlap(a->data, a->size, b->data, b->size,
+				      pt->have_tsc);
+	if (!start)
+		return -EINVAL;
+	b->use_size = b->data + b->size - start;
+	b->use_data = start;
+	return 0;
+}
+
+static void intel_pt_use_buffer_pid_tid(struct intel_pt_queue *ptq,
+					struct auxtrace_queue *queue,
+					struct auxtrace_buffer *buffer)
+{
+	if (queue->cpu == -1 && buffer->cpu != -1)
+		ptq->cpu = buffer->cpu;
+
+	ptq->pid = buffer->pid;
+	ptq->tid = buffer->tid;
+
+	intel_pt_log("queue %u cpu %d pid %d tid %d\n",
+		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+
+	thread__zput(ptq->thread);
+
+	if (ptq->tid != -1) {
+		if (ptq->pid != -1)
+			ptq->thread = machine__findnew_thread(ptq->pt->machine,
+							      ptq->pid,
+							      ptq->tid);
+		else
+			ptq->thread = machine__find_thread(ptq->pt->machine, -1,
+							   ptq->tid);
+	}
+}
+
+/* This function assumes data is processed sequentially only */
+static int intel_pt_get_trace(struct intel_pt_buffer *b, void *data)
+{
+	struct intel_pt_queue *ptq = data;
+	struct auxtrace_buffer *buffer = ptq->buffer, *old_buffer = buffer;
+	struct auxtrace_queue *queue;
+
+	if (ptq->stop) {
+		b->len = 0;
+		return 0;
+	}
+
+	queue = &ptq->pt->queues.queue_array[ptq->queue_nr];
+
+	buffer = auxtrace_buffer__next(queue, buffer);
+	if (!buffer) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		b->len = 0;
+		return 0;
+	}
+
+	ptq->buffer = buffer;
+
+	if (!buffer->data) {
+		int fd = perf_data_file__fd(ptq->pt->session->file);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data)
+			return -ENOMEM;
+	}
+
+	if (ptq->pt->snapshot_mode && !buffer->consecutive && old_buffer &&
+	    intel_pt_do_fix_overlap(ptq->pt, old_buffer, buffer))
+		return -ENOMEM;
+
+	if (old_buffer)
+		auxtrace_buffer__drop_data(old_buffer);
+
+	if (buffer->use_data) {
+		b->len = buffer->use_size;
+		b->buf = buffer->use_data;
+	} else {
+		b->len = buffer->size;
+		b->buf = buffer->data;
+	}
+	b->ref_timestamp = buffer->reference;
+
+	if (!old_buffer || ptq->pt->sampling_mode || (ptq->pt->snapshot_mode &&
+						      !buffer->consecutive)) {
+		b->consecutive = false;
+		b->trace_nr = buffer->buffer_nr + 1;
+	} else {
+		b->consecutive = true;
+	}
+
+	if (ptq->use_buffer_pid_tid && (ptq->pid != buffer->pid ||
+					ptq->tid != buffer->tid))
+		intel_pt_use_buffer_pid_tid(ptq, queue, buffer);
+
+	if (ptq->step_through_buffers)
+		ptq->stop = true;
+
+	if (!b->len)
+		return intel_pt_get_trace(b, data);
+
+	return 0;
+}
+
+struct intel_pt_cache_entry {
+	struct auxtrace_cache_entry	entry;
+	u64				insn_cnt;
+	u64				byte_cnt;
+	enum intel_pt_insn_op		op;
+	enum intel_pt_insn_branch	branch;
+	int				length;
+	int32_t				rel;
+};
+
+static int intel_pt_config_div(const char *var, const char *value, void *data)
+{
+	int *d = data;
+	long val;
+
+	if (!strcmp(var, "intel-pt.cache-divisor")) {
+		val = strtol(value, NULL, 0);
+		if (val > 0 && val <= INT_MAX)
+			*d = val;
+	}
+
+	return 0;
+}
+
+static int intel_pt_cache_divisor(void)
+{
+	static int d;
+
+	if (d)
+		return d;
+
+	perf_config(intel_pt_config_div, &d);
+
+	if (!d)
+		d = 64;
+
+	return d;
+}
+
+static unsigned int intel_pt_cache_size(struct dso *dso,
+					struct machine *machine)
+{
+	off_t size;
+
+	size = dso__data_size(dso, machine);
+	size /= intel_pt_cache_divisor();
+	if (size < 1000)
+		return 10;
+	if (size > (1 << 21))
+		return 21;
+	return 32 - __builtin_clz(size);
+}
+
+static struct auxtrace_cache *intel_pt_cache(struct dso *dso,
+					     struct machine *machine)
+{
+	struct auxtrace_cache *c;
+	unsigned int bits;
+
+	if (dso->auxtrace_cache)
+		return dso->auxtrace_cache;
+
+	bits = intel_pt_cache_size(dso, machine);
+
+	/* Ignoring cache creation failure */
+	c = auxtrace_cache__new(bits, sizeof(struct intel_pt_cache_entry), 200);
+
+	dso->auxtrace_cache = c;
+
+	return c;
+}
+
+static int intel_pt_cache_add(struct dso *dso, struct machine *machine,
+			      u64 offset, u64 insn_cnt, u64 byte_cnt,
+			      struct intel_pt_insn *intel_pt_insn)
+{
+	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
+	struct intel_pt_cache_entry *e;
+	int err;
+
+	if (!c)
+		return -ENOMEM;
+
+	e = auxtrace_cache__alloc_entry(c);
+	if (!e)
+		return -ENOMEM;
+
+	e->insn_cnt = insn_cnt;
+	e->byte_cnt = byte_cnt;
+	e->op = intel_pt_insn->op;
+	e->branch = intel_pt_insn->branch;
+	e->length = intel_pt_insn->length;
+	e->rel = intel_pt_insn->rel;
+
+	err = auxtrace_cache__add(c, offset, &e->entry);
+	if (err)
+		auxtrace_cache__free_entry(c, e);
+
+	return err;
+}
+
+static struct intel_pt_cache_entry *
+intel_pt_cache_lookup(struct dso *dso, struct machine *machine, u64 offset)
+{
+	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
+
+	if (!c)
+		return NULL;
+
+	return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
+}
+
+static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
+				   uint64_t *insn_cnt_ptr, uint64_t *ip,
+				   uint64_t to_ip, uint64_t max_insn_cnt,
+				   void *data)
+{
+	struct intel_pt_queue *ptq = data;
+	struct machine *machine = ptq->pt->machine;
+	struct thread *thread;
+	struct addr_location al;
+	unsigned char buf[1024];
+	size_t bufsz;
+	ssize_t len;
+	int x86_64;
+	u8 cpumode;
+	u64 offset, start_offset, start_ip;
+	u64 insn_cnt = 0;
+	bool one_map = true;
+
+	if (to_ip && *ip == to_ip)
+		goto out_no_cache;
+
+	bufsz = intel_pt_insn_max_size();
+
+	if (*ip >= ptq->pt->kernel_start)
+		cpumode = PERF_RECORD_MISC_KERNEL;
+	else
+		cpumode = PERF_RECORD_MISC_USER;
+
+	thread = ptq->thread;
+	if (!thread) {
+		if (cpumode != PERF_RECORD_MISC_KERNEL)
+			return -EINVAL;
+		thread = ptq->pt->unknown_thread;
+	}
+
+	while (1) {
+		thread__find_addr_map(thread, cpumode, MAP__FUNCTION, *ip, &al);
+		if (!al.map || !al.map->dso)
+			return -EINVAL;
+
+		if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR &&
+		    dso__data_status_seen(al.map->dso,
+					  DSO_DATA_STATUS_SEEN_ITRACE))
+			return -ENOENT;
+
+		offset = al.map->map_ip(al.map, *ip);
+
+		if (!to_ip && one_map) {
+			struct intel_pt_cache_entry *e;
+
+			e = intel_pt_cache_lookup(al.map->dso, machine, offset);
+			if (e &&
+			    (!max_insn_cnt || e->insn_cnt <= max_insn_cnt)) {
+				*insn_cnt_ptr = e->insn_cnt;
+				*ip += e->byte_cnt;
+				intel_pt_insn->op = e->op;
+				intel_pt_insn->branch = e->branch;
+				intel_pt_insn->length = e->length;
+				intel_pt_insn->rel = e->rel;
+				intel_pt_log_insn_no_data(intel_pt_insn, *ip);
+				return 0;
+			}
+		}
+
+		start_offset = offset;
+		start_ip = *ip;
+
+		/* Load maps to ensure dso->is_64_bit has been updated */
+		map__load(al.map, machine->symbol_filter);
+
+		x86_64 = al.map->dso->is_64_bit;
+
+		while (1) {
+			len = dso__data_read_offset(al.map->dso, machine,
+						    offset, buf, bufsz);
+			if (len <= 0)
+				return -EINVAL;
+
+			if (intel_pt_get_insn(buf, len, x86_64, intel_pt_insn))
+				return -EINVAL;
+
+			intel_pt_log_insn(intel_pt_insn, *ip);
+
+			insn_cnt += 1;
+
+			if (intel_pt_insn->branch != INTEL_PT_BR_NO_BRANCH)
+				goto out;
+
+			if (max_insn_cnt && insn_cnt >= max_insn_cnt)
+				goto out_no_cache;
+
+			*ip += intel_pt_insn->length;
+
+			if (to_ip && *ip == to_ip)
+				goto out_no_cache;
+
+			if (*ip >= al.map->end)
+				break;
+
+			offset += intel_pt_insn->length;
+		}
+		one_map = false;
+	}
+out:
+	*insn_cnt_ptr = insn_cnt;
+
+	if (!one_map)
+		goto out_no_cache;
+
+	/*
+	 * Didn't lookup in the 'to_ip' case, so do it now to prevent duplicate
+	 * entries.
+	 */
+	if (to_ip) {
+		struct intel_pt_cache_entry *e;
+
+		e = intel_pt_cache_lookup(al.map->dso, machine, start_offset);
+		if (e)
+			return 0;
+	}
+
+	/* Ignore cache errors */
+	intel_pt_cache_add(al.map->dso, machine, start_offset, insn_cnt,
+			   *ip - start_ip, intel_pt_insn);
+
+	return 0;
+
+out_no_cache:
+	*insn_cnt_ptr = insn_cnt;
+	return 0;
+}
+
+static bool intel_pt_get_config(struct intel_pt *pt,
+				struct perf_event_attr *attr, u64 *config)
+{
+	if (attr->type == pt->pmu_type) {
+		if (config)
+			*config = attr->config;
+		return true;
+	}
+
+	return false;
+}
+
+static bool intel_pt_exclude_kernel(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, NULL) &&
+		    !evsel->attr.exclude_kernel)
+			return false;
+	}
+	return true;
+}
+
+static bool intel_pt_return_compression(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	u64 config;
+
+	if (!pt->noretcomp_bit)
+		return true;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, &config) &&
+		    (config & pt->noretcomp_bit))
+			return false;
+	}
+	return true;
+}
+
+static bool intel_pt_timeless_decoding(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	bool timeless_decoding = true;
+	u64 config;
+
+	if (!pt->tsc_bit || !pt->cap_user_time_zero)
+		return true;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (!(evsel->attr.sample_type & PERF_SAMPLE_TIME))
+			return true;
+		if (intel_pt_get_config(pt, &evsel->attr, &config)) {
+			if (config & pt->tsc_bit)
+				timeless_decoding = false;
+			else
+				return true;
+		}
+	}
+	return timeless_decoding;
+}
+
+static bool intel_pt_tracing_kernel(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, NULL) &&
+		    !evsel->attr.exclude_kernel)
+			return true;
+	}
+	return false;
+}
+
+static bool intel_pt_have_tsc(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	bool have_tsc = false;
+	u64 config;
+
+	if (!pt->tsc_bit)
+		return false;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, &config)) {
+			if (config & pt->tsc_bit)
+				have_tsc = true;
+			else
+				return false;
+		}
+	}
+	return have_tsc;
+}
+
+static u64 intel_pt_ns_to_ticks(const struct intel_pt *pt, u64 ns)
+{
+	u64 quot, rem;
+
+	quot = ns / pt->tc.time_mult;
+	rem  = ns % pt->tc.time_mult;
+	return (quot << pt->tc.time_shift) + (rem << pt->tc.time_shift) /
+		pt->tc.time_mult;
+}
+
+static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
+						   unsigned int queue_nr)
+{
+	struct intel_pt_params params = { .get_trace = 0, };
+	struct intel_pt_queue *ptq;
+
+	ptq = zalloc(sizeof(struct intel_pt_queue));
+	if (!ptq)
+		return NULL;
+
+	if (pt->synth_opts.callchain) {
+		size_t sz = sizeof(struct ip_callchain);
+
+		sz += pt->synth_opts.callchain_sz * sizeof(u64);
+		ptq->chain = zalloc(sz);
+		if (!ptq->chain)
+			goto out_free;
+	}
+
+	ptq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
+	if (!ptq->event_buf)
+		goto out_free;
+
+	ptq->pt = pt;
+	ptq->queue_nr = queue_nr;
+	ptq->exclude_kernel = intel_pt_exclude_kernel(pt);
+	ptq->pid = -1;
+	ptq->tid = -1;
+	ptq->cpu = -1;
+	ptq->next_tid = -1;
+
+	params.get_trace = intel_pt_get_trace;
+	params.walk_insn = intel_pt_walk_next_insn;
+	params.data = ptq;
+	params.return_compression = intel_pt_return_compression(pt);
+	params.max_non_turbo_ratio = pt->max_non_turbo_ratio;
+
+	if (pt->synth_opts.instructions) {
+		if (pt->synth_opts.period) {
+			switch (pt->synth_opts.period_type) {
+			case PERF_ITRACE_PERIOD_INSTRUCTIONS:
+				params.period_type =
+						INTEL_PT_PERIOD_INSTRUCTIONS;
+				params.period = pt->synth_opts.period;
+				break;
+			case PERF_ITRACE_PERIOD_TICKS:
+				params.period_type = INTEL_PT_PERIOD_TICKS;
+				params.period = pt->synth_opts.period;
+				break;
+			case PERF_ITRACE_PERIOD_NANOSECS:
+				params.period_type = INTEL_PT_PERIOD_TICKS;
+				params.period = intel_pt_ns_to_ticks(pt,
+							pt->synth_opts.period);
+				break;
+			default:
+				break;
+			}
+		}
+
+		if (!params.period) {
+			params.period_type = INTEL_PT_PERIOD_INSTRUCTIONS;
+			params.period = 1000;
+		}
+	}
+
+	ptq->decoder = intel_pt_decoder_new(&params);
+	if (!ptq->decoder)
+		goto out_free;
+
+	return ptq;
+
+out_free:
+	zfree(&ptq->event_buf);
+	zfree(&ptq->chain);
+	free(ptq);
+	return NULL;
+}
+
+static void intel_pt_free_queue(void *priv)
+{
+	struct intel_pt_queue *ptq = priv;
+
+	if (!ptq)
+		return;
+	thread__zput(ptq->thread);
+	intel_pt_decoder_free(ptq->decoder);
+	zfree(&ptq->event_buf);
+	zfree(&ptq->chain);
+	free(ptq);
+}
+
+static void intel_pt_set_pid_tid_cpu(struct intel_pt *pt,
+				     struct auxtrace_queue *queue)
+{
+	struct intel_pt_queue *ptq = queue->priv;
+
+	if (queue->tid == -1 || pt->have_sched_switch) {
+		ptq->tid = machine__get_current_tid(pt->machine, ptq->cpu);
+		thread__zput(ptq->thread);
+	}
+
+	if (!ptq->thread && ptq->tid != -1)
+		ptq->thread = machine__find_thread(pt->machine, -1, ptq->tid);
+
+	if (ptq->thread) {
+		ptq->pid = ptq->thread->pid_;
+		if (queue->cpu == -1)
+			ptq->cpu = ptq->thread->cpu;
+	}
+}
+
+static void intel_pt_sample_flags(struct intel_pt_queue *ptq)
+{
+	if (ptq->state->flags & INTEL_PT_ABORT_TX) {
+		ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TX_ABORT;
+	} else if (ptq->state->flags & INTEL_PT_ASYNC) {
+		if (ptq->state->to_ip)
+			ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+				     PERF_IP_FLAG_ASYNC |
+				     PERF_IP_FLAG_INTERRUPT;
+		else
+			ptq->flags = PERF_IP_FLAG_BRANCH |
+				     PERF_IP_FLAG_TRACE_END;
+		ptq->insn_len = 0;
+	} else {
+		if (ptq->state->from_ip)
+			ptq->flags = intel_pt_insn_type(ptq->state->insn_op);
+		else
+			ptq->flags = PERF_IP_FLAG_BRANCH |
+				     PERF_IP_FLAG_TRACE_BEGIN;
+		if (ptq->state->flags & INTEL_PT_IN_TX)
+			ptq->flags |= PERF_IP_FLAG_IN_TX;
+		ptq->insn_len = ptq->state->insn_len;
+	}
+}
+
+static int intel_pt_setup_queue(struct intel_pt *pt,
+				struct auxtrace_queue *queue,
+				unsigned int queue_nr)
+{
+	struct intel_pt_queue *ptq = queue->priv;
+
+	if (list_empty(&queue->head))
+		return 0;
+
+	if (!ptq) {
+		ptq = intel_pt_alloc_queue(pt, queue_nr);
+		if (!ptq)
+			return -ENOMEM;
+		queue->priv = ptq;
+
+		if (queue->cpu != -1)
+			ptq->cpu = queue->cpu;
+		ptq->tid = queue->tid;
+
+		if (pt->sampling_mode) {
+			if (pt->timeless_decoding)
+				ptq->step_through_buffers = true;
+			if (pt->timeless_decoding || !pt->have_sched_switch)
+				ptq->use_buffer_pid_tid = true;
+		}
+	}
+
+	if (!ptq->on_heap &&
+	    (!pt->sync_switch ||
+	     ptq->switch_state != INTEL_PT_SS_EXPECTING_SWITCH_EVENT)) {
+		const struct intel_pt_state *state;
+		int ret;
+
+		if (pt->timeless_decoding)
+			return 0;
+
+		intel_pt_log("queue %u getting timestamp\n", queue_nr);
+		intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
+			     queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+		while (1) {
+			state = intel_pt_decode(ptq->decoder);
+			if (state->err) {
+				if (state->err == INTEL_PT_ERR_NODATA) {
+					intel_pt_log("queue %u has no timestamp\n",
+						     queue_nr);
+					return 0;
+				}
+				continue;
+			}
+			if (state->timestamp)
+				break;
+		}
+
+		ptq->timestamp = state->timestamp;
+		intel_pt_log("queue %u timestamp 0x%" PRIx64 "\n",
+			     queue_nr, ptq->timestamp);
+		ptq->state = state;
+		ptq->have_sample = true;
+		intel_pt_sample_flags(ptq);
+		ret = auxtrace_heap__add(&pt->heap, queue_nr, ptq->timestamp);
+		if (ret)
+			return ret;
+		ptq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int intel_pt_setup_queues(struct intel_pt *pt)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < pt->queues.nr_queues; i++) {
+		ret = intel_pt_setup_queue(pt, &pt->queues.queue_array[i], i);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static int intel_pt_inject_event(union perf_event *event,
+				 struct perf_sample *sample, u64 type,
+				 bool swapped)
+{
+	event->header.size = perf_event__sample_event_size(sample, type, 0);
+	return perf_event__synthesize_sample(event, type, 0, sample, swapped);
+}
+
+static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event *event = ptq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->branches_id;
+	sample.stream_id = ptq->pt->branches_id;
+	sample.period = 1;
+	sample.cpu = ptq->cpu;
+	sample.flags = ptq->flags;
+	sample.insn_len = ptq->insn_len;
+
+	if (pt->branches_filter && !(pt->branches_filter & ptq->flags))
+		return 0;
+
+	if (pt->synth_opts.inject) {
+		ret = intel_pt_inject_event(event, &sample,
+					    pt->branches_sample_type,
+					    pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver branch event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event *event = ptq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->instructions_id;
+	sample.stream_id = ptq->pt->instructions_id;
+	sample.period = ptq->pt->instructions_sample_period;
+	sample.cpu = ptq->cpu;
+	sample.flags = ptq->flags;
+	sample.insn_len = ptq->insn_len;
+
+	if (pt->synth_opts.callchain) {
+		thread_stack__sample(ptq->thread, ptq->chain,
+				     pt->synth_opts.callchain_sz, sample.ip);
+		sample.callchain = ptq->chain;
+	}
+
+	if (pt->synth_opts.inject) {
+		ret = intel_pt_inject_event(event, &sample,
+					    pt->instructions_sample_type,
+					    pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver instruction event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_transaction_sample(struct intel_pt_queue *ptq)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event *event = ptq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->transactions_id;
+	sample.stream_id = ptq->pt->transactions_id;
+	sample.period = 1;
+	sample.cpu = ptq->cpu;
+	sample.flags = ptq->flags;
+	sample.insn_len = ptq->insn_len;
+
+	if (pt->synth_opts.callchain) {
+		thread_stack__sample(ptq->thread, ptq->chain,
+				     pt->synth_opts.callchain_sz, sample.ip);
+		sample.callchain = ptq->chain;
+	}
+
+	if (pt->synth_opts.inject) {
+		ret = intel_pt_inject_event(event, &sample,
+					    pt->transactions_sample_type,
+					    pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver transaction event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_error(struct intel_pt *pt, int code, int cpu,
+				pid_t pid, pid_t tid, u64 ip)
+{
+	union perf_event event;
+	char msg[MAX_AUXTRACE_ERROR_MSG];
+	int err;
+
+	intel_pt__strerror(code, msg, MAX_AUXTRACE_ERROR_MSG);
+
+	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
+			     code, cpu, pid, tid, ip, msg);
+
+	err = perf_session__deliver_synth_event(pt->session, &event, NULL);
+	if (err)
+		pr_err("Intel Processor Trace: failed to deliver error event, error %d\n",
+		       err);
+
+	return err;
+}
+
+static int intel_pt_next_tid(struct intel_pt *pt, struct intel_pt_queue *ptq)
+{
+	struct auxtrace_queue *queue;
+	pid_t tid = ptq->next_tid;
+	int err;
+
+	if (tid == -1)
+		return 0;
+
+	intel_pt_log("switch: cpu %d tid %d\n", ptq->cpu, tid);
+
+	err = machine__set_current_tid(pt->machine, ptq->cpu, -1, tid);
+
+	queue = &pt->queues.queue_array[ptq->queue_nr];
+	intel_pt_set_pid_tid_cpu(pt, queue);
+
+	ptq->next_tid = -1;
+
+	return err;
+}
+
+static inline bool intel_pt_is_switch_ip(struct intel_pt_queue *ptq, u64 ip)
+{
+	struct intel_pt *pt = ptq->pt;
+
+	return ip == pt->switch_ip &&
+	       (ptq->flags & PERF_IP_FLAG_BRANCH) &&
+	       !(ptq->flags & (PERF_IP_FLAG_CONDITIONAL | PERF_IP_FLAG_ASYNC |
+			       PERF_IP_FLAG_INTERRUPT | PERF_IP_FLAG_TX_ABORT));
+}
+
+static int intel_pt_sample(struct intel_pt_queue *ptq)
+{
+	const struct intel_pt_state *state = ptq->state;
+	struct intel_pt *pt = ptq->pt;
+	int err;
+
+	if (!ptq->have_sample)
+		return 0;
+
+	ptq->have_sample = false;
+
+	if (pt->sample_instructions &&
+	    (state->type & INTEL_PT_INSTRUCTION)) {
+		err = intel_pt_synth_instruction_sample(ptq);
+		if (err)
+			return err;
+	}
+
+	if (pt->sample_transactions &&
+	    (state->type & INTEL_PT_TRANSACTION)) {
+		err = intel_pt_synth_transaction_sample(ptq);
+		if (err)
+			return err;
+	}
+
+	if (!(state->type & INTEL_PT_BRANCH))
+		return 0;
+
+	if (pt->synth_opts.callchain)
+		thread_stack__event(ptq->thread, ptq->flags, state->from_ip,
+				    state->to_ip, ptq->insn_len,
+				    state->trace_nr);
+	else
+		thread_stack__set_trace_nr(ptq->thread, state->trace_nr);
+
+	if (pt->sample_branches) {
+		err = intel_pt_synth_branch_sample(ptq);
+		if (err)
+			return err;
+	}
+
+	if (!pt->sync_switch)
+		return 0;
+
+	if (intel_pt_is_switch_ip(ptq, state->to_ip)) {
+		switch (ptq->switch_state) {
+		case INTEL_PT_SS_UNKNOWN:
+		case INTEL_PT_SS_EXPECTING_SWITCH_IP:
+			err = intel_pt_next_tid(pt, ptq);
+			if (err)
+				return err;
+			ptq->switch_state = INTEL_PT_SS_TRACING;
+			break;
+		default:
+			ptq->switch_state = INTEL_PT_SS_EXPECTING_SWITCH_EVENT;
+			return 1;
+		}
+	} else if (!state->to_ip) {
+		ptq->switch_state = INTEL_PT_SS_NOT_TRACING;
+	} else if (ptq->switch_state == INTEL_PT_SS_NOT_TRACING) {
+		ptq->switch_state = INTEL_PT_SS_UNKNOWN;
+	} else if (ptq->switch_state == INTEL_PT_SS_UNKNOWN &&
+		   state->to_ip == pt->ptss_ip &&
+		   (ptq->flags & PERF_IP_FLAG_CALL)) {
+		ptq->switch_state = INTEL_PT_SS_TRACING;
+	}
+
+	return 0;
+}
+
+static u64 intel_pt_switch_ip(struct machine *machine, u64 *ptss_ip)
+{
+	struct map *map;
+	struct symbol *sym, *start;
+	u64 ip, switch_ip = 0;
+
+	if (ptss_ip)
+		*ptss_ip = 0;
+
+	map = machine__kernel_map(machine, MAP__FUNCTION);
+	if (!map)
+		return 0;
+
+	if (map__load(map, machine->symbol_filter))
+		return 0;
+
+	start = dso__first_symbol(map->dso, MAP__FUNCTION);
+
+	for (sym = start; sym; sym = dso__next_symbol(sym)) {
+		if (sym->binding == STB_GLOBAL &&
+		    !strcmp(sym->name, "__switch_to")) {
+			ip = map->unmap_ip(map, sym->start);
+			if (ip >= map->start && ip < map->end) {
+				switch_ip = ip;
+				break;
+			}
+		}
+	}
+
+	if (!switch_ip || !ptss_ip)
+		return 0;
+
+	for (sym = start; sym; sym = dso__next_symbol(sym)) {
+		if (!strcmp(sym->name, "perf_trace_sched_switch")) {
+			ip = map->unmap_ip(map, sym->start);
+			if (ip >= map->start && ip < map->end) {
+				*ptss_ip = ip;
+				break;
+			}
+		}
+	}
+
+	return switch_ip;
+}
+
+static int intel_pt_run_decoder(struct intel_pt_queue *ptq, u64 *timestamp)
+{
+	const struct intel_pt_state *state = ptq->state;
+	struct intel_pt *pt = ptq->pt;
+	int err;
+
+	if (!pt->kernel_start) {
+		pt->kernel_start = machine__kernel_start(pt->machine);
+		if (pt->per_cpu_mmaps && pt->have_sched_switch &&
+		    !pt->timeless_decoding && intel_pt_tracing_kernel(pt) &&
+		    !pt->sampling_mode) {
+			pt->switch_ip = intel_pt_switch_ip(pt->machine,
+							   &pt->ptss_ip);
+			if (pt->switch_ip) {
+				intel_pt_log("switch_ip: %"PRIx64" ptss_ip: %"PRIx64"\n",
+					     pt->switch_ip, pt->ptss_ip);
+				pt->sync_switch = true;
+			}
+		}
+	}
+
+	intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
+		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+	while (1) {
+		err = intel_pt_sample(ptq);
+		if (err)
+			return err;
+
+		state = intel_pt_decode(ptq->decoder);
+		if (state->err) {
+			if (state->err == INTEL_PT_ERR_NODATA)
+				return 1;
+			if (pt->sync_switch &&
+			    state->from_ip >= pt->kernel_start) {
+				pt->sync_switch = false;
+				intel_pt_next_tid(pt, ptq);
+			}
+			if (pt->synth_opts.errors) {
+				err = intel_pt_synth_error(pt, state->err,
+							   ptq->cpu, ptq->pid,
+							   ptq->tid,
+							   state->from_ip);
+				if (err)
+					return err;
+			}
+			continue;
+		}
+
+		ptq->state = state;
+		ptq->have_sample = true;
+		intel_pt_sample_flags(ptq);
+
+		/* Use estimated TSC upon return to user space */
+		if (pt->est_tsc &&
+		    (state->from_ip >= pt->kernel_start || !state->from_ip) &&
+		    state->to_ip && state->to_ip < pt->kernel_start) {
+			intel_pt_log("TSC %"PRIx64" est. TSC %"PRIx64"\n",
+				     state->timestamp, state->est_timestamp);
+			ptq->timestamp = state->est_timestamp;
+		/* Use estimated TSC in unknown switch state */
+		} else if (pt->sync_switch &&
+			   ptq->switch_state == INTEL_PT_SS_UNKNOWN &&
+			   intel_pt_is_switch_ip(ptq, state->to_ip) &&
+			   ptq->next_tid == -1) {
+			intel_pt_log("TSC %"PRIx64" est. TSC %"PRIx64"\n",
+				     state->timestamp, state->est_timestamp);
+			ptq->timestamp = state->est_timestamp;
+		} else if (state->timestamp > ptq->timestamp) {
+			ptq->timestamp = state->timestamp;
+		}
+
+		if (!pt->timeless_decoding && ptq->timestamp >= *timestamp) {
+			*timestamp = ptq->timestamp;
+			return 0;
+		}
+	}
+	return 0;
+}
+
+static inline int intel_pt_update_queues(struct intel_pt *pt)
+{
+	if (pt->queues.new_data) {
+		pt->queues.new_data = false;
+		return intel_pt_setup_queues(pt);
+	}
+	return 0;
+}
+
+static int intel_pt_process_queues(struct intel_pt *pt, u64 timestamp)
+{
+	unsigned int queue_nr;
+	u64 ts;
+	int ret;
+
+	while (1) {
+		struct auxtrace_queue *queue;
+		struct intel_pt_queue *ptq;
+
+		if (!pt->heap.heap_cnt)
+			return 0;
+
+		if (pt->heap.heap_array[0].ordinal >= timestamp)
+			return 0;
+
+		queue_nr = pt->heap.heap_array[0].queue_nr;
+		queue = &pt->queues.queue_array[queue_nr];
+		ptq = queue->priv;
+
+		intel_pt_log("queue %u processing 0x%" PRIx64 " to 0x%" PRIx64 "\n",
+			     queue_nr, pt->heap.heap_array[0].ordinal,
+			     timestamp);
+
+		auxtrace_heap__pop(&pt->heap);
+
+		if (pt->heap.heap_cnt) {
+			ts = pt->heap.heap_array[0].ordinal + 1;
+			if (ts > timestamp)
+				ts = timestamp;
+		} else {
+			ts = timestamp;
+		}
+
+		intel_pt_set_pid_tid_cpu(pt, queue);
+
+		ret = intel_pt_run_decoder(ptq, &ts);
+
+		if (ret < 0) {
+			auxtrace_heap__add(&pt->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&pt->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			ptq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int intel_pt_process_timeless_queues(struct intel_pt *pt, pid_t tid,
+					    u64 time_)
+{
+	struct auxtrace_queues *queues = &pt->queues;
+	unsigned int i;
+	u64 ts = 0;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &pt->queues.queue_array[i];
+		struct intel_pt_queue *ptq = queue->priv;
+
+		if (ptq && (tid == -1 || ptq->tid == tid)) {
+			ptq->time = time_;
+			intel_pt_set_pid_tid_cpu(pt, queue);
+			intel_pt_run_decoder(ptq, &ts);
+		}
+	}
+	return 0;
+}
+
+static int intel_pt_lost(struct intel_pt *pt, struct perf_sample *sample)
+{
+	return intel_pt_synth_error(pt, INTEL_PT_ERR_LOST, sample->cpu,
+				    sample->pid, sample->tid, 0);
+}
+
+static struct intel_pt_queue *intel_pt_cpu_to_ptq(struct intel_pt *pt, int cpu)
+{
+	unsigned i, j;
+
+	if (cpu < 0 || !pt->queues.nr_queues)
+		return NULL;
+
+	if ((unsigned)cpu >= pt->queues.nr_queues)
+		i = pt->queues.nr_queues - 1;
+	else
+		i = cpu;
+
+	if (pt->queues.queue_array[i].cpu == cpu)
+		return pt->queues.queue_array[i].priv;
+
+	for (j = 0; i > 0; j++) {
+		if (pt->queues.queue_array[--i].cpu == cpu)
+			return pt->queues.queue_array[i].priv;
+	}
+
+	for (; j < pt->queues.nr_queues; j++) {
+		if (pt->queues.queue_array[j].cpu == cpu)
+			return pt->queues.queue_array[j].priv;
+	}
+
+	return NULL;
+}
+
+static int intel_pt_process_switch(struct intel_pt *pt,
+				   struct perf_sample *sample)
+{
+	struct intel_pt_queue *ptq;
+	struct perf_evsel *evsel;
+	pid_t tid;
+	int cpu, err;
+
+	evsel = perf_evlist__id2evsel(pt->session->evlist, sample->id);
+	if (evsel != pt->switch_evsel)
+		return 0;
+
+	tid = perf_evsel__intval(evsel, sample, "next_pid");
+	cpu = sample->cpu;
+
+	intel_pt_log("sched_switch: cpu %d tid %d time %"PRIu64" tsc %#"PRIx64"\n",
+		     cpu, tid, sample->time, perf_time_to_tsc(sample->time,
+		     &pt->tc));
+
+	if (!pt->sync_switch)
+		goto out;
+
+	ptq = intel_pt_cpu_to_ptq(pt, cpu);
+	if (!ptq)
+		goto out;
+
+	switch (ptq->switch_state) {
+	case INTEL_PT_SS_NOT_TRACING:
+		ptq->next_tid = -1;
+		break;
+	case INTEL_PT_SS_UNKNOWN:
+	case INTEL_PT_SS_TRACING:
+		ptq->next_tid = tid;
+		ptq->switch_state = INTEL_PT_SS_EXPECTING_SWITCH_IP;
+		return 0;
+	case INTEL_PT_SS_EXPECTING_SWITCH_EVENT:
+		if (!ptq->on_heap) {
+			ptq->timestamp = perf_time_to_tsc(sample->time,
+							  &pt->tc);
+			err = auxtrace_heap__add(&pt->heap, ptq->queue_nr,
+						 ptq->timestamp);
+			if (err)
+				return err;
+			ptq->on_heap = true;
+		}
+		ptq->switch_state = INTEL_PT_SS_TRACING;
+		break;
+	case INTEL_PT_SS_EXPECTING_SWITCH_IP:
+		ptq->next_tid = tid;
+		intel_pt_log("ERROR: cpu %d expecting switch ip\n", cpu);
+		break;
+	default:
+		break;
+	}
+out:
+	return machine__set_current_tid(pt->machine, cpu, -1, tid);
+}
+
+static int intel_pt_process_itrace_start(struct intel_pt *pt,
+					 union perf_event *event,
+					 struct perf_sample *sample)
+{
+	if (!pt->per_cpu_mmaps)
+		return 0;
+
+	intel_pt_log("itrace_start: cpu %d pid %d tid %d time %"PRIu64" tsc %#"PRIx64"\n",
+		     sample->cpu, event->itrace_start.pid,
+		     event->itrace_start.tid, sample->time,
+		     perf_time_to_tsc(sample->time, &pt->tc));
+
+	return machine__set_current_tid(pt->machine, sample->cpu,
+					event->itrace_start.pid,
+					event->itrace_start.tid);
+}
+
+static int intel_pt_process_event(struct perf_session *session,
+				  union perf_event *event,
+				  struct perf_sample *sample,
+				  struct perf_tool *tool)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+	u64 timestamp;
+	int err = 0;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("Intel Processor Trace requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time)
+		timestamp = perf_time_to_tsc(sample->time, &pt->tc);
+	else
+		timestamp = 0;
+
+	if (timestamp || pt->timeless_decoding) {
+		err = intel_pt_update_queues(pt);
+		if (err)
+			return err;
+	}
+
+	if (pt->timeless_decoding) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = intel_pt_process_timeless_queues(pt,
+							       event->comm.tid,
+							       sample->time);
+		}
+	} else if (timestamp) {
+		err = intel_pt_process_queues(pt, timestamp);
+	}
+	if (err)
+		return err;
+
+	if (event->header.type == PERF_RECORD_AUX &&
+	    (event->aux.flags & PERF_AUX_FLAG_TRUNCATED) &&
+	    pt->synth_opts.errors) {
+		err = intel_pt_lost(pt, sample);
+		if (err)
+			return err;
+	}
+
+	if (pt->switch_evsel && event->header.type == PERF_RECORD_SAMPLE)
+		err = intel_pt_process_switch(pt, sample);
+	else if (event->header.type == PERF_RECORD_ITRACE_START)
+		err = intel_pt_process_itrace_start(pt, event, sample);
+
+	intel_pt_log("event %s (%u): cpu %d time %"PRIu64" tsc %#"PRIx64"\n",
+		     perf_event__name(event->header.type), event->header.type,
+		     sample->cpu, sample->time, timestamp);
+
+	return err;
+}
+
+static int intel_pt_flush(struct perf_session *session, struct perf_tool *tool)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+	int ret;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = intel_pt_update_queues(pt);
+	if (ret < 0)
+		return ret;
+
+	if (pt->timeless_decoding)
+		return intel_pt_process_timeless_queues(pt, -1,
+							MAX_TIMESTAMP - 1);
+
+	return intel_pt_process_queues(pt, MAX_TIMESTAMP);
+}
+
+static void intel_pt_free_events(struct perf_session *session)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+	struct auxtrace_queues *queues = &pt->queues;
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		intel_pt_free_queue(queues->queue_array[i].priv);
+		queues->queue_array[i].priv = NULL;
+	}
+	intel_pt_log_disable();
+	auxtrace_queues__free(queues);
+}
+
+static void intel_pt_free(struct perf_session *session)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+
+	auxtrace_heap__free(&pt->heap);
+	intel_pt_free_events(session);
+	session->auxtrace = NULL;
+	thread__delete(pt->unknown_thread);
+	free(pt);
+}
+
+static int intel_pt_process_auxtrace_event(struct perf_session *session,
+					   union perf_event *event,
+					   struct perf_tool *tool __maybe_unused)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+
+	if (pt->sampling_mode)
+		return 0;
+
+	if (!pt->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data_file__fd(session->file);
+		int err;
+
+		if (perf_data_file__is_pipe(session->file)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&pt->queues, session, event,
+						 data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				intel_pt_dump_event(pt, buffer->data,
+						    buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
+		}
+	}
+
+	return 0;
+}
+
+struct intel_pt_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int intel_pt_event_synth(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample __maybe_unused,
+				struct machine *machine __maybe_unused)
+{
+	struct intel_pt_synth *intel_pt_synth =
+			container_of(tool, struct intel_pt_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(intel_pt_synth->session, event,
+						 NULL);
+}
+
+static int intel_pt_synth_event(struct perf_session *session,
+				struct perf_event_attr *attr, u64 id)
+{
+	struct intel_pt_synth intel_pt_synth;
+
+	memset(&intel_pt_synth, 0, sizeof(struct intel_pt_synth));
+	intel_pt_synth.session = session;
+
+	return perf_event__synthesize_attr(&intel_pt_synth.dummy_tool, attr, 1,
+					   &id, intel_pt_event_synth);
+}
+
+static int intel_pt_synth_events(struct intel_pt *pt,
+				 struct perf_session *session)
+{
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == pt->pmu_type && evsel->ids) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("There are no selected events with Intel Processor Trace data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+			    PERF_SAMPLE_PERIOD;
+	if (pt->timeless_decoding)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	else
+		attr.sample_type |= PERF_SAMPLE_TIME;
+	if (!pt->per_cpu_mmaps)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_CPU;
+	attr.exclude_user = evsel->attr.exclude_user;
+	attr.exclude_kernel = evsel->attr.exclude_kernel;
+	attr.exclude_hv = evsel->attr.exclude_hv;
+	attr.exclude_host = evsel->attr.exclude_host;
+	attr.exclude_guest = evsel->attr.exclude_guest;
+	attr.sample_id_all = evsel->attr.sample_id_all;
+	attr.read_format = evsel->attr.read_format;
+
+	id = evsel->id[0] + 1000000000;
+	if (!id)
+		id = 1;
+
+	if (pt->synth_opts.instructions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		if (pt->synth_opts.period_type == PERF_ITRACE_PERIOD_NANOSECS)
+			attr.sample_period =
+				intel_pt_ns_to_ticks(pt, pt->synth_opts.period);
+		else
+			attr.sample_period = pt->synth_opts.period;
+		pt->instructions_sample_period = attr.sample_period;
+		if (pt->synth_opts.callchain)
+			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
+		pr_debug("Synthesizing 'instructions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'instructions' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_instructions = true;
+		pt->instructions_sample_type = attr.sample_type;
+		pt->instructions_id = id;
+		id += 1;
+	}
+
+	if (pt->synth_opts.transactions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		attr.sample_period = 1;
+		if (pt->synth_opts.callchain)
+			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
+		pr_debug("Synthesizing 'transactions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'transactions' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_transactions = true;
+		pt->transactions_id = id;
+		id += 1;
+		evlist__for_each(evlist, evsel) {
+			if (evsel->id && evsel->id[0] == pt->transactions_id) {
+				if (evsel->name)
+					zfree(&evsel->name);
+				evsel->name = strdup("transactions");
+				break;
+			}
+		}
+	}
+
+	if (pt->synth_opts.branches) {
+		attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
+		attr.sample_period = 1;
+		attr.sample_type |= PERF_SAMPLE_ADDR;
+		attr.sample_type &= ~(u64)PERF_SAMPLE_CALLCHAIN;
+		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'branches' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_branches = true;
+		pt->branches_sample_type = attr.sample_type;
+		pt->branches_id = id;
+	}
+
+	pt->synth_needs_swap = evsel->needs_swap;
+
+	return 0;
+}
+
+static struct perf_evsel *intel_pt_find_sched_switch(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each_reverse(evlist, evsel) {
+		const char *name = perf_evsel__name(evsel);
+
+		if (!strcmp(name, "sched:sched_switch"))
+			return evsel;
+	}
+
+	return NULL;
+}
+
+static const char * const intel_pt_info_fmts[] = {
+	[INTEL_PT_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
+	[INTEL_PT_TIME_SHIFT]		= "  Time Shift         %"PRIu64"\n",
+	[INTEL_PT_TIME_MULT]		= "  Time Muliplier     %"PRIu64"\n",
+	[INTEL_PT_TIME_ZERO]		= "  Time Zero          %"PRIu64"\n",
+	[INTEL_PT_CAP_USER_TIME_ZERO]	= "  Cap Time Zero      %"PRId64"\n",
+	[INTEL_PT_TSC_BIT]		= "  TSC bit            %#"PRIx64"\n",
+	[INTEL_PT_NORETCOMP_BIT]	= "  NoRETComp bit      %#"PRIx64"\n",
+	[INTEL_PT_HAVE_SCHED_SWITCH]	= "  Have sched_switch  %"PRId64"\n",
+	[INTEL_PT_SNAPSHOT_MODE]	= "  Snapshot mode      %"PRId64"\n",
+	[INTEL_PT_PER_CPU_MMAPS]	= "  Per-cpu maps       %"PRId64"\n",
+};
+
+static void intel_pt_print_info(u64 *arr, int start, int finish)
+{
+	int i;
+
+	if (!dump_trace)
+		return;
+
+	for (i = start; i <= finish; i++)
+		fprintf(stdout, intel_pt_info_fmts[i], arr[i]);
+}
+
+int intel_pt_process_auxtrace_info(union perf_event *event,
+				   struct perf_session *session)
+{
+	struct auxtrace_info_event *auxtrace_info = &event->auxtrace_info;
+	size_t min_sz = sizeof(u64) * INTEL_PT_PER_CPU_MMAPS;
+	struct intel_pt *pt;
+	int err;
+
+	if (auxtrace_info->header.size < sizeof(struct auxtrace_info_event) +
+					min_sz)
+		return -EINVAL;
+
+	pt = zalloc(sizeof(struct intel_pt));
+	if (!pt)
+		return -ENOMEM;
+
+	err = auxtrace_queues__init(&pt->queues);
+	if (err)
+		goto err_free;
+
+	intel_pt_log_set_name(INTEL_PT_PMU_NAME);
+
+	pt->session = session;
+	pt->machine = &session->machines.host; /* No kvm support */
+	pt->auxtrace_type = auxtrace_info->type;
+	pt->pmu_type = auxtrace_info->priv[INTEL_PT_PMU_TYPE];
+	pt->tc.time_shift = auxtrace_info->priv[INTEL_PT_TIME_SHIFT];
+	pt->tc.time_mult = auxtrace_info->priv[INTEL_PT_TIME_MULT];
+	pt->tc.time_zero = auxtrace_info->priv[INTEL_PT_TIME_ZERO];
+	pt->cap_user_time_zero = auxtrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO];
+	pt->tsc_bit = auxtrace_info->priv[INTEL_PT_TSC_BIT];
+	pt->noretcomp_bit = auxtrace_info->priv[INTEL_PT_NORETCOMP_BIT];
+	pt->have_sched_switch = auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH];
+	pt->snapshot_mode = auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE];
+	pt->per_cpu_mmaps = auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS];
+	intel_pt_print_info(&auxtrace_info->priv[0], INTEL_PT_PMU_TYPE,
+			    INTEL_PT_PER_CPU_MMAPS);
+
+	pt->timeless_decoding = intel_pt_timeless_decoding(pt);
+	pt->have_tsc = intel_pt_have_tsc(pt);
+	pt->sampling_mode = false;
+	pt->est_tsc = !pt->timeless_decoding;
+
+	pt->unknown_thread = thread__new(999999999, 999999999);
+	if (!pt->unknown_thread) {
+		err = -ENOMEM;
+		goto err_free_queues;
+	}
+	err = thread__set_comm(pt->unknown_thread, "unknown", 0);
+	if (err)
+		goto err_delete_thread;
+	if (thread__init_map_groups(pt->unknown_thread, pt->machine)) {
+		err = -ENOMEM;
+		goto err_delete_thread;
+	}
+
+	pt->auxtrace.process_event = intel_pt_process_event;
+	pt->auxtrace.process_auxtrace_event = intel_pt_process_auxtrace_event;
+	pt->auxtrace.flush_events = intel_pt_flush;
+	pt->auxtrace.free_events = intel_pt_free_events;
+	pt->auxtrace.free = intel_pt_free;
+	session->auxtrace = &pt->auxtrace;
+
+	if (dump_trace)
+		return 0;
+
+	if (pt->have_sched_switch == 1) {
+		pt->switch_evsel = intel_pt_find_sched_switch(session->evlist);
+		if (!pt->switch_evsel) {
+			pr_err("%s: missing sched_switch event\n", __func__);
+			goto err_delete_thread;
+		}
+	}
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set) {
+		pt->synth_opts = *session->itrace_synth_opts;
+	} else {
+		itrace_synth_opts__set_default(&pt->synth_opts);
+		if (use_browser != -1) {
+			pt->synth_opts.branches = false;
+			pt->synth_opts.callchain = true;
+		}
+	}
+
+	if (pt->synth_opts.log)
+		intel_pt_log_enable();
+
+	/* Maximum non-turbo ratio is TSC freq / 100 MHz */
+	if (pt->tc.time_mult) {
+		u64 tsc_freq = intel_pt_ns_to_ticks(pt, 1000000000);
+
+		pt->max_non_turbo_ratio = (tsc_freq + 50000000) / 100000000;
+		intel_pt_log("TSC frequency %"PRIu64"\n", tsc_freq);
+		intel_pt_log("Maximum non-turbo ratio %u\n",
+			     pt->max_non_turbo_ratio);
+	}
+
+	if (pt->synth_opts.calls)
+		pt->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
+				       PERF_IP_FLAG_TRACE_END;
+	if (pt->synth_opts.returns)
+		pt->branches_filter |= PERF_IP_FLAG_RETURN |
+				       PERF_IP_FLAG_TRACE_BEGIN;
+
+	if (pt->synth_opts.callchain && !symbol_conf.use_callchain) {
+		symbol_conf.use_callchain = true;
+		if (callchain_register_param(&callchain_param) < 0) {
+			symbol_conf.use_callchain = false;
+			pt->synth_opts.callchain = false;
+		}
+	}
+
+	err = intel_pt_synth_events(pt, session);
+	if (err)
+		goto err_delete_thread;
+
+	err = auxtrace_queues__process_index(&pt->queues, session);
+	if (err)
+		goto err_delete_thread;
+
+	if (pt->queues.populated)
+		pt->data_queued = true;
+
+	if (pt->timeless_decoding)
+		pr_debug2("Intel PT decoding without timestamps\n");
+
+	return 0;
+
+err_delete_thread:
+	thread__delete(pt->unknown_thread);
+err_free_queues:
+	intel_pt_log_disable();
+	auxtrace_queues__free(&pt->queues);
+	session->auxtrace = NULL;
+err_free:
+	free(pt);
+	return err;
+}
diff --git a/tools/perf/util/intel-pt.h b/tools/perf/util/intel-pt.h
new file mode 100644
index 0000000..a1bfe93
--- /dev/null
+++ b/tools/perf/util/intel-pt.h
@@ -0,0 +1,51 @@
+/*
+ * intel_pt.h: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__PERF_INTEL_PT_H__
+#define INCLUDE__PERF_INTEL_PT_H__
+
+#define INTEL_PT_PMU_NAME "intel_pt"
+
+enum {
+	INTEL_PT_PMU_TYPE,
+	INTEL_PT_TIME_SHIFT,
+	INTEL_PT_TIME_MULT,
+	INTEL_PT_TIME_ZERO,
+	INTEL_PT_CAP_USER_TIME_ZERO,
+	INTEL_PT_TSC_BIT,
+	INTEL_PT_NORETCOMP_BIT,
+	INTEL_PT_HAVE_SCHED_SWITCH,
+	INTEL_PT_SNAPSHOT_MODE,
+	INTEL_PT_PER_CPU_MMAPS,
+	INTEL_PT_AUXTRACE_PRIV_MAX,
+};
+
+#define INTEL_PT_AUXTRACE_PRIV_SIZE (INTEL_PT_AUXTRACE_PRIV_MAX * sizeof(u64))
+
+struct auxtrace_record;
+struct perf_tool;
+union perf_event;
+struct perf_session;
+struct perf_event_attr;
+struct perf_pmu;
+
+struct auxtrace_record *intel_pt_recording_init(int *err);
+
+int intel_pt_process_auxtrace_info(union perf_event *event,
+				   struct perf_session *session);
+
+struct perf_event_attr *intel_pt_pmu_default_config(struct perf_pmu *pmu);
+
+#endif

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Take Intel PT into use
  2015-07-17 16:33 ` [PATCH V8 07/25] perf tools: Take Intel PT into use Adrian Hunter
@ 2015-08-20  9:59   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-20  9:59 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, hpa, adrian.hunter, jolsa, linux-kernel, acme, tglx

Commit-ID:  5efb1d5489520ce72232bbc28e9156f0ebddc44e
Gitweb:     http://git.kernel.org/tip/5efb1d5489520ce72232bbc28e9156f0ebddc44e
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:42 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 17 Aug 2015 11:11:37 -0300

perf tools: Take Intel PT into use

To record an AUX area, the weak function auxtrace_record__init() must be
implemented.

Equally to decode an AUX area, the AUX area tracing type must be added
to the perf_event__process_auxtrace_info() function.

This patch makes those two changes plus hooks up default config for the
intel_pt PMU.  Also some brief documentation is provided for using the
tools with intel_pt.

Commiter note:

E.g:

  [root@perf4 ~]# dmesg
  451 [0.405807] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver.
  [root@perf4 ~]# perf --version
  perf version 4.1.g53874a
  [root@perf4 ~]#  perf record -e intel_pt//u -a sleep 10
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.383 MB perf.data ]
  [root@perf4 ~]# perf evlist
  intel_pt//u
  sched:sched_switch
  dummy:u
  [root@perf4 ~]# perf report --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 0  of event 'intel_pt//u'
  # Event count (approx.): 0
  #
  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ......
  #

  # Samples: 393  of event 'sched:sched_switch'
  # Event count (approx.): 393
  #
  # Overhead  Command         Shared Object     Symbol
  # ........  ..............  ................  ..............
    49.62%  swapper         [kernel.vmlinux]  [k] __schedule
    10.69%  rcu_sched       [kernel.vmlinux]  [k] __schedule
     6.62%  rcuos/0         [kernel.vmlinux]  [k] __schedule
     5.60%  kworker/0:1     [kernel.vmlinux]  [k] __schedule
     3.56%  rcuos/3         [kernel.vmlinux]  [k] __schedule
     3.05%  kworker/u384:2  [kernel.vmlinux]  [k] __schedule
     2.54%  kworker/2:0     [kernel.vmlinux]  [k] __schedule
     2.54%  tuned           [kernel.vmlinux]  [k] __schedule
  <SNIP>
  # Samples: 0  of event 'dummy:u'
  # Event count (approx.): 0
  #
  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ......

  # Samples: 28  of event 'instructions:u'
  # Event count (approx.): 5030172
  #
  # Overhead  Command     Shared Object        Symbol
  # ........  ..........  ...................  ................................
  #
    21.43%  tuned       libpython2.7.so.1.0  [.] PyEval_EvalFrameEx
                 |
                 ---PyEval_EvalFrameEx
                    |
                    |--83.33%-- PyEval_EvalCodeEx
                    |          PyEval_EvalFrameEx
                    |          |
                    |          |--60.00%-- PyEval_EvalCodeEx
                    |          |          PyEval_EvalFrameEx
                    |          |          PyEval_EvalFrameEx
                    |          |
                    |           --40.00%-- PyEval_EvalFrameEx
                    |
                     --16.67%-- PyEval_EvalFrameEx
                               PyEval_EvalCodeEx
                               PyEval_EvalFrameEx
                               PyEval_EvalCodeEx
                               PyEval_EvalFrameEx
                               PyEval_EvalFrameEx

    14.29%  tuned       libpython2.7.so.1.0  [.] _PyType_Lookup
                 |
                 ---_PyType_Lookup
                    _PyObject_GenericGetAttrWithDict
                    PyEval_EvalFrameEx
                    PyEval_EvalCodeEx
                    PyEval_EvalFrameEx
                    PyEval_EvalCodeEx
                    PyEval_EvalFrameEx
                    |
                    |--75.00%-- PyEval_EvalFrameEx
                    |
                     --25.00%-- PyEval_EvalCodeEx
                               PyEval_EvalFrameEx
                               PyEval_EvalFrameEx

     3.57%  irqbalance  irqbalance           [.] 0x0000000000004038
            |
            ---0x4038
               0x4761
               0x4761
               0x4761
               0x49f1
               0x2295

     3.57%  irqbalance  libc-2.17.so         [.] __GI_____strtoull_l_internal
            |
            ---__GI_____strtoull_l_internal
               0x6f49
               0x229a

     3.57%  irqbalance  libc-2.17.so         [.] __strchrnul
            |
            ---__strchrnul
               vfprintf
               __vsprintf_chk
               __sprintf_chk
               0x2724
               0x4038
               0x2331

     3.57%  irqbalance  libc-2.17.so         [.] __strstr_sse42
            |
            ---__strstr_sse42
               0x71e0
               0x229f

  # And now to some userspace ftrace on uninstrumented binaries 8-) :
  # Hand edited to make it a bit more compact, replacing /home/acme/bin/perf
  # with /bin/perf:

  [root@perf4 ~]# perf script
     perf 8921 [3] 7.310889: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310889: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       481630 perf_evlist__enable (/bin/perf) => 4816d8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       4816de perf_evlist__enable (/bin/perf) => 48164f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310889: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310890: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310890: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310890: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310890: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310890: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310890: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310890: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310893: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310893: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       4816a8 perf_evlist__enable (/bin/perf) => 4815f8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       4815fe perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310893: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310956: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310956: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       481630 perf_evlist__enable (/bin/perf) => 4816d8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       4816de perf_evlist__enable (/bin/perf) => 48164f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310956: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310961: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310961: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310961: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310961: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310961: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310961: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310961: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310968: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310968: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       4816a8 perf_evlist__enable (/bin/perf) => 4815f8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       4815fe perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310968: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.311040: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311040: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       481630 perf_evlist__enable (/bin/perf) => 4816d8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       4816de perf_evlist__enable (/bin/perf) => 48164f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311040: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.311046: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311046: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311046: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311046: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311046: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.311046: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311046: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.311050: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311050: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
:

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-8-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/intel-pt.txt | 588 ++++++++++++++++++++++++++++++++++
 tools/perf/arch/x86/util/Build        |   2 +
 tools/perf/arch/x86/util/auxtrace.c   |  38 +++
 tools/perf/arch/x86/util/pmu.c        |  15 +
 tools/perf/util/auxtrace.c            |   5 +-
 tools/perf/util/pmu.c                 |   4 +-
 6 files changed, 649 insertions(+), 3 deletions(-)

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
new file mode 100644
index 0000000..2866b62
--- /dev/null
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -0,0 +1,588 @@
+Intel Processor Trace
+=====================
+
+Overview
+========
+
+Intel Processor Trace (Intel PT) is an extension of Intel Architecture that
+collects information about software execution such as control flow, execution
+modes and timings and formats it into highly compressed binary packets.
+Technical details are documented in the Intel 64 and IA-32 Architectures
+Software Developer Manuals, Chapter 36 Intel Processor Trace.
+
+Intel PT is first supported in Intel Core M and 5th generation Intel Core
+processors that are based on the Intel micro-architecture code name Broadwell.
+
+Trace data is collected by 'perf record' and stored within the perf.data file.
+See below for options to 'perf record'.
+
+Trace data must be 'decoded' which involves walking the object code and matching
+the trace data packets. For example a TNT packet only tells whether a
+conditional branch was taken or not taken, so to make use of that packet the
+decoder must know precisely which instruction was being executed.
+
+Decoding is done on-the-fly.  The decoder outputs samples in the same format as
+samples output by perf hardware events, for example as though the "instructions"
+or "branches" events had been recorded.  Presently 3 tools support this:
+'perf script', 'perf report' and 'perf inject'.  See below for more information
+on using those tools.
+
+The main distinguishing feature of Intel PT is that the decoder can determine
+the exact flow of software execution.  Intel PT can be used to understand why
+and how did software get to a certain point, or behave a certain way.  The
+software does not have to be recompiled, so Intel PT works with debug or release
+builds, however the executed images are needed - which makes use in JIT-compiled
+environments, or with self-modified code, a challenge.  Also symbols need to be
+provided to make sense of addresses.
+
+A limitation of Intel PT is that it produces huge amounts of trace data
+(hundreds of megabytes per second per core) which takes a long time to decode,
+for example two or three orders of magnitude longer than it took to collect.
+Another limitation is the performance impact of tracing, something that will
+vary depending on the use-case and architecture.
+
+
+Quickstart
+==========
+
+It is important to start small.  That is because it is easy to capture vastly
+more data than can possibly be processed.
+
+The simplest thing to do with Intel PT is userspace profiling of small programs.
+Data is captured with 'perf record' e.g. to trace 'ls' userspace-only:
+
+	perf record -e intel_pt//u ls
+
+And profiled with 'perf report' e.g.
+
+	perf report
+
+To also trace kernel space presents a problem, namely kernel self-modifying
+code.  A fairly good kernel image is available in /proc/kcore but to get an
+accurate image a copy of /proc/kcore needs to be made under the same conditions
+as the data capture.  A script perf-with-kcore can do that, but beware that the
+script makes use of 'sudo' to copy /proc/kcore.  If you have perf installed
+locally from the source tree you can do:
+
+	~/libexec/perf-core/perf-with-kcore record pt_ls -e intel_pt// -- ls
+
+which will create a directory named 'pt_ls' and put the perf.data file and
+copies of /proc/kcore, /proc/kallsyms and /proc/modules into it.  Then to use
+'perf report' becomes:
+
+	~/libexec/perf-core/perf-with-kcore report pt_ls
+
+Because samples are synthesized after-the-fact, the sampling period can be
+selected for reporting. e.g. sample every microsecond
+
+	~/libexec/perf-core/perf-with-kcore report pt_ls --itrace=i1usge
+
+See the sections below for more information about the --itrace option.
+
+Beware the smaller the period, the more samples that are produced, and the
+longer it takes to process them.
+
+Also note that the coarseness of Intel PT timing information will start to
+distort the statistical value of the sampling as the sampling period becomes
+smaller.
+
+To represent software control flow, "branches" samples are produced.  By default
+a branch sample is synthesized for every single branch.  To get an idea what
+data is available you can use the 'perf script' tool with no parameters, which
+will list all the samples.
+
+	perf record -e intel_pt//u ls
+	perf script
+
+An interesting field that is not printed by default is 'flags' which can be
+displayed as follows:
+
+	perf script -Fcomm,tid,pid,time,cpu,event,trace,ip,sym,dso,addr,symoff,flags
+
+The flags are "bcrosyiABEx" which stand for branch, call, return, conditional,
+system, asynchronous, interrupt, transaction abort, trace begin, trace end, and
+in transaction, respectively.
+
+While it is possible to create scripts to analyze the data, an alternative
+approach is available to export the data to a postgresql database.  Refer to
+script export-to-postgresql.py for more details, and to script
+call-graph-from-postgresql.py for an example of using the database.
+
+As mentioned above, it is easy to capture too much data.  One way to limit the
+data captured is to use 'snapshot' mode which is explained further below.
+Refer to 'new snapshot option' and 'Intel PT modes of operation' further below.
+
+Another problem that will be experienced is decoder errors.  They can be caused
+by inability to access the executed image, self-modified or JIT-ed code, or the
+inability to match side-band information (such as context switches and mmaps)
+which results in the decoder not knowing what code was executed.
+
+There is also the problem of perf not being able to copy the data fast enough,
+resulting in data lost because the buffer was full.  See 'Buffer handling' below
+for more details.
+
+
+perf record
+===========
+
+new event
+---------
+
+The Intel PT kernel driver creates a new PMU for Intel PT.  PMU events are
+selected by providing the PMU name followed by the "config" separated by slashes.
+An enhancement has been made to allow default "config" e.g. the option
+
+	-e intel_pt//
+
+will use a default config value.  Currently that is the same as
+
+	-e intel_pt/tsc,noretcomp=0/
+
+which is the same as
+
+	-e intel_pt/tsc=1,noretcomp=0/
+
+The config terms are listed in /sys/devices/intel_pt/format.  They are bit
+fields within the config member of the struct perf_event_attr which is
+passed to the kernel by the perf_event_open system call.  They correspond to bit
+fields in the IA32_RTIT_CTL MSR.  Here is a list of them and their definitions:
+
+	$ for f in `ls /sys/devices/intel_pt/format`;do
+	> echo $f
+	> cat /sys/devices/intel_pt/format/$f
+	> done
+	noretcomp
+	config:11
+	tsc
+	config:10
+
+Note that the default config must be overridden for each term i.e.
+
+	-e intel_pt/noretcomp=0/
+
+is the same as:
+
+	-e intel_pt/tsc=1,noretcomp=0/
+
+So, to disable TSC packets use:
+
+	-e intel_pt/tsc=0/
+
+It is also possible to specify the config value explicitly:
+
+	-e intel_pt/config=0x400/
+
+Note that, as with all events, the event is suffixed with event modifiers:
+
+	u	userspace
+	k	kernel
+	h	hypervisor
+	G	guest
+	H	host
+	p	precise ip
+
+'h', 'G' and 'H' are for virtualization which is not supported by Intel PT.
+'p' is also not relevant to Intel PT.  So only options 'u' and 'k' are
+meaningful for Intel PT.
+
+perf_event_attr is displayed if the -vv option is used e.g.
+
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             6
+	size                             112
+	config                           0x400
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|CPU|IDENTIFIER
+	read_format                      ID
+	disabled                         1
+	inherit                          1
+	exclude_kernel                   1
+	exclude_hv                       1
+	enable_on_exec                   1
+	sample_id_all                    1
+	------------------------------------------------------------
+	sys_perf_event_open: pid 31104  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 3  group_fd -1  flags 0x8
+	------------------------------------------------------------
+
+
+new snapshot option
+-------------------
+
+To select snapshot mode a new option has been added:
+
+	-S
+
+Optionally it can be followed by the snapshot size e.g.
+
+	-S0x100000
+
+The default snapshot size is the auxtrace mmap size.  If neither auxtrace mmap size
+nor snapshot size is specified, then the default is 4MiB for privileged users
+(or if /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for unprivileged users.
+If an unprivileged user does not specify mmap pages, the mmap pages will be
+reduced as described in the 'new auxtrace mmap size option' section below.
+
+The snapshot size is displayed if the option -vv is used e.g.
+
+	Intel PT snapshot size: %zu
+
+
+new auxtrace mmap size option
+---------------------------
+
+Intel PT buffer size is specified by an addition to the -m option e.g.
+
+	-m,16
+
+selects a buffer size of 16 pages i.e. 64KiB.
+
+Note that the existing functionality of -m is unchanged.  The auxtrace mmap size
+is specified by the optional addition of a comma and the value.
+
+The default auxtrace mmap size for Intel PT is 4MiB/page_size for privileged users
+(or if /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for unprivileged users.
+If an unprivileged user does not specify mmap pages, the mmap pages will be
+reduced from the default 512KiB/page_size to 256KiB/page_size, otherwise the
+user is likely to get an error as they exceed their mlock limit (Max locked
+memory as shown in /proc/self/limits).  Note that perf does not count the first
+512KiB (actually /proc/sys/kernel/perf_event_mlock_kb minus 1 page) per cpu
+against the mlock limit so an unprivileged user is allowed 512KiB per cpu plus
+their mlock limit (which defaults to 64KiB but is not multiplied by the number
+of cpus).
+
+In full-trace mode, powers of two are allowed for buffer size, with a minimum
+size of 2 pages.  In snapshot mode, it is the same but the minimum size is
+1 page.
+
+The mmap size and auxtrace mmap size are displayed if the -vv option is used e.g.
+
+	mmap length 528384
+	auxtrace mmap length 4198400
+
+
+Intel PT modes of operation
+---------------------------
+
+Intel PT can be used in 2 modes:
+	full-trace mode
+	snapshot mode
+
+Full-trace mode traces continuously e.g.
+
+	perf record -e intel_pt//u uname
+
+Snapshot mode captures the available data when a signal is sent e.g.
+
+	perf record -v -e intel_pt//u -S ./loopy 1000000000 &
+	[1] 11435
+	kill -USR2 11435
+	Recording AUX area tracing snapshot
+
+Note that the signal sent is SIGUSR2.
+Note that "Recording AUX area tracing snapshot" is displayed because the -v
+option is used.
+
+The 2 modes cannot be used together.
+
+
+Buffer handling
+---------------
+
+There may be buffer limitations (i.e. single ToPa entry) which means that actual
+buffer sizes are limited to powers of 2 up to 4MiB (MAX_ORDER).  In order to
+provide other sizes, and in particular an arbitrarily large size, multiple
+buffers are logically concatenated.  However an interrupt must be used to switch
+between buffers.  That has two potential problems:
+	a) the interrupt may not be handled in time so that the current buffer
+	becomes full and some trace data is lost.
+	b) the interrupts may slow the system and affect the performance
+	results.
+
+If trace data is lost, the driver sets 'truncated' in the PERF_RECORD_AUX event
+which the tools report as an error.
+
+In full-trace mode, the driver waits for data to be copied out before allowing
+the (logical) buffer to wrap-around.  If data is not copied out quickly enough,
+again 'truncated' is set in the PERF_RECORD_AUX event.  If the driver has to
+wait, the intel_pt event gets disabled.  Because it is difficult to know when
+that happens, perf tools always re-enable the intel_pt event after copying out
+data.
+
+
+Intel PT and build ids
+----------------------
+
+By default "perf record" post-processes the event stream to find all build ids
+for executables for all addresses sampled.  Deliberately, Intel PT is not
+decoded for that purpose (it would take too long).  Instead the build ids for
+all executables encountered (due to mmap, comm or task events) are included
+in the perf.data file.
+
+To see buildids included in the perf.data file use the command:
+
+	perf buildid-list
+
+If the perf.data file contains Intel PT data, that is the same as:
+
+	perf buildid-list --with-hits
+
+
+Snapshot mode and event disabling
+---------------------------------
+
+In order to make a snapshot, the intel_pt event is disabled using an IOCTL,
+namely PERF_EVENT_IOC_DISABLE.  However doing that can also disable the
+collection of side-band information.  In order to prevent that,  a dummy
+software event has been introduced that permits tracking events (like mmaps) to
+continue to be recorded while intel_pt is disabled.  That is important to ensure
+there is complete side-band information to allow the decoding of subsequent
+snapshots.
+
+A test has been created for that.  To find the test:
+
+	perf test list
+	...
+	23: Test using a dummy software event to keep tracking
+
+To run the test:
+
+	perf test 23
+	23: Test using a dummy software event to keep tracking     : Ok
+
+
+perf record modes (nothing new here)
+------------------------------------
+
+perf record essentially operates in one of three modes:
+	per thread
+	per cpu
+	workload only
+
+"per thread" mode is selected by -t or by --per-thread (with -p or -u or just a
+workload).
+"per cpu" is selected by -C or -a.
+"workload only" mode is selected by not using the other options but providing a
+command to run (i.e. the workload).
+
+In per-thread mode an exact list of threads is traced.  There is no inheritance.
+Each thread has its own event buffer.
+
+In per-cpu mode all processes (or processes from the selected cgroup i.e. -G
+option, or processes selected with -p or -u) are traced.  Each cpu has its own
+buffer. Inheritance is allowed.
+
+In workload-only mode, the workload is traced but with per-cpu buffers.
+Inheritance is allowed.  Note that you can now trace a workload in per-thread
+mode by using the --per-thread option.
+
+
+Privileged vs non-privileged users
+----------------------------------
+
+Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users
+have memory limits imposed upon them.  That affects what buffer sizes they can
+have as outlined above.
+
+Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users are
+not permitted to use tracepoints which means there is insufficient side-band
+information to decode Intel PT in per-cpu mode, and potentially workload-only
+mode too if the workload creates new processes.
+
+Note also, that to use tracepoints, read-access to debugfs is required.  So if
+debugfs is not mounted or the user does not have read-access, it will again not
+be possible to decode Intel PT in per-cpu mode.
+
+
+sched_switch tracepoint
+-----------------------
+
+The sched_switch tracepoint is used to provide side-band data for Intel PT
+decoding.  sched_switch events are automatically added. e.g. the second event
+shown below
+
+	$ perf record -vv -e intel_pt//u uname
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             6
+	size                             112
+	config                           0x400
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|CPU|IDENTIFIER
+	read_format                      ID
+	disabled                         1
+	inherit                          1
+	exclude_kernel                   1
+	exclude_hv                       1
+	enable_on_exec                   1
+	sample_id_all                    1
+	------------------------------------------------------------
+	sys_perf_event_open: pid 31104  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 3  group_fd -1  flags 0x8
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             2
+	size                             112
+	config                           0x108
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
+	read_format                      ID
+	inherit                          1
+	sample_id_all                    1
+	exclude_guest                    1
+	------------------------------------------------------------
+	sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             1
+	size                             112
+	config                           0x9
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|IDENTIFIER
+	read_format                      ID
+	disabled                         1
+	inherit                          1
+	exclude_kernel                   1
+	exclude_hv                       1
+	mmap                             1
+	comm                             1
+	enable_on_exec                   1
+	task                             1
+	sample_id_all                    1
+	mmap2                            1
+	comm_exec                        1
+	------------------------------------------------------------
+	sys_perf_event_open: pid 31104  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 3  group_fd -1  flags 0x8
+	mmap size 528384B
+	AUX area mmap length 4194304
+	perf event ring buffer mmapped per cpu
+	Synthesizing auxtrace information
+	Linux
+	[ perf record: Woken up 1 times to write data ]
+	[ perf record: Captured and wrote 0.042 MB perf.data ]
+
+Note, the sched_switch event is only added if the user is permitted to use it
+and only in per-cpu mode.
+
+Note also, the sched_switch event is only added if TSC packets are requested.
+That is because, in the absence of timing information, the sched_switch events
+cannot be matched against the Intel PT trace.
+
+
+perf script
+===========
+
+By default, perf script will decode trace data found in the perf.data file.
+This can be further controlled by new option --itrace.
+
+
+New --itrace option
+-------------------
+
+Having no option is the same as
+
+	--itrace
+
+which, in turn, is the same as
+
+	--itrace=ibxe
+
+The letters are:
+
+	i	synthesize "instructions" events
+	b	synthesize "branches" events
+	x	synthesize "transactions" events
+	c	synthesize branches events (calls only)
+	r	synthesize branches events (returns only)
+	e	synthesize tracing error events
+	d	create a debug log
+	g	synthesize a call chain (use with i or x)
+
+"Instructions" events look like they were recorded by "perf record -e
+instructions".
+
+"Branches" events look like they were recorded by "perf record -e branches". "c"
+and "r" can be combined to get calls and returns.
+
+"Transactions" events correspond to the start or end of transactions. The
+'flags' field can be used in perf script to determine whether the event is a
+tranasaction start, commit or abort.
+
+Error events are new.  They show where the decoder lost the trace.  Error events
+are quite important.  Users must know if what they are seeing is a complete
+picture or not.
+
+The "d" option will cause the creation of a file "intel_pt.log" containing all
+decoded packets and instructions.  Note that this option slows down the decoder
+and that the resulting file may be very large.
+
+In addition, the period of the "instructions" event can be specified. e.g.
+
+	--itrace=i10us
+
+sets the period to 10us i.e. one  instruction sample is synthesized for each 10
+microseconds of trace.  Alternatives to "us" are "ms" (milliseconds),
+"ns" (nanoseconds), "t" (TSC ticks) or "i" (instructions).
+
+"ms", "us" and "ns" are converted to TSC ticks.
+
+The timing information included with Intel PT does not give the time of every
+instruction.  Consequently, for the purpose of sampling, the decoder estimates
+the time since the last timing packet based on 1 tick per instruction.  The time
+on the sample is *not* adjusted and reflects the last known value of TSC.
+
+For Intel PT, the default period is 100us.
+
+Also the call chain size (default 16, max. 1024) for instructions or
+transactions events can be specified. e.g.
+
+	--itrace=ig32
+	--itrace=xg32
+
+To disable trace decoding entirely, use the option --no-itrace.
+
+
+dump option
+-----------
+
+perf script has an option (-D) to "dump" the events i.e. display the binary
+data.
+
+When -D is used, Intel PT packets are displayed.  The packet decoder does not
+pay attention to PSB packets, but just decodes the bytes - so the packets seen
+by the actual decoder may not be identical in places where the data is corrupt.
+One example of that would be when the buffer-switching interrupt has been too
+slow, and the buffer has been filled completely.  In that case, the last packet
+in the buffer might be truncated and immediately followed by a PSB as the trace
+continues in the next buffer.
+
+To disable the display of Intel PT packets, combine the -D option with
+--no-itrace.
+
+
+perf report
+===========
+
+By default, perf report will decode trace data found in the perf.data file.
+This can be further controlled by new option --itrace exactly the same as
+perf script, with the exception that the default is --itrace=igxe.
+
+
+perf inject
+===========
+
+perf inject also accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+	perf inject --itrace -i perf.data -o perf.data.new
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 1396088..a8be9f9 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -1,5 +1,6 @@
 libperf-y += header.o
 libperf-y += tsc.o
+libperf-y += pmu.o
 libperf-y += kvm-stat.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
@@ -7,4 +8,5 @@ libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_LIBUNWIND)          += unwind-libunwind.o
 libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 
+libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
new file mode 100644
index 0000000..e7654b5
--- /dev/null
+++ b/tools/perf/arch/x86/util/auxtrace.c
@@ -0,0 +1,38 @@
+/*
+ * auxtrace.c: AUX area tracing support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include "../../util/header.h"
+#include "../../util/auxtrace.h"
+#include "../../util/intel-pt.h"
+
+struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe_unused,
+					      int *err)
+{
+	char buffer[64];
+	int ret;
+
+	*err = 0;
+
+	ret = get_cpuid(buffer, sizeof(buffer));
+	if (ret) {
+		*err = ret;
+		return NULL;
+	}
+
+	if (!strncmp(buffer, "GenuineIntel,", 13))
+		return intel_pt_recording_init(err);
+
+	return NULL;
+}
diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
new file mode 100644
index 0000000..fd11cc3
--- /dev/null
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -0,0 +1,15 @@
+#include <string.h>
+
+#include <linux/perf_event.h>
+
+#include "../../util/intel-pt.h"
+#include "../../util/pmu.h"
+
+struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
+{
+#ifdef HAVE_AUXTRACE_SUPPORT
+	if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
+		return intel_pt_pmu_default_config(pmu);
+#endif
+	return NULL;
+}
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 49dbfbe..0f0b7e1 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -47,6 +47,8 @@
 #include "debug.h"
 #include "parse-options.h"
 
+#include "intel-pt.h"
+
 int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
 			struct auxtrace_mmap_params *mp,
 			void *userpg, int fd)
@@ -876,7 +878,7 @@ static bool auxtrace__dont_decode(struct perf_session *session)
 
 int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 				      union perf_event *event,
-				      struct perf_session *session __maybe_unused)
+				      struct perf_session *session)
 {
 	enum auxtrace_type type = event->auxtrace_info.type;
 
@@ -885,6 +887,7 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 
 	switch (type) {
 	case PERF_AUXTRACE_INTEL_PT:
+		return intel_pt_process_auxtrace_info(event, session);
 	case PERF_AUXTRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 84cad05..3c71138 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -462,8 +462,8 @@ static struct perf_pmu *pmu_lookup(const char *name)
 	LIST_HEAD(aliases);
 	__u32 type;
 
-	/* No support for intel_bts or intel_pt so disallow them */
-	if (!strcmp(name, "intel_bts") || !strcmp(name, "intel_pt"))
+	/* No support for intel_bts so disallow it */
+	if (!strcmp(name, "intel_bts"))
 		return NULL;
 
 	/*

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 08/25] perf tools: Add Intel BTS support
  2015-08-20  8:53                 ` Adrian Hunter
@ 2015-08-21 14:18                   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-21 14:18 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Thu, Aug 20, 2015 at 11:53:45AM +0300, Adrian Hunter escreveu:
> On 18/08/15 19:10, Arnaldo Carvalho de Melo wrote:
> > Works for you if you test in lockstep the tooling with the kernel
> > sources, right? It works for me as well, if I use the latest tip/master
> > tooling _and_ kernel, my report was for using the 4.2.0-rc latest for
> > the kernel and the tooling in tip/master.
> > 
> > I think this would be just a matter of having a better message, or
> > perhaps some fix that we need to send to the stable@kernel.org guys, so
> > that the new tooling works with a slightly older kernel.
> > 
> > Anyway, consider looking at that perf.data file that I fixed the
> > permissions on vger, and I will look at the fix you mentioned in another
> > message.
> 
> With your perf.data file I was able to find the problem.  I have sent patch
> "perf tools: Fix Intel BTS/PT timestamp handling" to fix it.

Great, I just tested, without the patch it works in a machine with a new
kernel, with an older kernel, where it was failing, after I apply this
patch it works.

Thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 10/25] perf tools: Add example call-graph script
  2015-07-17 16:33 ` [PATCH V8 10/25] perf tools: Add example call-graph script Adrian Hunter
@ 2015-08-21 15:00   ` Arnaldo Carvalho de Melo
  2015-08-21 15:11     ` Arnaldo Carvalho de Melo
  2015-08-22  6:53   ` [tip:perf/core] " tip-bot for Adrian Hunter
  1 sibling, 1 reply; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-21 15:00 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Jul 17, 2015 at 07:33:45PM +0300, Adrian Hunter escreveu:
> Add a script to produce a call-graph from data exported
> to a postgresql database and derived from a processor trace
> event like intel_pt or intel_bts. Refer to comments in the
> scripts call-graph-from-postgresql.py and export-to-postgresql.py
> for more details.

Testing this, you forgot to add instructions to install the qt stuff:

  [acme@felicio ~]$ perf script -s
  ~/libexec/perf-core/scripts/python/export-to-postgresql.py bts_example
  branches calls

  Traceback (most recent call last):
    File
  "/home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py",
  line 65, in <module>
      from PySide.QtSql import *
  ImportError: No module named PySide.QtSql
  Error running python script
  /home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py
  [acme@felicio ~]$

I'll add it.

- Arnaldo
 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  .../scripts/python/call-graph-from-postgresql.py   | 327 +++++++++++++++++++++
>  tools/perf/scripts/python/export-to-postgresql.py  |  47 +++
>  2 files changed, 374 insertions(+)
>  create mode 100644 tools/perf/scripts/python/call-graph-from-postgresql.py
> 
> diff --git a/tools/perf/scripts/python/call-graph-from-postgresql.py b/tools/perf/scripts/python/call-graph-from-postgresql.py
> new file mode 100644
> index 000000000000..e78fdc2a5a9d
> --- /dev/null
> +++ b/tools/perf/scripts/python/call-graph-from-postgresql.py
> @@ -0,0 +1,327 @@
> +#!/usr/bin/python2
> +# call-graph-from-postgresql.py: create call-graph from postgresql database
> +# Copyright (c) 2014, Intel Corporation.
> +#
> +# This program is free software; you can redistribute it and/or modify it
> +# under the terms and conditions of the GNU General Public License,
> +# version 2, as published by the Free Software Foundation.
> +#
> +# This program is distributed in the hope it will be useful, but WITHOUT
> +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> +# more details.
> +
> +# To use this script you will need to have exported data using the
> +# export-to-postgresql.py script.  Refer to that script for details.
> +#
> +# Following on from the example in the export-to-postgresql.py script, a
> +# call-graph can be displayed for the pt_example database like this:
> +#
> +#	python tools/perf/scripts/python/call-graph-from-postgresql.py pt_example
> +#
> +# Note this script supports connecting to remote databases by setting hostname,
> +# port, username, password, and dbname e.g.
> +#
> +#	python tools/perf/scripts/python/call-graph-from-postgresql.py "hostname=myhost username=myuser password=mypassword dbname=pt_example"
> +#
> +# The result is a GUI window with a tree representing a context-sensitive
> +# call-graph.  Expanding a couple of levels of the tree and adjusting column
> +# widths to suit will display something like:
> +#
> +#                                         Call Graph: pt_example
> +# Call Path                          Object      Count   Time(ns)  Time(%)  Branch Count   Branch Count(%)
> +# v- ls
> +#     v- 2638:2638
> +#         v- _start                  ld-2.19.so    1     10074071   100.0         211135            100.0
> +#           |- unknown               unknown       1        13198     0.1              1              0.0
> +#           >- _dl_start             ld-2.19.so    1      1400980    13.9          19637              9.3
> +#           >- _d_linit_internal     ld-2.19.so    1       448152     4.4          11094              5.3
> +#           v-__libc_start_main@plt  ls            1      8211741    81.5         180397             85.4
> +#              >- _dl_fixup          ld-2.19.so    1         7607     0.1            108              0.1
> +#              >- __cxa_atexit       libc-2.19.so  1        11737     0.1             10              0.0
> +#              >- __libc_csu_init    ls            1        10354     0.1             10              0.0
> +#              |- _setjmp            libc-2.19.so  1            0     0.0              4              0.0
> +#              v- main               ls            1      8182043    99.6         180254             99.9
> +#
> +# Points to note:
> +#	The top level is a command name (comm)
> +#	The next level is a thread (pid:tid)
> +#	Subsequent levels are functions
> +#	'Count' is the number of calls
> +#	'Time' is the elapsed time until the function returns
> +#	Percentages are relative to the level above
> +#	'Branch Count' is the total number of branches for that function and all
> +#       functions that it calls
> +
> +import sys
> +from PySide.QtCore import *
> +from PySide.QtGui import *
> +from PySide.QtSql import *
> +from decimal import *
> +
> +class TreeItem():
> +
> +	def __init__(self, db, row, parent_item):
> +		self.db = db
> +		self.row = row
> +		self.parent_item = parent_item
> +		self.query_done = False;
> +		self.child_count = 0
> +		self.child_items = []
> +		self.data = ["", "", "", "", "", "", ""]
> +		self.comm_id = 0
> +		self.thread_id = 0
> +		self.call_path_id = 1
> +		self.branch_count = 0
> +		self.time = 0
> +		if not parent_item:
> +			self.setUpRoot()
> +
> +	def setUpRoot(self):
> +		self.query_done = True
> +		query = QSqlQuery(self.db)
> +		ret = query.exec_('SELECT id, comm FROM comms')
> +		if not ret:
> +			raise Exception("Query failed: " + query.lastError().text())
> +		while query.next():
> +			if not query.value(0):
> +				continue
> +			child_item = TreeItem(self.db, self.child_count, self)
> +			self.child_items.append(child_item)
> +			self.child_count += 1
> +			child_item.setUpLevel1(query.value(0), query.value(1))
> +
> +	def setUpLevel1(self, comm_id, comm):
> +		self.query_done = True;
> +		self.comm_id = comm_id
> +		self.data[0] = comm
> +		self.child_items = []
> +		self.child_count = 0
> +		query = QSqlQuery(self.db)
> +		ret = query.exec_('SELECT thread_id, ( SELECT pid FROM threads WHERE id = thread_id ), ( SELECT tid FROM threads WHERE id = thread_id ) FROM comm_threads WHERE comm_id = ' + str(comm_id))
> +		if not ret:
> +			raise Exception("Query failed: " + query.lastError().text())
> +		while query.next():
> +			child_item = TreeItem(self.db, self.child_count, self)
> +			self.child_items.append(child_item)
> +			self.child_count += 1
> +			child_item.setUpLevel2(comm_id, query.value(0), query.value(1), query.value(2))
> +
> +	def setUpLevel2(self, comm_id, thread_id, pid, tid):
> +		self.comm_id = comm_id
> +		self.thread_id = thread_id
> +		self.data[0] = str(pid) + ":" + str(tid)
> +
> +	def getChildItem(self, row):
> +		return self.child_items[row]
> +
> +	def getParentItem(self):
> +		return self.parent_item
> +
> +	def getRow(self):
> +		return self.row
> +
> +	def timePercent(self, b):
> +		if not self.time:
> +			return "0.0"
> +		x = (b * Decimal(100)) / self.time
> +		return str(x.quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
> +
> +	def branchPercent(self, b):
> +		if not self.branch_count:
> +			return "0.0"
> +		x = (b * Decimal(100)) / self.branch_count
> +		return str(x.quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
> +
> +	def addChild(self, call_path_id, name, dso, count, time, branch_count):
> +		child_item = TreeItem(self.db, self.child_count, self)
> +		child_item.comm_id = self.comm_id
> +		child_item.thread_id = self.thread_id
> +		child_item.call_path_id = call_path_id
> +		child_item.branch_count = branch_count
> +		child_item.time = time
> +		child_item.data[0] = name
> +		if dso == "[kernel.kallsyms]":
> +			dso = "[kernel]"
> +		child_item.data[1] = dso
> +		child_item.data[2] = str(count)
> +		child_item.data[3] = str(time)
> +		child_item.data[4] = self.timePercent(time)
> +		child_item.data[5] = str(branch_count)
> +		child_item.data[6] = self.branchPercent(branch_count)
> +		self.child_items.append(child_item)
> +		self.child_count += 1
> +
> +	def selectCalls(self):
> +		self.query_done = True;
> +		query = QSqlQuery(self.db)
> +		ret = query.exec_('SELECT id, call_path_id, branch_count, call_time, return_time, '
> +				  '( SELECT name FROM symbols WHERE id = ( SELECT symbol_id FROM call_paths WHERE id = call_path_id ) ), '
> +				  '( SELECT short_name FROM dsos WHERE id = ( SELECT dso_id FROM symbols WHERE id = ( SELECT symbol_id FROM call_paths WHERE id = call_path_id ) ) ), '
> +				  '( SELECT ip FROM call_paths where id = call_path_id ) '
> +				  'FROM calls WHERE parent_call_path_id = ' + str(self.call_path_id) + ' AND comm_id = ' + str(self.comm_id) + ' AND thread_id = ' + str(self.thread_id) +
> +				  'ORDER BY call_path_id')
> +		if not ret:
> +			raise Exception("Query failed: " + query.lastError().text())
> +		last_call_path_id = 0
> +		name = ""
> +		dso = ""
> +		count = 0
> +		branch_count = 0
> +		total_branch_count = 0
> +		time = 0
> +		total_time = 0
> +		while query.next():
> +			if query.value(1) == last_call_path_id:
> +				count += 1
> +				branch_count += query.value(2)
> +				time += query.value(4) - query.value(3)
> +			else:
> +				if count:
> +					self.addChild(last_call_path_id, name, dso, count, time, branch_count)
> +				last_call_path_id = query.value(1)
> +				name = query.value(5)
> +				dso = query.value(6)
> +				count = 1
> +				total_branch_count += branch_count
> +				total_time += time
> +				branch_count = query.value(2)
> +				time = query.value(4) - query.value(3)
> +		if count:
> +			self.addChild(last_call_path_id, name, dso, count, time, branch_count)
> +		total_branch_count += branch_count
> +		total_time += time
> +		# Top level does not have time or branch count, so fix that here
> +		if total_branch_count > self.branch_count:
> +			self.branch_count = total_branch_count
> +			if self.branch_count:
> +				for child_item in self.child_items:
> +					child_item.data[6] = self.branchPercent(child_item.branch_count)
> +		if total_time > self.time:
> +			self.time = total_time
> +			if self.time:
> +				for child_item in self.child_items:
> +					child_item.data[4] = self.timePercent(child_item.time)
> +
> +	def childCount(self):
> +		if not self.query_done:
> +			self.selectCalls()
> +		return self.child_count
> +
> +	def columnCount(self):
> +		return 7
> +
> +	def columnHeader(self, column):
> +		headers = ["Call Path", "Object", "Count ", "Time (ns) ", "Time (%) ", "Branch Count ", "Branch Count (%) "]
> +		return headers[column]
> +
> +	def getData(self, column):
> +		return self.data[column]
> +
> +class TreeModel(QAbstractItemModel):
> +
> +	def __init__(self, db, parent=None):
> +		super(TreeModel, self).__init__(parent)
> +		self.db = db
> +		self.root = TreeItem(db, 0, None)
> +
> +	def columnCount(self, parent):
> +		return self.root.columnCount()
> +
> +	def rowCount(self, parent):
> +		if parent.isValid():
> +			parent_item = parent.internalPointer()
> +		else:
> +			parent_item = self.root
> +		return parent_item.childCount()
> +
> +	def headerData(self, section, orientation, role):
> +		if role == Qt.TextAlignmentRole:
> +			if section > 1:
> +				return Qt.AlignRight
> +		if role != Qt.DisplayRole:
> +			return None
> +		if orientation != Qt.Horizontal:
> +			return None
> +		return self.root.columnHeader(section)
> +
> +	def parent(self, child):
> +		child_item = child.internalPointer()
> +		if child_item is self.root:
> +			return QModelIndex()
> +		parent_item = child_item.getParentItem()
> +		return self.createIndex(parent_item.getRow(), 0, parent_item)
> +
> +	def index(self, row, column, parent):
> +		if parent.isValid():
> +			parent_item = parent.internalPointer()
> +		else:
> +			parent_item = self.root
> +		child_item = parent_item.getChildItem(row)
> +		return self.createIndex(row, column, child_item)
> +
> +	def data(self, index, role):
> +		if role == Qt.TextAlignmentRole:
> +			if index.column() > 1:
> +				return Qt.AlignRight
> +		if role != Qt.DisplayRole:
> +			return None
> +		index_item = index.internalPointer()
> +		return index_item.getData(index.column())
> +
> +class MainWindow(QMainWindow):
> +
> +	def __init__(self, db, dbname, parent=None):
> +		super(MainWindow, self).__init__(parent)
> +
> +		self.setObjectName("MainWindow")
> +		self.setWindowTitle("Call Graph: " + dbname)
> +		self.move(100, 100)
> +		self.resize(800, 600)
> +		style = self.style()
> +		icon = style.standardIcon(QStyle.SP_MessageBoxInformation)
> +		self.setWindowIcon(icon);
> +
> +		self.model = TreeModel(db)
> +
> +		self.view = QTreeView()
> +		self.view.setModel(self.model)
> +
> +		self.setCentralWidget(self.view)
> +
> +if __name__ == '__main__':
> +	if (len(sys.argv) < 2):
> +		print >> sys.stderr, "Usage is: call-graph-from-postgresql.py <database name>"
> +		raise Exception("Too few arguments")
> +
> +	dbname = sys.argv[1]
> +
> +	db = QSqlDatabase.addDatabase('QPSQL')
> +
> +	opts = dbname.split()
> +	for opt in opts:
> +		if '=' in opt:
> +			opt = opt.split('=')
> +			if opt[0] == 'hostname':
> +				db.setHostName(opt[1])
> +			elif opt[0] == 'port':
> +				db.setPort(int(opt[1]))
> +			elif opt[0] == 'username':
> +				db.setUserName(opt[1])
> +			elif opt[0] == 'password':
> +				db.setPassword(opt[1])
> +			elif opt[0] == 'dbname':
> +				dbname = opt[1]
> +		else:
> +			dbname = opt
> +
> +	db.setDatabaseName(dbname)
> +	if not db.open():
> +		raise Exception("Failed to open database " + dbname + " error: " + db.lastError().text())
> +
> +	app = QApplication(sys.argv)
> +	window = MainWindow(db, dbname)
> +	window.show()
> +	err = app.exec_()
> +	db.close()
> +	sys.exit(err)
> diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py
> index 4cdafd880074..5e939eadacd2 100644
> --- a/tools/perf/scripts/python/export-to-postgresql.py
> +++ b/tools/perf/scripts/python/export-to-postgresql.py
> @@ -15,6 +15,53 @@ import sys
>  import struct
>  import datetime
>  
> +# To use this script you will need to have installed package python-pyside which
> +# provides LGPL-licensed Python bindings for Qt.  You will also need the package
> +# libqt4-sql-psql for Qt postgresql support.
> +#
> +# The script assumes postgresql is running on the local machine and that the
> +# user has postgresql permissions to create databases. Examples of installing
> +# postgresql and adding such a user are:
> +#
> +# fedora:
> +#
> +#	$ sudo yum install postgresql postgresql-server
> +#	$ sudo su - postgres -c initdb
> +#	$ sudo service postgresql start
> +#	$ sudo su - postgres
> +#	$ createuser <your user id here>
> +#	Shall the new role be a superuser? (y/n) y
> +#
> +# ubuntu:
> +#
> +#	$ sudo apt-get install postgresql
> +#	$ sudo su - postgres
> +#	$ createuser <your user id here>
> +#	Shall the new role be a superuser? (y/n) y
> +#
> +# An example of using this script with Intel PT:
> +#
> +#	$ perf record -e intel_pt//u ls
> +#	$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py pt_example branches calls
> +#	2015-05-29 12:49:23.464364 Creating database...
> +#	2015-05-29 12:49:26.281717 Writing to intermediate files...
> +#	2015-05-29 12:49:27.190383 Copying to database...
> +#	2015-05-29 12:49:28.140451 Removing intermediate files...
> +#	2015-05-29 12:49:28.147451 Adding primary keys
> +#	2015-05-29 12:49:28.655683 Adding foreign keys
> +#	2015-05-29 12:49:29.365350 Done
> +#
> +# To browse the database, psql can be used e.g.
> +#
> +#	$ psql pt_example
> +#	pt_example=# select * from samples_view where id < 100;
> +#	pt_example=# \d+
> +#	pt_example=# \d+ samples_view
> +#	pt_example=# \q
> +#
> +# An example of using the database is provided by the script
> +# call-graph-from-postgresql.py.  Refer to that script for details.
> +
>  from PySide.QtSql import *
>  
>  # Need to access PostgreSQL C library directly to use COPY FROM STDIN
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 10/25] perf tools: Add example call-graph script
  2015-08-21 15:00   ` Arnaldo Carvalho de Melo
@ 2015-08-21 15:11     ` Arnaldo Carvalho de Melo
  2015-08-21 15:21       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-21 15:11 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Aug 21, 2015 at 12:00:13PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Jul 17, 2015 at 07:33:45PM +0300, Adrian Hunter escreveu:
> > Add a script to produce a call-graph from data exported
> > to a postgresql database and derived from a processor trace
> > event like intel_pt or intel_bts. Refer to comments in the
> > scripts call-graph-from-postgresql.py and export-to-postgresql.py
> > for more details.
> 
> Testing this, you forgot to add instructions to install the qt stuff:
> 
>   [acme@felicio ~]$ perf script -s
>   ~/libexec/perf-core/scripts/python/export-to-postgresql.py bts_example
>   branches calls
> 
>   Traceback (most recent call last):
>     File
>   "/home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py",
>   line 65, in <module>
>       from PySide.QtSql import *
>   ImportError: No module named PySide.QtSql
>   Error running python script
>   /home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py
>   [acme@felicio ~]$
> 
> I'll add it.

[acme@zoo ~]$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py bts_example branches calls
Traceback (most recent call last):
  File "/home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py", line 18, in <module>
    from PySide.QtSql import *
ImportError: No module named PySide.QtSql
Error running python script /home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py


Installed python-pyside, fedora21, but still something is missing:

[acme@zoo ~]$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py bts_example branches calls
2015-08-21 12:10:00.504126 Creating database...
QSqlDatabase: QPSQL driver not loaded
QSqlDatabase: available drivers: QSQLITE
QSqlDatabase: an instance of QCoreApplication is required for loading driver plugins
QSqlQuery::exec: database not open
Traceback (most recent call last):
  File "/home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py", line 87, in <module>
    do_query(query, 'CREATE DATABASE ' + dbname)
  File "/home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py", line 78, in do_query
    raise Exception("Query failed: " + q.lastError().text())
Exception: Query failed: Driver not loaded Driver not loaded
Error running python script /home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py
[acme@zoo ~]$ 

> - Arnaldo
>  
> > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> > ---
> >  .../scripts/python/call-graph-from-postgresql.py   | 327 +++++++++++++++++++++
> >  tools/perf/scripts/python/export-to-postgresql.py  |  47 +++
> >  2 files changed, 374 insertions(+)
> >  create mode 100644 tools/perf/scripts/python/call-graph-from-postgresql.py
> > 
> > diff --git a/tools/perf/scripts/python/call-graph-from-postgresql.py b/tools/perf/scripts/python/call-graph-from-postgresql.py
> > new file mode 100644
> > index 000000000000..e78fdc2a5a9d
> > --- /dev/null
> > +++ b/tools/perf/scripts/python/call-graph-from-postgresql.py
> > @@ -0,0 +1,327 @@
> > +#!/usr/bin/python2
> > +# call-graph-from-postgresql.py: create call-graph from postgresql database
> > +# Copyright (c) 2014, Intel Corporation.
> > +#
> > +# This program is free software; you can redistribute it and/or modify it
> > +# under the terms and conditions of the GNU General Public License,
> > +# version 2, as published by the Free Software Foundation.
> > +#
> > +# This program is distributed in the hope it will be useful, but WITHOUT
> > +# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > +# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> > +# more details.
> > +
> > +# To use this script you will need to have exported data using the
> > +# export-to-postgresql.py script.  Refer to that script for details.
> > +#
> > +# Following on from the example in the export-to-postgresql.py script, a
> > +# call-graph can be displayed for the pt_example database like this:
> > +#
> > +#	python tools/perf/scripts/python/call-graph-from-postgresql.py pt_example
> > +#
> > +# Note this script supports connecting to remote databases by setting hostname,
> > +# port, username, password, and dbname e.g.
> > +#
> > +#	python tools/perf/scripts/python/call-graph-from-postgresql.py "hostname=myhost username=myuser password=mypassword dbname=pt_example"
> > +#
> > +# The result is a GUI window with a tree representing a context-sensitive
> > +# call-graph.  Expanding a couple of levels of the tree and adjusting column
> > +# widths to suit will display something like:
> > +#
> > +#                                         Call Graph: pt_example
> > +# Call Path                          Object      Count   Time(ns)  Time(%)  Branch Count   Branch Count(%)
> > +# v- ls
> > +#     v- 2638:2638
> > +#         v- _start                  ld-2.19.so    1     10074071   100.0         211135            100.0
> > +#           |- unknown               unknown       1        13198     0.1              1              0.0
> > +#           >- _dl_start             ld-2.19.so    1      1400980    13.9          19637              9.3
> > +#           >- _d_linit_internal     ld-2.19.so    1       448152     4.4          11094              5.3
> > +#           v-__libc_start_main@plt  ls            1      8211741    81.5         180397             85.4
> > +#              >- _dl_fixup          ld-2.19.so    1         7607     0.1            108              0.1
> > +#              >- __cxa_atexit       libc-2.19.so  1        11737     0.1             10              0.0
> > +#              >- __libc_csu_init    ls            1        10354     0.1             10              0.0
> > +#              |- _setjmp            libc-2.19.so  1            0     0.0              4              0.0
> > +#              v- main               ls            1      8182043    99.6         180254             99.9
> > +#
> > +# Points to note:
> > +#	The top level is a command name (comm)
> > +#	The next level is a thread (pid:tid)
> > +#	Subsequent levels are functions
> > +#	'Count' is the number of calls
> > +#	'Time' is the elapsed time until the function returns
> > +#	Percentages are relative to the level above
> > +#	'Branch Count' is the total number of branches for that function and all
> > +#       functions that it calls
> > +
> > +import sys
> > +from PySide.QtCore import *
> > +from PySide.QtGui import *
> > +from PySide.QtSql import *
> > +from decimal import *
> > +
> > +class TreeItem():
> > +
> > +	def __init__(self, db, row, parent_item):
> > +		self.db = db
> > +		self.row = row
> > +		self.parent_item = parent_item
> > +		self.query_done = False;
> > +		self.child_count = 0
> > +		self.child_items = []
> > +		self.data = ["", "", "", "", "", "", ""]
> > +		self.comm_id = 0
> > +		self.thread_id = 0
> > +		self.call_path_id = 1
> > +		self.branch_count = 0
> > +		self.time = 0
> > +		if not parent_item:
> > +			self.setUpRoot()
> > +
> > +	def setUpRoot(self):
> > +		self.query_done = True
> > +		query = QSqlQuery(self.db)
> > +		ret = query.exec_('SELECT id, comm FROM comms')
> > +		if not ret:
> > +			raise Exception("Query failed: " + query.lastError().text())
> > +		while query.next():
> > +			if not query.value(0):
> > +				continue
> > +			child_item = TreeItem(self.db, self.child_count, self)
> > +			self.child_items.append(child_item)
> > +			self.child_count += 1
> > +			child_item.setUpLevel1(query.value(0), query.value(1))
> > +
> > +	def setUpLevel1(self, comm_id, comm):
> > +		self.query_done = True;
> > +		self.comm_id = comm_id
> > +		self.data[0] = comm
> > +		self.child_items = []
> > +		self.child_count = 0
> > +		query = QSqlQuery(self.db)
> > +		ret = query.exec_('SELECT thread_id, ( SELECT pid FROM threads WHERE id = thread_id ), ( SELECT tid FROM threads WHERE id = thread_id ) FROM comm_threads WHERE comm_id = ' + str(comm_id))
> > +		if not ret:
> > +			raise Exception("Query failed: " + query.lastError().text())
> > +		while query.next():
> > +			child_item = TreeItem(self.db, self.child_count, self)
> > +			self.child_items.append(child_item)
> > +			self.child_count += 1
> > +			child_item.setUpLevel2(comm_id, query.value(0), query.value(1), query.value(2))
> > +
> > +	def setUpLevel2(self, comm_id, thread_id, pid, tid):
> > +		self.comm_id = comm_id
> > +		self.thread_id = thread_id
> > +		self.data[0] = str(pid) + ":" + str(tid)
> > +
> > +	def getChildItem(self, row):
> > +		return self.child_items[row]
> > +
> > +	def getParentItem(self):
> > +		return self.parent_item
> > +
> > +	def getRow(self):
> > +		return self.row
> > +
> > +	def timePercent(self, b):
> > +		if not self.time:
> > +			return "0.0"
> > +		x = (b * Decimal(100)) / self.time
> > +		return str(x.quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
> > +
> > +	def branchPercent(self, b):
> > +		if not self.branch_count:
> > +			return "0.0"
> > +		x = (b * Decimal(100)) / self.branch_count
> > +		return str(x.quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
> > +
> > +	def addChild(self, call_path_id, name, dso, count, time, branch_count):
> > +		child_item = TreeItem(self.db, self.child_count, self)
> > +		child_item.comm_id = self.comm_id
> > +		child_item.thread_id = self.thread_id
> > +		child_item.call_path_id = call_path_id
> > +		child_item.branch_count = branch_count
> > +		child_item.time = time
> > +		child_item.data[0] = name
> > +		if dso == "[kernel.kallsyms]":
> > +			dso = "[kernel]"
> > +		child_item.data[1] = dso
> > +		child_item.data[2] = str(count)
> > +		child_item.data[3] = str(time)
> > +		child_item.data[4] = self.timePercent(time)
> > +		child_item.data[5] = str(branch_count)
> > +		child_item.data[6] = self.branchPercent(branch_count)
> > +		self.child_items.append(child_item)
> > +		self.child_count += 1
> > +
> > +	def selectCalls(self):
> > +		self.query_done = True;
> > +		query = QSqlQuery(self.db)
> > +		ret = query.exec_('SELECT id, call_path_id, branch_count, call_time, return_time, '
> > +				  '( SELECT name FROM symbols WHERE id = ( SELECT symbol_id FROM call_paths WHERE id = call_path_id ) ), '
> > +				  '( SELECT short_name FROM dsos WHERE id = ( SELECT dso_id FROM symbols WHERE id = ( SELECT symbol_id FROM call_paths WHERE id = call_path_id ) ) ), '
> > +				  '( SELECT ip FROM call_paths where id = call_path_id ) '
> > +				  'FROM calls WHERE parent_call_path_id = ' + str(self.call_path_id) + ' AND comm_id = ' + str(self.comm_id) + ' AND thread_id = ' + str(self.thread_id) +
> > +				  'ORDER BY call_path_id')
> > +		if not ret:
> > +			raise Exception("Query failed: " + query.lastError().text())
> > +		last_call_path_id = 0
> > +		name = ""
> > +		dso = ""
> > +		count = 0
> > +		branch_count = 0
> > +		total_branch_count = 0
> > +		time = 0
> > +		total_time = 0
> > +		while query.next():
> > +			if query.value(1) == last_call_path_id:
> > +				count += 1
> > +				branch_count += query.value(2)
> > +				time += query.value(4) - query.value(3)
> > +			else:
> > +				if count:
> > +					self.addChild(last_call_path_id, name, dso, count, time, branch_count)
> > +				last_call_path_id = query.value(1)
> > +				name = query.value(5)
> > +				dso = query.value(6)
> > +				count = 1
> > +				total_branch_count += branch_count
> > +				total_time += time
> > +				branch_count = query.value(2)
> > +				time = query.value(4) - query.value(3)
> > +		if count:
> > +			self.addChild(last_call_path_id, name, dso, count, time, branch_count)
> > +		total_branch_count += branch_count
> > +		total_time += time
> > +		# Top level does not have time or branch count, so fix that here
> > +		if total_branch_count > self.branch_count:
> > +			self.branch_count = total_branch_count
> > +			if self.branch_count:
> > +				for child_item in self.child_items:
> > +					child_item.data[6] = self.branchPercent(child_item.branch_count)
> > +		if total_time > self.time:
> > +			self.time = total_time
> > +			if self.time:
> > +				for child_item in self.child_items:
> > +					child_item.data[4] = self.timePercent(child_item.time)
> > +
> > +	def childCount(self):
> > +		if not self.query_done:
> > +			self.selectCalls()
> > +		return self.child_count
> > +
> > +	def columnCount(self):
> > +		return 7
> > +
> > +	def columnHeader(self, column):
> > +		headers = ["Call Path", "Object", "Count ", "Time (ns) ", "Time (%) ", "Branch Count ", "Branch Count (%) "]
> > +		return headers[column]
> > +
> > +	def getData(self, column):
> > +		return self.data[column]
> > +
> > +class TreeModel(QAbstractItemModel):
> > +
> > +	def __init__(self, db, parent=None):
> > +		super(TreeModel, self).__init__(parent)
> > +		self.db = db
> > +		self.root = TreeItem(db, 0, None)
> > +
> > +	def columnCount(self, parent):
> > +		return self.root.columnCount()
> > +
> > +	def rowCount(self, parent):
> > +		if parent.isValid():
> > +			parent_item = parent.internalPointer()
> > +		else:
> > +			parent_item = self.root
> > +		return parent_item.childCount()
> > +
> > +	def headerData(self, section, orientation, role):
> > +		if role == Qt.TextAlignmentRole:
> > +			if section > 1:
> > +				return Qt.AlignRight
> > +		if role != Qt.DisplayRole:
> > +			return None
> > +		if orientation != Qt.Horizontal:
> > +			return None
> > +		return self.root.columnHeader(section)
> > +
> > +	def parent(self, child):
> > +		child_item = child.internalPointer()
> > +		if child_item is self.root:
> > +			return QModelIndex()
> > +		parent_item = child_item.getParentItem()
> > +		return self.createIndex(parent_item.getRow(), 0, parent_item)
> > +
> > +	def index(self, row, column, parent):
> > +		if parent.isValid():
> > +			parent_item = parent.internalPointer()
> > +		else:
> > +			parent_item = self.root
> > +		child_item = parent_item.getChildItem(row)
> > +		return self.createIndex(row, column, child_item)
> > +
> > +	def data(self, index, role):
> > +		if role == Qt.TextAlignmentRole:
> > +			if index.column() > 1:
> > +				return Qt.AlignRight
> > +		if role != Qt.DisplayRole:
> > +			return None
> > +		index_item = index.internalPointer()
> > +		return index_item.getData(index.column())
> > +
> > +class MainWindow(QMainWindow):
> > +
> > +	def __init__(self, db, dbname, parent=None):
> > +		super(MainWindow, self).__init__(parent)
> > +
> > +		self.setObjectName("MainWindow")
> > +		self.setWindowTitle("Call Graph: " + dbname)
> > +		self.move(100, 100)
> > +		self.resize(800, 600)
> > +		style = self.style()
> > +		icon = style.standardIcon(QStyle.SP_MessageBoxInformation)
> > +		self.setWindowIcon(icon);
> > +
> > +		self.model = TreeModel(db)
> > +
> > +		self.view = QTreeView()
> > +		self.view.setModel(self.model)
> > +
> > +		self.setCentralWidget(self.view)
> > +
> > +if __name__ == '__main__':
> > +	if (len(sys.argv) < 2):
> > +		print >> sys.stderr, "Usage is: call-graph-from-postgresql.py <database name>"
> > +		raise Exception("Too few arguments")
> > +
> > +	dbname = sys.argv[1]
> > +
> > +	db = QSqlDatabase.addDatabase('QPSQL')
> > +
> > +	opts = dbname.split()
> > +	for opt in opts:
> > +		if '=' in opt:
> > +			opt = opt.split('=')
> > +			if opt[0] == 'hostname':
> > +				db.setHostName(opt[1])
> > +			elif opt[0] == 'port':
> > +				db.setPort(int(opt[1]))
> > +			elif opt[0] == 'username':
> > +				db.setUserName(opt[1])
> > +			elif opt[0] == 'password':
> > +				db.setPassword(opt[1])
> > +			elif opt[0] == 'dbname':
> > +				dbname = opt[1]
> > +		else:
> > +			dbname = opt
> > +
> > +	db.setDatabaseName(dbname)
> > +	if not db.open():
> > +		raise Exception("Failed to open database " + dbname + " error: " + db.lastError().text())
> > +
> > +	app = QApplication(sys.argv)
> > +	window = MainWindow(db, dbname)
> > +	window.show()
> > +	err = app.exec_()
> > +	db.close()
> > +	sys.exit(err)
> > diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py
> > index 4cdafd880074..5e939eadacd2 100644
> > --- a/tools/perf/scripts/python/export-to-postgresql.py
> > +++ b/tools/perf/scripts/python/export-to-postgresql.py
> > @@ -15,6 +15,53 @@ import sys
> >  import struct
> >  import datetime
> >  
> > +# To use this script you will need to have installed package python-pyside which
> > +# provides LGPL-licensed Python bindings for Qt.  You will also need the package
> > +# libqt4-sql-psql for Qt postgresql support.
> > +#
> > +# The script assumes postgresql is running on the local machine and that the
> > +# user has postgresql permissions to create databases. Examples of installing
> > +# postgresql and adding such a user are:
> > +#
> > +# fedora:
> > +#
> > +#	$ sudo yum install postgresql postgresql-server
> > +#	$ sudo su - postgres -c initdb
> > +#	$ sudo service postgresql start
> > +#	$ sudo su - postgres
> > +#	$ createuser <your user id here>
> > +#	Shall the new role be a superuser? (y/n) y
> > +#
> > +# ubuntu:
> > +#
> > +#	$ sudo apt-get install postgresql
> > +#	$ sudo su - postgres
> > +#	$ createuser <your user id here>
> > +#	Shall the new role be a superuser? (y/n) y
> > +#
> > +# An example of using this script with Intel PT:
> > +#
> > +#	$ perf record -e intel_pt//u ls
> > +#	$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py pt_example branches calls
> > +#	2015-05-29 12:49:23.464364 Creating database...
> > +#	2015-05-29 12:49:26.281717 Writing to intermediate files...
> > +#	2015-05-29 12:49:27.190383 Copying to database...
> > +#	2015-05-29 12:49:28.140451 Removing intermediate files...
> > +#	2015-05-29 12:49:28.147451 Adding primary keys
> > +#	2015-05-29 12:49:28.655683 Adding foreign keys
> > +#	2015-05-29 12:49:29.365350 Done
> > +#
> > +# To browse the database, psql can be used e.g.
> > +#
> > +#	$ psql pt_example
> > +#	pt_example=# select * from samples_view where id < 100;
> > +#	pt_example=# \d+
> > +#	pt_example=# \d+ samples_view
> > +#	pt_example=# \q
> > +#
> > +# An example of using the database is provided by the script
> > +# call-graph-from-postgresql.py.  Refer to that script for details.
> > +
> >  from PySide.QtSql import *
> >  
> >  # Need to access PostgreSQL C library directly to use COPY FROM STDIN
> > -- 
> > 1.9.1

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 10/25] perf tools: Add example call-graph script
  2015-08-21 15:11     ` Arnaldo Carvalho de Melo
@ 2015-08-21 15:21       ` Arnaldo Carvalho de Melo
  2015-08-21 15:28         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-21 15:21 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Aug 21, 2015 at 12:11:33PM -0300, Arnaldo Carvalho de Melo escreveu:
> [acme@zoo ~]$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py bts_example branches calls
> 2015-08-21 12:10:00.504126 Creating database...
> QSqlDatabase: QPSQL driver not loaded
> QSqlDatabase: available drivers: QSQLITE
> QSqlDatabase: an instance of QCoreApplication is required for loading driver plugins
> QSqlQuery::exec: database not open
> Traceback (most recent call last):
>   File "/home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py", line 87, in <module>
>     do_query(query, 'CREATE DATABASE ' + dbname)
>   File "/home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py", line 78, in do_query
>     raise Exception("Query failed: " + q.lastError().text())
> Exception: Query failed: Driver not loaded Driver not loaded
> Error running python script /home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py
> [acme@zoo ~]$ 

A-ha, this was missing:

[root@zoo ~]# rpm -ql qt-postgresql 
/usr/lib64/qt4/plugins/sqldrivers/libqsqlpsql.so
[root@zoo ~]#

[acme@zoo ~]$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py bts_example branches calls
2015-08-21 12:20:01.841677 Creating database...
2015-08-21 12:20:02.556853 Writing to intermediate files...
2015-08-21 12:20:03.262814 Copying to database...
2015-08-21 12:20:03.783109 Removing intermediate files...
2015-08-21 12:20:03.790282 Adding primary keys
2015-08-21 12:20:04.294342 Adding foreign keys
2015-08-21 12:20:04.718238 Done
[acme@zoo ~]$ 

Now to the next steps... Lets see how this looks like with BTS...

- Arnaldo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 10/25] perf tools: Add example call-graph script
  2015-08-21 15:21       ` Arnaldo Carvalho de Melo
@ 2015-08-21 15:28         ` Arnaldo Carvalho de Melo
  2015-08-21 15:32           ` Arnaldo Carvalho de Melo
  2015-08-24  7:00           ` Adrian Hunter
  0 siblings, 2 replies; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-21 15:28 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Aug 21, 2015 at 12:21:25PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Aug 21, 2015 at 12:11:33PM -0300, Arnaldo Carvalho de Melo escreveu:
> > [acme@zoo ~]$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py bts_example branches calls
> > 2015-08-21 12:10:00.504126 Creating database...
> > QSqlDatabase: QPSQL driver not loaded
> > QSqlDatabase: available drivers: QSQLITE
> > QSqlDatabase: an instance of QCoreApplication is required for loading driver plugins
> > QSqlQuery::exec: database not open
> > Traceback (most recent call last):
> >   File "/home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py", line 87, in <module>
> >     do_query(query, 'CREATE DATABASE ' + dbname)
> >   File "/home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py", line 78, in do_query
> >     raise Exception("Query failed: " + q.lastError().text())
> > Exception: Query failed: Driver not loaded Driver not loaded
> > Error running python script /home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py
> > [acme@zoo ~]$ 
> 
> A-ha, this was missing:
> 
> [root@zoo ~]# rpm -ql qt-postgresql 
> /usr/lib64/qt4/plugins/sqldrivers/libqsqlpsql.so
> [root@zoo ~]#
> 
> [acme@zoo ~]$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py bts_example branches calls
> 2015-08-21 12:20:01.841677 Creating database...
> 2015-08-21 12:20:02.556853 Writing to intermediate files...
> 2015-08-21 12:20:03.262814 Copying to database...
> 2015-08-21 12:20:03.783109 Removing intermediate files...
> 2015-08-21 12:20:03.790282 Adding primary keys
> 2015-08-21 12:20:04.294342 Adding foreign keys
> 2015-08-21 12:20:04.718238 Done
> [acme@zoo ~]$ 
> 
> Now to the next steps... Lets see how this looks like with BTS...

So, after running:

  [acme@zoo linux]$ python
  tools/perf/scripts/python/call-graph-from-postgresql.py bts_example

I got a GUI and after expanding some callchains I took this screenshot:

  http://vger.kernel.org/~acme/perf/call_graph_example_intel_bts.png

Cool stuff! But it seems the COMM got messed up? I.e. that "1380:1380"
first level after "ls".

- Arnaldo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 10/25] perf tools: Add example call-graph script
  2015-08-21 15:28         ` Arnaldo Carvalho de Melo
@ 2015-08-21 15:32           ` Arnaldo Carvalho de Melo
  2015-08-24  7:00           ` Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-21 15:32 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Aug 21, 2015 at 12:28:13PM -0300, Arnaldo Carvalho de Melo escreveu:
> > A-ha, this was missing:
> > 
> > [root@zoo ~]# rpm -ql qt-postgresql 
> > /usr/lib64/qt4/plugins/sqldrivers/libqsqlpsql.so

Adding this to your patch:

diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py
index 5e939eadacd2..84a32037a80f 100644
--- a/tools/perf/scripts/python/export-to-postgresql.py
+++ b/tools/perf/scripts/python/export-to-postgresql.py
@@ -25,7 +25,7 @@ import datetime
 #
 # fedora:
 #
-#	$ sudo yum install postgresql postgresql-server
+#	$ sudo yum install postgresql postgresql-server python-pyside qt-postgresql
 #	$ sudo su - postgres -c initdb
 #	$ sudo service postgresql start
 #	$ sudo su - postgres

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add Intel BTS support
  2015-07-17 16:33 ` [PATCH V8 08/25] perf tools: Add Intel BTS support Adrian Hunter
  2015-08-17 15:52   ` Arnaldo Carvalho de Melo
@ 2015-08-22  6:52   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-22  6:52 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: adrian.hunter, linux-kernel, jolsa, hpa, acme, mingo, tglx

Commit-ID:  d0170af7004dce9cd90b749842c37e379476cbc8
Gitweb:     http://git.kernel.org/tip/d0170af7004dce9cd90b749842c37e379476cbc8
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:43 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 21 Aug 2015 11:34:10 -0300

perf tools: Add Intel BTS support

Intel BTS support fits within the new auxtrace infrastructure.  Recording is
supporting by identifying the Intel BTS PMU, parsing options and setting up
events.

Decoding is supported by queuing up trace data by thread and then decoding
synchronously delivering synthesized event samples into the session processing
for tools to consume.

Committer note:

E.g:

  [root@felicio ~]# perf record --per-thread -e intel_bts// ls
  anaconda-ks.cfg  apctest.output  bin  kernel-rt-3.10.0-298.rt56.171.el7.x86_64.rpm  libexec  lock_page.bpf.c  perf.data  perf.data.old
  [ perf record: Woken up 3 times to write data ]
  [ perf record: Captured and wrote 4.367 MB perf.data ]
  [root@felicio ~]# perf evlist -v
  intel_bts//: type: 6, size: 112, { sample_period, sample_freq }: 1, sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1, enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1
  dummy:u: type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 1, sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1, exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1
  [root@felicio ~]# perf script # the navigate in the pager to some interesting place:
    ls 1843 1 branches: ffffffff810a60cb flush_signal_handlers ([kernel.kallsyms]) => ffffffff8121a522 setup_new_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8121a529 setup_new_exec ([kernel.kallsyms]) => ffffffff8122fa30 do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122fa5d do_close_on_exec ([kernel.kallsyms]) => ffffffff81767ae0 _raw_spin_lock ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff81767af4 _raw_spin_lock ([kernel.kallsyms]) => ffffffff8122fa62 do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122fa8e do_close_on_exec ([kernel.kallsyms]) => ffffffff8122faf0 do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122faf7 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fa8b do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122fac9 do_close_on_exec ([kernel.kallsyms]) => ffffffff8122fad2 do_close_on_exec ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8122fadd do_close_on_exec ([kernel.kallsyms]) => ffffffff8120fc80 filp_close ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8120fcaf filp_close ([kernel.kallsyms]) => ffffffff8120fcb6 filp_close ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8120fcc2 filp_close ([kernel.kallsyms]) => ffffffff812547f0 dnotify_flush ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff81254823 dnotify_flush ([kernel.kallsyms]) => ffffffff8120fcc7 filp_close ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8120fccd filp_close ([kernel.kallsyms]) => ffffffff81261790 locks_remove_posix ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff812617a3 locks_remove_posix ([kernel.kallsyms]) => ffffffff812617b9 locks_remove_posix ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff812617b9 locks_remove_posix ([kernel.kallsyms]) => ffffffff8120fcd2 filp_close ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8120fcd5 filp_close ([kernel.kallsyms]) => ffffffff812142c0 fput ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff812142d6 fput ([kernel.kallsyms]) => ffffffff812142df fput ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff8121430c fput ([kernel.kallsyms]) => ffffffff810b6580 task_work_add ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff810b65ad task_work_add ([kernel.kallsyms]) => ffffffff810b65b1 task_work_add ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff810b65c1 task_work_add ([kernel.kallsyms]) => ffffffff810bc710 kick_process ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff810bc725 kick_process ([kernel.kallsyms]) => ffffffff810bc742 kick_process ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff810bc742 kick_process ([kernel.kallsyms]) => ffffffff810b65c6 task_work_add ([kernel.kallsyms])
    ls 1843 1 branches: ffffffff810b65c9 task_work_add ([kernel.kallsyms]) => ffffffff81214311 fput ([kernel.kallsyms])

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-9-git-send-email-adrian.hunter@intel.com
[ Merged sample->time fix for bug found after first round of testing on slightly older kernel ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/intel-bts.txt |  86 +++
 tools/perf/arch/x86/util/Build         |   1 +
 tools/perf/arch/x86/util/auxtrace.c    |  49 +-
 tools/perf/arch/x86/util/intel-bts.c   | 458 ++++++++++++++++
 tools/perf/arch/x86/util/pmu.c         |   3 +
 tools/perf/util/Build                  |   1 +
 tools/perf/util/auxtrace.c             |   3 +
 tools/perf/util/auxtrace.h             |   1 +
 tools/perf/util/intel-bts.c            | 933 +++++++++++++++++++++++++++++++++
 tools/perf/util/intel-bts.h            |  43 ++
 tools/perf/util/pmu.c                  |   4 -
 11 files changed, 1576 insertions(+), 6 deletions(-)

diff --git a/tools/perf/Documentation/intel-bts.txt b/tools/perf/Documentation/intel-bts.txt
new file mode 100644
index 0000000..8bdc93b
--- /dev/null
+++ b/tools/perf/Documentation/intel-bts.txt
@@ -0,0 +1,86 @@
+Intel Branch Trace Store
+========================
+
+Overview
+========
+
+Intel BTS could be regarded as a predecessor to Intel PT and has some
+similarities because it can also identify every branch a program takes.  A
+notable difference is that Intel BTS has no timing information and as a
+consequence the present implementation is limited to per-thread recording.
+
+While decoding Intel BTS does not require walking the object code, the object
+code is still needed to pair up calls and returns correctly, consequently much
+of the Intel PT documentation applies also to Intel BTS.  Refer to the Intel PT
+documentation and consider that the PMU 'intel_bts' can usually be used in
+place of 'intel_pt' in the examples provided, with the proviso that per-thread
+recording must also be stipulated i.e. the --per-thread option for
+'perf record'.
+
+
+perf record
+===========
+
+new event
+---------
+
+The Intel BTS kernel driver creates a new PMU for Intel BTS.  The perf record
+option is:
+
+	-e intel_bts//
+
+Currently Intel BTS is limited to per-thread tracing so the --per-thread option
+is also needed.
+
+
+snapshot option
+---------------
+
+The snapshot option is the same as Intel PT (refer Intel PT documentation).
+
+
+auxtrace mmap size option
+-----------------------
+
+The mmap size option is the same as Intel PT (refer Intel PT documentation).
+
+
+perf script
+===========
+
+By default, perf script will decode trace data found in the perf.data file.
+This can be further controlled by option --itrace.  The --itrace option is
+the same as Intel PT (refer Intel PT documentation) except that neither
+"instructions" events nor "transactions" events (and consequently call
+chains) are supported.
+
+To disable trace decoding entirely, use the option --no-itrace.
+
+
+dump option
+-----------
+
+perf script has an option (-D) to "dump" the events i.e. display the binary
+data.
+
+When -D is used, Intel BTS packets are displayed.
+
+To disable the display of Intel BTS packets, combine the -D option with
+--no-itrace.
+
+
+perf report
+===========
+
+By default, perf report will decode trace data found in the perf.data file.
+This can be further controlled by new option --itrace exactly the same as
+perf script.
+
+
+perf inject
+===========
+
+perf inject also accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+	perf inject --itrace -i perf.data -o perf.data.new
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index a8be9f9..2c55e1b 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -10,3 +10,4 @@ libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
+libperf-$(CONFIG_AUXTRACE) += intel-bts.o
diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
index e7654b5..7a78055 100644
--- a/tools/perf/arch/x86/util/auxtrace.c
+++ b/tools/perf/arch/x86/util/auxtrace.c
@@ -13,11 +13,56 @@
  *
  */
 
+#include <stdbool.h>
+
 #include "../../util/header.h"
+#include "../../util/debug.h"
+#include "../../util/pmu.h"
 #include "../../util/auxtrace.h"
 #include "../../util/intel-pt.h"
+#include "../../util/intel-bts.h"
+#include "../../util/evlist.h"
+
+static
+struct auxtrace_record *auxtrace_record__init_intel(struct perf_evlist *evlist,
+						    int *err)
+{
+	struct perf_pmu *intel_pt_pmu;
+	struct perf_pmu *intel_bts_pmu;
+	struct perf_evsel *evsel;
+	bool found_pt = false;
+	bool found_bts = false;
+
+	intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
+	intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
+
+	if (evlist) {
+		evlist__for_each(evlist, evsel) {
+			if (intel_pt_pmu &&
+			    evsel->attr.type == intel_pt_pmu->type)
+				found_pt = true;
+			if (intel_bts_pmu &&
+			    evsel->attr.type == intel_bts_pmu->type)
+				found_bts = true;
+		}
+	}
+
+	if (found_pt && found_bts) {
+		pr_err("intel_pt and intel_bts may not be used together\n");
+		*err = -EINVAL;
+		return NULL;
+	}
+
+	if (found_pt)
+		return intel_pt_recording_init(err);
+
+	if (found_bts)
+		return intel_bts_recording_init(err);
 
-struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe_unused,
+	return NULL;
+}
+
+struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist,
 					      int *err)
 {
 	char buffer[64];
@@ -32,7 +77,7 @@ struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe
 	}
 
 	if (!strncmp(buffer, "GenuineIntel,", 13))
-		return intel_pt_recording_init(err);
+		return auxtrace_record__init_intel(evlist, err);
 
 	return NULL;
 }
diff --git a/tools/perf/arch/x86/util/intel-bts.c b/tools/perf/arch/x86/util/intel-bts.c
new file mode 100644
index 0000000..9b94ce5
--- /dev/null
+++ b/tools/perf/arch/x86/util/intel-bts.c
@@ -0,0 +1,458 @@
+/*
+ * intel-bts.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "../../util/cpumap.h"
+#include "../../util/evsel.h"
+#include "../../util/evlist.h"
+#include "../../util/session.h"
+#include "../../util/util.h"
+#include "../../util/pmu.h"
+#include "../../util/debug.h"
+#include "../../util/tsc.h"
+#include "../../util/auxtrace.h"
+#include "../../util/intel-bts.h"
+
+#define KiB(x) ((x) * 1024)
+#define MiB(x) ((x) * 1024 * 1024)
+#define KiB_MASK(x) (KiB(x) - 1)
+#define MiB_MASK(x) (MiB(x) - 1)
+
+#define INTEL_BTS_DFLT_SAMPLE_SIZE	KiB(4)
+
+#define INTEL_BTS_MAX_SAMPLE_SIZE	KiB(60)
+
+struct intel_bts_snapshot_ref {
+	void	*ref_buf;
+	size_t	ref_offset;
+	bool	wrapped;
+};
+
+struct intel_bts_recording {
+	struct auxtrace_record		itr;
+	struct perf_pmu			*intel_bts_pmu;
+	struct perf_evlist		*evlist;
+	bool				snapshot_mode;
+	size_t				snapshot_size;
+	int				snapshot_ref_cnt;
+	struct intel_bts_snapshot_ref	*snapshot_refs;
+};
+
+struct branch {
+	u64 from;
+	u64 to;
+	u64 misc;
+};
+
+static size_t intel_bts_info_priv_size(struct auxtrace_record *itr __maybe_unused)
+{
+	return INTEL_BTS_AUXTRACE_PRIV_SIZE;
+}
+
+static int intel_bts_info_fill(struct auxtrace_record *itr,
+			       struct perf_session *session,
+			       struct auxtrace_info_event *auxtrace_info,
+			       size_t priv_size)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
+	struct perf_event_mmap_page *pc;
+	struct perf_tsc_conversion tc = { .time_mult = 0, };
+	bool cap_user_time_zero = false;
+	int err;
+
+	if (priv_size != INTEL_BTS_AUXTRACE_PRIV_SIZE)
+		return -EINVAL;
+
+	if (!session->evlist->nr_mmaps)
+		return -EINVAL;
+
+	pc = session->evlist->mmap[0].base;
+	if (pc) {
+		err = perf_read_tsc_conversion(pc, &tc);
+		if (err) {
+			if (err != -EOPNOTSUPP)
+				return err;
+		} else {
+			cap_user_time_zero = tc.time_mult != 0;
+		}
+		if (!cap_user_time_zero)
+			ui__warning("Intel BTS: TSC not available\n");
+	}
+
+	auxtrace_info->type = PERF_AUXTRACE_INTEL_BTS;
+	auxtrace_info->priv[INTEL_BTS_PMU_TYPE] = intel_bts_pmu->type;
+	auxtrace_info->priv[INTEL_BTS_TIME_SHIFT] = tc.time_shift;
+	auxtrace_info->priv[INTEL_BTS_TIME_MULT] = tc.time_mult;
+	auxtrace_info->priv[INTEL_BTS_TIME_ZERO] = tc.time_zero;
+	auxtrace_info->priv[INTEL_BTS_CAP_USER_TIME_ZERO] = cap_user_time_zero;
+	auxtrace_info->priv[INTEL_BTS_SNAPSHOT_MODE] = btsr->snapshot_mode;
+
+	return 0;
+}
+
+static int intel_bts_recording_options(struct auxtrace_record *itr,
+				       struct perf_evlist *evlist,
+				       struct record_opts *opts)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
+	struct perf_evsel *evsel, *intel_bts_evsel = NULL;
+	const struct cpu_map *cpus = evlist->cpus;
+	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
+
+	btsr->evlist = evlist;
+	btsr->snapshot_mode = opts->auxtrace_snapshot_mode;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == intel_bts_pmu->type) {
+			if (intel_bts_evsel) {
+				pr_err("There may be only one " INTEL_BTS_PMU_NAME " event\n");
+				return -EINVAL;
+			}
+			evsel->attr.freq = 0;
+			evsel->attr.sample_period = 1;
+			intel_bts_evsel = evsel;
+			opts->full_auxtrace = true;
+		}
+	}
+
+	if (opts->auxtrace_snapshot_mode && !opts->full_auxtrace) {
+		pr_err("Snapshot mode (-S option) requires " INTEL_BTS_PMU_NAME " PMU event (-e " INTEL_BTS_PMU_NAME ")\n");
+		return -EINVAL;
+	}
+
+	if (!opts->full_auxtrace)
+		return 0;
+
+	if (opts->full_auxtrace && !cpu_map__empty(cpus)) {
+		pr_err(INTEL_BTS_PMU_NAME " does not support per-cpu recording\n");
+		return -EINVAL;
+	}
+
+	/* Set default sizes for snapshot mode */
+	if (opts->auxtrace_snapshot_mode) {
+		if (!opts->auxtrace_snapshot_size && !opts->auxtrace_mmap_pages) {
+			if (privileged) {
+				opts->auxtrace_mmap_pages = MiB(4) / page_size;
+			} else {
+				opts->auxtrace_mmap_pages = KiB(128) / page_size;
+				if (opts->mmap_pages == UINT_MAX)
+					opts->mmap_pages = KiB(256) / page_size;
+			}
+		} else if (!opts->auxtrace_mmap_pages && !privileged &&
+			   opts->mmap_pages == UINT_MAX) {
+			opts->mmap_pages = KiB(256) / page_size;
+		}
+		if (!opts->auxtrace_snapshot_size)
+			opts->auxtrace_snapshot_size =
+				opts->auxtrace_mmap_pages * (size_t)page_size;
+		if (!opts->auxtrace_mmap_pages) {
+			size_t sz = opts->auxtrace_snapshot_size;
+
+			sz = round_up(sz, page_size) / page_size;
+			opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
+		}
+		if (opts->auxtrace_snapshot_size >
+				opts->auxtrace_mmap_pages * (size_t)page_size) {
+			pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
+			       opts->auxtrace_snapshot_size,
+			       opts->auxtrace_mmap_pages * (size_t)page_size);
+			return -EINVAL;
+		}
+		if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages) {
+			pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
+			return -EINVAL;
+		}
+		pr_debug2("Intel BTS snapshot size: %zu\n",
+			  opts->auxtrace_snapshot_size);
+	}
+
+	/* Set default sizes for full trace mode */
+	if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) {
+		if (privileged) {
+			opts->auxtrace_mmap_pages = MiB(4) / page_size;
+		} else {
+			opts->auxtrace_mmap_pages = KiB(128) / page_size;
+			if (opts->mmap_pages == UINT_MAX)
+				opts->mmap_pages = KiB(256) / page_size;
+		}
+	}
+
+	/* Validate auxtrace_mmap_pages */
+	if (opts->auxtrace_mmap_pages) {
+		size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
+		size_t min_sz;
+
+		if (opts->auxtrace_snapshot_mode)
+			min_sz = KiB(4);
+		else
+			min_sz = KiB(8);
+
+		if (sz < min_sz || !is_power_of_2(sz)) {
+			pr_err("Invalid mmap size for Intel BTS: must be at least %zuKiB and a power of 2\n",
+			       min_sz / 1024);
+			return -EINVAL;
+		}
+	}
+
+	if (intel_bts_evsel) {
+		/*
+		 * To obtain the auxtrace buffer file descriptor, the auxtrace event
+		 * must come first.
+		 */
+		perf_evlist__to_front(evlist, intel_bts_evsel);
+		/*
+		 * In the case of per-cpu mmaps, we need the CPU on the
+		 * AUX event.
+		 */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(intel_bts_evsel, CPU);
+	}
+
+	/* Add dummy event to keep tracking */
+	if (opts->full_auxtrace) {
+		struct perf_evsel *tracking_evsel;
+		int err;
+
+		err = parse_events(evlist, "dummy:u", NULL);
+		if (err)
+			return err;
+
+		tracking_evsel = perf_evlist__last(evlist);
+
+		perf_evlist__set_tracking_event(evlist, tracking_evsel);
+
+		tracking_evsel->attr.freq = 0;
+		tracking_evsel->attr.sample_period = 1;
+	}
+
+	return 0;
+}
+
+static int intel_bts_parse_snapshot_options(struct auxtrace_record *itr,
+					    struct record_opts *opts,
+					    const char *str)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	unsigned long long snapshot_size = 0;
+	char *endptr;
+
+	if (str) {
+		snapshot_size = strtoull(str, &endptr, 0);
+		if (*endptr || snapshot_size > SIZE_MAX)
+			return -1;
+	}
+
+	opts->auxtrace_snapshot_mode = true;
+	opts->auxtrace_snapshot_size = snapshot_size;
+
+	btsr->snapshot_size = snapshot_size;
+
+	return 0;
+}
+
+static u64 intel_bts_reference(struct auxtrace_record *itr __maybe_unused)
+{
+	return rdtsc();
+}
+
+static int intel_bts_alloc_snapshot_refs(struct intel_bts_recording *btsr,
+					 int idx)
+{
+	const size_t sz = sizeof(struct intel_bts_snapshot_ref);
+	int cnt = btsr->snapshot_ref_cnt, new_cnt = cnt * 2;
+	struct intel_bts_snapshot_ref *refs;
+
+	if (!new_cnt)
+		new_cnt = 16;
+
+	while (new_cnt <= idx)
+		new_cnt *= 2;
+
+	refs = calloc(new_cnt, sz);
+	if (!refs)
+		return -ENOMEM;
+
+	memcpy(refs, btsr->snapshot_refs, cnt * sz);
+
+	btsr->snapshot_refs = refs;
+	btsr->snapshot_ref_cnt = new_cnt;
+
+	return 0;
+}
+
+static void intel_bts_free_snapshot_refs(struct intel_bts_recording *btsr)
+{
+	int i;
+
+	for (i = 0; i < btsr->snapshot_ref_cnt; i++)
+		zfree(&btsr->snapshot_refs[i].ref_buf);
+	zfree(&btsr->snapshot_refs);
+}
+
+static void intel_bts_recording_free(struct auxtrace_record *itr)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+
+	intel_bts_free_snapshot_refs(btsr);
+	free(btsr);
+}
+
+static int intel_bts_snapshot_start(struct auxtrace_record *itr)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(btsr->evlist, evsel) {
+		if (evsel->attr.type == btsr->intel_bts_pmu->type)
+			return perf_evlist__disable_event(btsr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_bts_snapshot_finish(struct auxtrace_record *itr)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(btsr->evlist, evsel) {
+		if (evsel->attr.type == btsr->intel_bts_pmu->type)
+			return perf_evlist__enable_event(btsr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static bool intel_bts_first_wrap(u64 *data, size_t buf_size)
+{
+	int i, a, b;
+
+	b = buf_size >> 3;
+	a = b - 512;
+	if (a < 0)
+		a = 0;
+
+	for (i = a; i < b; i++) {
+		if (data[i])
+			return true;
+	}
+
+	return false;
+}
+
+static int intel_bts_find_snapshot(struct auxtrace_record *itr, int idx,
+				   struct auxtrace_mmap *mm, unsigned char *data,
+				   u64 *head, u64 *old)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	bool wrapped;
+	int err;
+
+	pr_debug3("%s: mmap index %d old head %zu new head %zu\n",
+		  __func__, idx, (size_t)*old, (size_t)*head);
+
+	if (idx >= btsr->snapshot_ref_cnt) {
+		err = intel_bts_alloc_snapshot_refs(btsr, idx);
+		if (err)
+			goto out_err;
+	}
+
+	wrapped = btsr->snapshot_refs[idx].wrapped;
+	if (!wrapped && intel_bts_first_wrap((u64 *)data, mm->len)) {
+		btsr->snapshot_refs[idx].wrapped = true;
+		wrapped = true;
+	}
+
+	/*
+	 * In full trace mode 'head' continually increases.  However in snapshot
+	 * mode 'head' is an offset within the buffer.  Here 'old' and 'head'
+	 * are adjusted to match the full trace case which expects that 'old' is
+	 * always less than 'head'.
+	 */
+	if (wrapped) {
+		*old = *head;
+		*head += mm->len;
+	} else {
+		if (mm->mask)
+			*old &= mm->mask;
+		else
+			*old %= mm->len;
+		if (*old > *head)
+			*head += mm->len;
+	}
+
+	pr_debug3("%s: wrap-around %sdetected, adjusted old head %zu adjusted new head %zu\n",
+		  __func__, wrapped ? "" : "not ", (size_t)*old, (size_t)*head);
+
+	return 0;
+
+out_err:
+	pr_err("%s: failed, error %d\n", __func__, err);
+	return err;
+}
+
+static int intel_bts_read_finish(struct auxtrace_record *itr, int idx)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(btsr->evlist, evsel) {
+		if (evsel->attr.type == btsr->intel_bts_pmu->type)
+			return perf_evlist__enable_event_idx(btsr->evlist,
+							     evsel, idx);
+	}
+	return -EINVAL;
+}
+
+struct auxtrace_record *intel_bts_recording_init(int *err)
+{
+	struct perf_pmu *intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
+	struct intel_bts_recording *btsr;
+
+	if (!intel_bts_pmu)
+		return NULL;
+
+	btsr = zalloc(sizeof(struct intel_bts_recording));
+	if (!btsr) {
+		*err = -ENOMEM;
+		return NULL;
+	}
+
+	btsr->intel_bts_pmu = intel_bts_pmu;
+	btsr->itr.recording_options = intel_bts_recording_options;
+	btsr->itr.info_priv_size = intel_bts_info_priv_size;
+	btsr->itr.info_fill = intel_bts_info_fill;
+	btsr->itr.free = intel_bts_recording_free;
+	btsr->itr.snapshot_start = intel_bts_snapshot_start;
+	btsr->itr.snapshot_finish = intel_bts_snapshot_finish;
+	btsr->itr.find_snapshot = intel_bts_find_snapshot;
+	btsr->itr.parse_snapshot_options = intel_bts_parse_snapshot_options;
+	btsr->itr.reference = intel_bts_reference;
+	btsr->itr.read_finish = intel_bts_read_finish;
+	btsr->itr.alignment = sizeof(struct branch);
+	return &btsr->itr;
+}
diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
index fd11cc3..79fe071 100644
--- a/tools/perf/arch/x86/util/pmu.c
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -3,6 +3,7 @@
 #include <linux/perf_event.h>
 
 #include "../../util/intel-pt.h"
+#include "../../util/intel-bts.h"
 #include "../../util/pmu.h"
 
 struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
@@ -10,6 +11,8 @@ struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __mayb
 #ifdef HAVE_AUXTRACE_SUPPORT
 	if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
 		return intel_pt_pmu_default_config(pmu);
+	if (!strcmp(pmu->name, INTEL_BTS_PMU_NAME))
+		pmu->selectable = true;
 #endif
 	return NULL;
 }
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index c20473d..e912856 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -80,6 +80,7 @@ libperf-y += thread-stack.o
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
+libperf-$(CONFIG_AUXTRACE) += intel-bts.o
 libperf-y += parse-branch-options.o
 
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 0f0b7e1..a980e7c 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -48,6 +48,7 @@
 #include "parse-options.h"
 
 #include "intel-pt.h"
+#include "intel-bts.h"
 
 int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
 			struct auxtrace_mmap_params *mp,
@@ -888,6 +889,8 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 	switch (type) {
 	case PERF_AUXTRACE_INTEL_PT:
 		return intel_pt_process_auxtrace_info(event, session);
+	case PERF_AUXTRACE_INTEL_BTS:
+		return intel_bts_process_auxtrace_info(event, session);
 	case PERF_AUXTRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 7d12f33..bf72b77 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -40,6 +40,7 @@ struct events_stats;
 enum auxtrace_type {
 	PERF_AUXTRACE_UNKNOWN,
 	PERF_AUXTRACE_INTEL_PT,
+	PERF_AUXTRACE_INTEL_BTS,
 };
 
 enum itrace_period_type {
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
new file mode 100644
index 0000000..ea76862
--- /dev/null
+++ b/tools/perf/util/intel-bts.c
@@ -0,0 +1,933 @@
+/*
+ * intel-bts.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <endian.h>
+#include <byteswap.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "cpumap.h"
+#include "color.h"
+#include "evsel.h"
+#include "evlist.h"
+#include "machine.h"
+#include "session.h"
+#include "util.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "debug.h"
+#include "tsc.h"
+#include "auxtrace.h"
+#include "intel-pt-decoder/intel-pt-insn-decoder.h"
+#include "intel-bts.h"
+
+#define MAX_TIMESTAMP (~0ULL)
+
+#define INTEL_BTS_ERR_NOINSN  5
+#define INTEL_BTS_ERR_LOST    9
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define le64_to_cpu bswap_64
+#else
+#define le64_to_cpu
+#endif
+
+struct intel_bts {
+	struct auxtrace			auxtrace;
+	struct auxtrace_queues		queues;
+	struct auxtrace_heap		heap;
+	u32				auxtrace_type;
+	struct perf_session		*session;
+	struct machine			*machine;
+	bool				sampling_mode;
+	bool				snapshot_mode;
+	bool				data_queued;
+	u32				pmu_type;
+	struct perf_tsc_conversion	tc;
+	bool				cap_user_time_zero;
+	struct itrace_synth_opts	synth_opts;
+	bool				sample_branches;
+	u32				branches_filter;
+	u64				branches_sample_type;
+	u64				branches_id;
+	size_t				branches_event_size;
+	bool				synth_needs_swap;
+};
+
+struct intel_bts_queue {
+	struct intel_bts	*bts;
+	unsigned int		queue_nr;
+	struct auxtrace_buffer	*buffer;
+	bool			on_heap;
+	bool			done;
+	pid_t			pid;
+	pid_t			tid;
+	int			cpu;
+	u64			time;
+	struct intel_pt_insn	intel_pt_insn;
+	u32			sample_flags;
+};
+
+struct branch {
+	u64 from;
+	u64 to;
+	u64 misc;
+};
+
+static void intel_bts_dump(struct intel_bts *bts __maybe_unused,
+			   unsigned char *buf, size_t len)
+{
+	struct branch *branch;
+	size_t i, pos = 0, br_sz = sizeof(struct branch), sz;
+	const char *color = PERF_COLOR_BLUE;
+
+	color_fprintf(stdout, color,
+		      ". ... Intel BTS data: size %zu bytes\n",
+		      len);
+
+	while (len) {
+		if (len >= br_sz)
+			sz = br_sz;
+		else
+			sz = len;
+		printf(".");
+		color_fprintf(stdout, color, "  %08x: ", pos);
+		for (i = 0; i < sz; i++)
+			color_fprintf(stdout, color, " %02x", buf[i]);
+		for (; i < br_sz; i++)
+			color_fprintf(stdout, color, "   ");
+		if (len >= br_sz) {
+			branch = (struct branch *)buf;
+			color_fprintf(stdout, color, " %"PRIx64" -> %"PRIx64" %s\n",
+				      le64_to_cpu(branch->from),
+				      le64_to_cpu(branch->to),
+				      le64_to_cpu(branch->misc) & 0x10 ?
+							"pred" : "miss");
+		} else {
+			color_fprintf(stdout, color, " Bad record!\n");
+		}
+		pos += sz;
+		buf += sz;
+		len -= sz;
+	}
+}
+
+static void intel_bts_dump_event(struct intel_bts *bts, unsigned char *buf,
+				 size_t len)
+{
+	printf(".\n");
+	intel_bts_dump(bts, buf, len);
+}
+
+static int intel_bts_lost(struct intel_bts *bts, struct perf_sample *sample)
+{
+	union perf_event event;
+	int err;
+
+	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
+			     INTEL_BTS_ERR_LOST, sample->cpu, sample->pid,
+			     sample->tid, 0, "Lost trace data");
+
+	err = perf_session__deliver_synth_event(bts->session, &event, NULL);
+	if (err)
+		pr_err("Intel BTS: failed to deliver error event, error %d\n",
+		       err);
+
+	return err;
+}
+
+static struct intel_bts_queue *intel_bts_alloc_queue(struct intel_bts *bts,
+						     unsigned int queue_nr)
+{
+	struct intel_bts_queue *btsq;
+
+	btsq = zalloc(sizeof(struct intel_bts_queue));
+	if (!btsq)
+		return NULL;
+
+	btsq->bts = bts;
+	btsq->queue_nr = queue_nr;
+	btsq->pid = -1;
+	btsq->tid = -1;
+	btsq->cpu = -1;
+
+	return btsq;
+}
+
+static int intel_bts_setup_queue(struct intel_bts *bts,
+				 struct auxtrace_queue *queue,
+				 unsigned int queue_nr)
+{
+	struct intel_bts_queue *btsq = queue->priv;
+
+	if (list_empty(&queue->head))
+		return 0;
+
+	if (!btsq) {
+		btsq = intel_bts_alloc_queue(bts, queue_nr);
+		if (!btsq)
+			return -ENOMEM;
+		queue->priv = btsq;
+
+		if (queue->cpu != -1)
+			btsq->cpu = queue->cpu;
+		btsq->tid = queue->tid;
+	}
+
+	if (bts->sampling_mode)
+		return 0;
+
+	if (!btsq->on_heap && !btsq->buffer) {
+		int ret;
+
+		btsq->buffer = auxtrace_buffer__next(queue, NULL);
+		if (!btsq->buffer)
+			return 0;
+
+		ret = auxtrace_heap__add(&bts->heap, queue_nr,
+					 btsq->buffer->reference);
+		if (ret)
+			return ret;
+		btsq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int intel_bts_setup_queues(struct intel_bts *bts)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < bts->queues.nr_queues; i++) {
+		ret = intel_bts_setup_queue(bts, &bts->queues.queue_array[i],
+					    i);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static inline int intel_bts_update_queues(struct intel_bts *bts)
+{
+	if (bts->queues.new_data) {
+		bts->queues.new_data = false;
+		return intel_bts_setup_queues(bts);
+	}
+	return 0;
+}
+
+static unsigned char *intel_bts_find_overlap(unsigned char *buf_a, size_t len_a,
+					     unsigned char *buf_b, size_t len_b)
+{
+	size_t offs, len;
+
+	if (len_a > len_b)
+		offs = len_a - len_b;
+	else
+		offs = 0;
+
+	for (; offs < len_a; offs += sizeof(struct branch)) {
+		len = len_a - offs;
+		if (!memcmp(buf_a + offs, buf_b, len))
+			return buf_b + len;
+	}
+
+	return buf_b;
+}
+
+static int intel_bts_do_fix_overlap(struct auxtrace_queue *queue,
+				    struct auxtrace_buffer *b)
+{
+	struct auxtrace_buffer *a;
+	void *start;
+
+	if (b->list.prev == &queue->head)
+		return 0;
+	a = list_entry(b->list.prev, struct auxtrace_buffer, list);
+	start = intel_bts_find_overlap(a->data, a->size, b->data, b->size);
+	if (!start)
+		return -EINVAL;
+	b->use_size = b->data + b->size - start;
+	b->use_data = start;
+	return 0;
+}
+
+static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
+					 struct branch *branch)
+{
+	int ret;
+	struct intel_bts *bts = btsq->bts;
+	union perf_event event;
+	struct perf_sample sample = { .ip = 0, };
+
+	event.sample.header.type = PERF_RECORD_SAMPLE;
+	event.sample.header.misc = PERF_RECORD_MISC_USER;
+	event.sample.header.size = sizeof(struct perf_event_header);
+
+	sample.ip = le64_to_cpu(branch->from);
+	sample.pid = btsq->pid;
+	sample.tid = btsq->tid;
+	sample.addr = le64_to_cpu(branch->to);
+	sample.id = btsq->bts->branches_id;
+	sample.stream_id = btsq->bts->branches_id;
+	sample.period = 1;
+	sample.cpu = btsq->cpu;
+	sample.flags = btsq->sample_flags;
+	sample.insn_len = btsq->intel_pt_insn.length;
+
+	if (bts->synth_opts.inject) {
+		event.sample.header.size = bts->branches_event_size;
+		ret = perf_event__synthesize_sample(&event,
+						    bts->branches_sample_type,
+						    0, &sample,
+						    bts->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(bts->session, &event, &sample);
+	if (ret)
+		pr_err("Intel BTS: failed to deliver branch event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_bts_get_next_insn(struct intel_bts_queue *btsq, u64 ip)
+{
+	struct machine *machine = btsq->bts->machine;
+	struct thread *thread;
+	struct addr_location al;
+	unsigned char buf[1024];
+	size_t bufsz;
+	ssize_t len;
+	int x86_64;
+	uint8_t cpumode;
+	int err = -1;
+
+	bufsz = intel_pt_insn_max_size();
+
+	if (machine__kernel_ip(machine, ip))
+		cpumode = PERF_RECORD_MISC_KERNEL;
+	else
+		cpumode = PERF_RECORD_MISC_USER;
+
+	thread = machine__find_thread(machine, -1, btsq->tid);
+	if (!thread)
+		return -1;
+
+	thread__find_addr_map(thread, cpumode, MAP__FUNCTION, ip, &al);
+	if (!al.map || !al.map->dso)
+		goto out_put;
+
+	len = dso__data_read_addr(al.map->dso, al.map, machine, ip, buf, bufsz);
+	if (len <= 0)
+		goto out_put;
+
+	/* Load maps to ensure dso->is_64_bit has been updated */
+	map__load(al.map, machine->symbol_filter);
+
+	x86_64 = al.map->dso->is_64_bit;
+
+	if (intel_pt_get_insn(buf, len, x86_64, &btsq->intel_pt_insn))
+		goto out_put;
+
+	err = 0;
+out_put:
+	thread__put(thread);
+	return err;
+}
+
+static int intel_bts_synth_error(struct intel_bts *bts, int cpu, pid_t pid,
+				 pid_t tid, u64 ip)
+{
+	union perf_event event;
+	int err;
+
+	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
+			     INTEL_BTS_ERR_NOINSN, cpu, pid, tid, ip,
+			     "Failed to get instruction");
+
+	err = perf_session__deliver_synth_event(bts->session, &event, NULL);
+	if (err)
+		pr_err("Intel BTS: failed to deliver error event, error %d\n",
+		       err);
+
+	return err;
+}
+
+static int intel_bts_get_branch_type(struct intel_bts_queue *btsq,
+				     struct branch *branch)
+{
+	int err;
+
+	if (!branch->from) {
+		if (branch->to)
+			btsq->sample_flags = PERF_IP_FLAG_BRANCH |
+					     PERF_IP_FLAG_TRACE_BEGIN;
+		else
+			btsq->sample_flags = 0;
+		btsq->intel_pt_insn.length = 0;
+	} else if (!branch->to) {
+		btsq->sample_flags = PERF_IP_FLAG_BRANCH |
+				     PERF_IP_FLAG_TRACE_END;
+		btsq->intel_pt_insn.length = 0;
+	} else {
+		err = intel_bts_get_next_insn(btsq, branch->from);
+		if (err) {
+			btsq->sample_flags = 0;
+			btsq->intel_pt_insn.length = 0;
+			if (!btsq->bts->synth_opts.errors)
+				return 0;
+			err = intel_bts_synth_error(btsq->bts, btsq->cpu,
+						    btsq->pid, btsq->tid,
+						    branch->from);
+			return err;
+		}
+		btsq->sample_flags = intel_pt_insn_type(btsq->intel_pt_insn.op);
+		/* Check for an async branch into the kernel */
+		if (!machine__kernel_ip(btsq->bts->machine, branch->from) &&
+		    machine__kernel_ip(btsq->bts->machine, branch->to) &&
+		    btsq->sample_flags != (PERF_IP_FLAG_BRANCH |
+					   PERF_IP_FLAG_CALL |
+					   PERF_IP_FLAG_SYSCALLRET))
+			btsq->sample_flags = PERF_IP_FLAG_BRANCH |
+					     PERF_IP_FLAG_CALL |
+					     PERF_IP_FLAG_ASYNC |
+					     PERF_IP_FLAG_INTERRUPT;
+	}
+
+	return 0;
+}
+
+static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
+				    struct auxtrace_buffer *buffer)
+{
+	struct branch *branch;
+	size_t sz, bsz = sizeof(struct branch);
+	u32 filter = btsq->bts->branches_filter;
+	int err = 0;
+
+	if (buffer->use_data) {
+		sz = buffer->use_size;
+		branch = buffer->use_data;
+	} else {
+		sz = buffer->size;
+		branch = buffer->data;
+	}
+
+	if (!btsq->bts->sample_branches)
+		return 0;
+
+	for (; sz > bsz; branch += 1, sz -= bsz) {
+		if (!branch->from && !branch->to)
+			continue;
+		intel_bts_get_branch_type(btsq, branch);
+		if (filter && !(filter & btsq->sample_flags))
+			continue;
+		err = intel_bts_synth_branch_sample(btsq, branch);
+		if (err)
+			break;
+	}
+	return err;
+}
+
+static int intel_bts_process_queue(struct intel_bts_queue *btsq, u64 *timestamp)
+{
+	struct auxtrace_buffer *buffer = btsq->buffer, *old_buffer = buffer;
+	struct auxtrace_queue *queue;
+	struct thread *thread;
+	int err;
+
+	if (btsq->done)
+		return 1;
+
+	if (btsq->pid == -1) {
+		thread = machine__find_thread(btsq->bts->machine, -1,
+					      btsq->tid);
+		if (thread)
+			btsq->pid = thread->pid_;
+	} else {
+		thread = machine__findnew_thread(btsq->bts->machine, btsq->pid,
+						 btsq->tid);
+	}
+
+	queue = &btsq->bts->queues.queue_array[btsq->queue_nr];
+
+	if (!buffer)
+		buffer = auxtrace_buffer__next(queue, NULL);
+
+	if (!buffer) {
+		if (!btsq->bts->sampling_mode)
+			btsq->done = 1;
+		err = 1;
+		goto out_put;
+	}
+
+	/* Currently there is no support for split buffers */
+	if (buffer->consecutive) {
+		err = -EINVAL;
+		goto out_put;
+	}
+
+	if (!buffer->data) {
+		int fd = perf_data_file__fd(btsq->bts->session->file);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data) {
+			err = -ENOMEM;
+			goto out_put;
+		}
+	}
+
+	if (btsq->bts->snapshot_mode && !buffer->consecutive &&
+	    intel_bts_do_fix_overlap(queue, buffer)) {
+		err = -ENOMEM;
+		goto out_put;
+	}
+
+	if (!btsq->bts->synth_opts.callchain && thread &&
+	    (!old_buffer || btsq->bts->sampling_mode ||
+	     (btsq->bts->snapshot_mode && !buffer->consecutive)))
+		thread_stack__set_trace_nr(thread, buffer->buffer_nr + 1);
+
+	err = intel_bts_process_buffer(btsq, buffer);
+
+	auxtrace_buffer__drop_data(buffer);
+
+	btsq->buffer = auxtrace_buffer__next(queue, buffer);
+	if (btsq->buffer) {
+		if (timestamp)
+			*timestamp = btsq->buffer->reference;
+	} else {
+		if (!btsq->bts->sampling_mode)
+			btsq->done = 1;
+	}
+out_put:
+	thread__put(thread);
+	return err;
+}
+
+static int intel_bts_flush_queue(struct intel_bts_queue *btsq)
+{
+	u64 ts = 0;
+	int ret;
+
+	while (1) {
+		ret = intel_bts_process_queue(btsq, &ts);
+		if (ret < 0)
+			return ret;
+		if (ret)
+			break;
+	}
+	return 0;
+}
+
+static int intel_bts_process_tid_exit(struct intel_bts *bts, pid_t tid)
+{
+	struct auxtrace_queues *queues = &bts->queues;
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &bts->queues.queue_array[i];
+		struct intel_bts_queue *btsq = queue->priv;
+
+		if (btsq && btsq->tid == tid)
+			return intel_bts_flush_queue(btsq);
+	}
+	return 0;
+}
+
+static int intel_bts_process_queues(struct intel_bts *bts, u64 timestamp)
+{
+	while (1) {
+		unsigned int queue_nr;
+		struct auxtrace_queue *queue;
+		struct intel_bts_queue *btsq;
+		u64 ts = 0;
+		int ret;
+
+		if (!bts->heap.heap_cnt)
+			return 0;
+
+		if (bts->heap.heap_array[0].ordinal > timestamp)
+			return 0;
+
+		queue_nr = bts->heap.heap_array[0].queue_nr;
+		queue = &bts->queues.queue_array[queue_nr];
+		btsq = queue->priv;
+
+		auxtrace_heap__pop(&bts->heap);
+
+		ret = intel_bts_process_queue(btsq, &ts);
+		if (ret < 0) {
+			auxtrace_heap__add(&bts->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&bts->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			btsq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int intel_bts_process_event(struct perf_session *session,
+				   union perf_event *event,
+				   struct perf_sample *sample,
+				   struct perf_tool *tool)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+	u64 timestamp;
+	int err;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("Intel BTS requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time && sample->time != (u64)-1)
+		timestamp = perf_time_to_tsc(sample->time, &bts->tc);
+	else
+		timestamp = 0;
+
+	err = intel_bts_update_queues(bts);
+	if (err)
+		return err;
+
+	err = intel_bts_process_queues(bts, timestamp);
+	if (err)
+		return err;
+	if (event->header.type == PERF_RECORD_EXIT) {
+		err = intel_bts_process_tid_exit(bts, event->comm.tid);
+		if (err)
+			return err;
+	}
+
+	if (event->header.type == PERF_RECORD_AUX &&
+	    (event->aux.flags & PERF_AUX_FLAG_TRUNCATED) &&
+	    bts->synth_opts.errors)
+		err = intel_bts_lost(bts, sample);
+
+	return err;
+}
+
+static int intel_bts_process_auxtrace_event(struct perf_session *session,
+					    union perf_event *event,
+					    struct perf_tool *tool __maybe_unused)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+
+	if (bts->sampling_mode)
+		return 0;
+
+	if (!bts->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data_file__fd(session->file);
+		int err;
+
+		if (perf_data_file__is_pipe(session->file)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&bts->queues, session, event,
+						 data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				intel_bts_dump_event(bts, buffer->data,
+						     buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int intel_bts_flush(struct perf_session *session __maybe_unused,
+			   struct perf_tool *tool __maybe_unused)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+	int ret;
+
+	if (dump_trace || bts->sampling_mode)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = intel_bts_update_queues(bts);
+	if (ret < 0)
+		return ret;
+
+	return intel_bts_process_queues(bts, MAX_TIMESTAMP);
+}
+
+static void intel_bts_free_queue(void *priv)
+{
+	struct intel_bts_queue *btsq = priv;
+
+	if (!btsq)
+		return;
+	free(btsq);
+}
+
+static void intel_bts_free_events(struct perf_session *session)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+	struct auxtrace_queues *queues = &bts->queues;
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		intel_bts_free_queue(queues->queue_array[i].priv);
+		queues->queue_array[i].priv = NULL;
+	}
+	auxtrace_queues__free(queues);
+}
+
+static void intel_bts_free(struct perf_session *session)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+
+	auxtrace_heap__free(&bts->heap);
+	intel_bts_free_events(session);
+	session->auxtrace = NULL;
+	free(bts);
+}
+
+struct intel_bts_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int intel_bts_event_synth(struct perf_tool *tool,
+				 union perf_event *event,
+				 struct perf_sample *sample __maybe_unused,
+				 struct machine *machine __maybe_unused)
+{
+	struct intel_bts_synth *intel_bts_synth =
+			container_of(tool, struct intel_bts_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(intel_bts_synth->session,
+						 event, NULL);
+}
+
+static int intel_bts_synth_event(struct perf_session *session,
+				 struct perf_event_attr *attr, u64 id)
+{
+	struct intel_bts_synth intel_bts_synth;
+
+	memset(&intel_bts_synth, 0, sizeof(struct intel_bts_synth));
+	intel_bts_synth.session = session;
+
+	return perf_event__synthesize_attr(&intel_bts_synth.dummy_tool, attr, 1,
+					   &id, intel_bts_event_synth);
+}
+
+static int intel_bts_synth_events(struct intel_bts *bts,
+				  struct perf_session *session)
+{
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == bts->pmu_type && evsel->ids) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("There are no selected events with Intel BTS data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+			    PERF_SAMPLE_PERIOD;
+	attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	attr.sample_type &= ~(u64)PERF_SAMPLE_CPU;
+	attr.exclude_user = evsel->attr.exclude_user;
+	attr.exclude_kernel = evsel->attr.exclude_kernel;
+	attr.exclude_hv = evsel->attr.exclude_hv;
+	attr.exclude_host = evsel->attr.exclude_host;
+	attr.exclude_guest = evsel->attr.exclude_guest;
+	attr.sample_id_all = evsel->attr.sample_id_all;
+	attr.read_format = evsel->attr.read_format;
+
+	id = evsel->id[0] + 1000000000;
+	if (!id)
+		id = 1;
+
+	if (bts->synth_opts.branches) {
+		attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
+		attr.sample_period = 1;
+		attr.sample_type |= PERF_SAMPLE_ADDR;
+		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_bts_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'branches' event type\n",
+			       __func__);
+			return err;
+		}
+		bts->sample_branches = true;
+		bts->branches_sample_type = attr.sample_type;
+		bts->branches_id = id;
+		/*
+		 * We only use sample types from PERF_SAMPLE_MASK so we can use
+		 * __perf_evsel__sample_size() here.
+		 */
+		bts->branches_event_size = sizeof(struct sample_event) +
+				__perf_evsel__sample_size(attr.sample_type);
+	}
+
+	bts->synth_needs_swap = evsel->needs_swap;
+
+	return 0;
+}
+
+static const char * const intel_bts_info_fmts[] = {
+	[INTEL_BTS_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
+	[INTEL_BTS_TIME_SHIFT]		= "  Time Shift         %"PRIu64"\n",
+	[INTEL_BTS_TIME_MULT]		= "  Time Muliplier     %"PRIu64"\n",
+	[INTEL_BTS_TIME_ZERO]		= "  Time Zero          %"PRIu64"\n",
+	[INTEL_BTS_CAP_USER_TIME_ZERO]	= "  Cap Time Zero      %"PRId64"\n",
+	[INTEL_BTS_SNAPSHOT_MODE]	= "  Snapshot mode      %"PRId64"\n",
+};
+
+static void intel_bts_print_info(u64 *arr, int start, int finish)
+{
+	int i;
+
+	if (!dump_trace)
+		return;
+
+	for (i = start; i <= finish; i++)
+		fprintf(stdout, intel_bts_info_fmts[i], arr[i]);
+}
+
+u64 intel_bts_auxtrace_info_priv[INTEL_BTS_AUXTRACE_PRIV_SIZE];
+
+int intel_bts_process_auxtrace_info(union perf_event *event,
+				    struct perf_session *session)
+{
+	struct auxtrace_info_event *auxtrace_info = &event->auxtrace_info;
+	size_t min_sz = sizeof(u64) * INTEL_BTS_SNAPSHOT_MODE;
+	struct intel_bts *bts;
+	int err;
+
+	if (auxtrace_info->header.size < sizeof(struct auxtrace_info_event) +
+					min_sz)
+		return -EINVAL;
+
+	bts = zalloc(sizeof(struct intel_bts));
+	if (!bts)
+		return -ENOMEM;
+
+	err = auxtrace_queues__init(&bts->queues);
+	if (err)
+		goto err_free;
+
+	bts->session = session;
+	bts->machine = &session->machines.host; /* No kvm support */
+	bts->auxtrace_type = auxtrace_info->type;
+	bts->pmu_type = auxtrace_info->priv[INTEL_BTS_PMU_TYPE];
+	bts->tc.time_shift = auxtrace_info->priv[INTEL_BTS_TIME_SHIFT];
+	bts->tc.time_mult = auxtrace_info->priv[INTEL_BTS_TIME_MULT];
+	bts->tc.time_zero = auxtrace_info->priv[INTEL_BTS_TIME_ZERO];
+	bts->cap_user_time_zero =
+			auxtrace_info->priv[INTEL_BTS_CAP_USER_TIME_ZERO];
+	bts->snapshot_mode = auxtrace_info->priv[INTEL_BTS_SNAPSHOT_MODE];
+
+	bts->sampling_mode = false;
+
+	bts->auxtrace.process_event = intel_bts_process_event;
+	bts->auxtrace.process_auxtrace_event = intel_bts_process_auxtrace_event;
+	bts->auxtrace.flush_events = intel_bts_flush;
+	bts->auxtrace.free_events = intel_bts_free_events;
+	bts->auxtrace.free = intel_bts_free;
+	session->auxtrace = &bts->auxtrace;
+
+	intel_bts_print_info(&auxtrace_info->priv[0], INTEL_BTS_PMU_TYPE,
+			     INTEL_BTS_SNAPSHOT_MODE);
+
+	if (dump_trace)
+		return 0;
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
+		bts->synth_opts = *session->itrace_synth_opts;
+	else
+		itrace_synth_opts__set_default(&bts->synth_opts);
+
+	if (bts->synth_opts.calls)
+		bts->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
+					PERF_IP_FLAG_TRACE_END;
+	if (bts->synth_opts.returns)
+		bts->branches_filter |= PERF_IP_FLAG_RETURN |
+					PERF_IP_FLAG_TRACE_BEGIN;
+
+	err = intel_bts_synth_events(bts, session);
+	if (err)
+		goto err_free_queues;
+
+	err = auxtrace_queues__process_index(&bts->queues, session);
+	if (err)
+		goto err_free_queues;
+
+	if (bts->queues.populated)
+		bts->data_queued = true;
+
+	return 0;
+
+err_free_queues:
+	auxtrace_queues__free(&bts->queues);
+	session->auxtrace = NULL;
+err_free:
+	free(bts);
+	return err;
+}
diff --git a/tools/perf/util/intel-bts.h b/tools/perf/util/intel-bts.h
new file mode 100644
index 0000000..ca65e21
--- /dev/null
+++ b/tools/perf/util/intel-bts.h
@@ -0,0 +1,43 @@
+/*
+ * intel-bts.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__PERF_INTEL_BTS_H__
+#define INCLUDE__PERF_INTEL_BTS_H__
+
+#define INTEL_BTS_PMU_NAME "intel_bts"
+
+enum {
+	INTEL_BTS_PMU_TYPE,
+	INTEL_BTS_TIME_SHIFT,
+	INTEL_BTS_TIME_MULT,
+	INTEL_BTS_TIME_ZERO,
+	INTEL_BTS_CAP_USER_TIME_ZERO,
+	INTEL_BTS_SNAPSHOT_MODE,
+	INTEL_BTS_AUXTRACE_PRIV_MAX,
+};
+
+#define INTEL_BTS_AUXTRACE_PRIV_SIZE (INTEL_BTS_AUXTRACE_PRIV_MAX * sizeof(u64))
+
+struct auxtrace_record;
+struct perf_tool;
+union perf_event;
+struct perf_session;
+
+struct auxtrace_record *intel_bts_recording_init(int *err);
+
+int intel_bts_process_auxtrace_info(union perf_event *event,
+				    struct perf_session *session);
+
+#endif
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 3c71138..89c91a1 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -462,10 +462,6 @@ static struct perf_pmu *pmu_lookup(const char *name)
 	LIST_HEAD(aliases);
 	__u32 type;
 
-	/* No support for intel_bts so disallow it */
-	if (!strcmp(name, "intel_bts"))
-		return NULL;
-
 	/*
 	 * The pmu data we store & need consists of the pmu
 	 * type value and format definitions. Load both right

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Put itrace options into an asciidoc include
  2015-07-17 16:33 ` [PATCH V8 09/25] perf tools: Put itrace options into an asciidoc include Adrian Hunter
@ 2015-08-22  6:53   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-22  6:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, jolsa, tglx, adrian.hunter, mingo, acme, hpa

Commit-ID:  60b88d8743892218f82048a3df624f5fc5460843
Gitweb:     http://git.kernel.org/tip/60b88d8743892218f82048a3df624f5fc5460843
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:44 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 21 Aug 2015 11:40:44 -0300

perf tools: Put itrace options into an asciidoc include

perf script, report and inject all have the same itrace options. Put
them into an asciidoc include file.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-10-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/itrace.txt      | 22 ++++++++++++++++++++++
 tools/perf/Documentation/perf-inject.txt | 23 +----------------------
 tools/perf/Documentation/perf-report.txt | 23 +----------------------
 tools/perf/Documentation/perf-script.txt | 23 +----------------------
 4 files changed, 25 insertions(+), 66 deletions(-)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
new file mode 100644
index 0000000..2ff9466
--- /dev/null
+++ b/tools/perf/Documentation/itrace.txt
@@ -0,0 +1,22 @@
+		i	synthesize instructions events
+		b	synthesize branches events
+		c	synthesize branches events (calls only)
+		r	synthesize branches events (returns only)
+		x	synthesize transactions events
+		e	synthesize error events
+		d	create a debug log
+		g	synthesize a call chain (use with i or x)
+
+	The default is all events i.e. the same as --itrace=ibxe
+
+	In addition, the period (default 100000) for instructions events
+	can be specified in units of:
+
+		i	instructions
+		t	ticks
+		ms	milliseconds
+		us	microseconds
+		ns	nanoseconds (default)
+
+	Also the call chain size (default 16, max. 1024) for instructions or
+	transactions events can be specified.
diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt
index b876ae3..0c721c3 100644
--- a/tools/perf/Documentation/perf-inject.txt
+++ b/tools/perf/Documentation/perf-inject.txt
@@ -48,28 +48,7 @@ OPTIONS
 	Decode Instruction Tracing data, replacing it with synthesized events.
 	Options are:
 
-		i	synthesize instructions events
-		b	synthesize branches events
-		c	synthesize branches events (calls only)
-		r	synthesize branches events (returns only)
-		x	synthesize transactions events
-		e	synthesize error events
-		d	create a debug log
-		g	synthesize a call chain (use with i or x)
-
-	The default is all events i.e. the same as --itrace=ibxe
-
-	In addition, the period (default 100000) for instructions events
-	can be specified in units of:
-
-		i	instructions
-		t	ticks
-		ms	milliseconds
-		us	microseconds
-		ns	nanoseconds (default)
-
-	Also the call chain size (default 16, max. 1024) for instructions or
-	transactions events can be specified.
+include::itrace.txt[]
 
 SEE ALSO
 --------
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index a18ba75..9c7981b 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -331,28 +331,7 @@ OPTIONS
 --itrace::
 	Options for decoding instruction tracing data. The options are:
 
-		i	synthesize instructions events
-		b	synthesize branches events
-		c	synthesize branches events (calls only)
-		r	synthesize branches events (returns only)
-		x	synthesize transactions events
-		e	synthesize error events
-		d	create a debug log
-		g	synthesize a call chain (use with i or x)
-
-	The default is all events i.e. the same as --itrace=ibxe
-
-	In addition, the period (default 100000) for instructions events
-	can be specified in units of:
-
-		i	instructions
-		t	ticks
-		ms	milliseconds
-		us	microseconds
-		ns	nanoseconds (default)
-
-	Also the call chain size (default 16, max. 1024) for instructions or
-	transactions events can be specified.
+include::itrace.txt[]
 
 	To disable decoding entirely, use --no-itrace.
 
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index 8e9be1f..c0d2479 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -235,28 +235,7 @@ OPTIONS
 --itrace::
 	Options for decoding instruction tracing data. The options are:
 
-		i	synthesize instructions events
-		b	synthesize branches events
-		c	synthesize branches events (calls only)
-		r	synthesize branches events (returns only)
-		x	synthesize transactions events
-		e	synthesize error events
-		d	create a debug log
-		g	synthesize a call chain (use with i or x)
-
-	The default is all events i.e. the same as --itrace=ibxe
-
-	In addition, the period (default 100000) for instructions events
-	can be specified in units of:
-
-		i	instructions
-		t	ticks
-		ms	milliseconds
-		us	microseconds
-		ns	nanoseconds (default)
-
-	Also the call chain size (default 16, max. 1024) for instructions or
-	transactions events can be specified.
+include::itrace.txt[]
 
 	To disable decoding entirely, use --no-itrace.
 

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add example call-graph script
  2015-07-17 16:33 ` [PATCH V8 10/25] perf tools: Add example call-graph script Adrian Hunter
  2015-08-21 15:00   ` Arnaldo Carvalho de Melo
@ 2015-08-22  6:53   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-22  6:53 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, hpa, acme, linux-kernel, jolsa, mingo, adrian.hunter

Commit-ID:  4b715d24f4f14731c7b553cbb8604fe865cb8d3c
Gitweb:     http://git.kernel.org/tip/4b715d24f4f14731c7b553cbb8604fe865cb8d3c
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:45 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 21 Aug 2015 12:32:40 -0300

perf tools: Add example call-graph script

Add a script to produce a call-graph from data exported to a postgresql
database and derived from a processor trace event like intel_pt or intel_bts.

Refer to comments in the scripts call-graph-from-postgresql.py and
export-to-postgresql.py for more details on how to set up the environment,
install the required packages, etc.

Committer note:

>From the scripts, for convenience while reading 'git log':

  An example of using this script with Intel PT:

  $ perf record -e intel_pt//u ls
  $ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py pt_example branches calls
  2015-05-29 12:49:23.464364 Creating database...
  2015-05-29 12:49:26.281717 Writing to intermediate files...
  2015-05-29 12:49:27.190383 Copying to database...
  2015-05-29 12:49:28.140451 Removing intermediate files...
  2015-05-29 12:49:28.147451 Adding primary keys
  2015-05-29 12:49:28.655683 Adding foreign keys
  2015-05-29 12:49:29.365350 Done
  $ python tools/perf/scripts/python/call-graph-from-postgresql.py pt_example
  # The result is a GUI window with a tree representing a context-sensitive
  # call-graph.  Expanding a couple of levels of the tree and adjusting column
  # widths to suit will display something like:

                                         Call Graph: pt_example
  Call Path                        |Object     |Count|Time(ns)|Time(%)|Branch Count|Branch Count(%)
  v- ls
     v- 2638:2638
         v- _start                  ld-2.19.so    1   10074071  100.0        211135          100.0
           |- unknown               unknown       1      13198    0.1             1            0.0
           >- _dl_start             ld-2.19.so    1    1400980   13.9         19637            9.3
           >- _d_linit_internal     ld-2.19.so    1     448152    4.4         11094            5.3
           v-__libc_start_main@plt  ls            1    8211741   81.5        180397           85.4
              >- _dl_fixup          ld-2.19.so    1       7607    0.1           108            0.1
              >- __cxa_atexit       libc-2.19.so  1      11737    0.1            10            0.0
              >- __libc_csu_init    ls            1      10354    0.1            10            0.0
              |- _setjmp            libc-2.19.so  1          0    0.0             4            0.0
              v- main               ls            1    8182043   99.6        180254           99.9

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-11-git-send-email-adrian.hunter@intel.com
[ Added 'python-pyside qt-postgresql' to the yum cmdline installing required packages ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 .../scripts/python/call-graph-from-postgresql.py   | 327 +++++++++++++++++++++
 tools/perf/scripts/python/export-to-postgresql.py  |  47 +++
 2 files changed, 374 insertions(+)

diff --git a/tools/perf/scripts/python/call-graph-from-postgresql.py b/tools/perf/scripts/python/call-graph-from-postgresql.py
new file mode 100644
index 0000000..e78fdc2
--- /dev/null
+++ b/tools/perf/scripts/python/call-graph-from-postgresql.py
@@ -0,0 +1,327 @@
+#!/usr/bin/python2
+# call-graph-from-postgresql.py: create call-graph from postgresql database
+# Copyright (c) 2014, Intel Corporation.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+
+# To use this script you will need to have exported data using the
+# export-to-postgresql.py script.  Refer to that script for details.
+#
+# Following on from the example in the export-to-postgresql.py script, a
+# call-graph can be displayed for the pt_example database like this:
+#
+#	python tools/perf/scripts/python/call-graph-from-postgresql.py pt_example
+#
+# Note this script supports connecting to remote databases by setting hostname,
+# port, username, password, and dbname e.g.
+#
+#	python tools/perf/scripts/python/call-graph-from-postgresql.py "hostname=myhost username=myuser password=mypassword dbname=pt_example"
+#
+# The result is a GUI window with a tree representing a context-sensitive
+# call-graph.  Expanding a couple of levels of the tree and adjusting column
+# widths to suit will display something like:
+#
+#                                         Call Graph: pt_example
+# Call Path                          Object      Count   Time(ns)  Time(%)  Branch Count   Branch Count(%)
+# v- ls
+#     v- 2638:2638
+#         v- _start                  ld-2.19.so    1     10074071   100.0         211135            100.0
+#           |- unknown               unknown       1        13198     0.1              1              0.0
+#           >- _dl_start             ld-2.19.so    1      1400980    13.9          19637              9.3
+#           >- _d_linit_internal     ld-2.19.so    1       448152     4.4          11094              5.3
+#           v-__libc_start_main@plt  ls            1      8211741    81.5         180397             85.4
+#              >- _dl_fixup          ld-2.19.so    1         7607     0.1            108              0.1
+#              >- __cxa_atexit       libc-2.19.so  1        11737     0.1             10              0.0
+#              >- __libc_csu_init    ls            1        10354     0.1             10              0.0
+#              |- _setjmp            libc-2.19.so  1            0     0.0              4              0.0
+#              v- main               ls            1      8182043    99.6         180254             99.9
+#
+# Points to note:
+#	The top level is a command name (comm)
+#	The next level is a thread (pid:tid)
+#	Subsequent levels are functions
+#	'Count' is the number of calls
+#	'Time' is the elapsed time until the function returns
+#	Percentages are relative to the level above
+#	'Branch Count' is the total number of branches for that function and all
+#       functions that it calls
+
+import sys
+from PySide.QtCore import *
+from PySide.QtGui import *
+from PySide.QtSql import *
+from decimal import *
+
+class TreeItem():
+
+	def __init__(self, db, row, parent_item):
+		self.db = db
+		self.row = row
+		self.parent_item = parent_item
+		self.query_done = False;
+		self.child_count = 0
+		self.child_items = []
+		self.data = ["", "", "", "", "", "", ""]
+		self.comm_id = 0
+		self.thread_id = 0
+		self.call_path_id = 1
+		self.branch_count = 0
+		self.time = 0
+		if not parent_item:
+			self.setUpRoot()
+
+	def setUpRoot(self):
+		self.query_done = True
+		query = QSqlQuery(self.db)
+		ret = query.exec_('SELECT id, comm FROM comms')
+		if not ret:
+			raise Exception("Query failed: " + query.lastError().text())
+		while query.next():
+			if not query.value(0):
+				continue
+			child_item = TreeItem(self.db, self.child_count, self)
+			self.child_items.append(child_item)
+			self.child_count += 1
+			child_item.setUpLevel1(query.value(0), query.value(1))
+
+	def setUpLevel1(self, comm_id, comm):
+		self.query_done = True;
+		self.comm_id = comm_id
+		self.data[0] = comm
+		self.child_items = []
+		self.child_count = 0
+		query = QSqlQuery(self.db)
+		ret = query.exec_('SELECT thread_id, ( SELECT pid FROM threads WHERE id = thread_id ), ( SELECT tid FROM threads WHERE id = thread_id ) FROM comm_threads WHERE comm_id = ' + str(comm_id))
+		if not ret:
+			raise Exception("Query failed: " + query.lastError().text())
+		while query.next():
+			child_item = TreeItem(self.db, self.child_count, self)
+			self.child_items.append(child_item)
+			self.child_count += 1
+			child_item.setUpLevel2(comm_id, query.value(0), query.value(1), query.value(2))
+
+	def setUpLevel2(self, comm_id, thread_id, pid, tid):
+		self.comm_id = comm_id
+		self.thread_id = thread_id
+		self.data[0] = str(pid) + ":" + str(tid)
+
+	def getChildItem(self, row):
+		return self.child_items[row]
+
+	def getParentItem(self):
+		return self.parent_item
+
+	def getRow(self):
+		return self.row
+
+	def timePercent(self, b):
+		if not self.time:
+			return "0.0"
+		x = (b * Decimal(100)) / self.time
+		return str(x.quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
+
+	def branchPercent(self, b):
+		if not self.branch_count:
+			return "0.0"
+		x = (b * Decimal(100)) / self.branch_count
+		return str(x.quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
+
+	def addChild(self, call_path_id, name, dso, count, time, branch_count):
+		child_item = TreeItem(self.db, self.child_count, self)
+		child_item.comm_id = self.comm_id
+		child_item.thread_id = self.thread_id
+		child_item.call_path_id = call_path_id
+		child_item.branch_count = branch_count
+		child_item.time = time
+		child_item.data[0] = name
+		if dso == "[kernel.kallsyms]":
+			dso = "[kernel]"
+		child_item.data[1] = dso
+		child_item.data[2] = str(count)
+		child_item.data[3] = str(time)
+		child_item.data[4] = self.timePercent(time)
+		child_item.data[5] = str(branch_count)
+		child_item.data[6] = self.branchPercent(branch_count)
+		self.child_items.append(child_item)
+		self.child_count += 1
+
+	def selectCalls(self):
+		self.query_done = True;
+		query = QSqlQuery(self.db)
+		ret = query.exec_('SELECT id, call_path_id, branch_count, call_time, return_time, '
+				  '( SELECT name FROM symbols WHERE id = ( SELECT symbol_id FROM call_paths WHERE id = call_path_id ) ), '
+				  '( SELECT short_name FROM dsos WHERE id = ( SELECT dso_id FROM symbols WHERE id = ( SELECT symbol_id FROM call_paths WHERE id = call_path_id ) ) ), '
+				  '( SELECT ip FROM call_paths where id = call_path_id ) '
+				  'FROM calls WHERE parent_call_path_id = ' + str(self.call_path_id) + ' AND comm_id = ' + str(self.comm_id) + ' AND thread_id = ' + str(self.thread_id) +
+				  'ORDER BY call_path_id')
+		if not ret:
+			raise Exception("Query failed: " + query.lastError().text())
+		last_call_path_id = 0
+		name = ""
+		dso = ""
+		count = 0
+		branch_count = 0
+		total_branch_count = 0
+		time = 0
+		total_time = 0
+		while query.next():
+			if query.value(1) == last_call_path_id:
+				count += 1
+				branch_count += query.value(2)
+				time += query.value(4) - query.value(3)
+			else:
+				if count:
+					self.addChild(last_call_path_id, name, dso, count, time, branch_count)
+				last_call_path_id = query.value(1)
+				name = query.value(5)
+				dso = query.value(6)
+				count = 1
+				total_branch_count += branch_count
+				total_time += time
+				branch_count = query.value(2)
+				time = query.value(4) - query.value(3)
+		if count:
+			self.addChild(last_call_path_id, name, dso, count, time, branch_count)
+		total_branch_count += branch_count
+		total_time += time
+		# Top level does not have time or branch count, so fix that here
+		if total_branch_count > self.branch_count:
+			self.branch_count = total_branch_count
+			if self.branch_count:
+				for child_item in self.child_items:
+					child_item.data[6] = self.branchPercent(child_item.branch_count)
+		if total_time > self.time:
+			self.time = total_time
+			if self.time:
+				for child_item in self.child_items:
+					child_item.data[4] = self.timePercent(child_item.time)
+
+	def childCount(self):
+		if not self.query_done:
+			self.selectCalls()
+		return self.child_count
+
+	def columnCount(self):
+		return 7
+
+	def columnHeader(self, column):
+		headers = ["Call Path", "Object", "Count ", "Time (ns) ", "Time (%) ", "Branch Count ", "Branch Count (%) "]
+		return headers[column]
+
+	def getData(self, column):
+		return self.data[column]
+
+class TreeModel(QAbstractItemModel):
+
+	def __init__(self, db, parent=None):
+		super(TreeModel, self).__init__(parent)
+		self.db = db
+		self.root = TreeItem(db, 0, None)
+
+	def columnCount(self, parent):
+		return self.root.columnCount()
+
+	def rowCount(self, parent):
+		if parent.isValid():
+			parent_item = parent.internalPointer()
+		else:
+			parent_item = self.root
+		return parent_item.childCount()
+
+	def headerData(self, section, orientation, role):
+		if role == Qt.TextAlignmentRole:
+			if section > 1:
+				return Qt.AlignRight
+		if role != Qt.DisplayRole:
+			return None
+		if orientation != Qt.Horizontal:
+			return None
+		return self.root.columnHeader(section)
+
+	def parent(self, child):
+		child_item = child.internalPointer()
+		if child_item is self.root:
+			return QModelIndex()
+		parent_item = child_item.getParentItem()
+		return self.createIndex(parent_item.getRow(), 0, parent_item)
+
+	def index(self, row, column, parent):
+		if parent.isValid():
+			parent_item = parent.internalPointer()
+		else:
+			parent_item = self.root
+		child_item = parent_item.getChildItem(row)
+		return self.createIndex(row, column, child_item)
+
+	def data(self, index, role):
+		if role == Qt.TextAlignmentRole:
+			if index.column() > 1:
+				return Qt.AlignRight
+		if role != Qt.DisplayRole:
+			return None
+		index_item = index.internalPointer()
+		return index_item.getData(index.column())
+
+class MainWindow(QMainWindow):
+
+	def __init__(self, db, dbname, parent=None):
+		super(MainWindow, self).__init__(parent)
+
+		self.setObjectName("MainWindow")
+		self.setWindowTitle("Call Graph: " + dbname)
+		self.move(100, 100)
+		self.resize(800, 600)
+		style = self.style()
+		icon = style.standardIcon(QStyle.SP_MessageBoxInformation)
+		self.setWindowIcon(icon);
+
+		self.model = TreeModel(db)
+
+		self.view = QTreeView()
+		self.view.setModel(self.model)
+
+		self.setCentralWidget(self.view)
+
+if __name__ == '__main__':
+	if (len(sys.argv) < 2):
+		print >> sys.stderr, "Usage is: call-graph-from-postgresql.py <database name>"
+		raise Exception("Too few arguments")
+
+	dbname = sys.argv[1]
+
+	db = QSqlDatabase.addDatabase('QPSQL')
+
+	opts = dbname.split()
+	for opt in opts:
+		if '=' in opt:
+			opt = opt.split('=')
+			if opt[0] == 'hostname':
+				db.setHostName(opt[1])
+			elif opt[0] == 'port':
+				db.setPort(int(opt[1]))
+			elif opt[0] == 'username':
+				db.setUserName(opt[1])
+			elif opt[0] == 'password':
+				db.setPassword(opt[1])
+			elif opt[0] == 'dbname':
+				dbname = opt[1]
+		else:
+			dbname = opt
+
+	db.setDatabaseName(dbname)
+	if not db.open():
+		raise Exception("Failed to open database " + dbname + " error: " + db.lastError().text())
+
+	app = QApplication(sys.argv)
+	window = MainWindow(db, dbname)
+	window.show()
+	err = app.exec_()
+	db.close()
+	sys.exit(err)
diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py
index 4cdafd8..84a3203 100644
--- a/tools/perf/scripts/python/export-to-postgresql.py
+++ b/tools/perf/scripts/python/export-to-postgresql.py
@@ -15,6 +15,53 @@ import sys
 import struct
 import datetime
 
+# To use this script you will need to have installed package python-pyside which
+# provides LGPL-licensed Python bindings for Qt.  You will also need the package
+# libqt4-sql-psql for Qt postgresql support.
+#
+# The script assumes postgresql is running on the local machine and that the
+# user has postgresql permissions to create databases. Examples of installing
+# postgresql and adding such a user are:
+#
+# fedora:
+#
+#	$ sudo yum install postgresql postgresql-server python-pyside qt-postgresql
+#	$ sudo su - postgres -c initdb
+#	$ sudo service postgresql start
+#	$ sudo su - postgres
+#	$ createuser <your user id here>
+#	Shall the new role be a superuser? (y/n) y
+#
+# ubuntu:
+#
+#	$ sudo apt-get install postgresql
+#	$ sudo su - postgres
+#	$ createuser <your user id here>
+#	Shall the new role be a superuser? (y/n) y
+#
+# An example of using this script with Intel PT:
+#
+#	$ perf record -e intel_pt//u ls
+#	$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py pt_example branches calls
+#	2015-05-29 12:49:23.464364 Creating database...
+#	2015-05-29 12:49:26.281717 Writing to intermediate files...
+#	2015-05-29 12:49:27.190383 Copying to database...
+#	2015-05-29 12:49:28.140451 Removing intermediate files...
+#	2015-05-29 12:49:28.147451 Adding primary keys
+#	2015-05-29 12:49:28.655683 Adding foreign keys
+#	2015-05-29 12:49:29.365350 Done
+#
+# To browse the database, psql can be used e.g.
+#
+#	$ psql pt_example
+#	pt_example=# select * from samples_view where id < 100;
+#	pt_example=# \d+
+#	pt_example=# \d+ samples_view
+#	pt_example=# \q
+#
+# An example of using the database is provided by the script
+# call-graph-from-postgresql.py.  Refer to that script for details.
+
 from PySide.QtSql import *
 
 # Need to access PostgreSQL C library directly to use COPY FROM STDIN

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 10/25] perf tools: Add example call-graph script
  2015-08-21 15:28         ` Arnaldo Carvalho de Melo
  2015-08-21 15:32           ` Arnaldo Carvalho de Melo
@ 2015-08-24  7:00           ` Adrian Hunter
  2015-08-24 20:20             ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 77+ messages in thread
From: Adrian Hunter @ 2015-08-24  7:00 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

On 21/08/15 18:28, Arnaldo Carvalho de Melo wrote:
> Em Fri, Aug 21, 2015 at 12:21:25PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Fri, Aug 21, 2015 at 12:11:33PM -0300, Arnaldo Carvalho de Melo escreveu:
>>> [acme@zoo ~]$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py bts_example branches calls
>>> 2015-08-21 12:10:00.504126 Creating database...
>>> QSqlDatabase: QPSQL driver not loaded
>>> QSqlDatabase: available drivers: QSQLITE
>>> QSqlDatabase: an instance of QCoreApplication is required for loading driver plugins
>>> QSqlQuery::exec: database not open
>>> Traceback (most recent call last):
>>>   File "/home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py", line 87, in <module>
>>>     do_query(query, 'CREATE DATABASE ' + dbname)
>>>   File "/home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py", line 78, in do_query
>>>     raise Exception("Query failed: " + q.lastError().text())
>>> Exception: Query failed: Driver not loaded Driver not loaded
>>> Error running python script /home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py
>>> [acme@zoo ~]$ 
>>
>> A-ha, this was missing:
>>
>> [root@zoo ~]# rpm -ql qt-postgresql 
>> /usr/lib64/qt4/plugins/sqldrivers/libqsqlpsql.so
>> [root@zoo ~]#
>>
>> [acme@zoo ~]$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py bts_example branches calls
>> 2015-08-21 12:20:01.841677 Creating database...
>> 2015-08-21 12:20:02.556853 Writing to intermediate files...
>> 2015-08-21 12:20:03.262814 Copying to database...
>> 2015-08-21 12:20:03.783109 Removing intermediate files...
>> 2015-08-21 12:20:03.790282 Adding primary keys
>> 2015-08-21 12:20:04.294342 Adding foreign keys
>> 2015-08-21 12:20:04.718238 Done
>> [acme@zoo ~]$ 
>>
>> Now to the next steps... Lets see how this looks like with BTS...
> 
> So, after running:
> 
>   [acme@zoo linux]$ python
>   tools/perf/scripts/python/call-graph-from-postgresql.py bts_example
> 
> I got a GUI and after expanding some callchains I took this screenshot:
> 
>   http://vger.kernel.org/~acme/perf/call_graph_example_intel_bts.png
> 
> Cool stuff! But it seems the COMM got messed up? I.e. that "1380:1380"
> first level after "ls".

That is, in fact, correct.  The top-level is the process (comm).  The next
level is the threads (pid/tid) for that process.  Currently, each thread is
reported separately.

For the future, there ought to be an option to combine the data for all
threads of a process, or all processes with the same comm.  Also presently
it is not tracking if a thread sets it's own comm (e.g. some multi-threaded
applications name each thread).


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: [PATCH V8 10/25] perf tools: Add example call-graph script
  2015-08-24  7:00           ` Adrian Hunter
@ 2015-08-24 20:20             ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 77+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-08-24 20:20 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Mon, Aug 24, 2015 at 10:00:17AM +0300, Adrian Hunter escreveu:
> On 21/08/15 18:28, Arnaldo Carvalho de Melo wrote:
> > Em Fri, Aug 21, 2015 at 12:21:25PM -0300, Arnaldo Carvalho de Melo escreveu:
> >> Em Fri, Aug 21, 2015 at 12:11:33PM -0300, Arnaldo Carvalho de Melo escreveu:
> >>> [acme@zoo ~]$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py bts_example branches calls
> >>> 2015-08-21 12:10:00.504126 Creating database...
> >>> QSqlDatabase: QPSQL driver not loaded
> >>> QSqlDatabase: available drivers: QSQLITE
> >>> QSqlDatabase: an instance of QCoreApplication is required for loading driver plugins
> >>> QSqlQuery::exec: database not open
> >>> Traceback (most recent call last):
> >>>   File "/home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py", line 87, in <module>
> >>>     do_query(query, 'CREATE DATABASE ' + dbname)
> >>>   File "/home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py", line 78, in do_query
> >>>     raise Exception("Query failed: " + q.lastError().text())
> >>> Exception: Query failed: Driver not loaded Driver not loaded
> >>> Error running python script /home/acme/libexec/perf-core/scripts/python/export-to-postgresql.py
> >>> [acme@zoo ~]$ 
> >>
> >> A-ha, this was missing:
> >>
> >> [root@zoo ~]# rpm -ql qt-postgresql 
> >> /usr/lib64/qt4/plugins/sqldrivers/libqsqlpsql.so
> >> [root@zoo ~]#
> >>
> >> [acme@zoo ~]$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py bts_example branches calls
> >> 2015-08-21 12:20:01.841677 Creating database...
> >> 2015-08-21 12:20:02.556853 Writing to intermediate files...
> >> 2015-08-21 12:20:03.262814 Copying to database...
> >> 2015-08-21 12:20:03.783109 Removing intermediate files...
> >> 2015-08-21 12:20:03.790282 Adding primary keys
> >> 2015-08-21 12:20:04.294342 Adding foreign keys
> >> 2015-08-21 12:20:04.718238 Done
> >> [acme@zoo ~]$ 
> >>
> >> Now to the next steps... Lets see how this looks like with BTS...
> > 
> > So, after running:
> > 
> >   [acme@zoo linux]$ python
> >   tools/perf/scripts/python/call-graph-from-postgresql.py bts_example
> > 
> > I got a GUI and after expanding some callchains I took this screenshot:
> > 
> >   http://vger.kernel.org/~acme/perf/call_graph_example_intel_bts.png
> > 
> > Cool stuff! But it seems the COMM got messed up? I.e. that "1380:1380"
> > first level after "ls".
> 
> That is, in fact, correct.  The top-level is the process (comm).  The next
> level is the threads (pid/tid) for that process.  Currently, each thread is
> reported separately.

Ok, understood.

> For the future, there ought to be an option to combine the data for all
> threads of a process, or all processes with the same comm.  Also presently
> it is not tracking if a thread sets it's own comm (e.g. some multi-threaded
> applications name each thread).

And in that case it would be nice to have both the comms and pid/tid.

- Arnaldo

^ permalink raw reply	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Fix Intel PT 'instructions' sample period
  2015-07-17 16:33 ` [PATCH V8 13/25] perf tools: Fix Intel PT 'instructions' sample period Adrian Hunter
@ 2015-08-28  6:37   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-28  6:37 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, jolsa, mingo, tglx, hpa, adrian.hunter, linux-kernel

Commit-ID:  2a21d03686881331b0af0471588674e7e896eeb2
Gitweb:     http://git.kernel.org/tip/2a21d03686881331b0af0471588674e7e896eeb2
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:48 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 24 Aug 2015 17:42:26 -0300

perf tools: Fix Intel PT 'instructions' sample period

The period on synthesized 'instructions' samples was being set to a
fixed value, whereas the correct value is the number of instructions
since the last sample, which is a value that the decoder can provide.
So do it that way.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-14-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 3 +++
 tools/perf/util/intel-pt-decoder/intel-pt-decoder.h | 1 +
 tools/perf/util/intel-pt.c                          | 5 ++++-
 3 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index f8ac462..56790ea 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -108,6 +108,7 @@ struct intel_pt_decoder {
 	uint64_t sign_bits;
 	uint64_t period;
 	enum intel_pt_period_type period_type;
+	uint64_t tot_insn_cnt;
 	uint64_t period_insn_cnt;
 	uint64_t period_mask;
 	uint64_t period_ticks;
@@ -559,6 +560,7 @@ static int intel_pt_walk_insn(struct intel_pt_decoder *decoder,
 	err = decoder->walk_insn(intel_pt_insn, &insn_cnt, &decoder->ip, ip,
 				 max_insn_cnt, decoder->data);
 
+	decoder->tot_insn_cnt += insn_cnt;
 	decoder->timestamp_insn_cnt += insn_cnt;
 	decoder->period_insn_cnt += insn_cnt;
 
@@ -1529,6 +1531,7 @@ const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder)
 	decoder->state.timestamp = decoder->timestamp;
 	decoder->state.est_timestamp = intel_pt_est_timestamp(decoder);
 	decoder->state.cr3 = decoder->cr3;
+	decoder->state.tot_insn_cnt = decoder->tot_insn_cnt;
 
 	if (err)
 		decoder->state.from_ip = decoder->ip;
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
index 4c48802..cbf5704 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -58,6 +58,7 @@ struct intel_pt_state {
 	uint64_t from_ip;
 	uint64_t to_ip;
 	uint64_t cr3;
+	uint64_t tot_insn_cnt;
 	uint64_t timestamp;
 	uint64_t est_timestamp;
 	uint64_t trace_nr;
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index a5acd2f..3b34a64 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -126,6 +126,7 @@ struct intel_pt_queue {
 	u64 timestamp;
 	u32 flags;
 	u16 insn_len;
+	u64 last_insn_cnt;
 };
 
 static void intel_pt_dump(struct intel_pt *pt __maybe_unused,
@@ -920,11 +921,13 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
 	sample.addr = ptq->state->to_ip;
 	sample.id = ptq->pt->instructions_id;
 	sample.stream_id = ptq->pt->instructions_id;
-	sample.period = ptq->pt->instructions_sample_period;
+	sample.period = ptq->state->tot_insn_cnt - ptq->last_insn_cnt;
 	sample.cpu = ptq->cpu;
 	sample.flags = ptq->flags;
 	sample.insn_len = ptq->insn_len;
 
+	ptq->last_insn_cnt = ptq->state->tot_insn_cnt;
+
 	if (pt->synth_opts.callchain) {
 		thread_stack__sample(ptq->thread, ptq->chain,
 				     pt->synth_opts.callchain_sz, sample.ip);

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add Intel PT support for PSB periods
  2015-07-17 16:33 ` [PATCH V8 17/25] perf tools: Add Intel PT support for PSB periods Adrian Hunter
@ 2015-08-28  6:38   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-28  6:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, hpa, adrian.hunter, linux-kernel, jolsa, acme, tglx

Commit-ID:  bc9b6bf07c8b3f4e85509f9b3a552c86e567b4ae
Gitweb:     http://git.kernel.org/tip/bc9b6bf07c8b3f4e85509f9b3a552c86e567b4ae
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:52 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 24 Aug 2015 17:45:08 -0300

perf tools: Add Intel PT support for PSB periods

The PSB packet is a synchronization packet that provides a starting
point for decoding or recovery from errors.

This patch adds support for a new Intel PT feature that allows the
frequency of PSB packets to be specified.

Support for this feature is indicated by
/sys/bus/event_source/devices/intel_pt/caps/psb_cyc which contains "1"
if the feature is supported and "0" otherwise.

The PSB period can be specified as a PMU config term e.g. perf record -e
intel_pt/psb_period=2/u sleep 1

The default value is 3 or the nearest lower value that is supported.  0
is always supported.

Valid values are given by:

/sys/bus/event_source/devices/intel_pt/caps/psb_periods

which contains a hexadecimal value, the bits of which represent valid
values e.g. bit 2 set means value 2 is valid.

The value is converted to the approximate number of trace bytes between
PSB packets as:

	2 ^ (value + 11)

e.g. value 3 means 16KiB bytes between PSBs

If an invalid value is entered, the error message will give a list of
valid values e.g.

	$ perf record -e intel_pt/psb_period=15/u uname
	Invalid psb_period for intel_pt. Valid values are: 0-5

tools/perf/Documentation/intel-pt.txt is updated in a later patch as
there are a number of new features being added.

For more information about PSB periods refer to the Intel 64 and IA-32
Architectures SDM Chapter 36 Intel Processor Trace from June 2015 or
later.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-18-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/x86/util/intel-pt.c | 217 ++++++++++++++++++++++++++++++++++--
 1 file changed, 210 insertions(+), 7 deletions(-)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index da7d2c1..145975b 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -99,17 +99,121 @@ static int intel_pt_parse_terms(struct list_head *formats, const char *str,
 	return intel_pt_parse_terms_with_default(formats, str, config);
 }
 
-static size_t intel_pt_psb_period(struct perf_pmu *intel_pt_pmu __maybe_unused,
-				  struct perf_evlist *evlist __maybe_unused)
+static u64 intel_pt_masked_bits(u64 mask, u64 bits)
 {
-	return 256;
+	const u64 top_bit = 1ULL << 63;
+	u64 res = 0;
+	int i;
+
+	for (i = 0; i < 64; i++) {
+		if (mask & top_bit) {
+			res <<= 1;
+			if (bits & top_bit)
+				res |= 1;
+		}
+		mask <<= 1;
+		bits <<= 1;
+	}
+
+	return res;
+}
+
+static int intel_pt_read_config(struct perf_pmu *intel_pt_pmu, const char *str,
+				struct perf_evlist *evlist, u64 *res)
+{
+	struct perf_evsel *evsel;
+	u64 mask;
+
+	*res = 0;
+
+	mask = perf_pmu__format_bits(&intel_pt_pmu->format, str);
+	if (!mask)
+		return -EINVAL;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == intel_pt_pmu->type) {
+			*res = intel_pt_masked_bits(mask, evsel->attr.config);
+			return 0;
+		}
+	}
+
+	return -EINVAL;
+}
+
+static size_t intel_pt_psb_period(struct perf_pmu *intel_pt_pmu,
+				  struct perf_evlist *evlist)
+{
+	u64 val;
+	int err, topa_multiple_entries;
+	size_t psb_period;
+
+	if (perf_pmu__scan_file(intel_pt_pmu, "caps/topa_multiple_entries",
+				"%d", &topa_multiple_entries) != 1)
+		topa_multiple_entries = 0;
+
+	/*
+	 * Use caps/topa_multiple_entries to indicate early hardware that had
+	 * extra frequent PSBs.
+	 */
+	if (!topa_multiple_entries) {
+		psb_period = 256;
+		goto out;
+	}
+
+	err = intel_pt_read_config(intel_pt_pmu, "psb_period", evlist, &val);
+	if (err)
+		val = 0;
+
+	psb_period = 1 << (val + 11);
+out:
+	pr_debug2("%s psb_period %zu\n", intel_pt_pmu->name, psb_period);
+	return psb_period;
+}
+
+static int intel_pt_pick_bit(int bits, int target)
+{
+	int pos, pick = -1;
+
+	for (pos = 0; bits; bits >>= 1, pos++) {
+		if (bits & 1) {
+			if (pos <= target || pick < 0)
+				pick = pos;
+			if (pos >= target)
+				break;
+		}
+	}
+
+	return pick;
 }
 
 static u64 intel_pt_default_config(struct perf_pmu *intel_pt_pmu)
 {
+	char buf[256];
+	int psb_cyc, psb_periods, psb_period;
+	int pos = 0;
 	u64 config;
 
-	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &config);
+	pos += scnprintf(buf + pos, sizeof(buf) - pos, "tsc");
+
+	if (perf_pmu__scan_file(intel_pt_pmu, "caps/psb_cyc", "%d",
+				&psb_cyc) != 1)
+		psb_cyc = 1;
+
+	if (psb_cyc) {
+		if (perf_pmu__scan_file(intel_pt_pmu, "caps/psb_periods", "%x",
+					&psb_periods) != 1)
+			psb_periods = 0;
+		if (psb_periods) {
+			psb_period = intel_pt_pick_bit(psb_periods, 3);
+			pos += scnprintf(buf + pos, sizeof(buf) - pos,
+					 ",psb_period=%d", psb_period);
+		}
+	}
+
+	pr_debug2("%s default config: %s\n", intel_pt_pmu->name, buf);
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, buf, &config);
+
 	return config;
 }
 
@@ -239,6 +343,103 @@ static int intel_pt_track_switches(struct perf_evlist *evlist)
 	return 0;
 }
 
+static void intel_pt_valid_str(char *str, size_t len, u64 valid)
+{
+	unsigned int val, last = 0, state = 1;
+	int p = 0;
+
+	str[0] = '\0';
+
+	for (val = 0; val <= 64; val++, valid >>= 1) {
+		if (valid & 1) {
+			last = val;
+			switch (state) {
+			case 0:
+				p += scnprintf(str + p, len - p, ",");
+				/* Fall through */
+			case 1:
+				p += scnprintf(str + p, len - p, "%u", val);
+				state = 2;
+				break;
+			case 2:
+				state = 3;
+				break;
+			case 3:
+				state = 4;
+				break;
+			default:
+				break;
+			}
+		} else {
+			switch (state) {
+			case 3:
+				p += scnprintf(str + p, len - p, ",%u", last);
+				state = 0;
+				break;
+			case 4:
+				p += scnprintf(str + p, len - p, "-%u", last);
+				state = 0;
+				break;
+			default:
+				break;
+			}
+			if (state != 1)
+				state = 0;
+		}
+	}
+}
+
+static int intel_pt_val_config_term(struct perf_pmu *intel_pt_pmu,
+				    const char *caps, const char *name,
+				    const char *supported, u64 config)
+{
+	char valid_str[256];
+	unsigned int shift;
+	unsigned long long valid;
+	u64 bits;
+	int ok;
+
+	if (perf_pmu__scan_file(intel_pt_pmu, caps, "%llx", &valid) != 1)
+		valid = 0;
+
+	if (supported &&
+	    perf_pmu__scan_file(intel_pt_pmu, supported, "%d", &ok) == 1 && !ok)
+		valid = 0;
+
+	valid |= 1;
+
+	bits = perf_pmu__format_bits(&intel_pt_pmu->format, name);
+
+	config &= bits;
+
+	for (shift = 0; bits && !(bits & 1); shift++)
+		bits >>= 1;
+
+	config >>= shift;
+
+	if (config > 63)
+		goto out_err;
+
+	if (valid & (1 << config))
+		return 0;
+out_err:
+	intel_pt_valid_str(valid_str, sizeof(valid_str), valid);
+	pr_err("Invalid %s for %s. Valid values are: %s\n",
+	       name, INTEL_PT_PMU_NAME, valid_str);
+	return -EINVAL;
+}
+
+static int intel_pt_validate_config(struct perf_pmu *intel_pt_pmu,
+				    struct perf_evsel *evsel)
+{
+	if (!evsel)
+		return 0;
+
+	return intel_pt_val_config_term(intel_pt_pmu, "caps/psb_periods",
+					"psb_period", "caps/psb_cyc",
+					evsel->attr.config);
+}
+
 static int intel_pt_recording_options(struct auxtrace_record *itr,
 				      struct perf_evlist *evlist,
 				      struct record_opts *opts)
@@ -251,6 +452,7 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 	const struct cpu_map *cpus = evlist->cpus;
 	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
 	u64 tsc_bit;
+	int err;
 
 	ptr->evlist = evlist;
 	ptr->snapshot_mode = opts->auxtrace_snapshot_mode;
@@ -281,6 +483,10 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 	if (!opts->full_auxtrace)
 		return 0;
 
+	err = intel_pt_validate_config(intel_pt_pmu, intel_pt_evsel);
+	if (err)
+		return err;
+
 	/* Set default sizes for snapshot mode */
 	if (opts->auxtrace_snapshot_mode) {
 		size_t psb_period = intel_pt_psb_period(intel_pt_pmu, evlist);
@@ -366,8 +572,6 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 	 * threads.
 	 */
 	if (have_timing_info && !cpu_map__empty(cpus)) {
-		int err;
-
 		err = intel_pt_track_switches(evlist);
 		if (err == -EPERM)
 			pr_debug2("Unable to select sched:sched_switch\n");
@@ -394,7 +598,6 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 	/* Add dummy event to keep tracking */
 	if (opts->full_auxtrace) {
 		struct perf_evsel *tracking_evsel;
-		int err;
 
 		err = parse_events(evlist, "dummy:u", NULL);
 		if (err)

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add new Intel PT packet definitions
  2015-07-17 16:33 ` [PATCH V8 18/25] perf tools: Add new Intel PT packet definitions Adrian Hunter
@ 2015-08-28  6:38   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-28  6:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: tglx, mingo, linux-kernel, adrian.hunter, jolsa, hpa, acme

Commit-ID:  3d49807870f08d6f3406b77efd94bb3788372162
Gitweb:     http://git.kernel.org/tip/3d49807870f08d6f3406b77efd94bb3788372162
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:53 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 24 Aug 2015 17:46:06 -0300

perf tools: Add new Intel PT packet definitions

New features have been added to Intel PT which include a number of new
packet definitions.

This patch adds packet definitions for new packets: TMA, MTC, CYC, VMCS,
TRACESTOP and MNT.  Also another bit in PIP is defined.

This patch only adds support for the definitions. Later patches add
support for decoding TMA, MTC, CYC and TRACESTOP which is where those
packets are explained.

VMCS and the newly defined bit in PIP are used with virtualization which
is not supported yet.  MNT is a maintenance packet which the decoder
should ignore.

For details, refer to the June 2015 or later Intel 64 and IA-32
Architectures SDM Chapter 36 Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-19-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  |  70 +++++++++-
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   | 142 +++++++++++++++++++--
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.h   |   6 +
 3 files changed, 201 insertions(+), 17 deletions(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 56790ea..4a0e9fb 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -923,6 +923,7 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 		case INTEL_PT_TIP_PGE:
 		case INTEL_PT_TIP:
 		case INTEL_PT_TNT:
+		case INTEL_PT_TRACESTOP:
 		case INTEL_PT_BAD:
 		case INTEL_PT_PSB:
 			intel_pt_log("ERROR: Unexpected packet\n");
@@ -935,6 +936,9 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 			intel_pt_calc_tsc_timestamp(decoder);
 			break;
 
+		case INTEL_PT_TMA:
+			break;
+
 		case INTEL_PT_CBR:
 			decoder->cbr = decoder->packet.payload;
 			break;
@@ -944,7 +948,7 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload;
+			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
 			break;
 
 		case INTEL_PT_FUP:
@@ -956,6 +960,12 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 			intel_pt_update_in_tx(decoder);
 			break;
 
+		case INTEL_PT_MTC:
+			break;
+
+		case INTEL_PT_CYC:
+		case INTEL_PT_VMCS:
+		case INTEL_PT_MNT:
 		case INTEL_PT_PAD:
 		default:
 			break;
@@ -983,8 +993,10 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
 		switch (decoder->packet.type) {
 		case INTEL_PT_TNT:
 		case INTEL_PT_FUP:
+		case INTEL_PT_TRACESTOP:
 		case INTEL_PT_PSB:
 		case INTEL_PT_TSC:
+		case INTEL_PT_TMA:
 		case INTEL_PT_CBR:
 		case INTEL_PT_MODE_TSX:
 		case INTEL_PT_BAD:
@@ -1032,13 +1044,21 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
 			return 0;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload;
+			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
+			break;
+
+		case INTEL_PT_MTC:
+			break;
+
+		case INTEL_PT_CYC:
 			break;
 
 		case INTEL_PT_MODE_EXEC:
 			decoder->exec_mode = decoder->packet.payload;
 			break;
 
+		case INTEL_PT_VMCS:
+		case INTEL_PT_MNT:
 		case INTEL_PT_PAD:
 			break;
 
@@ -1122,6 +1142,9 @@ next:
 			}
 			return intel_pt_walk_fup_tip(decoder);
 
+		case INTEL_PT_TRACESTOP:
+			break;
+
 		case INTEL_PT_PSB:
 			intel_pt_clear_stack(&decoder->stack);
 			err = intel_pt_walk_psbend(decoder);
@@ -1132,13 +1155,22 @@ next:
 			break;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload;
+			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
+			break;
+
+		case INTEL_PT_MTC:
 			break;
 
 		case INTEL_PT_TSC:
 			intel_pt_calc_tsc_timestamp(decoder);
 			break;
 
+		case INTEL_PT_TMA:
+			break;
+
+		case INTEL_PT_CYC:
+			break;
+
 		case INTEL_PT_CBR:
 			decoder->cbr = decoder->packet.payload;
 			break;
@@ -1162,6 +1194,8 @@ next:
 			return intel_pt_bug(decoder);
 
 		case INTEL_PT_PSBEND:
+		case INTEL_PT_VMCS:
+		case INTEL_PT_MNT:
 		case INTEL_PT_PAD:
 			break;
 
@@ -1202,16 +1236,25 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			}
 			break;
 
+		case INTEL_PT_MTC:
+			break;
+
 		case INTEL_PT_TSC:
 			intel_pt_calc_tsc_timestamp(decoder);
 			break;
 
+		case INTEL_PT_TMA:
+			break;
+
+		case INTEL_PT_CYC:
+			break;
+
 		case INTEL_PT_CBR:
 			decoder->cbr = decoder->packet.payload;
 			break;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload;
+			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
 			break;
 
 		case INTEL_PT_MODE_EXEC:
@@ -1222,6 +1265,7 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			intel_pt_update_in_tx(decoder);
 			break;
 
+		case INTEL_PT_TRACESTOP:
 		case INTEL_PT_TNT:
 			intel_pt_log("ERROR: Unexpected packet\n");
 			if (decoder->ip)
@@ -1240,6 +1284,8 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			return 0;
 
 		case INTEL_PT_PSB:
+		case INTEL_PT_VMCS:
+		case INTEL_PT_MNT:
 		case INTEL_PT_PAD:
 		default:
 			break;
@@ -1282,16 +1328,25 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 				intel_pt_set_last_ip(decoder);
 			break;
 
+		case INTEL_PT_MTC:
+			break;
+
 		case INTEL_PT_TSC:
 			intel_pt_calc_tsc_timestamp(decoder);
 			break;
 
+		case INTEL_PT_TMA:
+			break;
+
+		case INTEL_PT_CYC:
+			break;
+
 		case INTEL_PT_CBR:
 			decoder->cbr = decoder->packet.payload;
 			break;
 
 		case INTEL_PT_PIP:
-			decoder->cr3 = decoder->packet.payload;
+			decoder->cr3 = decoder->packet.payload & (BIT63 - 1);
 			break;
 
 		case INTEL_PT_MODE_EXEC:
@@ -1308,6 +1363,9 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 		case INTEL_PT_BAD: /* Does not happen */
 			return intel_pt_bug(decoder);
 
+		case INTEL_PT_TRACESTOP:
+			break;
+
 		case INTEL_PT_PSB:
 			err = intel_pt_walk_psb(decoder);
 			if (err)
@@ -1321,6 +1379,8 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 
 		case INTEL_PT_TNT:
 		case INTEL_PT_PSBEND:
+		case INTEL_PT_VMCS:
+		case INTEL_PT_MNT:
 		case INTEL_PT_PAD:
 		default:
 			break;
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
index 988c82c..b1257c8 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
@@ -24,6 +24,8 @@
 
 #define BIT63		((uint64_t)1 << 63)
 
+#define NR_FLAG		BIT63
+
 #if __BYTE_ORDER == __BIG_ENDIAN
 #define le16_to_cpu bswap_16
 #define le32_to_cpu bswap_32
@@ -46,15 +48,21 @@ static const char * const packet_name[] = {
 	[INTEL_PT_TIP_PGD]	= "TIP.PGD",
 	[INTEL_PT_TIP_PGE]	= "TIP.PGE",
 	[INTEL_PT_TSC]		= "TSC",
+	[INTEL_PT_TMA]		= "TMA",
 	[INTEL_PT_MODE_EXEC]	= "MODE.Exec",
 	[INTEL_PT_MODE_TSX]	= "MODE.TSX",
+	[INTEL_PT_MTC]		= "MTC",
 	[INTEL_PT_TIP]		= "TIP",
 	[INTEL_PT_FUP]		= "FUP",
+	[INTEL_PT_CYC]		= "CYC",
+	[INTEL_PT_VMCS]		= "VMCS",
 	[INTEL_PT_PSB]		= "PSB",
 	[INTEL_PT_PSBEND]	= "PSBEND",
 	[INTEL_PT_CBR]		= "CBR",
+	[INTEL_PT_TRACESTOP]	= "TraceSTOP",
 	[INTEL_PT_PIP]		= "PIP",
 	[INTEL_PT_OVF]		= "OVF",
+	[INTEL_PT_MNT]		= "MNT",
 };
 
 const char *intel_pt_pkt_name(enum intel_pt_pkt_type type)
@@ -96,10 +104,18 @@ static int intel_pt_get_pip(const unsigned char *buf, size_t len,
 	packet->type = INTEL_PT_PIP;
 	memcpy_le64(&payload, buf + 2, 6);
 	packet->payload = payload >> 1;
+	if (payload & 1)
+		packet->payload |= NR_FLAG;
 
 	return 8;
 }
 
+static int intel_pt_get_tracestop(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_TRACESTOP;
+	return 2;
+}
+
 static int intel_pt_get_cbr(const unsigned char *buf, size_t len,
 			    struct intel_pt_pkt *packet)
 {
@@ -110,6 +126,24 @@ static int intel_pt_get_cbr(const unsigned char *buf, size_t len,
 	return 4;
 }
 
+static int intel_pt_get_vmcs(const unsigned char *buf, size_t len,
+			     struct intel_pt_pkt *packet)
+{
+	unsigned int count = (52 - 5) >> 3;
+
+	if (count < 1 || count > 7)
+		return INTEL_PT_BAD_PACKET;
+
+	if (len < count + 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	packet->type = INTEL_PT_VMCS;
+	packet->count = count;
+	memcpy_le64(&packet->payload, buf + 2, count);
+
+	return count + 2;
+}
+
 static int intel_pt_get_ovf(struct intel_pt_pkt *packet)
 {
 	packet->type = INTEL_PT_OVF;
@@ -139,12 +173,49 @@ static int intel_pt_get_psbend(struct intel_pt_pkt *packet)
 	return 2;
 }
 
+static int intel_pt_get_tma(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 7)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	packet->type = INTEL_PT_TMA;
+	packet->payload = buf[2] | (buf[3] << 8);
+	packet->count = buf[5] | ((buf[6] & BIT(0)) << 8);
+	return 7;
+}
+
 static int intel_pt_get_pad(struct intel_pt_pkt *packet)
 {
 	packet->type = INTEL_PT_PAD;
 	return 1;
 }
 
+static int intel_pt_get_mnt(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 11)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_MNT;
+	memcpy_le64(&packet->payload, buf + 3, 8);
+	return 11
+;
+}
+
+static int intel_pt_get_3byte(const unsigned char *buf, size_t len,
+			      struct intel_pt_pkt *packet)
+{
+	if (len < 3)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	switch (buf[2]) {
+	case 0x88: /* MNT */
+		return intel_pt_get_mnt(buf, len, packet);
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+}
+
 static int intel_pt_get_ext(const unsigned char *buf, size_t len,
 			    struct intel_pt_pkt *packet)
 {
@@ -156,14 +227,22 @@ static int intel_pt_get_ext(const unsigned char *buf, size_t len,
 		return intel_pt_get_long_tnt(buf, len, packet);
 	case 0x43: /* PIP */
 		return intel_pt_get_pip(buf, len, packet);
+	case 0x83: /* TraceStop */
+		return intel_pt_get_tracestop(packet);
 	case 0x03: /* CBR */
 		return intel_pt_get_cbr(buf, len, packet);
+	case 0xc8: /* VMCS */
+		return intel_pt_get_vmcs(buf, len, packet);
 	case 0xf3: /* OVF */
 		return intel_pt_get_ovf(packet);
 	case 0x82: /* PSB */
 		return intel_pt_get_psb(buf, len, packet);
 	case 0x23: /* PSBEND */
 		return intel_pt_get_psbend(packet);
+	case 0x73: /* TMA */
+		return intel_pt_get_tma(buf, len, packet);
+	case 0xC3: /* 3-byte header */
+		return intel_pt_get_3byte(buf, len, packet);
 	default:
 		return INTEL_PT_BAD_PACKET;
 	}
@@ -187,6 +266,28 @@ static int intel_pt_get_short_tnt(unsigned int byte,
 	return 1;
 }
 
+static int intel_pt_get_cyc(unsigned int byte, const unsigned char *buf,
+			    size_t len, struct intel_pt_pkt *packet)
+{
+	unsigned int offs = 1, shift;
+	uint64_t payload = byte >> 3;
+
+	byte >>= 2;
+	len -= 1;
+	for (shift = 5; byte & 1; shift += 7) {
+		if (offs > 9)
+			return INTEL_PT_BAD_PACKET;
+		if (len < offs)
+			return INTEL_PT_NEED_MORE_BYTES;
+		byte = buf[offs++];
+		payload |= (byte >> 1) << shift;
+	}
+
+	packet->type = INTEL_PT_CYC;
+	packet->payload = payload;
+	return offs;
+}
+
 static int intel_pt_get_ip(enum intel_pt_pkt_type type, unsigned int byte,
 			   const unsigned char *buf, size_t len,
 			   struct intel_pt_pkt *packet)
@@ -269,6 +370,16 @@ static int intel_pt_get_tsc(const unsigned char *buf, size_t len,
 	return 8;
 }
 
+static int intel_pt_get_mtc(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_MTC;
+	packet->payload = buf[1];
+	return 2;
+}
+
 static int intel_pt_do_get_packet(const unsigned char *buf, size_t len,
 				  struct intel_pt_pkt *packet)
 {
@@ -288,6 +399,9 @@ static int intel_pt_do_get_packet(const unsigned char *buf, size_t len,
 		return intel_pt_get_short_tnt(byte, packet);
 	}
 
+	if ((byte & 2))
+		return intel_pt_get_cyc(byte, buf, len, packet);
+
 	switch (byte & 0x1f) {
 	case 0x0D:
 		return intel_pt_get_ip(INTEL_PT_TIP, byte, buf, len, packet);
@@ -305,6 +419,8 @@ static int intel_pt_do_get_packet(const unsigned char *buf, size_t len,
 			return intel_pt_get_mode(buf, len, packet);
 		case 0x19:
 			return intel_pt_get_tsc(buf, len, packet);
+		case 0x59:
+			return intel_pt_get_mtc(buf, len, packet);
 		default:
 			return INTEL_PT_BAD_PACKET;
 		}
@@ -329,7 +445,7 @@ int intel_pt_get_packet(const unsigned char *buf, size_t len,
 int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf,
 		      size_t buf_len)
 {
-	int ret, i;
+	int ret, i, nr;
 	unsigned long long payload = packet->payload;
 	const char *name = intel_pt_pkt_name(packet->type);
 
@@ -338,6 +454,7 @@ int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf,
 	case INTEL_PT_PAD:
 	case INTEL_PT_PSB:
 	case INTEL_PT_PSBEND:
+	case INTEL_PT_TRACESTOP:
 	case INTEL_PT_OVF:
 		return snprintf(buf, buf_len, "%s", name);
 	case INTEL_PT_TNT: {
@@ -371,17 +488,16 @@ int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf,
 	case INTEL_PT_FUP:
 		if (!(packet->count))
 			return snprintf(buf, buf_len, "%s no ip", name);
+	case INTEL_PT_CYC:
+	case INTEL_PT_VMCS:
+	case INTEL_PT_MTC:
+	case INTEL_PT_MNT:
 	case INTEL_PT_CBR:
-		return snprintf(buf, buf_len, "%s 0x%llx", name, payload);
 	case INTEL_PT_TSC:
-		if (packet->count)
-			return snprintf(buf, buf_len,
-					"%s 0x%llx CTC 0x%x FC 0x%x",
-					name, payload, packet->count & 0xffff,
-					(packet->count >> 16) & 0x1ff);
-		else
-			return snprintf(buf, buf_len, "%s 0x%llx",
-					name, payload);
+		return snprintf(buf, buf_len, "%s 0x%llx", name, payload);
+	case INTEL_PT_TMA:
+		return snprintf(buf, buf_len, "%s CTC 0x%x FC 0x%x", name,
+				(unsigned)payload, packet->count);
 	case INTEL_PT_MODE_EXEC:
 		return snprintf(buf, buf_len, "%s %lld", name, payload);
 	case INTEL_PT_MODE_TSX:
@@ -389,8 +505,10 @@ int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf,
 				name, (unsigned)(payload >> 1) & 1,
 				(unsigned)payload & 1);
 	case INTEL_PT_PIP:
-		ret = snprintf(buf, buf_len, "%s 0x%llx",
-			       name, payload);
+		nr = packet->payload & NR_FLAG ? 1 : 0;
+		payload &= ~NR_FLAG;
+		ret = snprintf(buf, buf_len, "%s 0x%llx (NR=%d)",
+			       name, payload, nr);
 		return ret;
 	default:
 		break;
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
index 53404fa..781bb79 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
@@ -37,15 +37,21 @@ enum intel_pt_pkt_type {
 	INTEL_PT_TIP_PGD,
 	INTEL_PT_TIP_PGE,
 	INTEL_PT_TSC,
+	INTEL_PT_TMA,
 	INTEL_PT_MODE_EXEC,
 	INTEL_PT_MODE_TSX,
+	INTEL_PT_MTC,
 	INTEL_PT_TIP,
 	INTEL_PT_FUP,
+	INTEL_PT_CYC,
+	INTEL_PT_VMCS,
 	INTEL_PT_PSB,
 	INTEL_PT_PSBEND,
 	INTEL_PT_CBR,
+	INTEL_PT_TRACESTOP,
 	INTEL_PT_PIP,
 	INTEL_PT_OVF,
+	INTEL_PT_MNT,
 };
 
 struct intel_pt_pkt {

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Pass Intel PT information for decoding MTC and CYC
  2015-07-17 16:33 ` [PATCH V8 19/25] perf tools: Pass Intel PT information for decoding MTC and CYC Adrian Hunter
@ 2015-08-28  6:38   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-28  6:38 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, jolsa, linux-kernel, adrian.hunter, acme, hpa, tglx

Commit-ID:  11fa7cb86b56d3610043ba2ac6cbd81feab4b7c4
Gitweb:     http://git.kernel.org/tip/11fa7cb86b56d3610043ba2ac6cbd81feab4b7c4
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:54 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 24 Aug 2015 17:46:43 -0300

perf tools: Pass Intel PT information for decoding MTC and CYC

Record additional information in the AUXTRACE_INFO event in preparation
for decoding MTC and CYC packets.  Pass the information to the decoder.

The AUXTRACE_INFO record can be extended by using the size to indicate
the presence of new members.

The additional information includes PMU config bit positions and the TSC
to CTC (hardware crystal clock) ratio needed to decode MTC packets.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-20-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/x86/util/intel-pt.c                | 24 ++++++++-
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |  3 ++
 tools/perf/util/intel-pt.c                         | 62 ++++++++++++++++++----
 tools/perf/util/intel-pt.h                         |  5 ++
 4 files changed, 83 insertions(+), 11 deletions(-)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index 145975b..faae928 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -18,6 +18,7 @@
 #include <linux/types.h>
 #include <linux/bitops.h>
 #include <linux/log2.h>
+#include <cpuid.h>
 
 #include "../../perf.h"
 #include "../../util/session.h"
@@ -261,6 +262,15 @@ static size_t intel_pt_info_priv_size(struct auxtrace_record *itr __maybe_unused
 	return INTEL_PT_AUXTRACE_PRIV_SIZE;
 }
 
+static void intel_pt_tsc_ctc_ratio(u32 *n, u32 *d)
+{
+	unsigned int eax = 0, ebx = 0, ecx = 0, edx = 0;
+
+	__get_cpuid(0x15, &eax, &ebx, &ecx, &edx);
+	*n = ebx;
+	*d = eax;
+}
+
 static int intel_pt_info_fill(struct auxtrace_record *itr,
 			      struct perf_session *session,
 			      struct auxtrace_info_event *auxtrace_info,
@@ -272,7 +282,8 @@ static int intel_pt_info_fill(struct auxtrace_record *itr,
 	struct perf_event_mmap_page *pc;
 	struct perf_tsc_conversion tc = { .time_mult = 0, };
 	bool cap_user_time_zero = false, per_cpu_mmaps;
-	u64 tsc_bit, noretcomp_bit;
+	u64 tsc_bit, mtc_bit, mtc_freq_bits, cyc_bit, noretcomp_bit;
+	u32 tsc_ctc_ratio_n, tsc_ctc_ratio_d;
 	int err;
 
 	if (priv_size != INTEL_PT_AUXTRACE_PRIV_SIZE)
@@ -281,6 +292,12 @@ static int intel_pt_info_fill(struct auxtrace_record *itr,
 	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
 	intel_pt_parse_terms(&intel_pt_pmu->format, "noretcomp",
 			     &noretcomp_bit);
+	intel_pt_parse_terms(&intel_pt_pmu->format, "mtc", &mtc_bit);
+	mtc_freq_bits = perf_pmu__format_bits(&intel_pt_pmu->format,
+					      "mtc_period");
+	intel_pt_parse_terms(&intel_pt_pmu->format, "cyc", &cyc_bit);
+
+	intel_pt_tsc_ctc_ratio(&tsc_ctc_ratio_n, &tsc_ctc_ratio_d);
 
 	if (!session->evlist->nr_mmaps)
 		return -EINVAL;
@@ -311,6 +328,11 @@ static int intel_pt_info_fill(struct auxtrace_record *itr,
 	auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH] = ptr->have_sched_switch;
 	auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE] = ptr->snapshot_mode;
 	auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS] = per_cpu_mmaps;
+	auxtrace_info->priv[INTEL_PT_MTC_BIT] = mtc_bit;
+	auxtrace_info->priv[INTEL_PT_MTC_FREQ_BITS] = mtc_freq_bits;
+	auxtrace_info->priv[INTEL_PT_TSC_CTC_N] = tsc_ctc_ratio_n;
+	auxtrace_info->priv[INTEL_PT_TSC_CTC_D] = tsc_ctc_ratio_d;
+	auxtrace_info->priv[INTEL_PT_CYC_BIT] = cyc_bit;
 
 	return 0;
 }
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
index cbf5704..56cc47b 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -87,6 +87,9 @@ struct intel_pt_params {
 	uint64_t period;
 	enum intel_pt_period_type period_type;
 	unsigned max_non_turbo_ratio;
+	unsigned int mtc_period;
+	uint32_t tsc_ctc_ratio_n;
+	uint32_t tsc_ctc_ratio_d;
 };
 
 struct intel_pt_decoder;
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 3b34a64..bb41c20 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -91,6 +91,11 @@ struct intel_pt {
 	bool synth_needs_swap;
 
 	u64 tsc_bit;
+	u64 mtc_bit;
+	u64 mtc_freq_bits;
+	u32 tsc_ctc_ratio_n;
+	u32 tsc_ctc_ratio_d;
+	u64 cyc_bit;
 	u64 noretcomp_bit;
 	unsigned max_non_turbo_ratio;
 };
@@ -568,6 +573,25 @@ static bool intel_pt_return_compression(struct intel_pt *pt)
 	return true;
 }
 
+static unsigned int intel_pt_mtc_period(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	unsigned int shift;
+	u64 config;
+
+	if (!pt->mtc_freq_bits)
+		return 0;
+
+	for (shift = 0, config = pt->mtc_freq_bits; !(config & 1); shift++)
+		config >>= 1;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, &config))
+			return (config & pt->mtc_freq_bits) >> shift;
+	}
+	return 0;
+}
+
 static bool intel_pt_timeless_decoding(struct intel_pt *pt)
 {
 	struct perf_evsel *evsel;
@@ -668,6 +692,9 @@ static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
 	params.data = ptq;
 	params.return_compression = intel_pt_return_compression(pt);
 	params.max_non_turbo_ratio = pt->max_non_turbo_ratio;
+	params.mtc_period = intel_pt_mtc_period(pt);
+	params.tsc_ctc_ratio_n = pt->tsc_ctc_ratio_n;
+	params.tsc_ctc_ratio_d = pt->tsc_ctc_ratio_d;
 
 	if (pt->synth_opts.instructions) {
 		if (pt->synth_opts.period) {
@@ -1751,16 +1778,20 @@ static struct perf_evsel *intel_pt_find_sched_switch(struct perf_evlist *evlist)
 }
 
 static const char * const intel_pt_info_fmts[] = {
-	[INTEL_PT_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
-	[INTEL_PT_TIME_SHIFT]		= "  Time Shift         %"PRIu64"\n",
-	[INTEL_PT_TIME_MULT]		= "  Time Muliplier     %"PRIu64"\n",
-	[INTEL_PT_TIME_ZERO]		= "  Time Zero          %"PRIu64"\n",
-	[INTEL_PT_CAP_USER_TIME_ZERO]	= "  Cap Time Zero      %"PRId64"\n",
-	[INTEL_PT_TSC_BIT]		= "  TSC bit            %#"PRIx64"\n",
-	[INTEL_PT_NORETCOMP_BIT]	= "  NoRETComp bit      %#"PRIx64"\n",
-	[INTEL_PT_HAVE_SCHED_SWITCH]	= "  Have sched_switch  %"PRId64"\n",
-	[INTEL_PT_SNAPSHOT_MODE]	= "  Snapshot mode      %"PRId64"\n",
-	[INTEL_PT_PER_CPU_MMAPS]	= "  Per-cpu maps       %"PRId64"\n",
+	[INTEL_PT_PMU_TYPE]		= "  PMU Type            %"PRId64"\n",
+	[INTEL_PT_TIME_SHIFT]		= "  Time Shift          %"PRIu64"\n",
+	[INTEL_PT_TIME_MULT]		= "  Time Muliplier      %"PRIu64"\n",
+	[INTEL_PT_TIME_ZERO]		= "  Time Zero           %"PRIu64"\n",
+	[INTEL_PT_CAP_USER_TIME_ZERO]	= "  Cap Time Zero       %"PRId64"\n",
+	[INTEL_PT_TSC_BIT]		= "  TSC bit             %#"PRIx64"\n",
+	[INTEL_PT_NORETCOMP_BIT]	= "  NoRETComp bit       %#"PRIx64"\n",
+	[INTEL_PT_HAVE_SCHED_SWITCH]	= "  Have sched_switch   %"PRId64"\n",
+	[INTEL_PT_SNAPSHOT_MODE]	= "  Snapshot mode       %"PRId64"\n",
+	[INTEL_PT_PER_CPU_MMAPS]	= "  Per-cpu maps        %"PRId64"\n",
+	[INTEL_PT_MTC_BIT]		= "  MTC bit             %#"PRIx64"\n",
+	[INTEL_PT_TSC_CTC_N]		= "  TSC:CTC numerator   %"PRIu64"\n",
+	[INTEL_PT_TSC_CTC_D]		= "  TSC:CTC denominator %"PRIu64"\n",
+	[INTEL_PT_CYC_BIT]		= "  CYC bit             %#"PRIx64"\n",
 };
 
 static void intel_pt_print_info(u64 *arr, int start, int finish)
@@ -1812,6 +1843,17 @@ int intel_pt_process_auxtrace_info(union perf_event *event,
 	intel_pt_print_info(&auxtrace_info->priv[0], INTEL_PT_PMU_TYPE,
 			    INTEL_PT_PER_CPU_MMAPS);
 
+	if (auxtrace_info->header.size >= sizeof(struct auxtrace_info_event) +
+					(sizeof(u64) * INTEL_PT_CYC_BIT)) {
+		pt->mtc_bit = auxtrace_info->priv[INTEL_PT_MTC_BIT];
+		pt->mtc_freq_bits = auxtrace_info->priv[INTEL_PT_MTC_FREQ_BITS];
+		pt->tsc_ctc_ratio_n = auxtrace_info->priv[INTEL_PT_TSC_CTC_N];
+		pt->tsc_ctc_ratio_d = auxtrace_info->priv[INTEL_PT_TSC_CTC_D];
+		pt->cyc_bit = auxtrace_info->priv[INTEL_PT_CYC_BIT];
+		intel_pt_print_info(&auxtrace_info->priv[0], INTEL_PT_MTC_BIT,
+				    INTEL_PT_CYC_BIT);
+	}
+
 	pt->timeless_decoding = intel_pt_timeless_decoding(pt);
 	pt->have_tsc = intel_pt_have_tsc(pt);
 	pt->sampling_mode = false;
diff --git a/tools/perf/util/intel-pt.h b/tools/perf/util/intel-pt.h
index a1bfe93..0065949 100644
--- a/tools/perf/util/intel-pt.h
+++ b/tools/perf/util/intel-pt.h
@@ -29,6 +29,11 @@ enum {
 	INTEL_PT_HAVE_SCHED_SWITCH,
 	INTEL_PT_SNAPSHOT_MODE,
 	INTEL_PT_PER_CPU_MMAPS,
+	INTEL_PT_MTC_BIT,
+	INTEL_PT_MTC_FREQ_BITS,
+	INTEL_PT_TSC_CTC_N,
+	INTEL_PT_TSC_CTC_D,
+	INTEL_PT_CYC_BIT,
 	INTEL_PT_AUXTRACE_PRIV_MAX,
 };
 

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add Intel PT support for decoding MTC packets
  2015-07-17 16:33 ` [PATCH V8 20/25] perf tools: Add Intel PT support for decoding MTC packets Adrian Hunter
@ 2015-08-28  6:39   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-28  6:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: adrian.hunter, jolsa, acme, tglx, linux-kernel, hpa, mingo

Commit-ID:  79b58424b821c651a4b4df9018a14684e3670f42
Gitweb:     http://git.kernel.org/tip/79b58424b821c651a4b4df9018a14684e3670f42
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:55 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 24 Aug 2015 17:46:56 -0300

perf tools: Add Intel PT support for decoding MTC packets

MTC packets provide finer grain timestamp information than TSC packets.
MTC packets record time using the hardware crystal clock (CTC) which is
related to TSC packets using a TMA packet.

This patch just adds decoder support.

Support for a default value and validation of values is provided by a
later patch. Also documentation is updated in a separate patch.

For details refer to the June 2015 or later Intel 64 and IA-32
Architectures SDM Chapter 36 Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-21-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 162 ++++++++++++++++++++-
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |   1 +
 2 files changed, 159 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 4a0e9fb..f7119a1 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -85,7 +85,9 @@ struct intel_pt_decoder {
 	const unsigned char *buf;
 	size_t len;
 	bool return_compression;
+	bool mtc_insn;
 	bool pge;
+	bool have_tma;
 	uint64_t pos;
 	uint64_t last_ip;
 	uint64_t ip;
@@ -94,6 +96,15 @@ struct intel_pt_decoder {
 	uint64_t tsc_timestamp;
 	uint64_t ref_timestamp;
 	uint64_t ret_addr;
+	uint64_t ctc_timestamp;
+	uint64_t ctc_delta;
+	uint32_t last_mtc;
+	uint32_t tsc_ctc_ratio_n;
+	uint32_t tsc_ctc_ratio_d;
+	uint32_t tsc_ctc_mult;
+	uint32_t tsc_slip;
+	uint32_t ctc_rem_mask;
+	int mtc_shift;
 	struct intel_pt_stack stack;
 	enum intel_pt_pkt_state pkt_state;
 	struct intel_pt_pkt packet;
@@ -149,6 +160,13 @@ static void intel_pt_setup_period(struct intel_pt_decoder *decoder)
 	}
 }
 
+static uint64_t multdiv(uint64_t t, uint32_t n, uint32_t d)
+{
+	if (!d)
+		return 0;
+	return (t / d) * n + ((t % d) * n) / d;
+}
+
 struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
 {
 	struct intel_pt_decoder *decoder;
@@ -175,6 +193,39 @@ struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
 
 	intel_pt_setup_period(decoder);
 
+	decoder->mtc_shift = params->mtc_period;
+	decoder->ctc_rem_mask = (1 << decoder->mtc_shift) - 1;
+
+	decoder->tsc_ctc_ratio_n = params->tsc_ctc_ratio_n;
+	decoder->tsc_ctc_ratio_d = params->tsc_ctc_ratio_d;
+
+	if (!decoder->tsc_ctc_ratio_n)
+		decoder->tsc_ctc_ratio_d = 0;
+
+	if (decoder->tsc_ctc_ratio_d) {
+		if (!(decoder->tsc_ctc_ratio_n % decoder->tsc_ctc_ratio_d))
+			decoder->tsc_ctc_mult = decoder->tsc_ctc_ratio_n /
+						decoder->tsc_ctc_ratio_d;
+
+		/*
+		 * Allow for timestamps appearing to backwards because a TSC
+		 * packet has slipped past a MTC packet, so allow 2 MTC ticks
+		 * or ...
+		 */
+		decoder->tsc_slip = multdiv(2 << decoder->mtc_shift,
+					decoder->tsc_ctc_ratio_n,
+					decoder->tsc_ctc_ratio_d);
+	}
+	/* ... or 0x100 paranoia */
+	if (decoder->tsc_slip < 0x100)
+		decoder->tsc_slip = 0x100;
+
+	intel_pt_log("timestamp: mtc_shift %u\n", decoder->mtc_shift);
+	intel_pt_log("timestamp: tsc_ctc_ratio_n %u\n", decoder->tsc_ctc_ratio_n);
+	intel_pt_log("timestamp: tsc_ctc_ratio_d %u\n", decoder->tsc_ctc_ratio_d);
+	intel_pt_log("timestamp: tsc_ctc_mult %u\n", decoder->tsc_ctc_mult);
+	intel_pt_log("timestamp: tsc_slip %#x\n", decoder->tsc_slip);
+
 	return decoder;
 }
 
@@ -368,6 +419,7 @@ static inline void intel_pt_update_in_tx(struct intel_pt_decoder *decoder)
 static int intel_pt_bad_packet(struct intel_pt_decoder *decoder)
 {
 	intel_pt_clear_tx_flags(decoder);
+	decoder->have_tma = false;
 	decoder->pkt_len = 1;
 	decoder->pkt_step = 1;
 	intel_pt_decoder_log_packet(decoder);
@@ -400,6 +452,7 @@ static int intel_pt_get_data(struct intel_pt_decoder *decoder)
 		decoder->pkt_state = INTEL_PT_STATE_NO_PSB;
 		decoder->ref_timestamp = buffer.ref_timestamp;
 		decoder->timestamp = 0;
+		decoder->have_tma = false;
 		decoder->state.trace_nr = buffer.trace_nr;
 		intel_pt_log("Reference timestamp 0x%" PRIx64 "\n",
 			     decoder->ref_timestamp);
@@ -523,6 +576,7 @@ static uint64_t intel_pt_next_sample(struct intel_pt_decoder *decoder)
 	case INTEL_PT_PERIOD_TICKS:
 		return intel_pt_next_period(decoder);
 	case INTEL_PT_PERIOD_NONE:
+	case INTEL_PT_PERIOD_MTC:
 	default:
 		return 0;
 	}
@@ -542,6 +596,7 @@ static void intel_pt_sample_insn(struct intel_pt_decoder *decoder)
 		decoder->last_masked_timestamp = masked_timestamp;
 		break;
 	case INTEL_PT_PERIOD_NONE:
+	case INTEL_PT_PERIOD_MTC:
 	default:
 		break;
 	}
@@ -555,6 +610,9 @@ static int intel_pt_walk_insn(struct intel_pt_decoder *decoder,
 	uint64_t max_insn_cnt, insn_cnt = 0;
 	int err;
 
+	if (!decoder->mtc_insn)
+		decoder->mtc_insn = true;
+
 	max_insn_cnt = intel_pt_next_sample(decoder);
 
 	err = decoder->walk_insn(intel_pt_insn, &insn_cnt, &decoder->ip, ip,
@@ -861,6 +919,8 @@ static void intel_pt_calc_tsc_timestamp(struct intel_pt_decoder *decoder)
 {
 	uint64_t timestamp;
 
+	decoder->have_tma = false;
+
 	if (decoder->ref_timestamp) {
 		timestamp = decoder->packet.payload |
 			    (decoder->ref_timestamp & (0xffULL << 56));
@@ -878,17 +938,18 @@ static void intel_pt_calc_tsc_timestamp(struct intel_pt_decoder *decoder)
 	} else if (decoder->timestamp) {
 		timestamp = decoder->packet.payload |
 			    (decoder->timestamp & (0xffULL << 56));
+		decoder->tsc_timestamp = timestamp;
 		if (timestamp < decoder->timestamp &&
-		    decoder->timestamp - timestamp < 0x100) {
-			intel_pt_log_to("ERROR: Suppressing backwards timestamp",
+		    decoder->timestamp - timestamp < decoder->tsc_slip) {
+			intel_pt_log_to("Suppressing backwards timestamp",
 					timestamp);
 			timestamp = decoder->timestamp;
 		}
 		while (timestamp < decoder->timestamp) {
 			intel_pt_log_to("Wraparound timestamp", timestamp);
 			timestamp += (1ULL << 56);
+			decoder->tsc_timestamp = timestamp;
 		}
-		decoder->tsc_timestamp = timestamp;
 		decoder->timestamp = timestamp;
 		decoder->timestamp_insn_cnt = 0;
 	}
@@ -900,11 +961,73 @@ static int intel_pt_overflow(struct intel_pt_decoder *decoder)
 {
 	intel_pt_log("ERROR: Buffer overflow\n");
 	intel_pt_clear_tx_flags(decoder);
+	decoder->have_tma = false;
 	decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
 	decoder->overflow = true;
 	return -EOVERFLOW;
 }
 
+static void intel_pt_calc_tma(struct intel_pt_decoder *decoder)
+{
+	uint32_t ctc = decoder->packet.payload;
+	uint32_t fc = decoder->packet.count;
+	uint32_t ctc_rem = ctc & decoder->ctc_rem_mask;
+
+	if (!decoder->tsc_ctc_ratio_d)
+		return;
+
+	decoder->last_mtc = (ctc >> decoder->mtc_shift) & 0xff;
+	decoder->ctc_timestamp = decoder->tsc_timestamp - fc;
+	if (decoder->tsc_ctc_mult) {
+		decoder->ctc_timestamp -= ctc_rem * decoder->tsc_ctc_mult;
+	} else {
+		decoder->ctc_timestamp -= multdiv(ctc_rem,
+						  decoder->tsc_ctc_ratio_n,
+						  decoder->tsc_ctc_ratio_d);
+	}
+	decoder->ctc_delta = 0;
+	decoder->have_tma = true;
+	intel_pt_log("CTC timestamp " x64_fmt " last MTC %#x  CTC rem %#x\n",
+		     decoder->ctc_timestamp, decoder->last_mtc, ctc_rem);
+}
+
+static void intel_pt_calc_mtc_timestamp(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp;
+	uint32_t mtc, mtc_delta;
+
+	if (!decoder->have_tma)
+		return;
+
+	mtc = decoder->packet.payload;
+
+	if (mtc > decoder->last_mtc)
+		mtc_delta = mtc - decoder->last_mtc;
+	else
+		mtc_delta = mtc + 256 - decoder->last_mtc;
+
+	decoder->ctc_delta += mtc_delta << decoder->mtc_shift;
+
+	if (decoder->tsc_ctc_mult) {
+		timestamp = decoder->ctc_timestamp +
+			    decoder->ctc_delta * decoder->tsc_ctc_mult;
+	} else {
+		timestamp = decoder->ctc_timestamp +
+			    multdiv(decoder->ctc_delta,
+				    decoder->tsc_ctc_ratio_n,
+				    decoder->tsc_ctc_ratio_d);
+	}
+
+	if (timestamp < decoder->timestamp)
+		intel_pt_log("Suppressing MTC timestamp " x64_fmt " less than current timestamp " x64_fmt "\n",
+			     timestamp, decoder->timestamp);
+	else
+		decoder->timestamp = timestamp;
+
+	decoder->timestamp_insn_cnt = 0;
+	decoder->last_mtc = mtc;
+}
+
 /* Walk PSB+ packets when already in sync. */
 static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 {
@@ -926,6 +1049,7 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 		case INTEL_PT_TRACESTOP:
 		case INTEL_PT_BAD:
 		case INTEL_PT_PSB:
+			decoder->have_tma = false;
 			intel_pt_log("ERROR: Unexpected packet\n");
 			return -EAGAIN;
 
@@ -937,6 +1061,7 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_TMA:
+			intel_pt_calc_tma(decoder);
 			break;
 
 		case INTEL_PT_CBR:
@@ -961,6 +1086,9 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_MTC:
+			intel_pt_calc_mtc_timestamp(decoder);
+			if (decoder->period_type == INTEL_PT_PERIOD_MTC)
+				decoder->state.type |= INTEL_PT_INSTRUCTION;
 			break;
 
 		case INTEL_PT_CYC:
@@ -1048,6 +1176,9 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_MTC:
+			intel_pt_calc_mtc_timestamp(decoder);
+			if (decoder->period_type == INTEL_PT_PERIOD_MTC)
+				decoder->state.type |= INTEL_PT_INSTRUCTION;
 			break;
 
 		case INTEL_PT_CYC:
@@ -1159,13 +1290,31 @@ next:
 			break;
 
 		case INTEL_PT_MTC:
-			break;
+			intel_pt_calc_mtc_timestamp(decoder);
+			if (decoder->period_type != INTEL_PT_PERIOD_MTC)
+				break;
+			/*
+			 * Ensure that there has been an instruction since the
+			 * last MTC.
+			 */
+			if (!decoder->mtc_insn)
+				break;
+			decoder->mtc_insn = false;
+			/* Ensure that there is a timestamp */
+			if (!decoder->timestamp)
+				break;
+			decoder->state.type = INTEL_PT_INSTRUCTION;
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = 0;
+			decoder->mtc_insn = false;
+			return 0;
 
 		case INTEL_PT_TSC:
 			intel_pt_calc_tsc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_TMA:
+			intel_pt_calc_tma(decoder);
 			break;
 
 		case INTEL_PT_CYC:
@@ -1237,6 +1386,7 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_MTC:
+			intel_pt_calc_mtc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_TSC:
@@ -1244,6 +1394,7 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_TMA:
+			intel_pt_calc_tma(decoder);
 			break;
 
 		case INTEL_PT_CYC:
@@ -1267,6 +1418,7 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 
 		case INTEL_PT_TRACESTOP:
 		case INTEL_PT_TNT:
+			decoder->have_tma = false;
 			intel_pt_log("ERROR: Unexpected packet\n");
 			if (decoder->ip)
 				decoder->pkt_state = INTEL_PT_STATE_ERR4;
@@ -1329,6 +1481,7 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_MTC:
+			intel_pt_calc_mtc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_TSC:
@@ -1336,6 +1489,7 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_TMA:
+			intel_pt_calc_tma(decoder);
 			break;
 
 		case INTEL_PT_CYC:
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
index 56cc47b..02c38fe 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -36,6 +36,7 @@ enum intel_pt_period_type {
 	INTEL_PT_PERIOD_NONE,
 	INTEL_PT_PERIOD_INSTRUCTIONS,
 	INTEL_PT_PERIOD_TICKS,
+	INTEL_PT_PERIOD_MTC,
 };
 
 enum {

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add Intel PT support for using MTC packets
  2015-07-17 16:33 ` [PATCH V8 21/25] perf tools: Add Intel PT support for using " Adrian Hunter
@ 2015-08-28  6:39   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-28  6:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, tglx, linux-kernel, adrian.hunter, acme, mingo, jolsa

Commit-ID:  b45fc0bfaf4a0b60ce2deda222f8ef2a23b89a5f
Gitweb:     http://git.kernel.org/tip/b45fc0bfaf4a0b60ce2deda222f8ef2a23b89a5f
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:56 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 24 Aug 2015 17:48:06 -0300

perf tools: Add Intel PT support for using MTC packets

MTC packets are a new Intel PT feature.

MTC packets provide finer grain timestamp information than TSC packets.

Support for this feature is indicated by:

  /sys/bus/event_source/devices/intel_pt/caps/mtc

which contains "1" if the feature is supported and "0" otherwise.

MTC packets can be requested using a PMU config term e.g. perf record -e
intel_pt/mtc/u sleep 1

The frequency of MTC packets can also be specified.  e.g. perf record -e
intel_pt/mtc,mtc_period=2/u sleep 1

The default value is 3 or the nearest lower value that is supported.  0
is always supported.

Valid values are given by:

/sys/bus/event_source/devices/intel_pt/caps/mtc_periods

which contains a hexadecimal value, the bits of which represent valid
values e.g. bit 2 set means value 2 is valid.

The value is converted to the MTC frequency as:

	CTC-frequency / (2 ^ value)

e.g. value 3 means one eighth of CTC-frequency

Where CTC is the hardware crystal clock, the frequency of which can be
related to TSC via values provided in cpuid leaf 0x15.

If an invalid value is entered, the error message will give a list of
valid values e.g.

	$ perf record -e intel_pt/mtc_period=15/u uname
	Invalid mtc_period for intel_pt. Valid values are: 0,3,6,9

tools/perf/Documentation/intel-pt.txt is updated in a later patch as
there are a number of new features being added.

For more information refer to the June 2015 or later Intel 64 and IA-32
Architectures SDM Chapter 36 Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-22-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/x86/util/intel-pt.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index faae928..a5de01d 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -190,17 +190,33 @@ static int intel_pt_pick_bit(int bits, int target)
 static u64 intel_pt_default_config(struct perf_pmu *intel_pt_pmu)
 {
 	char buf[256];
+	int mtc, mtc_periods = 0, mtc_period;
 	int psb_cyc, psb_periods, psb_period;
 	int pos = 0;
 	u64 config;
 
 	pos += scnprintf(buf + pos, sizeof(buf) - pos, "tsc");
 
+	if (perf_pmu__scan_file(intel_pt_pmu, "caps/mtc", "%d",
+				&mtc) != 1)
+		mtc = 1;
+
+	if (mtc) {
+		if (perf_pmu__scan_file(intel_pt_pmu, "caps/mtc_periods", "%x",
+					&mtc_periods) != 1)
+			mtc_periods = 0;
+		if (mtc_periods) {
+			mtc_period = intel_pt_pick_bit(mtc_periods, 3);
+			pos += scnprintf(buf + pos, sizeof(buf) - pos,
+					 ",mtc,mtc_period=%d", mtc_period);
+		}
+	}
+
 	if (perf_pmu__scan_file(intel_pt_pmu, "caps/psb_cyc", "%d",
 				&psb_cyc) != 1)
 		psb_cyc = 1;
 
-	if (psb_cyc) {
+	if (psb_cyc && mtc_periods) {
 		if (perf_pmu__scan_file(intel_pt_pmu, "caps/psb_periods", "%x",
 					&psb_periods) != 1)
 			psb_periods = 0;
@@ -454,9 +470,17 @@ out_err:
 static int intel_pt_validate_config(struct perf_pmu *intel_pt_pmu,
 				    struct perf_evsel *evsel)
 {
+	int err;
+
 	if (!evsel)
 		return 0;
 
+	err = intel_pt_val_config_term(intel_pt_pmu, "caps/mtc_periods",
+				       "mtc_period", "caps/mtc",
+				       evsel->attr.config);
+	if (err)
+		return err;
+
 	return intel_pt_val_config_term(intel_pt_pmu, "caps/psb_periods",
 					"psb_period", "caps/psb_cyc",
 					evsel->attr.config);

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add Intel PT support for decoding CYC packets
  2015-07-17 16:33 ` [PATCH V8 22/25] perf tools: Add Intel PT support for decoding CYC packets Adrian Hunter
@ 2015-08-28  6:39   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-28  6:39 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, tglx, hpa, linux-kernel, acme, adrian.hunter, mingo

Commit-ID:  cc33618619cefc6d730cca3bb8e15311016a4da7
Gitweb:     http://git.kernel.org/tip/cc33618619cefc6d730cca3bb8e15311016a4da7
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:57 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 24 Aug 2015 17:49:04 -0300

perf tools: Add Intel PT support for decoding CYC packets

CYC packets provide even finer grain timestamp information than MTC and
TSC packets.  A CYC packet contains the number of CPU cycles since the
last CYC packet.

This patch just adds decoder support.  The CPU frequency can be related
to TSC using the Maximum Non-Turbo Ratio in combination with the CBR
(core-to-bus ratio) packet.  However more accuracy is achieved by simply
interpolating the number of cycles between other timing packets like MTC
or TSC.  This patch takes the latter approach.

Support for a default value and validation of values is provided by a
later patch. Also documentation is updated in a separate patch.

For details refer to the June 2015 or later Intel 64 and IA-32
Architectures SDM Chapter 36 Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-23-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 311 ++++++++++++++++++++-
 1 file changed, 306 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index f7119a1..0845c5e 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -88,6 +88,7 @@ struct intel_pt_decoder {
 	bool mtc_insn;
 	bool pge;
 	bool have_tma;
+	bool have_cyc;
 	uint64_t pos;
 	uint64_t last_ip;
 	uint64_t ip;
@@ -98,6 +99,8 @@ struct intel_pt_decoder {
 	uint64_t ret_addr;
 	uint64_t ctc_timestamp;
 	uint64_t ctc_delta;
+	uint64_t cycle_cnt;
+	uint64_t cyc_ref_timestamp;
 	uint32_t last_mtc;
 	uint32_t tsc_ctc_ratio_n;
 	uint32_t tsc_ctc_ratio_d;
@@ -111,8 +114,13 @@ struct intel_pt_decoder {
 	struct intel_pt_pkt tnt;
 	int pkt_step;
 	int pkt_len;
+	int last_packet_type;
 	unsigned int cbr;
 	unsigned int max_non_turbo_ratio;
+	double max_non_turbo_ratio_fp;
+	double cbr_cyc_to_tsc;
+	double calc_cyc_to_tsc;
+	bool have_calc_cyc_to_tsc;
 	int exec_mode;
 	unsigned int insn_bytes;
 	uint64_t sign_bit;
@@ -189,7 +197,8 @@ struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
 	decoder->period             = params->period;
 	decoder->period_type        = params->period_type;
 
-	decoder->max_non_turbo_ratio = params->max_non_turbo_ratio;
+	decoder->max_non_turbo_ratio    = params->max_non_turbo_ratio;
+	decoder->max_non_turbo_ratio_fp = params->max_non_turbo_ratio;
 
 	intel_pt_setup_period(decoder);
 
@@ -514,10 +523,247 @@ static int intel_pt_get_split_packet(struct intel_pt_decoder *decoder)
 	return ret;
 }
 
+struct intel_pt_pkt_info {
+	struct intel_pt_decoder	  *decoder;
+	struct intel_pt_pkt       packet;
+	uint64_t                  pos;
+	int                       pkt_len;
+	int                       last_packet_type;
+	void                      *data;
+};
+
+typedef int (*intel_pt_pkt_cb_t)(struct intel_pt_pkt_info *pkt_info);
+
+/* Lookahead packets in current buffer */
+static int intel_pt_pkt_lookahead(struct intel_pt_decoder *decoder,
+				  intel_pt_pkt_cb_t cb, void *data)
+{
+	struct intel_pt_pkt_info pkt_info;
+	const unsigned char *buf = decoder->buf;
+	size_t len = decoder->len;
+	int ret;
+
+	pkt_info.decoder          = decoder;
+	pkt_info.pos              = decoder->pos;
+	pkt_info.pkt_len          = decoder->pkt_step;
+	pkt_info.last_packet_type = decoder->last_packet_type;
+	pkt_info.data             = data;
+
+	while (1) {
+		do {
+			pkt_info.pos += pkt_info.pkt_len;
+			buf          += pkt_info.pkt_len;
+			len          -= pkt_info.pkt_len;
+
+			if (!len)
+				return INTEL_PT_NEED_MORE_BYTES;
+
+			ret = intel_pt_get_packet(buf, len, &pkt_info.packet);
+			if (!ret)
+				return INTEL_PT_NEED_MORE_BYTES;
+			if (ret < 0)
+				return ret;
+
+			pkt_info.pkt_len = ret;
+		} while (pkt_info.packet.type == INTEL_PT_PAD);
+
+		ret = cb(&pkt_info);
+		if (ret)
+			return 0;
+
+		pkt_info.last_packet_type = pkt_info.packet.type;
+	}
+}
+
+struct intel_pt_calc_cyc_to_tsc_info {
+	uint64_t        cycle_cnt;
+	unsigned int    cbr;
+	uint32_t        last_mtc;
+	uint64_t        ctc_timestamp;
+	uint64_t        ctc_delta;
+	uint64_t        tsc_timestamp;
+	uint64_t        timestamp;
+	bool            have_tma;
+	bool            from_mtc;
+	double          cbr_cyc_to_tsc;
+};
+
+static int intel_pt_calc_cyc_cb(struct intel_pt_pkt_info *pkt_info)
+{
+	struct intel_pt_decoder *decoder = pkt_info->decoder;
+	struct intel_pt_calc_cyc_to_tsc_info *data = pkt_info->data;
+	uint64_t timestamp;
+	double cyc_to_tsc;
+	unsigned int cbr;
+	uint32_t mtc, mtc_delta, ctc, fc, ctc_rem;
+
+	switch (pkt_info->packet.type) {
+	case INTEL_PT_TNT:
+	case INTEL_PT_TIP_PGE:
+	case INTEL_PT_TIP:
+	case INTEL_PT_FUP:
+	case INTEL_PT_PSB:
+	case INTEL_PT_PIP:
+	case INTEL_PT_MODE_EXEC:
+	case INTEL_PT_MODE_TSX:
+	case INTEL_PT_PSBEND:
+	case INTEL_PT_PAD:
+	case INTEL_PT_VMCS:
+	case INTEL_PT_MNT:
+		return 0;
+
+	case INTEL_PT_MTC:
+		if (!data->have_tma)
+			return 0;
+
+		mtc = pkt_info->packet.payload;
+		if (mtc > data->last_mtc)
+			mtc_delta = mtc - data->last_mtc;
+		else
+			mtc_delta = mtc + 256 - data->last_mtc;
+		data->ctc_delta += mtc_delta << decoder->mtc_shift;
+		data->last_mtc = mtc;
+
+		if (decoder->tsc_ctc_mult) {
+			timestamp = data->ctc_timestamp +
+				data->ctc_delta * decoder->tsc_ctc_mult;
+		} else {
+			timestamp = data->ctc_timestamp +
+				multdiv(data->ctc_delta,
+					decoder->tsc_ctc_ratio_n,
+					decoder->tsc_ctc_ratio_d);
+		}
+
+		if (timestamp < data->timestamp)
+			return 1;
+
+		if (pkt_info->last_packet_type != INTEL_PT_CYC) {
+			data->timestamp = timestamp;
+			return 0;
+		}
+
+		break;
+
+	case INTEL_PT_TSC:
+		timestamp = pkt_info->packet.payload |
+			    (data->timestamp & (0xffULL << 56));
+		if (data->from_mtc && timestamp < data->timestamp &&
+		    data->timestamp - timestamp < decoder->tsc_slip)
+			return 1;
+		while (timestamp < data->timestamp)
+			timestamp += (1ULL << 56);
+		if (pkt_info->last_packet_type != INTEL_PT_CYC) {
+			if (data->from_mtc)
+				return 1;
+			data->tsc_timestamp = timestamp;
+			data->timestamp = timestamp;
+			return 0;
+		}
+		break;
+
+	case INTEL_PT_TMA:
+		if (data->from_mtc)
+			return 1;
+
+		if (!decoder->tsc_ctc_ratio_d)
+			return 0;
+
+		ctc = pkt_info->packet.payload;
+		fc = pkt_info->packet.count;
+		ctc_rem = ctc & decoder->ctc_rem_mask;
+
+		data->last_mtc = (ctc >> decoder->mtc_shift) & 0xff;
+
+		data->ctc_timestamp = data->tsc_timestamp - fc;
+		if (decoder->tsc_ctc_mult) {
+			data->ctc_timestamp -= ctc_rem * decoder->tsc_ctc_mult;
+		} else {
+			data->ctc_timestamp -=
+				multdiv(ctc_rem, decoder->tsc_ctc_ratio_n,
+					decoder->tsc_ctc_ratio_d);
+		}
+
+		data->ctc_delta = 0;
+		data->have_tma = true;
+
+		return 0;
+
+	case INTEL_PT_CYC:
+		data->cycle_cnt += pkt_info->packet.payload;
+		return 0;
+
+	case INTEL_PT_CBR:
+		cbr = pkt_info->packet.payload;
+		if (data->cbr && data->cbr != cbr)
+			return 1;
+		data->cbr = cbr;
+		data->cbr_cyc_to_tsc = decoder->max_non_turbo_ratio_fp / cbr;
+		return 0;
+
+	case INTEL_PT_TIP_PGD:
+	case INTEL_PT_TRACESTOP:
+	case INTEL_PT_OVF:
+	case INTEL_PT_BAD: /* Does not happen */
+	default:
+		return 1;
+	}
+
+	if (!data->cbr && decoder->cbr) {
+		data->cbr = decoder->cbr;
+		data->cbr_cyc_to_tsc = decoder->cbr_cyc_to_tsc;
+	}
+
+	if (!data->cycle_cnt)
+		return 1;
+
+	cyc_to_tsc = (double)(timestamp - decoder->timestamp) / data->cycle_cnt;
+
+	if (data->cbr && cyc_to_tsc > data->cbr_cyc_to_tsc &&
+	    cyc_to_tsc / data->cbr_cyc_to_tsc > 1.25) {
+		intel_pt_log("Timestamp: calculated %g TSC ticks per cycle too big (c.f. CBR-based value %g), pos " x64_fmt "\n",
+			     cyc_to_tsc, data->cbr_cyc_to_tsc, pkt_info->pos);
+		return 1;
+	}
+
+	decoder->calc_cyc_to_tsc = cyc_to_tsc;
+	decoder->have_calc_cyc_to_tsc = true;
+
+	if (data->cbr) {
+		intel_pt_log("Timestamp: calculated %g TSC ticks per cycle c.f. CBR-based value %g, pos " x64_fmt "\n",
+			     cyc_to_tsc, data->cbr_cyc_to_tsc, pkt_info->pos);
+	} else {
+		intel_pt_log("Timestamp: calculated %g TSC ticks per cycle c.f. unknown CBR-based value, pos " x64_fmt "\n",
+			     cyc_to_tsc, pkt_info->pos);
+	}
+
+	return 1;
+}
+
+static void intel_pt_calc_cyc_to_tsc(struct intel_pt_decoder *decoder,
+				     bool from_mtc)
+{
+	struct intel_pt_calc_cyc_to_tsc_info data = {
+		.cycle_cnt      = 0,
+		.cbr            = 0,
+		.last_mtc       = decoder->last_mtc,
+		.ctc_timestamp  = decoder->ctc_timestamp,
+		.ctc_delta      = decoder->ctc_delta,
+		.tsc_timestamp  = decoder->tsc_timestamp,
+		.timestamp      = decoder->timestamp,
+		.have_tma       = decoder->have_tma,
+		.from_mtc       = from_mtc,
+		.cbr_cyc_to_tsc = 0,
+	};
+
+	intel_pt_pkt_lookahead(decoder, intel_pt_calc_cyc_cb, &data);
+}
+
 static int intel_pt_get_next_packet(struct intel_pt_decoder *decoder)
 {
 	int ret;
 
+	decoder->last_packet_type = decoder->packet.type;
+
 	do {
 		decoder->pos += decoder->pkt_step;
 		decoder->buf += decoder->pkt_step;
@@ -954,6 +1200,13 @@ static void intel_pt_calc_tsc_timestamp(struct intel_pt_decoder *decoder)
 		decoder->timestamp_insn_cnt = 0;
 	}
 
+	if (decoder->last_packet_type == INTEL_PT_CYC) {
+		decoder->cyc_ref_timestamp = decoder->timestamp;
+		decoder->cycle_cnt = 0;
+		decoder->have_calc_cyc_to_tsc = false;
+		intel_pt_calc_cyc_to_tsc(decoder, false);
+	}
+
 	intel_pt_log_to("Setting timestamp", decoder->timestamp);
 }
 
@@ -962,6 +1215,7 @@ static int intel_pt_overflow(struct intel_pt_decoder *decoder)
 	intel_pt_log("ERROR: Buffer overflow\n");
 	intel_pt_clear_tx_flags(decoder);
 	decoder->have_tma = false;
+	decoder->cbr = 0;
 	decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
 	decoder->overflow = true;
 	return -EOVERFLOW;
@@ -1026,6 +1280,49 @@ static void intel_pt_calc_mtc_timestamp(struct intel_pt_decoder *decoder)
 
 	decoder->timestamp_insn_cnt = 0;
 	decoder->last_mtc = mtc;
+
+	if (decoder->last_packet_type == INTEL_PT_CYC) {
+		decoder->cyc_ref_timestamp = decoder->timestamp;
+		decoder->cycle_cnt = 0;
+		decoder->have_calc_cyc_to_tsc = false;
+		intel_pt_calc_cyc_to_tsc(decoder, true);
+	}
+}
+
+static void intel_pt_calc_cbr(struct intel_pt_decoder *decoder)
+{
+	unsigned int cbr = decoder->packet.payload;
+
+	if (decoder->cbr == cbr)
+		return;
+
+	decoder->cbr = cbr;
+	decoder->cbr_cyc_to_tsc = decoder->max_non_turbo_ratio_fp / cbr;
+}
+
+static void intel_pt_calc_cyc_timestamp(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp = decoder->cyc_ref_timestamp;
+
+	decoder->have_cyc = true;
+
+	decoder->cycle_cnt += decoder->packet.payload;
+
+	if (!decoder->cyc_ref_timestamp)
+		return;
+
+	if (decoder->have_calc_cyc_to_tsc)
+		timestamp += decoder->cycle_cnt * decoder->calc_cyc_to_tsc;
+	else if (decoder->cbr)
+		timestamp += decoder->cycle_cnt * decoder->cbr_cyc_to_tsc;
+	else
+		return;
+
+	if (timestamp < decoder->timestamp)
+		intel_pt_log("Suppressing CYC timestamp " x64_fmt " less than current timestamp " x64_fmt "\n",
+			     timestamp, decoder->timestamp);
+	else
+		decoder->timestamp = timestamp;
 }
 
 /* Walk PSB+ packets when already in sync. */
@@ -1065,7 +1362,7 @@ static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_CBR:
-			decoder->cbr = decoder->packet.payload;
+			intel_pt_calc_cbr(decoder);
 			break;
 
 		case INTEL_PT_MODE_EXEC:
@@ -1182,6 +1479,7 @@ static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_CYC:
+			intel_pt_calc_cyc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_MODE_EXEC:
@@ -1318,10 +1616,11 @@ next:
 			break;
 
 		case INTEL_PT_CYC:
+			intel_pt_calc_cyc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_CBR:
-			decoder->cbr = decoder->packet.payload;
+			intel_pt_calc_cbr(decoder);
 			break;
 
 		case INTEL_PT_MODE_EXEC:
@@ -1398,10 +1697,11 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_CYC:
+			intel_pt_calc_cyc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_CBR:
-			decoder->cbr = decoder->packet.payload;
+			intel_pt_calc_cbr(decoder);
 			break;
 
 		case INTEL_PT_PIP:
@@ -1493,10 +1793,11 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_CYC:
+			intel_pt_calc_cyc_timestamp(decoder);
 			break;
 
 		case INTEL_PT_CBR:
-			decoder->cbr = decoder->packet.payload;
+			intel_pt_calc_cbr(decoder);
 			break;
 
 		case INTEL_PT_PIP:

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add Intel PT support for using CYC packets
  2015-07-17 16:33 ` [PATCH V8 23/25] perf tools: Add Intel PT support for using " Adrian Hunter
@ 2015-08-28  6:40   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-28  6:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, tglx, acme, adrian.hunter, mingo, jolsa, hpa

Commit-ID:  0de802abd14abdf8cbbba28b421a1a00fa0939d5
Gitweb:     http://git.kernel.org/tip/0de802abd14abdf8cbbba28b421a1a00fa0939d5
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:58 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 24 Aug 2015 17:49:43 -0300

perf tools: Add Intel PT support for using CYC packets

CYC packets are a new Intel PT feature.

CYC packets provide even finer grain timestamp information than MTC and
TSC packets.  A CYC packet contains the number of CPU cycles since the
last CYC packet. Unlike MTC and TSC packets, CYC packets are only sent
when another packet is also sent.

Support for this feature is indicated by:

/sys/bus/event_source/devices/intel_pt/caps/psb_cyc

which contains "1" if the feature is supported and "0" otherwise.

CYC packets can be requested using a PMU config term e.g. perf record -e
intel_pt/cyc/u sleep 1

The frequency of CYC packets can also be specified.  e.g. perf record -e
intel_pt/cyc,cyc_thresh=2/u sleep 1

CYC packets are not requested by default.

Valid cyc_thresh values are given by:

/sys/bus/event_source/devices/intel_pt/caps/cycle_thresholds

which contains a hexadecimal value, the bits of which represent valid
values e.g. bit 2 set means value 2 is valid.

The value represents the minimum number of CPU cycles that must have
passed before a CYC packet can be sent.  The number of CPU cycles is:

    2 ^ (value - 1)

e.g. value 4 means 8 CPU cycles must pass before a CYC packet can be
sent.  Note a CYC packet is still only sent when another packet is sent,
not at, e.g. every 8 CPU cycles.

If an invalid value is entered, the error message will give a list of
valid values e.g.

    $ perf record -e intel_pt/cyc,cyc_thresh=15/u uname
    Invalid cyc_thresh for intel_pt. Valid values are: 0-12

tools/perf/Documentation/intel-pt.txt is updated in a later patch as
there are a number of new features being added.

For more information refer to the June 2015 or later Intel 64 and IA-32
Architectures SDM Chapter 36 Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-24-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/x86/util/intel-pt.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index a5de01d..2ca10d7 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -475,6 +475,12 @@ static int intel_pt_validate_config(struct perf_pmu *intel_pt_pmu,
 	if (!evsel)
 		return 0;
 
+	err = intel_pt_val_config_term(intel_pt_pmu, "caps/cycle_thresholds",
+				       "cyc_thresh", "caps/psb_cyc",
+				       evsel->attr.config);
+	if (err)
+		return err;
+
 	err = intel_pt_val_config_term(intel_pt_pmu, "caps/mtc_periods",
 				       "mtc_period", "caps/mtc",
 				       evsel->attr.config);

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Add Intel PT support for decoding TRACESTOP packets
  2015-07-17 16:33 ` [PATCH V8 24/25] perf tools: Add Intel PT support for decoding TRACESTOP packets Adrian Hunter
@ 2015-08-28  6:40   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-28  6:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, acme, mingo, adrian.hunter, linux-kernel, jolsa, tglx

Commit-ID:  7eacca3ebb03a4ee7bb41284aafeb19a54242621
Gitweb:     http://git.kernel.org/tip/7eacca3ebb03a4ee7bb41284aafeb19a54242621
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:33:59 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 24 Aug 2015 17:50:23 -0300

perf tools: Add Intel PT support for decoding TRACESTOP packets

A TRACESTOP packet is produced when an Intel PT trace enters a defined
region of the address space at which point the tracing stops.

This patch just adds decoder support.

Support for specifying TRACESTOP regions is left until later.

For details refer to the June 2015 or later Intel 64 and IA-32
Architectures SDM Chapter 36 Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-25-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
index 0845c5e..22ba502 100644
--- a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -1572,6 +1572,10 @@ next:
 			return intel_pt_walk_fup_tip(decoder);
 
 		case INTEL_PT_TRACESTOP:
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			intel_pt_clear_tx_flags(decoder);
+			decoder->have_tma = false;
 			break;
 
 		case INTEL_PT_PSB:
@@ -1717,6 +1721,9 @@ static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
 			break;
 
 		case INTEL_PT_TRACESTOP:
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			intel_pt_clear_tx_flags(decoder);
 		case INTEL_PT_TNT:
 			decoder->have_tma = false;
 			intel_pt_log("ERROR: Unexpected packet\n");
@@ -1819,6 +1826,10 @@ static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
 			return intel_pt_bug(decoder);
 
 		case INTEL_PT_TRACESTOP:
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			intel_pt_clear_tx_flags(decoder);
+			decoder->have_tma = false;
 			break;
 
 		case INTEL_PT_PSB:

^ permalink raw reply related	[flat|nested] 77+ messages in thread

* [tip:perf/core] perf tools: Update Intel PT documentation
  2015-07-17 16:34 ` [PATCH V8 25/25] perf tools: Update Intel PT documentation Adrian Hunter
@ 2015-08-28  6:40   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 77+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-08-28  6:40 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, linux-kernel, tglx, hpa, jolsa, adrian.hunter, mingo

Commit-ID:  9d1bf02ac3d41367896b38793db6f8f30bb9a295
Gitweb:     http://git.kernel.org/tip/9d1bf02ac3d41367896b38793db6f8f30bb9a295
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 17 Jul 2015 19:34:00 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Mon, 24 Aug 2015 17:51:09 -0300

perf tools: Update Intel PT documentation

Update Intel PT documentation to describe new features.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1437150840-31811-26-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/intel-pt.txt | 194 ++++++++++++++++++++++++++++++++--
 1 file changed, 186 insertions(+), 8 deletions(-)

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
index 2866b62..4a0501d 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -142,19 +142,21 @@ which is the same as
 
 	-e intel_pt/tsc=1,noretcomp=0/
 
+Note there are now new config terms - see section 'config terms' further below.
+
 The config terms are listed in /sys/devices/intel_pt/format.  They are bit
 fields within the config member of the struct perf_event_attr which is
 passed to the kernel by the perf_event_open system call.  They correspond to bit
 fields in the IA32_RTIT_CTL MSR.  Here is a list of them and their definitions:
 
-	$ for f in `ls /sys/devices/intel_pt/format`;do
-	> echo $f
-	> cat /sys/devices/intel_pt/format/$f
-	> done
-	noretcomp
-	config:11
-	tsc
-	config:10
+	$ grep -H . /sys/bus/event_source/devices/intel_pt/format/*
+	/sys/bus/event_source/devices/intel_pt/format/cyc:config:1
+	/sys/bus/event_source/devices/intel_pt/format/cyc_thresh:config:19-22
+	/sys/bus/event_source/devices/intel_pt/format/mtc:config:9
+	/sys/bus/event_source/devices/intel_pt/format/mtc_period:config:14-17
+	/sys/bus/event_source/devices/intel_pt/format/noretcomp:config:11
+	/sys/bus/event_source/devices/intel_pt/format/psb_period:config:24-27
+	/sys/bus/event_source/devices/intel_pt/format/tsc:config:10
 
 Note that the default config must be overridden for each term i.e.
 
@@ -209,9 +211,185 @@ perf_event_attr is displayed if the -vv option is used e.g.
 	------------------------------------------------------------
 
 
+config terms
+------------
+
+The June 2015 version of Intel 64 and IA-32 Architectures Software Developer
+Manuals, Chapter 36 Intel Processor Trace, defined new Intel PT features.
+Some of the features are reflect in new config terms.  All the config terms are
+described below.
+
+tsc		Always supported.  Produces TSC timestamp packets to provide
+		timing information.  In some cases it is possible to decode
+		without timing information, for example a per-thread context
+		that does not overlap executable memory maps.
+
+		The default config selects tsc (i.e. tsc=1).
+
+noretcomp	Always supported.  Disables "return compression" so a TIP packet
+		is produced when a function returns.  Causes more packets to be
+		produced but might make decoding more reliable.
+
+		The default config does not select noretcomp (i.e. noretcomp=0).
+
+psb_period	Allows the frequency of PSB packets to be specified.
+
+		The PSB packet is a synchronization packet that provides a
+		starting point for decoding or recovery from errors.
+
+		Support for psb_period is indicated by:
+
+			/sys/bus/event_source/devices/intel_pt/caps/psb_cyc
+
+		which contains "1" if the feature is supported and "0"
+		otherwise.
+
+		Valid values are given by:
+
+			/sys/bus/event_source/devices/intel_pt/caps/psb_periods
+
+		which contains a hexadecimal value, the bits of which represent
+		valid values e.g. bit 2 set means value 2 is valid.
+
+		The psb_period value is converted to the approximate number of
+		trace bytes between PSB packets as:
+
+			2 ^ (value + 11)
+
+		e.g. value 3 means 16KiB bytes between PSBs
+
+		If an invalid value is entered, the error message
+		will give a list of valid values e.g.
+
+			$ perf record -e intel_pt/psb_period=15/u uname
+			Invalid psb_period for intel_pt. Valid values are: 0-5
+
+		If MTC packets are selected, the default config selects a value
+		of 3 (i.e. psb_period=3) or the nearest lower value that is
+		supported (0 is always supported).  Otherwise the default is 0.
+
+		If decoding is expected to be reliable and the buffer is large
+		then a large PSB period can be used.
+
+		Because a TSC packet is produced with PSB, the PSB period can
+		also affect the granularity to timing information in the absence
+		of MTC or CYC.
+
+mtc		Produces MTC timing packets.
+
+		MTC packets provide finer grain timestamp information than TSC
+		packets.  MTC packets record time using the hardware crystal
+		clock (CTC) which is related to TSC packets using a TMA packet.
+
+		Support for this feature is indicated by:
+
+			/sys/bus/event_source/devices/intel_pt/caps/mtc
+
+		which contains "1" if the feature is supported and
+		"0" otherwise.
+
+		The frequency of MTC packets can also be specified - see
+		mtc_period below.
+
+mtc_period	Specifies how frequently MTC packets are produced - see mtc
+		above for how to determine if MTC packets are supported.
+
+		Valid values are given by:
+
+			/sys/bus/event_source/devices/intel_pt/caps/mtc_periods
+
+		which contains a hexadecimal value, the bits of which represent
+		valid values e.g. bit 2 set means value 2 is valid.
+
+		The mtc_period value is converted to the MTC frequency as:
+
+			CTC-frequency / (2 ^ value)
+
+		e.g. value 3 means one eighth of CTC-frequency
+
+		Where CTC is the hardware crystal clock, the frequency of which
+		can be related to TSC via values provided in cpuid leaf 0x15.
+
+		If an invalid value is entered, the error message
+		will give a list of valid values e.g.
+
+			$ perf record -e intel_pt/mtc_period=15/u uname
+			Invalid mtc_period for intel_pt. Valid values are: 0,3,6,9
+
+		The default value is 3 or the nearest lower value
+		that is supported (0 is always supported).
+
+cyc		Produces CYC timing packets.
+
+		CYC packets provide even finer grain timestamp information than
+		MTC and TSC packets.  A CYC packet contains the number of CPU
+		cycles since the last CYC packet. Unlike MTC and TSC packets,
+		CYC packets are only sent when another packet is also sent.
+
+		Support for this feature is indicated by:
+
+			/sys/bus/event_source/devices/intel_pt/caps/psb_cyc
+
+		which contains "1" if the feature is supported and
+		"0" otherwise.
+
+		The number of CYC packets produced can be reduced by specifying
+		a threshold - see cyc_thresh below.
+
+cyc_thresh	Specifies how frequently CYC packets are produced - see cyc
+		above for how to determine if CYC packets are supported.
+
+		Valid cyc_thresh values are given by:
+
+			/sys/bus/event_source/devices/intel_pt/caps/cycle_thresholds
+
+		which contains a hexadecimal value, the bits of which represent
+		valid values e.g. bit 2 set means value 2 is valid.
+
+		The cyc_thresh value represents the minimum number of CPU cycles
+		that must have passed before a CYC packet can be sent.  The
+		number of CPU cycles is:
+
+			2 ^ (value - 1)
+
+		e.g. value 4 means 8 CPU cycles must pass before a CYC packet
+		can be sent.  Note a CYC packet is still only sent when another
+		packet is sent, not at, e.g. every 8 CPU cycles.
+
+		If an invalid value is entered, the error message
+		will give a list of valid values e.g.
+
+			$ perf record -e intel_pt/cyc,cyc_thresh=15/u uname
+			Invalid cyc_thresh for intel_pt. Valid values are: 0-12
+
+		CYC packets are not requested by default.
+
+no_force_psb	This is a driver option and is not in the IA32_RTIT_CTL MSR.
+
+		It stops the driver resetting the byte count to zero whenever
+		enabling the trace (for example on context switches) which in
+		turn results in no PSB being forced.  However some processors
+		will produce a PSB anyway.
+
+		In any case, there is still a PSB when the trace is enabled for
+		the first time.
+
+		no_force_psb can be used to slightly decrease the trace size but
+		may make it harder for the decoder to recover from errors.
+
+		no_force_psb is not selected by default.
+
+
 new snapshot option
 -------------------
 
+The difference between full trace and snapshot from the kernel's perspective is
+that in full trace we don't overwrite trace data that the user hasn't collected
+yet (and indicated that by advancing aux_tail), whereas in snapshot mode we let
+the trace run and overwrite older data in the buffer so that whenever something
+interesting happens, we can stop it and grab a snapshot of what was going on
+around that interesting moment.
+
 To select snapshot mode a new option has been added:
 
 	-S

^ permalink raw reply related	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2015-08-28  6:40 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-17 16:33 [PATCH V8 00/25] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 01/25] perf auxtrace: Add Intel PT as an AUX area tracing type Adrian Hunter
2015-08-20  9:57   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 02/25] perf tools: Add Intel PT packet decoder Adrian Hunter
2015-08-20  9:57   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 03/25] perf tools: Add Intel PT instruction decoder Adrian Hunter
2015-08-12 20:55   ` Arnaldo Carvalho de Melo
2015-08-13  6:48     ` Adrian Hunter
2015-08-13  7:14       ` [PATCH V9 " Adrian Hunter
2015-08-13 12:37         ` Arnaldo Carvalho de Melo
2015-08-20  9:57         ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 04/25] perf tools: Add Intel PT log Adrian Hunter
2015-08-20  9:58   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 05/25] perf tools: Add Intel PT decoder Adrian Hunter
2015-08-20  9:58   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 06/25] perf tools: Add Intel PT support Adrian Hunter
2015-08-20  9:59   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 07/25] perf tools: Take Intel PT into use Adrian Hunter
2015-08-20  9:59   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 08/25] perf tools: Add Intel BTS support Adrian Hunter
2015-08-17 15:52   ` Arnaldo Carvalho de Melo
2015-08-17 17:43     ` Adrian Hunter
2015-08-17 17:58       ` Arnaldo Carvalho de Melo
2015-08-17 19:09         ` Adrian Hunter
2015-08-17 19:58           ` Arnaldo Carvalho de Melo
2015-08-18  6:39             ` Adrian Hunter
2015-08-18  9:09               ` Adrian Hunter
2015-08-18 16:10               ` Arnaldo Carvalho de Melo
2015-08-20  8:53                 ` Adrian Hunter
2015-08-21 14:18                   ` Arnaldo Carvalho de Melo
2015-08-22  6:52   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 09/25] perf tools: Put itrace options into an asciidoc include Adrian Hunter
2015-08-22  6:53   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 10/25] perf tools: Add example call-graph script Adrian Hunter
2015-08-21 15:00   ` Arnaldo Carvalho de Melo
2015-08-21 15:11     ` Arnaldo Carvalho de Melo
2015-08-21 15:21       ` Arnaldo Carvalho de Melo
2015-08-21 15:28         ` Arnaldo Carvalho de Melo
2015-08-21 15:32           ` Arnaldo Carvalho de Melo
2015-08-24  7:00           ` Adrian Hunter
2015-08-24 20:20             ` Arnaldo Carvalho de Melo
2015-08-22  6:53   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 11/25] perf auxtrace: Fix period type 'i' not working Adrian Hunter
2015-08-06 19:50   ` Arnaldo Carvalho de Melo
2015-08-07  7:22   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 12/25] perf tools: Fix perf-with-kcore handling of arguments containing spaces Adrian Hunter
2015-08-06 19:50   ` Arnaldo Carvalho de Melo
2015-08-07  7:22   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 13/25] perf tools: Fix Intel PT 'instructions' sample period Adrian Hunter
2015-08-28  6:37   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 14/25] perf tools: Add perf_pmu__format_bits() Adrian Hunter
2015-08-06 19:50   ` Arnaldo Carvalho de Melo
2015-08-07  7:23   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 15/25] perf tools: Validate config term maximum value Adrian Hunter
2015-08-06 19:50   ` Arnaldo Carvalho de Melo
2015-08-07  7:23   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 16/25] perf tools: Extend the event parser maximum error index Adrian Hunter
2015-08-06 19:50   ` Arnaldo Carvalho de Melo
2015-08-07  7:23   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 17/25] perf tools: Add Intel PT support for PSB periods Adrian Hunter
2015-08-28  6:38   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 18/25] perf tools: Add new Intel PT packet definitions Adrian Hunter
2015-08-28  6:38   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 19/25] perf tools: Pass Intel PT information for decoding MTC and CYC Adrian Hunter
2015-08-28  6:38   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 20/25] perf tools: Add Intel PT support for decoding MTC packets Adrian Hunter
2015-08-28  6:39   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 21/25] perf tools: Add Intel PT support for using " Adrian Hunter
2015-08-28  6:39   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 22/25] perf tools: Add Intel PT support for decoding CYC packets Adrian Hunter
2015-08-28  6:39   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 23/25] perf tools: Add Intel PT support for using " Adrian Hunter
2015-08-28  6:40   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:33 ` [PATCH V8 24/25] perf tools: Add Intel PT support for decoding TRACESTOP packets Adrian Hunter
2015-08-28  6:40   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-07-17 16:34 ` [PATCH V8 25/25] perf tools: Update Intel PT documentation Adrian Hunter
2015-08-28  6:40   ` [tip:perf/core] " tip-bot for Adrian Hunter

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).