All of lore.kernel.org
 help / color / mirror / Atom feed
* [GIT PULL 0/8] perf/pt -> Intel PT/BTS
@ 2015-06-26 22:02 Arnaldo Carvalho de Melo
  2015-06-26 22:02 ` [PATCH 1/8] perf auxtrace: Add Intel PT as an AUX area tracing type Arnaldo Carvalho de Melo
                   ` (8 more replies)
  0 siblings, 9 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-26 22:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Arnaldo Carvalho de Melo, Adrian Hunter, Jiri Olsa,
	David Ahern, Namhyung Kim, Peter Zijlstra, Stephane Eranian,
	Arnaldo Carvalho de Melo

Hi Ingo,

	Please consider pulling, there are several other patches after this,
but I think that this may be acceptable to showcase the capabilities already
present, look at the output of 'perf script' and callchains for userspace
without using any extra debugging info (no need for DWARF, CFI, nothing),
really cool capabilities... :-)

	Adrian wrote some docs and I tested it both on a Ivy Bridge machine
where there is only BTS and on a Broadwell machine with the whole shebang,
adding the output of the commands to the csets, to further showcase what is
there already.

	This is on top of my last perf-core-for-mingo tag.

	Up to you, please let us know what you think and we'll continue working
on having this in an acceptable form for merging,

Regards,

- Arnaldo

P.S. Kudos for Adrian for the patience with this process, way more is needed to
polish this, but the promise is there, cool stuff indeed! :-)

The following changes since commit 36c8bb56a9f718a9a5f35d1834ca9dcec95deb4a:

  perf symbols: Check access permission when reading symbol files (2015-06-26 12:11:53 -0300)

are available in the git repository at:

  git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-pt-for-mingo

for you to fetch changes up to 04759f172270afb28c8004f5cad62ed55710a499:

  perf tools: Add Intel BTS support (2015-06-26 18:36:11 -0300)

----------------------------------------------------------------
Put Intel PT and BTS into initial use (Adrian Hunter)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>

----------------------------------------------------------------
Adrian Hunter (8):
      perf auxtrace: Add Intel PT as an AUX area tracing type
      perf tools: Add Intel PT packet decoder
      perf tools: Add Intel PT instruction decoder
      perf tools: Add Intel PT log
      perf tools: Add Intel PT decoder
      perf tools: Add Intel PT support
      perf tools: Take Intel PT into use
      perf tools: Add Intel BTS support

 tools/build/Makefile.build                         |    2 +
 tools/perf/.gitignore                              |    2 +
 tools/perf/Documentation/intel-bts.txt             |   86 +
 tools/perf/Documentation/intel-pt.txt              |  588 ++++++
 tools/perf/MANIFEST                                |    7 +
 tools/perf/Makefile.perf                           |   12 +-
 tools/perf/arch/x86/util/Build                     |    5 +
 tools/perf/arch/x86/util/auxtrace.c                |   83 +
 tools/perf/arch/x86/util/intel-bts.c               |  458 +++++
 tools/perf/arch/x86/util/intel-pt.c                |  752 ++++++++
 tools/perf/arch/x86/util/pmu.c                     |   18 +
 tools/perf/util/Build                              |    3 +
 tools/perf/util/auxtrace.c                         |    9 +-
 tools/perf/util/auxtrace.h                         |    2 +
 tools/perf/util/intel-bts.c                        |  791 ++++++++
 tools/perf/util/intel-bts.h                        |   43 +
 tools/perf/util/intel-pt-decoder/Build             |   14 +
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 1759 ++++++++++++++++++
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |  102 ++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.c  |  246 +++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |   65 +
 tools/perf/util/intel-pt-decoder/intel-pt-log.c    |  155 ++
 tools/perf/util/intel-pt-decoder/intel-pt-log.h    |   52 +
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   |  400 +++++
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.h   |   64 +
 tools/perf/util/intel-pt.c                         | 1889 ++++++++++++++++++++
 tools/perf/util/intel-pt.h                         |   51 +
 tools/perf/util/pmu.c                              |    4 -
 28 files changed, 7655 insertions(+), 7 deletions(-)
 create mode 100644 tools/perf/Documentation/intel-bts.txt
 create mode 100644 tools/perf/Documentation/intel-pt.txt
 create mode 100644 tools/perf/arch/x86/util/auxtrace.c
 create mode 100644 tools/perf/arch/x86/util/intel-bts.c
 create mode 100644 tools/perf/arch/x86/util/intel-pt.c
 create mode 100644 tools/perf/arch/x86/util/pmu.c
 create mode 100644 tools/perf/util/intel-bts.c
 create mode 100644 tools/perf/util/intel-bts.h
 create mode 100644 tools/perf/util/intel-pt-decoder/Build
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
 create mode 100644 tools/perf/util/intel-pt.c
 create mode 100644 tools/perf/util/intel-pt.h

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [PATCH 1/8] perf auxtrace: Add Intel PT as an AUX area tracing type
  2015-06-26 22:02 [GIT PULL 0/8] perf/pt -> Intel PT/BTS Arnaldo Carvalho de Melo
@ 2015-06-26 22:02 ` Arnaldo Carvalho de Melo
  2015-06-26 22:02 ` [PATCH 2/8] perf tools: Add Intel PT packet decoder Arnaldo Carvalho de Melo
                   ` (7 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-26 22:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Adrian Hunter, David Ahern, Namhyung Kim,
	Peter Zijlstra, Stephane Eranian, Jiri Olsa,
	Arnaldo Carvalho de Melo

From: Adrian Hunter <adrian.hunter@intel.com>

Add the Intel Processor Trace type constant PERF_AUXTRACE_INTEL_PT.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1432906425-9911-4-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/auxtrace.c | 1 +
 tools/perf/util/auxtrace.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 7e7405c9b936..ea29022a4f52 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -884,6 +884,7 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 		fprintf(stdout, " type: %u\n", type);
 
 	switch (type) {
+	case PERF_AUXTRACE_INTEL_PT:
 	case PERF_AUXTRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 471aecbc4d68..7d12f33a3a06 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -39,6 +39,7 @@ struct events_stats;
 
 enum auxtrace_type {
 	PERF_AUXTRACE_UNKNOWN,
+	PERF_AUXTRACE_INTEL_PT,
 };
 
 enum itrace_period_type {
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 2/8] perf tools: Add Intel PT packet decoder
  2015-06-26 22:02 [GIT PULL 0/8] perf/pt -> Intel PT/BTS Arnaldo Carvalho de Melo
  2015-06-26 22:02 ` [PATCH 1/8] perf auxtrace: Add Intel PT as an AUX area tracing type Arnaldo Carvalho de Melo
@ 2015-06-26 22:02 ` Arnaldo Carvalho de Melo
  2015-06-26 22:02 ` [PATCH 3/8] perf tools: Add Intel PT instruction decoder Arnaldo Carvalho de Melo
                   ` (6 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-26 22:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Adrian Hunter, David Ahern, Namhyung Kim,
	Peter Zijlstra, Stephane Eranian, Jiri Olsa,
	Arnaldo Carvalho de Melo

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for decoding Intel Processor Trace packets.

This essentially provides intel_pt_get_packet() which takes a buffer of
binary data and returns the decoded packet.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1432906425-9911-5-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/Build                              |   1 +
 tools/perf/util/intel-pt-decoder/Build             |   1 +
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   | 400 +++++++++++++++++++++
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.h   |  64 ++++
 4 files changed, 466 insertions(+)
 create mode 100644 tools/perf/util/intel-pt-decoder/Build
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 586a59d46022..e1b505a8055b 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -76,6 +76,7 @@ libperf-$(CONFIG_X86) += tsc.o
 libperf-y += cloexec.o
 libperf-y += thread-stack.o
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 libperf-y += parse-branch-options.o
 
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
new file mode 100644
index 000000000000..9d67381a9bd3
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -0,0 +1 @@
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
new file mode 100644
index 000000000000..988c82c6652d
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
@@ -0,0 +1,400 @@
+/*
+ * intel_pt_pkt_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <endian.h>
+#include <byteswap.h>
+
+#include "intel-pt-pkt-decoder.h"
+
+#define BIT(n)		(1 << (n))
+
+#define BIT63		((uint64_t)1 << 63)
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define le16_to_cpu bswap_16
+#define le32_to_cpu bswap_32
+#define le64_to_cpu bswap_64
+#define memcpy_le64(d, s, n) do { \
+	memcpy((d), (s), (n));    \
+	*(d) = le64_to_cpu(*(d)); \
+} while (0)
+#else
+#define le16_to_cpu
+#define le32_to_cpu
+#define le64_to_cpu
+#define memcpy_le64 memcpy
+#endif
+
+static const char * const packet_name[] = {
+	[INTEL_PT_BAD]		= "Bad Packet!",
+	[INTEL_PT_PAD]		= "PAD",
+	[INTEL_PT_TNT]		= "TNT",
+	[INTEL_PT_TIP_PGD]	= "TIP.PGD",
+	[INTEL_PT_TIP_PGE]	= "TIP.PGE",
+	[INTEL_PT_TSC]		= "TSC",
+	[INTEL_PT_MODE_EXEC]	= "MODE.Exec",
+	[INTEL_PT_MODE_TSX]	= "MODE.TSX",
+	[INTEL_PT_TIP]		= "TIP",
+	[INTEL_PT_FUP]		= "FUP",
+	[INTEL_PT_PSB]		= "PSB",
+	[INTEL_PT_PSBEND]	= "PSBEND",
+	[INTEL_PT_CBR]		= "CBR",
+	[INTEL_PT_PIP]		= "PIP",
+	[INTEL_PT_OVF]		= "OVF",
+};
+
+const char *intel_pt_pkt_name(enum intel_pt_pkt_type type)
+{
+	return packet_name[type];
+}
+
+static int intel_pt_get_long_tnt(const unsigned char *buf, size_t len,
+				 struct intel_pt_pkt *packet)
+{
+	uint64_t payload;
+	int count;
+
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	payload = le64_to_cpu(*(uint64_t *)buf);
+
+	for (count = 47; count; count--) {
+		if (payload & BIT63)
+			break;
+		payload <<= 1;
+	}
+
+	packet->type = INTEL_PT_TNT;
+	packet->count = count;
+	packet->payload = payload << 1;
+	return 8;
+}
+
+static int intel_pt_get_pip(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	uint64_t payload = 0;
+
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	packet->type = INTEL_PT_PIP;
+	memcpy_le64(&payload, buf + 2, 6);
+	packet->payload = payload >> 1;
+
+	return 8;
+}
+
+static int intel_pt_get_cbr(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 4)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_CBR;
+	packet->payload = buf[2];
+	return 4;
+}
+
+static int intel_pt_get_ovf(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_OVF;
+	return 2;
+}
+
+static int intel_pt_get_psb(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	int i;
+
+	if (len < 16)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	for (i = 2; i < 16; i += 2) {
+		if (buf[i] != 2 || buf[i + 1] != 0x82)
+			return INTEL_PT_BAD_PACKET;
+	}
+
+	packet->type = INTEL_PT_PSB;
+	return 16;
+}
+
+static int intel_pt_get_psbend(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_PSBEND;
+	return 2;
+}
+
+static int intel_pt_get_pad(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_PAD;
+	return 1;
+}
+
+static int intel_pt_get_ext(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	switch (buf[1]) {
+	case 0xa3: /* Long TNT */
+		return intel_pt_get_long_tnt(buf, len, packet);
+	case 0x43: /* PIP */
+		return intel_pt_get_pip(buf, len, packet);
+	case 0x03: /* CBR */
+		return intel_pt_get_cbr(buf, len, packet);
+	case 0xf3: /* OVF */
+		return intel_pt_get_ovf(packet);
+	case 0x82: /* PSB */
+		return intel_pt_get_psb(buf, len, packet);
+	case 0x23: /* PSBEND */
+		return intel_pt_get_psbend(packet);
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+}
+
+static int intel_pt_get_short_tnt(unsigned int byte,
+				  struct intel_pt_pkt *packet)
+{
+	int count;
+
+	for (count = 6; count; count--) {
+		if (byte & BIT(7))
+			break;
+		byte <<= 1;
+	}
+
+	packet->type = INTEL_PT_TNT;
+	packet->count = count;
+	packet->payload = (uint64_t)byte << 57;
+
+	return 1;
+}
+
+static int intel_pt_get_ip(enum intel_pt_pkt_type type, unsigned int byte,
+			   const unsigned char *buf, size_t len,
+			   struct intel_pt_pkt *packet)
+{
+	switch (byte >> 5) {
+	case 0:
+		packet->count = 0;
+		break;
+	case 1:
+		if (len < 3)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 2;
+		packet->payload = le16_to_cpu(*(uint16_t *)(buf + 1));
+		break;
+	case 2:
+		if (len < 5)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 4;
+		packet->payload = le32_to_cpu(*(uint32_t *)(buf + 1));
+		break;
+	case 3:
+	case 6:
+		if (len < 7)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 6;
+		memcpy_le64(&packet->payload, buf + 1, 6);
+		break;
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+
+	packet->type = type;
+
+	return packet->count + 1;
+}
+
+static int intel_pt_get_mode(const unsigned char *buf, size_t len,
+			     struct intel_pt_pkt *packet)
+{
+	if (len < 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	switch (buf[1] >> 5) {
+	case 0:
+		packet->type = INTEL_PT_MODE_EXEC;
+		switch (buf[1] & 3) {
+		case 0:
+			packet->payload = 16;
+			break;
+		case 1:
+			packet->payload = 64;
+			break;
+		case 2:
+			packet->payload = 32;
+			break;
+		default:
+			return INTEL_PT_BAD_PACKET;
+		}
+		break;
+	case 1:
+		packet->type = INTEL_PT_MODE_TSX;
+		if ((buf[1] & 3) == 3)
+			return INTEL_PT_BAD_PACKET;
+		packet->payload = buf[1] & 3;
+		break;
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+
+	return 2;
+}
+
+static int intel_pt_get_tsc(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_TSC;
+	memcpy_le64(&packet->payload, buf + 1, 7);
+	return 8;
+}
+
+static int intel_pt_do_get_packet(const unsigned char *buf, size_t len,
+				  struct intel_pt_pkt *packet)
+{
+	unsigned int byte;
+
+	memset(packet, 0, sizeof(struct intel_pt_pkt));
+
+	if (!len)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	byte = buf[0];
+	if (!(byte & BIT(0))) {
+		if (byte == 0)
+			return intel_pt_get_pad(packet);
+		if (byte == 2)
+			return intel_pt_get_ext(buf, len, packet);
+		return intel_pt_get_short_tnt(byte, packet);
+	}
+
+	switch (byte & 0x1f) {
+	case 0x0D:
+		return intel_pt_get_ip(INTEL_PT_TIP, byte, buf, len, packet);
+	case 0x11:
+		return intel_pt_get_ip(INTEL_PT_TIP_PGE, byte, buf, len,
+				       packet);
+	case 0x01:
+		return intel_pt_get_ip(INTEL_PT_TIP_PGD, byte, buf, len,
+				       packet);
+	case 0x1D:
+		return intel_pt_get_ip(INTEL_PT_FUP, byte, buf, len, packet);
+	case 0x19:
+		switch (byte) {
+		case 0x99:
+			return intel_pt_get_mode(buf, len, packet);
+		case 0x19:
+			return intel_pt_get_tsc(buf, len, packet);
+		default:
+			return INTEL_PT_BAD_PACKET;
+		}
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+}
+
+int intel_pt_get_packet(const unsigned char *buf, size_t len,
+			struct intel_pt_pkt *packet)
+{
+	int ret;
+
+	ret = intel_pt_do_get_packet(buf, len, packet);
+	if (ret > 0) {
+		while (ret < 8 && len > (size_t)ret && !buf[ret])
+			ret += 1;
+	}
+	return ret;
+}
+
+int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf,
+		      size_t buf_len)
+{
+	int ret, i;
+	unsigned long long payload = packet->payload;
+	const char *name = intel_pt_pkt_name(packet->type);
+
+	switch (packet->type) {
+	case INTEL_PT_BAD:
+	case INTEL_PT_PAD:
+	case INTEL_PT_PSB:
+	case INTEL_PT_PSBEND:
+	case INTEL_PT_OVF:
+		return snprintf(buf, buf_len, "%s", name);
+	case INTEL_PT_TNT: {
+		size_t blen = buf_len;
+
+		ret = snprintf(buf, blen, "%s ", name);
+		if (ret < 0)
+			return ret;
+		buf += ret;
+		blen -= ret;
+		for (i = 0; i < packet->count; i++) {
+			if (payload & BIT63)
+				ret = snprintf(buf, blen, "T");
+			else
+				ret = snprintf(buf, blen, "N");
+			if (ret < 0)
+				return ret;
+			buf += ret;
+			blen -= ret;
+			payload <<= 1;
+		}
+		ret = snprintf(buf, blen, " (%d)", packet->count);
+		if (ret < 0)
+			return ret;
+		blen -= ret;
+		return buf_len - blen;
+	}
+	case INTEL_PT_TIP_PGD:
+	case INTEL_PT_TIP_PGE:
+	case INTEL_PT_TIP:
+	case INTEL_PT_FUP:
+		if (!(packet->count))
+			return snprintf(buf, buf_len, "%s no ip", name);
+	case INTEL_PT_CBR:
+		return snprintf(buf, buf_len, "%s 0x%llx", name, payload);
+	case INTEL_PT_TSC:
+		if (packet->count)
+			return snprintf(buf, buf_len,
+					"%s 0x%llx CTC 0x%x FC 0x%x",
+					name, payload, packet->count & 0xffff,
+					(packet->count >> 16) & 0x1ff);
+		else
+			return snprintf(buf, buf_len, "%s 0x%llx",
+					name, payload);
+	case INTEL_PT_MODE_EXEC:
+		return snprintf(buf, buf_len, "%s %lld", name, payload);
+	case INTEL_PT_MODE_TSX:
+		return snprintf(buf, buf_len, "%s TXAbort:%u InTX:%u",
+				name, (unsigned)(payload >> 1) & 1,
+				(unsigned)payload & 1);
+	case INTEL_PT_PIP:
+		ret = snprintf(buf, buf_len, "%s 0x%llx",
+			       name, payload);
+		return ret;
+	default:
+		break;
+	}
+	return snprintf(buf, buf_len, "%s 0x%llx (%d)",
+			name, payload, packet->count);
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
new file mode 100644
index 000000000000..53404fa942b3
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
@@ -0,0 +1,64 @@
+/*
+ * intel_pt_pkt_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_PKT_DECODER_H__
+#define INCLUDE__INTEL_PT_PKT_DECODER_H__
+
+#include <stddef.h>
+#include <stdint.h>
+
+#define INTEL_PT_PKT_DESC_MAX	256
+
+#define INTEL_PT_NEED_MORE_BYTES	-1
+#define INTEL_PT_BAD_PACKET		-2
+
+#define INTEL_PT_PSB_STR		"\002\202\002\202\002\202\002\202" \
+					"\002\202\002\202\002\202\002\202"
+#define INTEL_PT_PSB_LEN		16
+
+#define INTEL_PT_PKT_MAX_SZ		16
+
+enum intel_pt_pkt_type {
+	INTEL_PT_BAD,
+	INTEL_PT_PAD,
+	INTEL_PT_TNT,
+	INTEL_PT_TIP_PGD,
+	INTEL_PT_TIP_PGE,
+	INTEL_PT_TSC,
+	INTEL_PT_MODE_EXEC,
+	INTEL_PT_MODE_TSX,
+	INTEL_PT_TIP,
+	INTEL_PT_FUP,
+	INTEL_PT_PSB,
+	INTEL_PT_PSBEND,
+	INTEL_PT_CBR,
+	INTEL_PT_PIP,
+	INTEL_PT_OVF,
+};
+
+struct intel_pt_pkt {
+	enum intel_pt_pkt_type	type;
+	int			count;
+	uint64_t		payload;
+};
+
+const char *intel_pt_pkt_name(enum intel_pt_pkt_type);
+
+int intel_pt_get_packet(const unsigned char *buf, size_t len,
+			struct intel_pt_pkt *packet);
+
+int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf, size_t len);
+
+#endif
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 3/8] perf tools: Add Intel PT instruction decoder
  2015-06-26 22:02 [GIT PULL 0/8] perf/pt -> Intel PT/BTS Arnaldo Carvalho de Melo
  2015-06-26 22:02 ` [PATCH 1/8] perf auxtrace: Add Intel PT as an AUX area tracing type Arnaldo Carvalho de Melo
  2015-06-26 22:02 ` [PATCH 2/8] perf tools: Add Intel PT packet decoder Arnaldo Carvalho de Melo
@ 2015-06-26 22:02 ` Arnaldo Carvalho de Melo
  2015-06-26 22:02 ` [PATCH 4/8] perf tools: Add Intel PT log Arnaldo Carvalho de Melo
                   ` (5 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-26 22:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Adrian Hunter, David Ahern, Namhyung Kim,
	Peter Zijlstra, Stephane Eranian, Jiri Olsa,
	Arnaldo Carvalho de Melo

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for decoding instructions for Intel Processor Trace.  The
kernel x86 instruction decoder is used for this.

This essentially provides intel_pt_get_insn() which takes a binary
buffer, uses the kernel's x86 instruction decoder to get details of the
instruction and then categorizes it for consumption by an Intel PT
decoder.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1432906425-9911-6-git-send-email-adrian.hunter@intel.com
[ Fixed .gitignore conflict, added missing arch/x86/ used to tools/perf/MANIFEST ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/build/Makefile.build                         |   2 +
 tools/perf/.gitignore                              |   2 +
 tools/perf/MANIFEST                                |   7 +
 tools/perf/Makefile.perf                           |  12 +-
 tools/perf/util/intel-pt-decoder/Build             |  15 +-
 .../util/intel-pt-decoder/intel-pt-insn-decoder.c  | 246 +++++++++++++++++++++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |  65 ++++++
 7 files changed, 346 insertions(+), 3 deletions(-)
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h

diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index a51244a8022f..9f23562a99bb 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -57,6 +57,8 @@ quiet_cmd_cc_i_c = CPP      $@
 quiet_cmd_cc_s_c = AS       $@
       cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $<
 
+quiet_cmd_gen = GEN      $@
+
 # Link agregate command
 # If there's nothing to link, create empty $@ object.
 quiet_cmd_ld_multi = LD       $@
diff --git a/tools/perf/.gitignore b/tools/perf/.gitignore
index 09db62ba5786..97f460a5f477 100644
--- a/tools/perf/.gitignore
+++ b/tools/perf/.gitignore
@@ -29,3 +29,5 @@ config.mak.autogen
 *.pyc
 *.pyo
 .config-detected
+util/intel-pt-decoder/inat-tables.c
+util/intel-pt-decoder/inat.c
diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST
index fe50a1b34aa0..4e5662d8c274 100644
--- a/tools/perf/MANIFEST
+++ b/tools/perf/MANIFEST
@@ -58,6 +58,13 @@ include/linux/stringify.h
 lib/hweight.c
 lib/rbtree.c
 include/linux/swab.h
+arch/x86/lib/insn.c
+arch/x86/lib/inat.c
+arch/x86/include/asm/insn.h
+arch/x86/include/asm/inat.h
+arch/x86/include/asm/inat_types.h
+arch/x86/tools/gen-insn-attr-x86.awk
+arch/x86/lib/x86-opcode-map.txt
 arch/*/include/asm/unistd*.h
 arch/*/include/uapi/asm/unistd*.h
 arch/*/include/uapi/asm/perf_regs.h
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 1af0cfeb7a57..20e5d001f05b 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -76,6 +76,12 @@ include config/utilities.mak
 #
 # Define NO_AUXTRACE if you do not want AUX area tracing support
 
+# As per kernel Makefile, avoid funny character set dependencies
+unexport LC_ALL
+LC_COLLATE=C
+LC_NUMERIC=C
+export LC_COLLATE LC_NUMERIC
+
 ifeq ($(srctree),)
 srctree := $(patsubst %/,%,$(dir $(shell pwd)))
 srctree := $(patsubst %/,%,$(dir $(srctree)))
@@ -122,6 +128,7 @@ INSTALL = install
 FLEX    = flex
 BISON   = bison
 STRIP   = strip
+AWK     = awk
 
 LIB_DIR          = $(srctree)/tools/lib/api/
 TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
@@ -276,7 +283,7 @@ strip: $(PROGRAMS) $(OUTPUT)perf
 
 PERF_IN := $(OUTPUT)perf-in.o
 
-export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX
+export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX AWK
 build := -f $(srctree)/tools/build/Makefile.build dir=. obj
 
 $(PERF_IN): $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h FORCE
@@ -547,7 +554,8 @@ clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
 	$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
 	$(Q)$(RM) .config-detected
 	$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf perf-read-vdso32 perf-read-vdsox32
-	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex*
+	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex* \
+		$(OUTPUT)util/intel-pt-decoder/inat.c $(OUTPUT)util/intel-pt-decoder/inat-tables.c
 	$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean
 	$(python-clean)
 
diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
index 9d67381a9bd3..f5f7f87eb718 100644
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -1 +1,14 @@
-libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o
+
+inat_tables_script = ../../arch/x86/tools/gen-insn-attr-x86.awk
+inat_tables_maps = ../../arch/x86/lib/x86-opcode-map.txt
+
+$(OUTPUT)util/intel-pt-decoder/inat-tables.c: $(inat_tables_script) $(inat_tables_maps)
+	@$(call echo-cmd,gen)$(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
+
+$(OUTPUT)util/intel-pt-decoder/inat.c:
+	@$(call echo-cmd,gen)cp ../../arch/x86/lib/inat.c $(OUTPUT)util/intel-pt-decoder/inat.c
+
+$(OUTPUT)util/intel-pt-decoder/intel-pt-insn-decoder.o: $(OUTPUT)util/intel-pt-decoder/inat.c $(OUTPUT)util/intel-pt-decoder/inat-tables.c
+
+CFLAGS_intel-pt-insn-decoder.o += -I../../arch/x86/include -I$(OUTPUT)util/intel-pt-decoder -I../../arch/x86/lib -Wno-override-init
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
new file mode 100644
index 000000000000..2fa82c567446
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
@@ -0,0 +1,246 @@
+/*
+ * intel_pt_insn_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <endian.h>
+#include <byteswap.h>
+
+#include "event.h"
+
+#include <asm/insn.h>
+
+#include "inat.c"
+#include <insn.c>
+
+#include "intel-pt-insn-decoder.h"
+
+/* Based on branch_type() from perf_event_intel_lbr.c */
+static void intel_pt_insn_decoder(struct insn *insn,
+				  struct intel_pt_insn *intel_pt_insn)
+{
+	enum intel_pt_insn_op op = INTEL_PT_OP_OTHER;
+	enum intel_pt_insn_branch branch = INTEL_PT_BR_NO_BRANCH;
+	int ext;
+
+	if (insn_is_avx(insn)) {
+		intel_pt_insn->op = INTEL_PT_OP_OTHER;
+		intel_pt_insn->branch = INTEL_PT_BR_NO_BRANCH;
+		intel_pt_insn->length = insn->length;
+		return;
+	}
+
+	switch (insn->opcode.bytes[0]) {
+	case 0xf:
+		switch (insn->opcode.bytes[1]) {
+		case 0x05: /* syscall */
+		case 0x34: /* sysenter */
+			op = INTEL_PT_OP_SYSCALL;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 0x07: /* sysret */
+		case 0x35: /* sysexit */
+			op = INTEL_PT_OP_SYSRET;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 0x80 ... 0x8f: /* jcc */
+			op = INTEL_PT_OP_JCC;
+			branch = INTEL_PT_BR_CONDITIONAL;
+			break;
+		default:
+			break;
+		}
+		break;
+	case 0x70 ... 0x7f: /* jcc */
+		op = INTEL_PT_OP_JCC;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xc2: /* near ret */
+	case 0xc3: /* near ret */
+	case 0xca: /* far ret */
+	case 0xcb: /* far ret */
+		op = INTEL_PT_OP_RET;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xcf: /* iret */
+		op = INTEL_PT_OP_IRET;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xcc ... 0xce: /* int */
+		op = INTEL_PT_OP_INT;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xe8: /* call near rel */
+		op = INTEL_PT_OP_CALL;
+		branch = INTEL_PT_BR_UNCONDITIONAL;
+		break;
+	case 0x9a: /* call far absolute */
+		op = INTEL_PT_OP_CALL;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xe0 ... 0xe2: /* loop */
+		op = INTEL_PT_OP_LOOP;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xe3: /* jcc */
+		op = INTEL_PT_OP_JCC;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xe9: /* jmp */
+	case 0xeb: /* jmp */
+		op = INTEL_PT_OP_JMP;
+		branch = INTEL_PT_BR_UNCONDITIONAL;
+		break;
+	case 0xea: /* far jmp */
+		op = INTEL_PT_OP_JMP;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xff: /* call near absolute, call far absolute ind */
+		ext = (insn->modrm.bytes[0] >> 3) & 0x7;
+		switch (ext) {
+		case 2: /* near ind call */
+		case 3: /* far ind call */
+			op = INTEL_PT_OP_CALL;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 4:
+		case 5:
+			op = INTEL_PT_OP_JMP;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		default:
+			break;
+		}
+		break;
+	default:
+		break;
+	}
+
+	intel_pt_insn->op = op;
+	intel_pt_insn->branch = branch;
+	intel_pt_insn->length = insn->length;
+
+	if (branch == INTEL_PT_BR_CONDITIONAL ||
+	    branch == INTEL_PT_BR_UNCONDITIONAL) {
+#if __BYTE_ORDER == __BIG_ENDIAN
+		switch (insn->immediate.nbytes) {
+		case 1:
+			intel_pt_insn->rel = insn->immediate.value;
+			break;
+		case 2:
+			intel_pt_insn->rel =
+					bswap_16((short)insn->immediate.value);
+			break;
+		case 4:
+			intel_pt_insn->rel = bswap_32(insn->immediate.value);
+			break;
+		}
+#else
+		intel_pt_insn->rel = insn->immediate.value;
+#endif
+	}
+}
+
+int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
+		      struct intel_pt_insn *intel_pt_insn)
+{
+	struct insn insn;
+
+	insn_init(&insn, buf, len, x86_64);
+	insn_get_length(&insn);
+	if (!insn_complete(&insn) || insn.length > len)
+		return -1;
+	intel_pt_insn_decoder(&insn, intel_pt_insn);
+	if (insn.length < INTEL_PT_INSN_DBG_BUF_SZ)
+		memcpy(intel_pt_insn->buf, buf, insn.length);
+	else
+		memcpy(intel_pt_insn->buf, buf, INTEL_PT_INSN_DBG_BUF_SZ);
+	return 0;
+}
+
+const char *branch_name[] = {
+	[INTEL_PT_OP_OTHER]	= "Other",
+	[INTEL_PT_OP_CALL]	= "Call",
+	[INTEL_PT_OP_RET]	= "Ret",
+	[INTEL_PT_OP_JCC]	= "Jcc",
+	[INTEL_PT_OP_JMP]	= "Jmp",
+	[INTEL_PT_OP_LOOP]	= "Loop",
+	[INTEL_PT_OP_IRET]	= "IRet",
+	[INTEL_PT_OP_INT]	= "Int",
+	[INTEL_PT_OP_SYSCALL]	= "Syscall",
+	[INTEL_PT_OP_SYSRET]	= "Sysret",
+};
+
+const char *intel_pt_insn_name(enum intel_pt_insn_op op)
+{
+	return branch_name[op];
+}
+
+int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
+		       size_t buf_len)
+{
+	switch (intel_pt_insn->branch) {
+	case INTEL_PT_BR_CONDITIONAL:
+	case INTEL_PT_BR_UNCONDITIONAL:
+		return snprintf(buf, buf_len, "%s %s%d",
+				intel_pt_insn_name(intel_pt_insn->op),
+				intel_pt_insn->rel > 0 ? "+" : "",
+				intel_pt_insn->rel);
+	case INTEL_PT_BR_NO_BRANCH:
+	case INTEL_PT_BR_INDIRECT:
+		return snprintf(buf, buf_len, "%s",
+				intel_pt_insn_name(intel_pt_insn->op));
+	default:
+		break;
+	}
+	return 0;
+}
+
+size_t intel_pt_insn_max_size(void)
+{
+	return MAX_INSN_SIZE;
+}
+
+int intel_pt_insn_type(enum intel_pt_insn_op op)
+{
+	switch (op) {
+	case INTEL_PT_OP_OTHER:
+		return 0;
+	case INTEL_PT_OP_CALL:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL;
+	case INTEL_PT_OP_RET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN;
+	case INTEL_PT_OP_JCC:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
+	case INTEL_PT_OP_JMP:
+		return PERF_IP_FLAG_BRANCH;
+	case INTEL_PT_OP_LOOP:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
+	case INTEL_PT_OP_IRET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
+		       PERF_IP_FLAG_INTERRUPT;
+	case INTEL_PT_OP_INT:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+		       PERF_IP_FLAG_INTERRUPT;
+	case INTEL_PT_OP_SYSCALL:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+		       PERF_IP_FLAG_SYSCALLRET;
+	case INTEL_PT_OP_SYSRET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
+		       PERF_IP_FLAG_SYSCALLRET;
+	default:
+		return 0;
+	}
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
new file mode 100644
index 000000000000..b0adbf37323e
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
@@ -0,0 +1,65 @@
+/*
+ * intel_pt_insn_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_INSN_DECODER_H__
+#define INCLUDE__INTEL_PT_INSN_DECODER_H__
+
+#include <stddef.h>
+#include <stdint.h>
+
+#define INTEL_PT_INSN_DESC_MAX		32
+#define INTEL_PT_INSN_DBG_BUF_SZ	16
+
+enum intel_pt_insn_op {
+	INTEL_PT_OP_OTHER,
+	INTEL_PT_OP_CALL,
+	INTEL_PT_OP_RET,
+	INTEL_PT_OP_JCC,
+	INTEL_PT_OP_JMP,
+	INTEL_PT_OP_LOOP,
+	INTEL_PT_OP_IRET,
+	INTEL_PT_OP_INT,
+	INTEL_PT_OP_SYSCALL,
+	INTEL_PT_OP_SYSRET,
+};
+
+enum intel_pt_insn_branch {
+	INTEL_PT_BR_NO_BRANCH,
+	INTEL_PT_BR_INDIRECT,
+	INTEL_PT_BR_CONDITIONAL,
+	INTEL_PT_BR_UNCONDITIONAL,
+};
+
+struct intel_pt_insn {
+	enum intel_pt_insn_op		op;
+	enum intel_pt_insn_branch	branch;
+	int				length;
+	int32_t				rel;
+	unsigned char			buf[INTEL_PT_INSN_DBG_BUF_SZ];
+};
+
+int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
+		      struct intel_pt_insn *intel_pt_insn);
+
+const char *intel_pt_insn_name(enum intel_pt_insn_op op);
+
+int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
+		       size_t buf_len);
+
+size_t intel_pt_insn_max_size(void);
+
+int intel_pt_insn_type(enum intel_pt_insn_op op);
+
+#endif
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 4/8] perf tools: Add Intel PT log
  2015-06-26 22:02 [GIT PULL 0/8] perf/pt -> Intel PT/BTS Arnaldo Carvalho de Melo
                   ` (2 preceding siblings ...)
  2015-06-26 22:02 ` [PATCH 3/8] perf tools: Add Intel PT instruction decoder Arnaldo Carvalho de Melo
@ 2015-06-26 22:02 ` Arnaldo Carvalho de Melo
  2015-06-26 22:02 ` [PATCH 5/8] perf tools: Add Intel PT decoder Arnaldo Carvalho de Melo
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-26 22:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Adrian Hunter, David Ahern, Namhyung Kim,
	Peter Zijlstra, Stephane Eranian, Jiri Olsa,
	Arnaldo Carvalho de Melo

From: Adrian Hunter <adrian.hunter@intel.com>

Add a facility to log Intel Processor Trace decoding.  The log is
intended for debugging purposes only.

The log file name is "intel_pt.log" and is opened in the current
directory.  The log contains a record of all packets and instructions
decoded and can get very large (10 MB would be a small one).

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1432906425-9911-7-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/intel-pt-decoder/Build          |   2 +-
 tools/perf/util/intel-pt-decoder/intel-pt-log.c | 155 ++++++++++++++++++++++++
 tools/perf/util/intel-pt-decoder/intel-pt-log.h |  52 ++++++++
 3 files changed, 208 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.h

diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
index f5f7f87eb718..587321a67d95 100644
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -1,4 +1,4 @@
-libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o intel-pt-log.o
 
 inat_tables_script = ../../arch/x86/tools/gen-insn-attr-x86.awk
 inat_tables_maps = ../../arch/x86/lib/x86-opcode-map.txt
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.c b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
new file mode 100644
index 000000000000..d09c7d9f9050
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
@@ -0,0 +1,155 @@
+/*
+ * intel_pt_log.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdarg.h>
+#include <stdbool.h>
+#include <string.h>
+
+#include "intel-pt-log.h"
+#include "intel-pt-insn-decoder.h"
+
+#include "intel-pt-pkt-decoder.h"
+
+#define MAX_LOG_NAME 256
+
+static FILE *f;
+static char log_name[MAX_LOG_NAME];
+static bool enable_logging;
+
+void intel_pt_log_enable(void)
+{
+	enable_logging = true;
+}
+
+void intel_pt_log_disable(void)
+{
+	if (f)
+		fflush(f);
+	enable_logging = false;
+}
+
+void intel_pt_log_set_name(const char *name)
+{
+	strncpy(log_name, name, MAX_LOG_NAME - 5);
+	strcat(log_name, ".log");
+}
+
+static void intel_pt_print_data(const unsigned char *buf, int len, uint64_t pos,
+				int indent)
+{
+	int i;
+
+	for (i = 0; i < indent; i++)
+		fprintf(f, " ");
+
+	fprintf(f, "  %08" PRIx64 ": ", pos);
+	for (i = 0; i < len; i++)
+		fprintf(f, " %02x", buf[i]);
+	for (; i < 16; i++)
+		fprintf(f, "   ");
+	fprintf(f, " ");
+}
+
+static void intel_pt_print_no_data(uint64_t pos, int indent)
+{
+	int i;
+
+	for (i = 0; i < indent; i++)
+		fprintf(f, " ");
+
+	fprintf(f, "  %08" PRIx64 ": ", pos);
+	for (i = 0; i < 16; i++)
+		fprintf(f, "   ");
+	fprintf(f, " ");
+}
+
+static int intel_pt_log_open(void)
+{
+	if (!enable_logging)
+		return -1;
+
+	if (f)
+		return 0;
+
+	if (!log_name[0])
+		return -1;
+
+	f = fopen(log_name, "w+");
+	if (!f) {
+		enable_logging = false;
+		return -1;
+	}
+
+	return 0;
+}
+
+void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			 uint64_t pos, const unsigned char *buf)
+{
+	char desc[INTEL_PT_PKT_DESC_MAX];
+
+	if (intel_pt_log_open())
+		return;
+
+	intel_pt_print_data(buf, pkt_len, pos, 0);
+	intel_pt_pkt_desc(packet, desc, INTEL_PT_PKT_DESC_MAX);
+	fprintf(f, "%s\n", desc);
+}
+
+void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	char desc[INTEL_PT_INSN_DESC_MAX];
+	size_t len = intel_pt_insn->length;
+
+	if (intel_pt_log_open())
+		return;
+
+	if (len > INTEL_PT_INSN_DBG_BUF_SZ)
+		len = INTEL_PT_INSN_DBG_BUF_SZ;
+	intel_pt_print_data(intel_pt_insn->buf, len, ip, 8);
+	if (intel_pt_insn_desc(intel_pt_insn, desc, INTEL_PT_INSN_DESC_MAX) > 0)
+		fprintf(f, "%s\n", desc);
+	else
+		fprintf(f, "Bad instruction!\n");
+}
+
+void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	char desc[INTEL_PT_INSN_DESC_MAX];
+
+	if (intel_pt_log_open())
+		return;
+
+	intel_pt_print_no_data(ip, 8);
+	if (intel_pt_insn_desc(intel_pt_insn, desc, INTEL_PT_INSN_DESC_MAX) > 0)
+		fprintf(f, "%s\n", desc);
+	else
+		fprintf(f, "Bad instruction!\n");
+}
+
+void intel_pt_log(const char *fmt, ...)
+{
+	va_list args;
+
+	if (intel_pt_log_open())
+		return;
+
+	va_start(args, fmt);
+	vfprintf(f, fmt, args);
+	va_end(args);
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.h b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
new file mode 100644
index 000000000000..db3942f83677
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
@@ -0,0 +1,52 @@
+/*
+ * intel_pt_log.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_LOG_H__
+#define INCLUDE__INTEL_PT_LOG_H__
+
+#include <stdint.h>
+#include <inttypes.h>
+
+struct intel_pt_pkt;
+
+void intel_pt_log_enable(void);
+void intel_pt_log_disable(void);
+void intel_pt_log_set_name(const char *name);
+
+void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			 uint64_t pos, const unsigned char *buf);
+
+struct intel_pt_insn;
+
+void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip);
+void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn,
+			       uint64_t ip);
+
+__attribute__((format(printf, 1, 2)))
+void intel_pt_log(const char *fmt, ...);
+
+#define x64_fmt "0x%" PRIx64
+
+static inline void intel_pt_log_at(const char *msg, uint64_t u)
+{
+	intel_pt_log("%s at " x64_fmt "\n", msg, u);
+}
+
+static inline void intel_pt_log_to(const char *msg, uint64_t u)
+{
+	intel_pt_log("%s to " x64_fmt "\n", msg, u);
+}
+
+#endif
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 5/8] perf tools: Add Intel PT decoder
  2015-06-26 22:02 [GIT PULL 0/8] perf/pt -> Intel PT/BTS Arnaldo Carvalho de Melo
                   ` (3 preceding siblings ...)
  2015-06-26 22:02 ` [PATCH 4/8] perf tools: Add Intel PT log Arnaldo Carvalho de Melo
@ 2015-06-26 22:02 ` Arnaldo Carvalho de Melo
  2015-06-26 22:02 ` [PATCH 6/8] perf tools: Add Intel PT support Arnaldo Carvalho de Melo
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-26 22:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Adrian Hunter, David Ahern, Namhyung Kim,
	Peter Zijlstra, Stephane Eranian, Jiri Olsa,
	Arnaldo Carvalho de Melo

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for decoding an Intel Processor Trace.

Intel PT trace data must be 'decoded' which involves walking the object
code and matching the trace data packets.

The decoder requests a buffer of binary data via a get_trace()
call-back, which it decodes using instruction information which it gets
via another call-back walk_insn().

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1432906425-9911-8-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/intel-pt-decoder/Build             |    2 +-
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 1759 ++++++++++++++++++++
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |  102 ++
 3 files changed, 1862 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.h

diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
index 587321a67d95..fa12eac57a22 100644
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -1,4 +1,4 @@
-libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o intel-pt-log.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o intel-pt-log.o intel-pt-decoder.o
 
 inat_tables_script = ../../arch/x86/tools/gen-insn-attr-x86.awk
 inat_tables_maps = ../../arch/x86/lib/x86-opcode-map.txt
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
new file mode 100644
index 000000000000..748a7a078313
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -0,0 +1,1759 @@
+/*
+ * intel_pt_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include "../cache.h"
+#include "../util.h"
+
+#include "intel-pt-insn-decoder.h"
+#include "intel-pt-pkt-decoder.h"
+#include "intel-pt-decoder.h"
+#include "intel-pt-log.h"
+
+#define INTEL_PT_BLK_SIZE 1024
+
+#define BIT63 (((uint64_t)1 << 63))
+
+#define INTEL_PT_RETURN 1
+
+struct intel_pt_blk {
+	struct intel_pt_blk *prev;
+	uint64_t ip[INTEL_PT_BLK_SIZE];
+};
+
+struct intel_pt_stack {
+	struct intel_pt_blk *blk;
+	struct intel_pt_blk *spare;
+	int pos;
+};
+
+enum intel_pt_pkt_state {
+	INTEL_PT_STATE_NO_PSB,
+	INTEL_PT_STATE_NO_IP,
+	INTEL_PT_STATE_ERR_RESYNC,
+	INTEL_PT_STATE_IN_SYNC,
+	INTEL_PT_STATE_TNT,
+	INTEL_PT_STATE_TIP,
+	INTEL_PT_STATE_TIP_PGD,
+	INTEL_PT_STATE_FUP,
+	INTEL_PT_STATE_FUP_NO_TIP,
+};
+
+#ifdef INTEL_PT_STRICT
+#define INTEL_PT_STATE_ERR1	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR2	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR3	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR4	INTEL_PT_STATE_NO_PSB
+#else
+#define INTEL_PT_STATE_ERR1	(decoder->pkt_state)
+#define INTEL_PT_STATE_ERR2	INTEL_PT_STATE_NO_IP
+#define INTEL_PT_STATE_ERR3	INTEL_PT_STATE_ERR_RESYNC
+#define INTEL_PT_STATE_ERR4	INTEL_PT_STATE_IN_SYNC
+#endif
+
+struct intel_pt_decoder {
+	int (*get_trace)(struct intel_pt_buffer *buffer, void *data);
+	int (*walk_insn)(struct intel_pt_insn *intel_pt_insn,
+			 uint64_t *insn_cnt_ptr, uint64_t *ip, uint64_t to_ip,
+			 uint64_t max_insn_cnt, void *data);
+	void *data;
+	struct intel_pt_state state;
+	const unsigned char *buf;
+	size_t len;
+	bool return_compression;
+	bool pge;
+	uint64_t pos;
+	uint64_t last_ip;
+	uint64_t ip;
+	uint64_t cr3;
+	uint64_t timestamp;
+	uint64_t tsc_timestamp;
+	uint64_t ref_timestamp;
+	uint64_t ret_addr;
+	struct intel_pt_stack stack;
+	enum intel_pt_pkt_state pkt_state;
+	struct intel_pt_pkt packet;
+	struct intel_pt_pkt tnt;
+	int pkt_step;
+	int pkt_len;
+	unsigned int cbr;
+	int exec_mode;
+	unsigned int insn_bytes;
+	uint64_t sign_bit;
+	uint64_t sign_bits;
+	uint64_t period;
+	enum intel_pt_period_type period_type;
+	uint64_t period_insn_cnt;
+	uint64_t period_mask;
+	uint64_t period_ticks;
+	uint64_t last_masked_timestamp;
+	bool continuous_period;
+	bool overflow;
+	bool set_fup_tx_flags;
+	unsigned int fup_tx_flags;
+	unsigned int tx_flags;
+	uint64_t timestamp_insn_cnt;
+	const unsigned char *next_buf;
+	size_t next_len;
+	unsigned char temp_buf[INTEL_PT_PKT_MAX_SZ];
+};
+
+static uint64_t intel_pt_lower_power_of_2(uint64_t x)
+{
+	int i;
+
+	for (i = 0; x != 1; i++)
+		x >>= 1;
+
+	return x << i;
+}
+
+static void intel_pt_setup_period(struct intel_pt_decoder *decoder)
+{
+	if (decoder->period_type == INTEL_PT_PERIOD_TICKS) {
+		uint64_t period;
+
+		period = intel_pt_lower_power_of_2(decoder->period);
+		decoder->period_mask  = ~(period - 1);
+		decoder->period_ticks = period;
+	}
+}
+
+struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
+{
+	struct intel_pt_decoder *decoder;
+
+	if (!params->get_trace || !params->walk_insn)
+		return NULL;
+
+	decoder = zalloc(sizeof(struct intel_pt_decoder));
+	if (!decoder)
+		return NULL;
+
+	decoder->get_trace          = params->get_trace;
+	decoder->walk_insn          = params->walk_insn;
+	decoder->data               = params->data;
+	decoder->return_compression = params->return_compression;
+
+	decoder->sign_bit           = (uint64_t)1 << 47;
+	decoder->sign_bits          = ~(((uint64_t)1 << 48) - 1);
+
+	decoder->period             = params->period;
+	decoder->period_type        = params->period_type;
+
+	intel_pt_setup_period(decoder);
+
+	return decoder;
+}
+
+static void intel_pt_pop_blk(struct intel_pt_stack *stack)
+{
+	struct intel_pt_blk *blk = stack->blk;
+
+	stack->blk = blk->prev;
+	if (!stack->spare)
+		stack->spare = blk;
+	else
+		free(blk);
+}
+
+static uint64_t intel_pt_pop(struct intel_pt_stack *stack)
+{
+	if (!stack->pos) {
+		if (!stack->blk)
+			return 0;
+		intel_pt_pop_blk(stack);
+		if (!stack->blk)
+			return 0;
+		stack->pos = INTEL_PT_BLK_SIZE;
+	}
+	return stack->blk->ip[--stack->pos];
+}
+
+static int intel_pt_alloc_blk(struct intel_pt_stack *stack)
+{
+	struct intel_pt_blk *blk;
+
+	if (stack->spare) {
+		blk = stack->spare;
+		stack->spare = NULL;
+	} else {
+		blk = malloc(sizeof(struct intel_pt_blk));
+		if (!blk)
+			return -ENOMEM;
+	}
+
+	blk->prev = stack->blk;
+	stack->blk = blk;
+	stack->pos = 0;
+	return 0;
+}
+
+static int intel_pt_push(struct intel_pt_stack *stack, uint64_t ip)
+{
+	int err;
+
+	if (!stack->blk || stack->pos == INTEL_PT_BLK_SIZE) {
+		err = intel_pt_alloc_blk(stack);
+		if (err)
+			return err;
+	}
+
+	stack->blk->ip[stack->pos++] = ip;
+	return 0;
+}
+
+static void intel_pt_clear_stack(struct intel_pt_stack *stack)
+{
+	while (stack->blk)
+		intel_pt_pop_blk(stack);
+	stack->pos = 0;
+}
+
+static void intel_pt_free_stack(struct intel_pt_stack *stack)
+{
+	intel_pt_clear_stack(stack);
+	zfree(&stack->blk);
+	zfree(&stack->spare);
+}
+
+void intel_pt_decoder_free(struct intel_pt_decoder *decoder)
+{
+	intel_pt_free_stack(&decoder->stack);
+	free(decoder);
+}
+
+static int intel_pt_ext_err(int code)
+{
+	switch (code) {
+	case -ENOMEM:
+		return INTEL_PT_ERR_NOMEM;
+	case -ENOSYS:
+		return INTEL_PT_ERR_INTERN;
+	case -EBADMSG:
+		return INTEL_PT_ERR_BADPKT;
+	case -ENODATA:
+		return INTEL_PT_ERR_NODATA;
+	case -EILSEQ:
+		return INTEL_PT_ERR_NOINSN;
+	case -ENOENT:
+		return INTEL_PT_ERR_MISMAT;
+	case -EOVERFLOW:
+		return INTEL_PT_ERR_OVR;
+	case -ENOSPC:
+		return INTEL_PT_ERR_LOST;
+	default:
+		return INTEL_PT_ERR_UNK;
+	}
+}
+
+static const char *intel_pt_err_msgs[] = {
+	[INTEL_PT_ERR_NOMEM]  = "Memory allocation failed",
+	[INTEL_PT_ERR_INTERN] = "Internal error",
+	[INTEL_PT_ERR_BADPKT] = "Bad packet",
+	[INTEL_PT_ERR_NODATA] = "No more data",
+	[INTEL_PT_ERR_NOINSN] = "Failed to get instruction",
+	[INTEL_PT_ERR_MISMAT] = "Trace doesn't match instruction",
+	[INTEL_PT_ERR_OVR]    = "Overflow packet",
+	[INTEL_PT_ERR_LOST]   = "Lost trace data",
+	[INTEL_PT_ERR_UNK]    = "Unknown error!",
+};
+
+int intel_pt__strerror(int code, char *buf, size_t buflen)
+{
+	if (code < 1 || code > INTEL_PT_ERR_MAX)
+		code = INTEL_PT_ERR_UNK;
+	strlcpy(buf, intel_pt_err_msgs[code], buflen);
+	return 0;
+}
+
+static uint64_t intel_pt_calc_ip(struct intel_pt_decoder *decoder,
+				 const struct intel_pt_pkt *packet,
+				 uint64_t last_ip)
+{
+	uint64_t ip;
+
+	switch (packet->count) {
+	case 2:
+		ip = (last_ip & (uint64_t)0xffffffffffff0000ULL) |
+		     packet->payload;
+		break;
+	case 4:
+		ip = (last_ip & (uint64_t)0xffffffff00000000ULL) |
+		     packet->payload;
+		break;
+	case 6:
+		ip = packet->payload;
+		break;
+	default:
+		return 0;
+	}
+
+	if (ip & decoder->sign_bit)
+		return ip | decoder->sign_bits;
+
+	return ip;
+}
+
+static inline void intel_pt_set_last_ip(struct intel_pt_decoder *decoder)
+{
+	decoder->last_ip = intel_pt_calc_ip(decoder, &decoder->packet,
+					    decoder->last_ip);
+}
+
+static inline void intel_pt_set_ip(struct intel_pt_decoder *decoder)
+{
+	intel_pt_set_last_ip(decoder);
+	decoder->ip = decoder->last_ip;
+}
+
+static void intel_pt_decoder_log_packet(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log_packet(&decoder->packet, decoder->pkt_len, decoder->pos,
+			    decoder->buf);
+}
+
+static int intel_pt_bug(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log("ERROR: Internal error\n");
+	decoder->pkt_state = INTEL_PT_STATE_NO_PSB;
+	return -ENOSYS;
+}
+
+static inline void intel_pt_clear_tx_flags(struct intel_pt_decoder *decoder)
+{
+	decoder->tx_flags = 0;
+}
+
+static inline void intel_pt_update_in_tx(struct intel_pt_decoder *decoder)
+{
+	decoder->tx_flags = decoder->packet.payload & INTEL_PT_IN_TX;
+}
+
+static int intel_pt_bad_packet(struct intel_pt_decoder *decoder)
+{
+	intel_pt_clear_tx_flags(decoder);
+	decoder->pkt_len = 1;
+	decoder->pkt_step = 1;
+	intel_pt_decoder_log_packet(decoder);
+	if (decoder->pkt_state != INTEL_PT_STATE_NO_PSB) {
+		intel_pt_log("ERROR: Bad packet\n");
+		decoder->pkt_state = INTEL_PT_STATE_ERR1;
+	}
+	return -EBADMSG;
+}
+
+static int intel_pt_get_data(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_buffer buffer = { .buf = 0, };
+	int ret;
+
+	decoder->pkt_step = 0;
+
+	intel_pt_log("Getting more data\n");
+	ret = decoder->get_trace(&buffer, decoder->data);
+	if (ret)
+		return ret;
+	decoder->buf = buffer.buf;
+	decoder->len = buffer.len;
+	if (!decoder->len) {
+		intel_pt_log("No more data\n");
+		return -ENODATA;
+	}
+	if (!buffer.consecutive) {
+		decoder->ip = 0;
+		decoder->pkt_state = INTEL_PT_STATE_NO_PSB;
+		decoder->ref_timestamp = buffer.ref_timestamp;
+		decoder->timestamp = 0;
+		decoder->state.trace_nr = buffer.trace_nr;
+		intel_pt_log("Reference timestamp 0x%" PRIx64 "\n",
+			     decoder->ref_timestamp);
+		return -ENOLINK;
+	}
+
+	return 0;
+}
+
+static int intel_pt_get_next_data(struct intel_pt_decoder *decoder)
+{
+	if (!decoder->next_buf)
+		return intel_pt_get_data(decoder);
+
+	decoder->buf = decoder->next_buf;
+	decoder->len = decoder->next_len;
+	decoder->next_buf = 0;
+	decoder->next_len = 0;
+	return 0;
+}
+
+static int intel_pt_get_split_packet(struct intel_pt_decoder *decoder)
+{
+	unsigned char *buf = decoder->temp_buf;
+	size_t old_len, len, n;
+	int ret;
+
+	old_len = decoder->len;
+	len = decoder->len;
+	memcpy(buf, decoder->buf, len);
+
+	ret = intel_pt_get_data(decoder);
+	if (ret) {
+		decoder->pos += old_len;
+		return ret < 0 ? ret : -EINVAL;
+	}
+
+	n = INTEL_PT_PKT_MAX_SZ - len;
+	if (n > decoder->len)
+		n = decoder->len;
+	memcpy(buf + len, decoder->buf, n);
+	len += n;
+
+	ret = intel_pt_get_packet(buf, len, &decoder->packet);
+	if (ret < (int)old_len) {
+		decoder->next_buf = decoder->buf;
+		decoder->next_len = decoder->len;
+		decoder->buf = buf;
+		decoder->len = old_len;
+		return intel_pt_bad_packet(decoder);
+	}
+
+	decoder->next_buf = decoder->buf + (ret - old_len);
+	decoder->next_len = decoder->len - (ret - old_len);
+
+	decoder->buf = buf;
+	decoder->len = ret;
+
+	return ret;
+}
+
+static int intel_pt_get_next_packet(struct intel_pt_decoder *decoder)
+{
+	int ret;
+
+	do {
+		decoder->pos += decoder->pkt_step;
+		decoder->buf += decoder->pkt_step;
+		decoder->len -= decoder->pkt_step;
+
+		if (!decoder->len) {
+			ret = intel_pt_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		ret = intel_pt_get_packet(decoder->buf, decoder->len,
+					  &decoder->packet);
+		if (ret == INTEL_PT_NEED_MORE_BYTES &&
+		    decoder->len < INTEL_PT_PKT_MAX_SZ && !decoder->next_buf) {
+			ret = intel_pt_get_split_packet(decoder);
+			if (ret < 0)
+				return ret;
+		}
+		if (ret <= 0)
+			return intel_pt_bad_packet(decoder);
+
+		decoder->pkt_len = ret;
+		decoder->pkt_step = ret;
+		intel_pt_decoder_log_packet(decoder);
+	} while (decoder->packet.type == INTEL_PT_PAD);
+
+	return 0;
+}
+
+static uint64_t intel_pt_next_period(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp, masked_timestamp;
+
+	timestamp = decoder->timestamp + decoder->timestamp_insn_cnt;
+	masked_timestamp = timestamp & decoder->period_mask;
+	if (decoder->continuous_period) {
+		if (masked_timestamp != decoder->last_masked_timestamp)
+			return 1;
+	} else {
+		timestamp += 1;
+		masked_timestamp = timestamp & decoder->period_mask;
+		if (masked_timestamp != decoder->last_masked_timestamp) {
+			decoder->last_masked_timestamp = masked_timestamp;
+			decoder->continuous_period = true;
+		}
+	}
+	return decoder->period_ticks - (timestamp - masked_timestamp);
+}
+
+static uint64_t intel_pt_next_sample(struct intel_pt_decoder *decoder)
+{
+	switch (decoder->period_type) {
+	case INTEL_PT_PERIOD_INSTRUCTIONS:
+		return decoder->period - decoder->period_insn_cnt;
+	case INTEL_PT_PERIOD_TICKS:
+		return intel_pt_next_period(decoder);
+	case INTEL_PT_PERIOD_NONE:
+	default:
+		return 0;
+	}
+}
+
+static void intel_pt_sample_insn(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp, masked_timestamp;
+
+	switch (decoder->period_type) {
+	case INTEL_PT_PERIOD_INSTRUCTIONS:
+		decoder->period_insn_cnt = 0;
+		break;
+	case INTEL_PT_PERIOD_TICKS:
+		timestamp = decoder->timestamp + decoder->timestamp_insn_cnt;
+		masked_timestamp = timestamp & decoder->period_mask;
+		decoder->last_masked_timestamp = masked_timestamp;
+		break;
+	case INTEL_PT_PERIOD_NONE:
+	default:
+		break;
+	}
+
+	decoder->state.type |= INTEL_PT_INSTRUCTION;
+}
+
+static int intel_pt_walk_insn(struct intel_pt_decoder *decoder,
+			      struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	uint64_t max_insn_cnt, insn_cnt = 0;
+	int err;
+
+	max_insn_cnt = intel_pt_next_sample(decoder);
+
+	err = decoder->walk_insn(intel_pt_insn, &insn_cnt, &decoder->ip, ip,
+				 max_insn_cnt, decoder->data);
+
+	decoder->timestamp_insn_cnt += insn_cnt;
+	decoder->period_insn_cnt += insn_cnt;
+
+	if (err) {
+		decoder->pkt_state = INTEL_PT_STATE_ERR2;
+		intel_pt_log_at("ERROR: Failed to get instruction",
+				decoder->ip);
+		if (err == -ENOENT)
+			return -ENOLINK;
+		return -EILSEQ;
+	}
+
+	if (ip && decoder->ip == ip) {
+		err = -EAGAIN;
+		goto out;
+	}
+
+	if (max_insn_cnt && insn_cnt >= max_insn_cnt)
+		intel_pt_sample_insn(decoder);
+
+	if (intel_pt_insn->branch == INTEL_PT_BR_NO_BRANCH) {
+		decoder->state.type = INTEL_PT_INSTRUCTION;
+		decoder->state.from_ip = decoder->ip;
+		decoder->state.to_ip = 0;
+		decoder->ip += intel_pt_insn->length;
+		err = INTEL_PT_RETURN;
+		goto out;
+	}
+
+	if (intel_pt_insn->op == INTEL_PT_OP_CALL) {
+		/* Zero-length calls are excluded */
+		if (intel_pt_insn->branch != INTEL_PT_BR_UNCONDITIONAL ||
+		    intel_pt_insn->rel) {
+			err = intel_pt_push(&decoder->stack, decoder->ip +
+					    intel_pt_insn->length);
+			if (err)
+				goto out;
+		}
+	} else if (intel_pt_insn->op == INTEL_PT_OP_RET) {
+		decoder->ret_addr = intel_pt_pop(&decoder->stack);
+	}
+
+	if (intel_pt_insn->branch == INTEL_PT_BR_UNCONDITIONAL) {
+		decoder->state.from_ip = decoder->ip;
+		decoder->ip += intel_pt_insn->length +
+				intel_pt_insn->rel;
+		decoder->state.to_ip = decoder->ip;
+		err = INTEL_PT_RETURN;
+	}
+out:
+	decoder->state.insn_op = intel_pt_insn->op;
+	decoder->state.insn_len = intel_pt_insn->length;
+
+	if (decoder->tx_flags & INTEL_PT_IN_TX)
+		decoder->state.flags |= INTEL_PT_IN_TX;
+
+	return err;
+}
+
+static int intel_pt_walk_fup(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	uint64_t ip;
+	int err;
+
+	ip = decoder->last_ip;
+
+	while (1) {
+		err = intel_pt_walk_insn(decoder, &intel_pt_insn, ip);
+		if (err == INTEL_PT_RETURN)
+			return 0;
+		if (err == -EAGAIN) {
+			if (decoder->set_fup_tx_flags) {
+				decoder->set_fup_tx_flags = false;
+				decoder->tx_flags = decoder->fup_tx_flags;
+				decoder->state.type = INTEL_PT_TRANSACTION;
+				decoder->state.from_ip = decoder->ip;
+				decoder->state.to_ip = 0;
+				decoder->state.flags = decoder->fup_tx_flags;
+				return 0;
+			}
+			return err;
+		}
+		decoder->set_fup_tx_flags = false;
+		if (err)
+			return err;
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+			intel_pt_log_at("ERROR: Unexpected indirect branch",
+					decoder->ip);
+			decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+			return -ENOENT;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+			intel_pt_log_at("ERROR: Unexpected conditional branch",
+					decoder->ip);
+			decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+			return -ENOENT;
+		}
+
+		intel_pt_bug(decoder);
+	}
+}
+
+static int intel_pt_walk_tip(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	int err;
+
+	err = intel_pt_walk_insn(decoder, &intel_pt_insn, 0);
+	if (err == INTEL_PT_RETURN)
+		return 0;
+	if (err)
+		return err;
+
+	if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+		if (decoder->pkt_state == INTEL_PT_STATE_TIP_PGD) {
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = 0;
+			if (decoder->packet.count != 0)
+				decoder->ip = decoder->last_ip;
+		} else {
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->state.from_ip = decoder->ip;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				decoder->state.to_ip = decoder->last_ip;
+				decoder->ip = decoder->last_ip;
+			}
+		}
+		return 0;
+	}
+
+	if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+		intel_pt_log_at("ERROR: Conditional branch when expecting indirect branch",
+				decoder->ip);
+		decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+		return -ENOENT;
+	}
+
+	return intel_pt_bug(decoder);
+}
+
+static int intel_pt_walk_tnt(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	int err;
+
+	while (1) {
+		err = intel_pt_walk_insn(decoder, &intel_pt_insn, 0);
+		if (err == INTEL_PT_RETURN)
+			return 0;
+		if (err)
+			return err;
+
+		if (intel_pt_insn.op == INTEL_PT_OP_RET) {
+			if (!decoder->return_compression) {
+				intel_pt_log_at("ERROR: RET when expecting conditional branch",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			if (!decoder->ret_addr) {
+				intel_pt_log_at("ERROR: Bad RET compression (stack empty)",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			if (!(decoder->tnt.payload & BIT63)) {
+				intel_pt_log_at("ERROR: Bad RET compression (TNT=N)",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			decoder->tnt.count -= 1;
+			if (!decoder->tnt.count)
+				decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->tnt.payload <<= 1;
+			decoder->state.from_ip = decoder->ip;
+			decoder->ip = decoder->ret_addr;
+			decoder->state.to_ip = decoder->ip;
+			return 0;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+			/* Handle deferred TIPs */
+			err = intel_pt_get_next_packet(decoder);
+			if (err)
+				return err;
+			if (decoder->packet.type != INTEL_PT_TIP ||
+			    decoder->packet.count == 0) {
+				intel_pt_log_at("ERROR: Missing deferred TIP for indirect branch",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				decoder->pkt_step = 0;
+				return -ENOENT;
+			}
+			intel_pt_set_last_ip(decoder);
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = decoder->last_ip;
+			decoder->ip = decoder->last_ip;
+			return 0;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+			decoder->tnt.count -= 1;
+			if (!decoder->tnt.count)
+				decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			if (decoder->tnt.payload & BIT63) {
+				decoder->tnt.payload <<= 1;
+				decoder->state.from_ip = decoder->ip;
+				decoder->ip += intel_pt_insn.length +
+					       intel_pt_insn.rel;
+				decoder->state.to_ip = decoder->ip;
+				return 0;
+			}
+			/* Instruction sample for a non-taken branch */
+			if (decoder->state.type & INTEL_PT_INSTRUCTION) {
+				decoder->tnt.payload <<= 1;
+				decoder->state.type = INTEL_PT_INSTRUCTION;
+				decoder->state.from_ip = decoder->ip;
+				decoder->state.to_ip = 0;
+				decoder->ip += intel_pt_insn.length;
+				return 0;
+			}
+			decoder->ip += intel_pt_insn.length;
+			if (!decoder->tnt.count)
+				return -EAGAIN;
+			decoder->tnt.payload <<= 1;
+			continue;
+		}
+
+		return intel_pt_bug(decoder);
+	}
+}
+
+static int intel_pt_mode_tsx(struct intel_pt_decoder *decoder, bool *no_tip)
+{
+	unsigned int fup_tx_flags;
+	int err;
+
+	fup_tx_flags = decoder->packet.payload &
+		       (INTEL_PT_IN_TX | INTEL_PT_ABORT_TX);
+	err = intel_pt_get_next_packet(decoder);
+	if (err)
+		return err;
+	if (decoder->packet.type == INTEL_PT_FUP) {
+		decoder->fup_tx_flags = fup_tx_flags;
+		decoder->set_fup_tx_flags = true;
+		if (!(decoder->fup_tx_flags & INTEL_PT_ABORT_TX))
+			*no_tip = true;
+	} else {
+		intel_pt_log_at("ERROR: Missing FUP after MODE.TSX",
+				decoder->pos);
+		intel_pt_update_in_tx(decoder);
+	}
+	return 0;
+}
+
+static void intel_pt_calc_tsc_timestamp(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp;
+
+	if (decoder->ref_timestamp) {
+		timestamp = decoder->packet.payload |
+			    (decoder->ref_timestamp & (0xffULL << 56));
+		if (timestamp < decoder->ref_timestamp) {
+			if (decoder->ref_timestamp - timestamp > (1ULL << 55))
+				timestamp += (1ULL << 56);
+		} else {
+			if (timestamp - decoder->ref_timestamp > (1ULL << 55))
+				timestamp -= (1ULL << 56);
+		}
+		decoder->tsc_timestamp = timestamp;
+		decoder->timestamp = timestamp;
+		decoder->ref_timestamp = 0;
+		decoder->timestamp_insn_cnt = 0;
+	} else if (decoder->timestamp) {
+		timestamp = decoder->packet.payload |
+			    (decoder->timestamp & (0xffULL << 56));
+		if (timestamp < decoder->timestamp &&
+		    decoder->timestamp - timestamp < 0x100) {
+			intel_pt_log_to("ERROR: Suppressing backwards timestamp",
+					timestamp);
+			timestamp = decoder->timestamp;
+		}
+		while (timestamp < decoder->timestamp) {
+			intel_pt_log_to("Wraparound timestamp", timestamp);
+			timestamp += (1ULL << 56);
+		}
+		decoder->tsc_timestamp = timestamp;
+		decoder->timestamp = timestamp;
+		decoder->timestamp_insn_cnt = 0;
+	}
+
+	intel_pt_log_to("Setting timestamp", decoder->timestamp);
+}
+
+static int intel_pt_overflow(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log("ERROR: Buffer overflow\n");
+	intel_pt_clear_tx_flags(decoder);
+	decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+	decoder->overflow = true;
+	return -EOVERFLOW;
+}
+
+/* Walk PSB+ packets when already in sync. */
+static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_PSBEND:
+			return 0;
+
+		case INTEL_PT_TIP_PGD:
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+		case INTEL_PT_TNT:
+		case INTEL_PT_BAD:
+		case INTEL_PT_PSB:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			return -EAGAIN;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_FUP:
+			decoder->pge = true;
+			intel_pt_set_last_ip(decoder);
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	if (decoder->tx_flags & INTEL_PT_ABORT_TX) {
+		decoder->tx_flags = 0;
+		decoder->state.flags &= ~INTEL_PT_IN_TX;
+		decoder->state.flags |= INTEL_PT_ABORT_TX;
+	} else {
+		decoder->state.flags |= INTEL_PT_ASYNC;
+	}
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TNT:
+		case INTEL_PT_FUP:
+		case INTEL_PT_PSB:
+		case INTEL_PT_TSC:
+		case INTEL_PT_CBR:
+		case INTEL_PT_MODE_TSX:
+		case INTEL_PT_BAD:
+		case INTEL_PT_PSBEND:
+			intel_pt_log("ERROR: Missing TIP after FUP\n");
+			decoder->pkt_state = INTEL_PT_STATE_ERR3;
+			return -ENOENT;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TIP_PGD:
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = 0;
+			if (decoder->packet.count != 0) {
+				intel_pt_set_ip(decoder);
+				intel_pt_log("Omitting PGD ip " x64_fmt "\n",
+					     decoder->ip);
+			}
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			return 0;
+
+		case INTEL_PT_TIP_PGE:
+			decoder->pge = true;
+			intel_pt_log("Omitting PGE ip " x64_fmt "\n",
+				     decoder->ip);
+			decoder->state.from_ip = 0;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				intel_pt_set_ip(decoder);
+				decoder->state.to_ip = decoder->ip;
+			}
+			return 0;
+
+		case INTEL_PT_TIP:
+			decoder->state.from_ip = decoder->ip;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				intel_pt_set_ip(decoder);
+				decoder->state.to_ip = decoder->ip;
+			}
+			return 0;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PAD:
+			break;
+
+		default:
+			return intel_pt_bug(decoder);
+		}
+	}
+}
+
+static int intel_pt_walk_trace(struct intel_pt_decoder *decoder)
+{
+	bool no_tip = false;
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+next:
+		switch (decoder->packet.type) {
+		case INTEL_PT_TNT:
+			if (!decoder->packet.count)
+				break;
+			decoder->tnt = decoder->packet;
+			decoder->pkt_state = INTEL_PT_STATE_TNT;
+			err = intel_pt_walk_tnt(decoder);
+			if (err == -EAGAIN)
+				break;
+			return err;
+
+		case INTEL_PT_TIP_PGD:
+			if (decoder->packet.count != 0)
+				intel_pt_set_last_ip(decoder);
+			decoder->pkt_state = INTEL_PT_STATE_TIP_PGD;
+			return intel_pt_walk_tip(decoder);
+
+		case INTEL_PT_TIP_PGE: {
+			decoder->pge = true;
+			if (decoder->packet.count == 0) {
+				intel_pt_log_at("Skipping zero TIP.PGE",
+						decoder->pos);
+				break;
+			}
+			intel_pt_set_ip(decoder);
+			decoder->state.from_ip = 0;
+			decoder->state.to_ip = decoder->ip;
+			return 0;
+		}
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TIP:
+			if (decoder->packet.count != 0)
+				intel_pt_set_last_ip(decoder);
+			decoder->pkt_state = INTEL_PT_STATE_TIP;
+			return intel_pt_walk_tip(decoder);
+
+		case INTEL_PT_FUP:
+			if (decoder->packet.count == 0) {
+				intel_pt_log_at("Skipping zero FUP",
+						decoder->pos);
+				no_tip = false;
+				break;
+			}
+			intel_pt_set_last_ip(decoder);
+			err = intel_pt_walk_fup(decoder);
+			if (err != -EAGAIN) {
+				if (err)
+					return err;
+				if (no_tip)
+					decoder->pkt_state =
+						INTEL_PT_STATE_FUP_NO_TIP;
+				else
+					decoder->pkt_state = INTEL_PT_STATE_FUP;
+				return 0;
+			}
+			if (no_tip) {
+				no_tip = false;
+				break;
+			}
+			return intel_pt_walk_fup_tip(decoder);
+
+		case INTEL_PT_PSB:
+			intel_pt_clear_stack(&decoder->stack);
+			err = intel_pt_walk_psbend(decoder);
+			if (err == -EAGAIN)
+				goto next;
+			if (err)
+				return err;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			/* MODE_TSX need not be followed by FUP */
+			if (!decoder->pge) {
+				intel_pt_update_in_tx(decoder);
+				break;
+			}
+			err = intel_pt_mode_tsx(decoder, &no_tip);
+			if (err)
+				return err;
+			goto next;
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_PSBEND:
+		case INTEL_PT_PAD:
+			break;
+
+		default:
+			return intel_pt_bug(decoder);
+		}
+	}
+}
+
+/* Walk PSB+ packets to get in sync. */
+static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TIP_PGD:
+			decoder->continuous_period = false;
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			return -ENOENT;
+
+		case INTEL_PT_FUP:
+			decoder->pge = true;
+			if (decoder->last_ip || decoder->packet.count == 6 ||
+			    decoder->packet.count == 0) {
+				uint64_t current_ip = decoder->ip;
+
+				intel_pt_set_ip(decoder);
+				if (current_ip)
+					intel_pt_log_to("Setting IP",
+							decoder->ip);
+			}
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_TNT:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			if (decoder->ip)
+				decoder->pkt_state = INTEL_PT_STATE_ERR4;
+			else
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+			return -ENOENT;
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_PSBEND:
+			return 0;
+
+		case INTEL_PT_PSB:
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TIP_PGD:
+			decoder->continuous_period = false;
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+			decoder->pge = decoder->packet.type != INTEL_PT_TIP_PGD;
+			if (decoder->last_ip || decoder->packet.count == 6 ||
+			    decoder->packet.count == 0)
+				intel_pt_set_ip(decoder);
+			if (decoder->ip)
+				return 0;
+			break;
+
+		case INTEL_PT_FUP:
+			if (decoder->overflow) {
+				if (decoder->last_ip ||
+				    decoder->packet.count == 6 ||
+				    decoder->packet.count == 0)
+					intel_pt_set_ip(decoder);
+				if (decoder->ip)
+					return 0;
+			}
+			if (decoder->packet.count)
+				intel_pt_set_last_ip(decoder);
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_PSB:
+			err = intel_pt_walk_psb(decoder);
+			if (err)
+				return err;
+			if (decoder->ip) {
+				/* Do not have a sample */
+				decoder->state.type = 0;
+				return 0;
+			}
+			break;
+
+		case INTEL_PT_TNT:
+		case INTEL_PT_PSBEND:
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_sync_ip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	intel_pt_log("Scanning for full IP\n");
+	err = intel_pt_walk_to_ip(decoder);
+	if (err)
+		return err;
+
+	decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+	decoder->overflow = false;
+
+	decoder->state.from_ip = 0;
+	decoder->state.to_ip = decoder->ip;
+	intel_pt_log_to("Setting IP", decoder->ip);
+
+	return 0;
+}
+
+static int intel_pt_part_psb(struct intel_pt_decoder *decoder)
+{
+	const unsigned char *end = decoder->buf + decoder->len;
+	size_t i;
+
+	for (i = INTEL_PT_PSB_LEN - 1; i; i--) {
+		if (i > decoder->len)
+			continue;
+		if (!memcmp(end - i, INTEL_PT_PSB_STR, i))
+			return i;
+	}
+	return 0;
+}
+
+static int intel_pt_rest_psb(struct intel_pt_decoder *decoder, int part_psb)
+{
+	size_t rest_psb = INTEL_PT_PSB_LEN - part_psb;
+	const char *psb = INTEL_PT_PSB_STR;
+
+	if (rest_psb > decoder->len ||
+	    memcmp(decoder->buf, psb + part_psb, rest_psb))
+		return 0;
+
+	return rest_psb;
+}
+
+static int intel_pt_get_split_psb(struct intel_pt_decoder *decoder,
+				  int part_psb)
+{
+	int rest_psb, ret;
+
+	decoder->pos += decoder->len;
+	decoder->len = 0;
+
+	ret = intel_pt_get_next_data(decoder);
+	if (ret)
+		return ret;
+
+	rest_psb = intel_pt_rest_psb(decoder, part_psb);
+	if (!rest_psb)
+		return 0;
+
+	decoder->pos -= part_psb;
+	decoder->next_buf = decoder->buf + rest_psb;
+	decoder->next_len = decoder->len - rest_psb;
+	memcpy(decoder->temp_buf, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	decoder->buf = decoder->temp_buf;
+	decoder->len = INTEL_PT_PSB_LEN;
+
+	return 0;
+}
+
+static int intel_pt_scan_for_psb(struct intel_pt_decoder *decoder)
+{
+	unsigned char *next;
+	int ret;
+
+	intel_pt_log("Scanning for PSB\n");
+	while (1) {
+		if (!decoder->len) {
+			ret = intel_pt_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		next = memmem(decoder->buf, decoder->len, INTEL_PT_PSB_STR,
+			      INTEL_PT_PSB_LEN);
+		if (!next) {
+			int part_psb;
+
+			part_psb = intel_pt_part_psb(decoder);
+			if (part_psb) {
+				ret = intel_pt_get_split_psb(decoder, part_psb);
+				if (ret)
+					return ret;
+			} else {
+				decoder->pos += decoder->len;
+				decoder->len = 0;
+			}
+			continue;
+		}
+
+		decoder->pkt_step = next - decoder->buf;
+		return intel_pt_get_next_packet(decoder);
+	}
+}
+
+static int intel_pt_sync(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	decoder->pge = false;
+	decoder->continuous_period = false;
+	decoder->last_ip = 0;
+	decoder->ip = 0;
+	intel_pt_clear_stack(&decoder->stack);
+
+	err = intel_pt_scan_for_psb(decoder);
+	if (err)
+		return err;
+
+	decoder->pkt_state = INTEL_PT_STATE_NO_IP;
+
+	err = intel_pt_walk_psb(decoder);
+	if (err)
+		return err;
+
+	if (decoder->ip) {
+		decoder->state.type = 0; /* Do not have a sample */
+		decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+	} else {
+		return intel_pt_sync_ip(decoder);
+	}
+
+	return 0;
+}
+
+const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	do {
+		decoder->state.type = INTEL_PT_BRANCH;
+		decoder->state.flags = 0;
+
+		switch (decoder->pkt_state) {
+		case INTEL_PT_STATE_NO_PSB:
+			err = intel_pt_sync(decoder);
+			break;
+		case INTEL_PT_STATE_NO_IP:
+			decoder->last_ip = 0;
+			/* Fall through */
+		case INTEL_PT_STATE_ERR_RESYNC:
+			err = intel_pt_sync_ip(decoder);
+			break;
+		case INTEL_PT_STATE_IN_SYNC:
+			err = intel_pt_walk_trace(decoder);
+			break;
+		case INTEL_PT_STATE_TNT:
+			err = intel_pt_walk_tnt(decoder);
+			if (err == -EAGAIN)
+				err = intel_pt_walk_trace(decoder);
+			break;
+		case INTEL_PT_STATE_TIP:
+		case INTEL_PT_STATE_TIP_PGD:
+			err = intel_pt_walk_tip(decoder);
+			break;
+		case INTEL_PT_STATE_FUP:
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			err = intel_pt_walk_fup(decoder);
+			if (err == -EAGAIN)
+				err = intel_pt_walk_fup_tip(decoder);
+			else if (!err)
+				decoder->pkt_state = INTEL_PT_STATE_FUP;
+			break;
+		case INTEL_PT_STATE_FUP_NO_TIP:
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			err = intel_pt_walk_fup(decoder);
+			if (err == -EAGAIN)
+				err = intel_pt_walk_trace(decoder);
+			break;
+		default:
+			err = intel_pt_bug(decoder);
+			break;
+		}
+	} while (err == -ENOLINK);
+
+	decoder->state.err = err ? intel_pt_ext_err(err) : 0;
+	decoder->state.timestamp = decoder->timestamp;
+	decoder->state.est_timestamp = decoder->timestamp +
+				       (decoder->timestamp_insn_cnt << 1);
+	decoder->state.cr3 = decoder->cr3;
+
+	if (err)
+		decoder->state.from_ip = decoder->ip;
+
+	return &decoder->state;
+}
+
+static bool intel_pt_at_psb(unsigned char *buf, size_t len)
+{
+	if (len < INTEL_PT_PSB_LEN)
+		return false;
+	return memmem(buf, INTEL_PT_PSB_LEN, INTEL_PT_PSB_STR,
+		      INTEL_PT_PSB_LEN);
+}
+
+/**
+ * intel_pt_next_psb - move buffer pointer to the start of the next PSB packet.
+ * @buf: pointer to buffer pointer
+ * @len: size of buffer
+ *
+ * Updates the buffer pointer to point to the start of the next PSB packet if
+ * there is one, otherwise the buffer pointer is unchanged.  If @buf is updated,
+ * @len is adjusted accordingly.
+ *
+ * Return: %true if a PSB packet is found, %false otherwise.
+ */
+static bool intel_pt_next_psb(unsigned char **buf, size_t *len)
+{
+	unsigned char *next;
+
+	next = memmem(*buf, *len, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	if (next) {
+		*len -= next - *buf;
+		*buf = next;
+		return true;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_step_psb - move buffer pointer to the start of the following PSB
+ *                     packet.
+ * @buf: pointer to buffer pointer
+ * @len: size of buffer
+ *
+ * Updates the buffer pointer to point to the start of the following PSB packet
+ * (skipping the PSB at @buf itself) if there is one, otherwise the buffer
+ * pointer is unchanged.  If @buf is updated, @len is adjusted accordingly.
+ *
+ * Return: %true if a PSB packet is found, %false otherwise.
+ */
+static bool intel_pt_step_psb(unsigned char **buf, size_t *len)
+{
+	unsigned char *next;
+
+	if (!*len)
+		return false;
+
+	next = memmem(*buf + 1, *len - 1, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	if (next) {
+		*len -= next - *buf;
+		*buf = next;
+		return true;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_last_psb - find the last PSB packet in a buffer.
+ * @buf: buffer
+ * @len: size of buffer
+ *
+ * This function finds the last PSB in a buffer.
+ *
+ * Return: A pointer to the last PSB in @buf if found, %NULL otherwise.
+ */
+static unsigned char *intel_pt_last_psb(unsigned char *buf, size_t len)
+{
+	const char *n = INTEL_PT_PSB_STR;
+	unsigned char *p;
+	size_t k;
+
+	if (len < INTEL_PT_PSB_LEN)
+		return NULL;
+
+	k = len - INTEL_PT_PSB_LEN + 1;
+	while (1) {
+		p = memrchr(buf, n[0], k);
+		if (!p)
+			return NULL;
+		if (!memcmp(p + 1, n + 1, INTEL_PT_PSB_LEN - 1))
+			return p;
+		k = p - buf;
+		if (!k)
+			return NULL;
+	}
+}
+
+/**
+ * intel_pt_next_tsc - find and return next TSC.
+ * @buf: buffer
+ * @len: size of buffer
+ * @tsc: TSC value returned
+ *
+ * Find a TSC packet in @buf and return the TSC value.  This function assumes
+ * that @buf starts at a PSB and that PSB+ will contain TSC and so stops if a
+ * PSBEND packet is found.
+ *
+ * Return: %true if TSC is found, false otherwise.
+ */
+static bool intel_pt_next_tsc(unsigned char *buf, size_t len, uint64_t *tsc)
+{
+	struct intel_pt_pkt packet;
+	int ret;
+
+	while (len) {
+		ret = intel_pt_get_packet(buf, len, &packet);
+		if (ret <= 0)
+			return false;
+		if (packet.type == INTEL_PT_TSC) {
+			*tsc = packet.payload;
+			return true;
+		}
+		if (packet.type == INTEL_PT_PSBEND)
+			return false;
+		buf += ret;
+		len -= ret;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_tsc_cmp - compare 7-byte TSCs.
+ * @tsc1: first TSC to compare
+ * @tsc2: second TSC to compare
+ *
+ * This function compares 7-byte TSC values allowing for the possibility that
+ * TSC wrapped around.  Generally it is not possible to know if TSC has wrapped
+ * around so for that purpose this function assumes the absolute difference is
+ * less than half the maximum difference.
+ *
+ * Return: %-1 if @tsc1 is before @tsc2, %0 if @tsc1 == @tsc2, %1 if @tsc1 is
+ * after @tsc2.
+ */
+static int intel_pt_tsc_cmp(uint64_t tsc1, uint64_t tsc2)
+{
+	const uint64_t halfway = (1ULL << 55);
+
+	if (tsc1 == tsc2)
+		return 0;
+
+	if (tsc1 < tsc2) {
+		if (tsc2 - tsc1 < halfway)
+			return -1;
+		else
+			return 1;
+	} else {
+		if (tsc1 - tsc2 < halfway)
+			return 1;
+		else
+			return -1;
+	}
+}
+
+/**
+ * intel_pt_find_overlap_tsc - determine start of non-overlapped trace data
+ *                             using TSC.
+ * @buf_a: first buffer
+ * @len_a: size of first buffer
+ * @buf_b: second buffer
+ * @len_b: size of second buffer
+ *
+ * If the trace contains TSC we can look at the last TSC of @buf_a and the
+ * first TSC of @buf_b in order to determine if the buffers overlap, and then
+ * walk forward in @buf_b until a later TSC is found.  A precondition is that
+ * @buf_a and @buf_b are positioned at a PSB.
+ *
+ * Return: A pointer into @buf_b from where non-overlapped data starts, or
+ * @buf_b + @len_b if there is no non-overlapped data.
+ */
+static unsigned char *intel_pt_find_overlap_tsc(unsigned char *buf_a,
+						size_t len_a,
+						unsigned char *buf_b,
+						size_t len_b)
+{
+	uint64_t tsc_a, tsc_b;
+	unsigned char *p;
+	size_t len;
+
+	p = intel_pt_last_psb(buf_a, len_a);
+	if (!p)
+		return buf_b; /* No PSB in buf_a => no overlap */
+
+	len = len_a - (p - buf_a);
+	if (!intel_pt_next_tsc(p, len, &tsc_a)) {
+		/* The last PSB+ in buf_a is incomplete, so go back one more */
+		len_a -= len;
+		p = intel_pt_last_psb(buf_a, len_a);
+		if (!p)
+			return buf_b; /* No full PSB+ => assume no overlap */
+		len = len_a - (p - buf_a);
+		if (!intel_pt_next_tsc(p, len, &tsc_a))
+			return buf_b; /* No TSC in buf_a => assume no overlap */
+	}
+
+	while (1) {
+		/* Ignore PSB+ with no TSC */
+		if (intel_pt_next_tsc(buf_b, len_b, &tsc_b) &&
+		    intel_pt_tsc_cmp(tsc_a, tsc_b) < 0)
+			return buf_b; /* tsc_a < tsc_b => no overlap */
+
+		if (!intel_pt_step_psb(&buf_b, &len_b))
+			return buf_b + len_b; /* No PSB in buf_b => no data */
+	}
+}
+
+/**
+ * intel_pt_find_overlap - determine start of non-overlapped trace data.
+ * @buf_a: first buffer
+ * @len_a: size of first buffer
+ * @buf_b: second buffer
+ * @len_b: size of second buffer
+ * @have_tsc: can use TSC packets to detect overlap
+ *
+ * When trace samples or snapshots are recorded there is the possibility that
+ * the data overlaps.  Note that, for the purposes of decoding, data is only
+ * useful if it begins with a PSB packet.
+ *
+ * Return: A pointer into @buf_b from where non-overlapped data starts, or
+ * @buf_b + @len_b if there is no non-overlapped data.
+ */
+unsigned char *intel_pt_find_overlap(unsigned char *buf_a, size_t len_a,
+				     unsigned char *buf_b, size_t len_b,
+				     bool have_tsc)
+{
+	unsigned char *found;
+
+	/* Buffer 'b' must start at PSB so throw away everything before that */
+	if (!intel_pt_next_psb(&buf_b, &len_b))
+		return buf_b + len_b; /* No PSB */
+
+	if (!intel_pt_next_psb(&buf_a, &len_a))
+		return buf_b; /* No overlap */
+
+	if (have_tsc) {
+		found = intel_pt_find_overlap_tsc(buf_a, len_a, buf_b, len_b);
+		if (found)
+			return found;
+	}
+
+	/*
+	 * Buffer 'b' cannot end within buffer 'a' so, for comparison purposes,
+	 * we can ignore the first part of buffer 'a'.
+	 */
+	while (len_b < len_a) {
+		if (!intel_pt_step_psb(&buf_a, &len_a))
+			return buf_b; /* No overlap */
+	}
+
+	/* Now len_b >= len_a */
+	if (len_b > len_a) {
+		/* The leftover buffer 'b' must start at a PSB */
+		while (!intel_pt_at_psb(buf_b + len_a, len_b - len_a)) {
+			if (!intel_pt_step_psb(&buf_a, &len_a))
+				return buf_b; /* No overlap */
+		}
+	}
+
+	while (1) {
+		/* Potential overlap so check the bytes */
+		found = memmem(buf_a, len_a, buf_b, len_a);
+		if (found)
+			return buf_b + len_a;
+
+		/* Try again at next PSB in buffer 'a' */
+		if (!intel_pt_step_psb(&buf_a, &len_a))
+			return buf_b; /* No overlap */
+
+		/* The leftover buffer 'b' must start at a PSB */
+		while (!intel_pt_at_psb(buf_b + len_a, len_b - len_a)) {
+			if (!intel_pt_step_psb(&buf_a, &len_a))
+				return buf_b; /* No overlap */
+		}
+	}
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
new file mode 100644
index 000000000000..955263adfd8d
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -0,0 +1,102 @@
+/*
+ * intel_pt_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_DECODER_H__
+#define INCLUDE__INTEL_PT_DECODER_H__
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+
+#include "intel-pt-insn-decoder.h"
+
+#define INTEL_PT_IN_TX		(1 << 0)
+#define INTEL_PT_ABORT_TX	(1 << 1)
+#define INTEL_PT_ASYNC		(1 << 2)
+
+enum intel_pt_sample_type {
+	INTEL_PT_BRANCH		= 1 << 0,
+	INTEL_PT_INSTRUCTION	= 1 << 1,
+	INTEL_PT_TRANSACTION	= 1 << 2,
+};
+
+enum intel_pt_period_type {
+	INTEL_PT_PERIOD_NONE,
+	INTEL_PT_PERIOD_INSTRUCTIONS,
+	INTEL_PT_PERIOD_TICKS,
+};
+
+enum {
+	INTEL_PT_ERR_NOMEM = 1,
+	INTEL_PT_ERR_INTERN,
+	INTEL_PT_ERR_BADPKT,
+	INTEL_PT_ERR_NODATA,
+	INTEL_PT_ERR_NOINSN,
+	INTEL_PT_ERR_MISMAT,
+	INTEL_PT_ERR_OVR,
+	INTEL_PT_ERR_LOST,
+	INTEL_PT_ERR_UNK,
+	INTEL_PT_ERR_MAX,
+};
+
+struct intel_pt_state {
+	enum intel_pt_sample_type type;
+	int err;
+	uint64_t from_ip;
+	uint64_t to_ip;
+	uint64_t cr3;
+	uint64_t timestamp;
+	uint64_t est_timestamp;
+	uint64_t trace_nr;
+	uint32_t flags;
+	enum intel_pt_insn_op insn_op;
+	int insn_len;
+};
+
+struct intel_pt_insn;
+
+struct intel_pt_buffer {
+	const unsigned char *buf;
+	size_t len;
+	bool consecutive;
+	uint64_t ref_timestamp;
+	uint64_t trace_nr;
+};
+
+struct intel_pt_params {
+	int (*get_trace)(struct intel_pt_buffer *buffer, void *data);
+	int (*walk_insn)(struct intel_pt_insn *intel_pt_insn,
+			 uint64_t *insn_cnt_ptr, uint64_t *ip, uint64_t to_ip,
+			 uint64_t max_insn_cnt, void *data);
+	void *data;
+	bool return_compression;
+	uint64_t period;
+	enum intel_pt_period_type period_type;
+};
+
+struct intel_pt_decoder;
+
+struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params);
+void intel_pt_decoder_free(struct intel_pt_decoder *decoder);
+
+const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder);
+
+unsigned char *intel_pt_find_overlap(unsigned char *buf_a, size_t len_a,
+				     unsigned char *buf_b, size_t len_b,
+				     bool have_tsc);
+
+int intel_pt__strerror(int code, char *buf, size_t buflen);
+
+#endif
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 6/8] perf tools: Add Intel PT support
  2015-06-26 22:02 [GIT PULL 0/8] perf/pt -> Intel PT/BTS Arnaldo Carvalho de Melo
                   ` (4 preceding siblings ...)
  2015-06-26 22:02 ` [PATCH 5/8] perf tools: Add Intel PT decoder Arnaldo Carvalho de Melo
@ 2015-06-26 22:02 ` Arnaldo Carvalho de Melo
  2015-06-26 22:02 ` [PATCH 7/8] perf tools: Take Intel PT into use Arnaldo Carvalho de Melo
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-26 22:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Adrian Hunter, David Ahern, Namhyung Kim,
	Peter Zijlstra, Stephane Eranian, Jiri Olsa,
	Arnaldo Carvalho de Melo

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for Intel Processor Trace.

Intel PT support fits within the new auxtrace infrastructure.  Recording
is supporting by identifying the Intel PT PMU, parsing options and
setting up events.

Decoding is supported by queuing up trace data by cpu or thread and then
decoding synchronously delivering synthesized event samples into the
session processing for tools to consume.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1432906425-9911-9-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/arch/x86/util/Build      |    2 +
 tools/perf/arch/x86/util/intel-pt.c |  752 ++++++++++++++
 tools/perf/util/Build               |    1 +
 tools/perf/util/intel-pt.c          | 1889 +++++++++++++++++++++++++++++++++++
 tools/perf/util/intel-pt.h          |   51 +
 5 files changed, 2695 insertions(+)
 create mode 100644 tools/perf/arch/x86/util/intel-pt.c
 create mode 100644 tools/perf/util/intel-pt.c
 create mode 100644 tools/perf/util/intel-pt.h

diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index cfbccc4e3187..139608878888 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -6,3 +6,5 @@ libperf-$(CONFIG_DWARF) += dwarf-regs.o
 
 libperf-$(CONFIG_LIBUNWIND)          += unwind-libunwind.o
 libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
+
+libperf-$(CONFIG_AUXTRACE) += intel-pt.o
diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
new file mode 100644
index 000000000000..da7d2c15e611
--- /dev/null
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -0,0 +1,752 @@
+/*
+ * intel_pt.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdbool.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "../../perf.h"
+#include "../../util/session.h"
+#include "../../util/event.h"
+#include "../../util/evlist.h"
+#include "../../util/evsel.h"
+#include "../../util/cpumap.h"
+#include "../../util/parse-options.h"
+#include "../../util/parse-events.h"
+#include "../../util/pmu.h"
+#include "../../util/debug.h"
+#include "../../util/auxtrace.h"
+#include "../../util/tsc.h"
+#include "../../util/intel-pt.h"
+
+#define KiB(x) ((x) * 1024)
+#define MiB(x) ((x) * 1024 * 1024)
+#define KiB_MASK(x) (KiB(x) - 1)
+#define MiB_MASK(x) (MiB(x) - 1)
+
+#define INTEL_PT_DEFAULT_SAMPLE_SIZE	KiB(4)
+
+#define INTEL_PT_MAX_SAMPLE_SIZE	KiB(60)
+
+#define INTEL_PT_PSB_PERIOD_NEAR	256
+
+struct intel_pt_snapshot_ref {
+	void *ref_buf;
+	size_t ref_offset;
+	bool wrapped;
+};
+
+struct intel_pt_recording {
+	struct auxtrace_record		itr;
+	struct perf_pmu			*intel_pt_pmu;
+	int				have_sched_switch;
+	struct perf_evlist		*evlist;
+	bool				snapshot_mode;
+	bool				snapshot_init_done;
+	size_t				snapshot_size;
+	size_t				snapshot_ref_buf_size;
+	int				snapshot_ref_cnt;
+	struct intel_pt_snapshot_ref	*snapshot_refs;
+};
+
+static int intel_pt_parse_terms_with_default(struct list_head *formats,
+					     const char *str,
+					     u64 *config)
+{
+	struct list_head *terms;
+	struct perf_event_attr attr = { .size = 0, };
+	int err;
+
+	terms = malloc(sizeof(struct list_head));
+	if (!terms)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(terms);
+
+	err = parse_events_terms(terms, str);
+	if (err)
+		goto out_free;
+
+	attr.config = *config;
+	err = perf_pmu__config_terms(formats, &attr, terms, true, NULL);
+	if (err)
+		goto out_free;
+
+	*config = attr.config;
+out_free:
+	parse_events__free_terms(terms);
+	return err;
+}
+
+static int intel_pt_parse_terms(struct list_head *formats, const char *str,
+				u64 *config)
+{
+	*config = 0;
+	return intel_pt_parse_terms_with_default(formats, str, config);
+}
+
+static size_t intel_pt_psb_period(struct perf_pmu *intel_pt_pmu __maybe_unused,
+				  struct perf_evlist *evlist __maybe_unused)
+{
+	return 256;
+}
+
+static u64 intel_pt_default_config(struct perf_pmu *intel_pt_pmu)
+{
+	u64 config;
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &config);
+	return config;
+}
+
+static int intel_pt_parse_snapshot_options(struct auxtrace_record *itr,
+					   struct record_opts *opts,
+					   const char *str)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	unsigned long long snapshot_size = 0;
+	char *endptr;
+
+	if (str) {
+		snapshot_size = strtoull(str, &endptr, 0);
+		if (*endptr || snapshot_size > SIZE_MAX)
+			return -1;
+	}
+
+	opts->auxtrace_snapshot_mode = true;
+	opts->auxtrace_snapshot_size = snapshot_size;
+
+	ptr->snapshot_size = snapshot_size;
+
+	return 0;
+}
+
+struct perf_event_attr *
+intel_pt_pmu_default_config(struct perf_pmu *intel_pt_pmu)
+{
+	struct perf_event_attr *attr;
+
+	attr = zalloc(sizeof(struct perf_event_attr));
+	if (!attr)
+		return NULL;
+
+	attr->config = intel_pt_default_config(intel_pt_pmu);
+
+	intel_pt_pmu->selectable = true;
+
+	return attr;
+}
+
+static size_t intel_pt_info_priv_size(struct auxtrace_record *itr __maybe_unused)
+{
+	return INTEL_PT_AUXTRACE_PRIV_SIZE;
+}
+
+static int intel_pt_info_fill(struct auxtrace_record *itr,
+			      struct perf_session *session,
+			      struct auxtrace_info_event *auxtrace_info,
+			      size_t priv_size)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
+	struct perf_event_mmap_page *pc;
+	struct perf_tsc_conversion tc = { .time_mult = 0, };
+	bool cap_user_time_zero = false, per_cpu_mmaps;
+	u64 tsc_bit, noretcomp_bit;
+	int err;
+
+	if (priv_size != INTEL_PT_AUXTRACE_PRIV_SIZE)
+		return -EINVAL;
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
+	intel_pt_parse_terms(&intel_pt_pmu->format, "noretcomp",
+			     &noretcomp_bit);
+
+	if (!session->evlist->nr_mmaps)
+		return -EINVAL;
+
+	pc = session->evlist->mmap[0].base;
+	if (pc) {
+		err = perf_read_tsc_conversion(pc, &tc);
+		if (err) {
+			if (err != -EOPNOTSUPP)
+				return err;
+		} else {
+			cap_user_time_zero = tc.time_mult != 0;
+		}
+		if (!cap_user_time_zero)
+			ui__warning("Intel Processor Trace: TSC not available\n");
+	}
+
+	per_cpu_mmaps = !cpu_map__empty(session->evlist->cpus);
+
+	auxtrace_info->type = PERF_AUXTRACE_INTEL_PT;
+	auxtrace_info->priv[INTEL_PT_PMU_TYPE] = intel_pt_pmu->type;
+	auxtrace_info->priv[INTEL_PT_TIME_SHIFT] = tc.time_shift;
+	auxtrace_info->priv[INTEL_PT_TIME_MULT] = tc.time_mult;
+	auxtrace_info->priv[INTEL_PT_TIME_ZERO] = tc.time_zero;
+	auxtrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO] = cap_user_time_zero;
+	auxtrace_info->priv[INTEL_PT_TSC_BIT] = tsc_bit;
+	auxtrace_info->priv[INTEL_PT_NORETCOMP_BIT] = noretcomp_bit;
+	auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH] = ptr->have_sched_switch;
+	auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE] = ptr->snapshot_mode;
+	auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS] = per_cpu_mmaps;
+
+	return 0;
+}
+
+static int intel_pt_track_switches(struct perf_evlist *evlist)
+{
+	const char *sched_switch = "sched:sched_switch";
+	struct perf_evsel *evsel;
+	int err;
+
+	if (!perf_evlist__can_select_event(evlist, sched_switch))
+		return -EPERM;
+
+	err = parse_events(evlist, sched_switch, NULL);
+	if (err) {
+		pr_debug2("%s: failed to parse %s, error %d\n",
+			  __func__, sched_switch, err);
+		return err;
+	}
+
+	evsel = perf_evlist__last(evlist);
+
+	perf_evsel__set_sample_bit(evsel, CPU);
+	perf_evsel__set_sample_bit(evsel, TIME);
+
+	evsel->system_wide = true;
+	evsel->no_aux_samples = true;
+	evsel->immediate = true;
+
+	return 0;
+}
+
+static int intel_pt_recording_options(struct auxtrace_record *itr,
+				      struct perf_evlist *evlist,
+				      struct record_opts *opts)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
+	bool have_timing_info;
+	struct perf_evsel *evsel, *intel_pt_evsel = NULL;
+	const struct cpu_map *cpus = evlist->cpus;
+	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
+	u64 tsc_bit;
+
+	ptr->evlist = evlist;
+	ptr->snapshot_mode = opts->auxtrace_snapshot_mode;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == intel_pt_pmu->type) {
+			if (intel_pt_evsel) {
+				pr_err("There may be only one " INTEL_PT_PMU_NAME " event\n");
+				return -EINVAL;
+			}
+			evsel->attr.freq = 0;
+			evsel->attr.sample_period = 1;
+			intel_pt_evsel = evsel;
+			opts->full_auxtrace = true;
+		}
+	}
+
+	if (opts->auxtrace_snapshot_mode && !opts->full_auxtrace) {
+		pr_err("Snapshot mode (-S option) requires " INTEL_PT_PMU_NAME " PMU event (-e " INTEL_PT_PMU_NAME ")\n");
+		return -EINVAL;
+	}
+
+	if (opts->use_clockid) {
+		pr_err("Cannot use clockid (-k option) with " INTEL_PT_PMU_NAME "\n");
+		return -EINVAL;
+	}
+
+	if (!opts->full_auxtrace)
+		return 0;
+
+	/* Set default sizes for snapshot mode */
+	if (opts->auxtrace_snapshot_mode) {
+		size_t psb_period = intel_pt_psb_period(intel_pt_pmu, evlist);
+
+		if (!opts->auxtrace_snapshot_size && !opts->auxtrace_mmap_pages) {
+			if (privileged) {
+				opts->auxtrace_mmap_pages = MiB(4) / page_size;
+			} else {
+				opts->auxtrace_mmap_pages = KiB(128) / page_size;
+				if (opts->mmap_pages == UINT_MAX)
+					opts->mmap_pages = KiB(256) / page_size;
+			}
+		} else if (!opts->auxtrace_mmap_pages && !privileged &&
+			   opts->mmap_pages == UINT_MAX) {
+			opts->mmap_pages = KiB(256) / page_size;
+		}
+		if (!opts->auxtrace_snapshot_size)
+			opts->auxtrace_snapshot_size =
+				opts->auxtrace_mmap_pages * (size_t)page_size;
+		if (!opts->auxtrace_mmap_pages) {
+			size_t sz = opts->auxtrace_snapshot_size;
+
+			sz = round_up(sz, page_size) / page_size;
+			opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
+		}
+		if (opts->auxtrace_snapshot_size >
+				opts->auxtrace_mmap_pages * (size_t)page_size) {
+			pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
+			       opts->auxtrace_snapshot_size,
+			       opts->auxtrace_mmap_pages * (size_t)page_size);
+			return -EINVAL;
+		}
+		if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages) {
+			pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
+			return -EINVAL;
+		}
+		pr_debug2("Intel PT snapshot size: %zu\n",
+			  opts->auxtrace_snapshot_size);
+		if (psb_period &&
+		    opts->auxtrace_snapshot_size <= psb_period +
+						  INTEL_PT_PSB_PERIOD_NEAR)
+			ui__warning("Intel PT snapshot size (%zu) may be too small for PSB period (%zu)\n",
+				    opts->auxtrace_snapshot_size, psb_period);
+	}
+
+	/* Set default sizes for full trace mode */
+	if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) {
+		if (privileged) {
+			opts->auxtrace_mmap_pages = MiB(4) / page_size;
+		} else {
+			opts->auxtrace_mmap_pages = KiB(128) / page_size;
+			if (opts->mmap_pages == UINT_MAX)
+				opts->mmap_pages = KiB(256) / page_size;
+		}
+	}
+
+	/* Validate auxtrace_mmap_pages */
+	if (opts->auxtrace_mmap_pages) {
+		size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
+		size_t min_sz;
+
+		if (opts->auxtrace_snapshot_mode)
+			min_sz = KiB(4);
+		else
+			min_sz = KiB(8);
+
+		if (sz < min_sz || !is_power_of_2(sz)) {
+			pr_err("Invalid mmap size for Intel Processor Trace: must be at least %zuKiB and a power of 2\n",
+			       min_sz / 1024);
+			return -EINVAL;
+		}
+	}
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
+
+	if (opts->full_auxtrace && (intel_pt_evsel->attr.config & tsc_bit))
+		have_timing_info = true;
+	else
+		have_timing_info = false;
+
+	/*
+	 * Per-cpu recording needs sched_switch events to distinguish different
+	 * threads.
+	 */
+	if (have_timing_info && !cpu_map__empty(cpus)) {
+		int err;
+
+		err = intel_pt_track_switches(evlist);
+		if (err == -EPERM)
+			pr_debug2("Unable to select sched:sched_switch\n");
+		else if (err)
+			return err;
+		else
+			ptr->have_sched_switch = 1;
+	}
+
+	if (intel_pt_evsel) {
+		/*
+		 * To obtain the auxtrace buffer file descriptor, the auxtrace
+		 * event must come first.
+		 */
+		perf_evlist__to_front(evlist, intel_pt_evsel);
+		/*
+		 * In the case of per-cpu mmaps, we need the CPU on the
+		 * AUX event.
+		 */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(intel_pt_evsel, CPU);
+	}
+
+	/* Add dummy event to keep tracking */
+	if (opts->full_auxtrace) {
+		struct perf_evsel *tracking_evsel;
+		int err;
+
+		err = parse_events(evlist, "dummy:u", NULL);
+		if (err)
+			return err;
+
+		tracking_evsel = perf_evlist__last(evlist);
+
+		perf_evlist__set_tracking_event(evlist, tracking_evsel);
+
+		tracking_evsel->attr.freq = 0;
+		tracking_evsel->attr.sample_period = 1;
+
+		/* In per-cpu case, always need the time of mmap events etc */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(tracking_evsel, TIME);
+	}
+
+	/*
+	 * Warn the user when we do not have enough information to decode i.e.
+	 * per-cpu with no sched_switch (except workload-only).
+	 */
+	if (!ptr->have_sched_switch && !cpu_map__empty(cpus) &&
+	    !target__none(&opts->target))
+		ui__warning("Intel Processor Trace decoding will not be possible except for kernel tracing!\n");
+
+	return 0;
+}
+
+static int intel_pt_snapshot_start(struct auxtrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(ptr->evlist, evsel) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__disable_event(ptr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_pt_snapshot_finish(struct auxtrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(ptr->evlist, evsel) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__enable_event(ptr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_pt_alloc_snapshot_refs(struct intel_pt_recording *ptr, int idx)
+{
+	const size_t sz = sizeof(struct intel_pt_snapshot_ref);
+	int cnt = ptr->snapshot_ref_cnt, new_cnt = cnt * 2;
+	struct intel_pt_snapshot_ref *refs;
+
+	if (!new_cnt)
+		new_cnt = 16;
+
+	while (new_cnt <= idx)
+		new_cnt *= 2;
+
+	refs = calloc(new_cnt, sz);
+	if (!refs)
+		return -ENOMEM;
+
+	memcpy(refs, ptr->snapshot_refs, cnt * sz);
+
+	ptr->snapshot_refs = refs;
+	ptr->snapshot_ref_cnt = new_cnt;
+
+	return 0;
+}
+
+static void intel_pt_free_snapshot_refs(struct intel_pt_recording *ptr)
+{
+	int i;
+
+	for (i = 0; i < ptr->snapshot_ref_cnt; i++)
+		zfree(&ptr->snapshot_refs[i].ref_buf);
+	zfree(&ptr->snapshot_refs);
+}
+
+static void intel_pt_recording_free(struct auxtrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+
+	intel_pt_free_snapshot_refs(ptr);
+	free(ptr);
+}
+
+static int intel_pt_alloc_snapshot_ref(struct intel_pt_recording *ptr, int idx,
+				       size_t snapshot_buf_size)
+{
+	size_t ref_buf_size = ptr->snapshot_ref_buf_size;
+	void *ref_buf;
+
+	ref_buf = zalloc(ref_buf_size);
+	if (!ref_buf)
+		return -ENOMEM;
+
+	ptr->snapshot_refs[idx].ref_buf = ref_buf;
+	ptr->snapshot_refs[idx].ref_offset = snapshot_buf_size - ref_buf_size;
+
+	return 0;
+}
+
+static size_t intel_pt_snapshot_ref_buf_size(struct intel_pt_recording *ptr,
+					     size_t snapshot_buf_size)
+{
+	const size_t max_size = 256 * 1024;
+	size_t buf_size = 0, psb_period;
+
+	if (ptr->snapshot_size <= 64 * 1024)
+		return 0;
+
+	psb_period = intel_pt_psb_period(ptr->intel_pt_pmu, ptr->evlist);
+	if (psb_period)
+		buf_size = psb_period * 2;
+
+	if (!buf_size || buf_size > max_size)
+		buf_size = max_size;
+
+	if (buf_size >= snapshot_buf_size)
+		return 0;
+
+	if (buf_size >= ptr->snapshot_size / 2)
+		return 0;
+
+	return buf_size;
+}
+
+static int intel_pt_snapshot_init(struct intel_pt_recording *ptr,
+				  size_t snapshot_buf_size)
+{
+	if (ptr->snapshot_init_done)
+		return 0;
+
+	ptr->snapshot_init_done = true;
+
+	ptr->snapshot_ref_buf_size = intel_pt_snapshot_ref_buf_size(ptr,
+							snapshot_buf_size);
+
+	return 0;
+}
+
+/**
+ * intel_pt_compare_buffers - compare bytes in a buffer to a circular buffer.
+ * @buf1: first buffer
+ * @compare_size: number of bytes to compare
+ * @buf2: second buffer (a circular buffer)
+ * @offs2: offset in second buffer
+ * @buf2_size: size of second buffer
+ *
+ * The comparison allows for the possibility that the bytes to compare in the
+ * circular buffer are not contiguous.  It is assumed that @compare_size <=
+ * @buf2_size.  This function returns %false if the bytes are identical, %true
+ * otherwise.
+ */
+static bool intel_pt_compare_buffers(void *buf1, size_t compare_size,
+				     void *buf2, size_t offs2, size_t buf2_size)
+{
+	size_t end2 = offs2 + compare_size, part_size;
+
+	if (end2 <= buf2_size)
+		return memcmp(buf1, buf2 + offs2, compare_size);
+
+	part_size = end2 - buf2_size;
+	if (memcmp(buf1, buf2 + offs2, part_size))
+		return true;
+
+	compare_size -= part_size;
+
+	return memcmp(buf1 + part_size, buf2, compare_size);
+}
+
+static bool intel_pt_compare_ref(void *ref_buf, size_t ref_offset,
+				 size_t ref_size, size_t buf_size,
+				 void *data, size_t head)
+{
+	size_t ref_end = ref_offset + ref_size;
+
+	if (ref_end > buf_size) {
+		if (head > ref_offset || head < ref_end - buf_size)
+			return true;
+	} else if (head > ref_offset && head < ref_end) {
+		return true;
+	}
+
+	return intel_pt_compare_buffers(ref_buf, ref_size, data, ref_offset,
+					buf_size);
+}
+
+static void intel_pt_copy_ref(void *ref_buf, size_t ref_size, size_t buf_size,
+			      void *data, size_t head)
+{
+	if (head >= ref_size) {
+		memcpy(ref_buf, data + head - ref_size, ref_size);
+	} else {
+		memcpy(ref_buf, data, head);
+		ref_size -= head;
+		memcpy(ref_buf + head, data + buf_size - ref_size, ref_size);
+	}
+}
+
+static bool intel_pt_wrapped(struct intel_pt_recording *ptr, int idx,
+			     struct auxtrace_mmap *mm, unsigned char *data,
+			     u64 head)
+{
+	struct intel_pt_snapshot_ref *ref = &ptr->snapshot_refs[idx];
+	bool wrapped;
+
+	wrapped = intel_pt_compare_ref(ref->ref_buf, ref->ref_offset,
+				       ptr->snapshot_ref_buf_size, mm->len,
+				       data, head);
+
+	intel_pt_copy_ref(ref->ref_buf, ptr->snapshot_ref_buf_size, mm->len,
+			  data, head);
+
+	return wrapped;
+}
+
+static bool intel_pt_first_wrap(u64 *data, size_t buf_size)
+{
+	int i, a, b;
+
+	b = buf_size >> 3;
+	a = b - 512;
+	if (a < 0)
+		a = 0;
+
+	for (i = a; i < b; i++) {
+		if (data[i])
+			return true;
+	}
+
+	return false;
+}
+
+static int intel_pt_find_snapshot(struct auxtrace_record *itr, int idx,
+				  struct auxtrace_mmap *mm, unsigned char *data,
+				  u64 *head, u64 *old)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	bool wrapped;
+	int err;
+
+	pr_debug3("%s: mmap index %d old head %zu new head %zu\n",
+		  __func__, idx, (size_t)*old, (size_t)*head);
+
+	err = intel_pt_snapshot_init(ptr, mm->len);
+	if (err)
+		goto out_err;
+
+	if (idx >= ptr->snapshot_ref_cnt) {
+		err = intel_pt_alloc_snapshot_refs(ptr, idx);
+		if (err)
+			goto out_err;
+	}
+
+	if (ptr->snapshot_ref_buf_size) {
+		if (!ptr->snapshot_refs[idx].ref_buf) {
+			err = intel_pt_alloc_snapshot_ref(ptr, idx, mm->len);
+			if (err)
+				goto out_err;
+		}
+		wrapped = intel_pt_wrapped(ptr, idx, mm, data, *head);
+	} else {
+		wrapped = ptr->snapshot_refs[idx].wrapped;
+		if (!wrapped && intel_pt_first_wrap((u64 *)data, mm->len)) {
+			ptr->snapshot_refs[idx].wrapped = true;
+			wrapped = true;
+		}
+	}
+
+	/*
+	 * In full trace mode 'head' continually increases.  However in snapshot
+	 * mode 'head' is an offset within the buffer.  Here 'old' and 'head'
+	 * are adjusted to match the full trace case which expects that 'old' is
+	 * always less than 'head'.
+	 */
+	if (wrapped) {
+		*old = *head;
+		*head += mm->len;
+	} else {
+		if (mm->mask)
+			*old &= mm->mask;
+		else
+			*old %= mm->len;
+		if (*old > *head)
+			*head += mm->len;
+	}
+
+	pr_debug3("%s: wrap-around %sdetected, adjusted old head %zu adjusted new head %zu\n",
+		  __func__, wrapped ? "" : "not ", (size_t)*old, (size_t)*head);
+
+	return 0;
+
+out_err:
+	pr_err("%s: failed, error %d\n", __func__, err);
+	return err;
+}
+
+static u64 intel_pt_reference(struct auxtrace_record *itr __maybe_unused)
+{
+	return rdtsc();
+}
+
+static int intel_pt_read_finish(struct auxtrace_record *itr, int idx)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(ptr->evlist, evsel) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__enable_event_idx(ptr->evlist, evsel,
+							     idx);
+	}
+	return -EINVAL;
+}
+
+struct auxtrace_record *intel_pt_recording_init(int *err)
+{
+	struct perf_pmu *intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
+	struct intel_pt_recording *ptr;
+
+	if (!intel_pt_pmu)
+		return NULL;
+
+	ptr = zalloc(sizeof(struct intel_pt_recording));
+	if (!ptr) {
+		*err = -ENOMEM;
+		return NULL;
+	}
+
+	ptr->intel_pt_pmu = intel_pt_pmu;
+	ptr->itr.recording_options = intel_pt_recording_options;
+	ptr->itr.info_priv_size = intel_pt_info_priv_size;
+	ptr->itr.info_fill = intel_pt_info_fill;
+	ptr->itr.free = intel_pt_recording_free;
+	ptr->itr.snapshot_start = intel_pt_snapshot_start;
+	ptr->itr.snapshot_finish = intel_pt_snapshot_finish;
+	ptr->itr.find_snapshot = intel_pt_find_snapshot;
+	ptr->itr.parse_snapshot_options = intel_pt_parse_snapshot_options;
+	ptr->itr.reference = intel_pt_reference;
+	ptr->itr.read_finish = intel_pt_read_finish;
+	return &ptr->itr;
+}
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index e1b505a8055b..296de661933f 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -77,6 +77,7 @@ libperf-y += cloexec.o
 libperf-y += thread-stack.o
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
+libperf-$(CONFIG_AUXTRACE) += intel-pt.o
 libperf-y += parse-branch-options.o
 
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
new file mode 100644
index 000000000000..6d66879e1e33
--- /dev/null
+++ b/tools/perf/util/intel-pt.c
@@ -0,0 +1,1889 @@
+/*
+ * intel_pt.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+
+#include "../perf.h"
+#include "session.h"
+#include "machine.h"
+#include "tool.h"
+#include "event.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "map.h"
+#include "color.h"
+#include "util.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "symbol.h"
+#include "callchain.h"
+#include "dso.h"
+#include "debug.h"
+#include "auxtrace.h"
+#include "tsc.h"
+#include "intel-pt.h"
+
+#include "intel-pt-decoder/intel-pt-log.h"
+#include "intel-pt-decoder/intel-pt-decoder.h"
+#include "intel-pt-decoder/intel-pt-insn-decoder.h"
+#include "intel-pt-decoder/intel-pt-pkt-decoder.h"
+
+#define MAX_TIMESTAMP (~0ULL)
+
+struct intel_pt {
+	struct auxtrace auxtrace;
+	struct auxtrace_queues queues;
+	struct auxtrace_heap heap;
+	u32 auxtrace_type;
+	struct perf_session *session;
+	struct machine *machine;
+	struct perf_evsel *switch_evsel;
+	struct thread *unknown_thread;
+	bool timeless_decoding;
+	bool sampling_mode;
+	bool snapshot_mode;
+	bool per_cpu_mmaps;
+	bool have_tsc;
+	bool data_queued;
+	bool est_tsc;
+	bool sync_switch;
+	bool est_tsc_orig;
+	int have_sched_switch;
+	u32 pmu_type;
+	u64 kernel_start;
+	u64 switch_ip;
+	u64 ptss_ip;
+
+	struct perf_tsc_conversion tc;
+	bool cap_user_time_zero;
+
+	struct itrace_synth_opts synth_opts;
+
+	bool sample_instructions;
+	u64 instructions_sample_type;
+	u64 instructions_sample_period;
+	u64 instructions_id;
+
+	bool sample_branches;
+	u32 branches_filter;
+	u64 branches_sample_type;
+	u64 branches_id;
+
+	bool sample_transactions;
+	u64 transactions_sample_type;
+	u64 transactions_id;
+
+	bool synth_needs_swap;
+
+	u64 tsc_bit;
+	u64 noretcomp_bit;
+};
+
+enum switch_state {
+	INTEL_PT_SS_NOT_TRACING,
+	INTEL_PT_SS_UNKNOWN,
+	INTEL_PT_SS_TRACING,
+	INTEL_PT_SS_EXPECTING_SWITCH_EVENT,
+	INTEL_PT_SS_EXPECTING_SWITCH_IP,
+};
+
+struct intel_pt_queue {
+	struct intel_pt *pt;
+	unsigned int queue_nr;
+	struct auxtrace_buffer *buffer;
+	void *decoder;
+	const struct intel_pt_state *state;
+	struct ip_callchain *chain;
+	union perf_event *event_buf;
+	bool on_heap;
+	bool stop;
+	bool step_through_buffers;
+	bool use_buffer_pid_tid;
+	pid_t pid, tid;
+	int cpu;
+	int switch_state;
+	pid_t next_tid;
+	struct thread *thread;
+	bool exclude_kernel;
+	bool have_sample;
+	u64 time;
+	u64 timestamp;
+	u32 flags;
+	u16 insn_len;
+};
+
+static void intel_pt_dump(struct intel_pt *pt __maybe_unused,
+			  unsigned char *buf, size_t len)
+{
+	struct intel_pt_pkt packet;
+	size_t pos = 0;
+	int ret, pkt_len, i;
+	char desc[INTEL_PT_PKT_DESC_MAX];
+	const char *color = PERF_COLOR_BLUE;
+
+	color_fprintf(stdout, color,
+		      ". ... Intel Processor Trace data: size %zu bytes\n",
+		      len);
+
+	while (len) {
+		ret = intel_pt_get_packet(buf, len, &packet);
+		if (ret > 0)
+			pkt_len = ret;
+		else
+			pkt_len = 1;
+		printf(".");
+		color_fprintf(stdout, color, "  %08x: ", pos);
+		for (i = 0; i < pkt_len; i++)
+			color_fprintf(stdout, color, " %02x", buf[i]);
+		for (; i < 16; i++)
+			color_fprintf(stdout, color, "   ");
+		if (ret > 0) {
+			ret = intel_pt_pkt_desc(&packet, desc,
+						INTEL_PT_PKT_DESC_MAX);
+			if (ret > 0)
+				color_fprintf(stdout, color, " %s\n", desc);
+		} else {
+			color_fprintf(stdout, color, " Bad packet!\n");
+		}
+		pos += pkt_len;
+		buf += pkt_len;
+		len -= pkt_len;
+	}
+}
+
+static void intel_pt_dump_event(struct intel_pt *pt, unsigned char *buf,
+				size_t len)
+{
+	printf(".\n");
+	intel_pt_dump(pt, buf, len);
+}
+
+static int intel_pt_do_fix_overlap(struct intel_pt *pt, struct auxtrace_buffer *a,
+				   struct auxtrace_buffer *b)
+{
+	void *start;
+
+	start = intel_pt_find_overlap(a->data, a->size, b->data, b->size,
+				      pt->have_tsc);
+	if (!start)
+		return -EINVAL;
+	b->use_size = b->data + b->size - start;
+	b->use_data = start;
+	return 0;
+}
+
+static void intel_pt_use_buffer_pid_tid(struct intel_pt_queue *ptq,
+					struct auxtrace_queue *queue,
+					struct auxtrace_buffer *buffer)
+{
+	if (queue->cpu == -1 && buffer->cpu != -1)
+		ptq->cpu = buffer->cpu;
+
+	ptq->pid = buffer->pid;
+	ptq->tid = buffer->tid;
+
+	intel_pt_log("queue %u cpu %d pid %d tid %d\n",
+		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+
+	ptq->thread = NULL;
+
+	if (ptq->tid != -1) {
+		if (ptq->pid != -1)
+			ptq->thread = machine__findnew_thread(ptq->pt->machine,
+							      ptq->pid,
+							      ptq->tid);
+		else
+			ptq->thread = machine__find_thread(ptq->pt->machine, -1,
+							   ptq->tid);
+	}
+}
+
+/* This function assumes data is processed sequentially only */
+static int intel_pt_get_trace(struct intel_pt_buffer *b, void *data)
+{
+	struct intel_pt_queue *ptq = data;
+	struct auxtrace_buffer *buffer = ptq->buffer, *old_buffer = buffer;
+	struct auxtrace_queue *queue;
+
+	if (ptq->stop) {
+		b->len = 0;
+		return 0;
+	}
+
+	queue = &ptq->pt->queues.queue_array[ptq->queue_nr];
+
+	buffer = auxtrace_buffer__next(queue, buffer);
+	if (!buffer) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		b->len = 0;
+		return 0;
+	}
+
+	ptq->buffer = buffer;
+
+	if (!buffer->data) {
+		int fd = perf_data_file__fd(ptq->pt->session->file);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data)
+			return -ENOMEM;
+	}
+
+	if (ptq->pt->snapshot_mode && !buffer->consecutive && old_buffer &&
+	    intel_pt_do_fix_overlap(ptq->pt, old_buffer, buffer))
+		return -ENOMEM;
+
+	if (old_buffer)
+		auxtrace_buffer__drop_data(old_buffer);
+
+	if (buffer->use_data) {
+		b->len = buffer->use_size;
+		b->buf = buffer->use_data;
+	} else {
+		b->len = buffer->size;
+		b->buf = buffer->data;
+	}
+	b->ref_timestamp = buffer->reference;
+
+	if (!old_buffer || ptq->pt->sampling_mode || (ptq->pt->snapshot_mode &&
+						      !buffer->consecutive)) {
+		b->consecutive = false;
+		b->trace_nr = buffer->buffer_nr;
+	} else {
+		b->consecutive = true;
+	}
+
+	if (ptq->use_buffer_pid_tid && (ptq->pid != buffer->pid ||
+					ptq->tid != buffer->tid))
+		intel_pt_use_buffer_pid_tid(ptq, queue, buffer);
+
+	if (ptq->step_through_buffers)
+		ptq->stop = true;
+
+	if (!b->len)
+		return intel_pt_get_trace(b, data);
+
+	return 0;
+}
+
+struct intel_pt_cache_entry {
+	struct auxtrace_cache_entry	entry;
+	u64				insn_cnt;
+	u64				byte_cnt;
+	enum intel_pt_insn_op		op;
+	enum intel_pt_insn_branch	branch;
+	int				length;
+	int32_t				rel;
+};
+
+static int intel_pt_config_div(const char *var, const char *value, void *data)
+{
+	int *d = data;
+	long val;
+
+	if (!strcmp(var, "intel-pt.cache-divisor")) {
+		val = strtol(value, NULL, 0);
+		if (val > 0 && val <= INT_MAX)
+			*d = val;
+	}
+
+	return 0;
+}
+
+static int intel_pt_cache_divisor(void)
+{
+	static int d;
+
+	if (d)
+		return d;
+
+	perf_config(intel_pt_config_div, &d);
+
+	if (!d)
+		d = 64;
+
+	return d;
+}
+
+static unsigned int intel_pt_cache_size(struct dso *dso,
+					struct machine *machine)
+{
+	off_t size;
+
+	size = dso__data_size(dso, machine);
+	size /= intel_pt_cache_divisor();
+	if (size < 1000)
+		return 10;
+	if (size > (1 << 21))
+		return 21;
+	return 32 - __builtin_clz(size);
+}
+
+static struct auxtrace_cache *intel_pt_cache(struct dso *dso,
+					     struct machine *machine)
+{
+	struct auxtrace_cache *c;
+	unsigned int bits;
+
+	if (dso->auxtrace_cache)
+		return dso->auxtrace_cache;
+
+	bits = intel_pt_cache_size(dso, machine);
+
+	/* Ignoring cache creation failure */
+	c = auxtrace_cache__new(bits, sizeof(struct intel_pt_cache_entry), 200);
+
+	dso->auxtrace_cache = c;
+
+	return c;
+}
+
+static int intel_pt_cache_add(struct dso *dso, struct machine *machine,
+			      u64 offset, u64 insn_cnt, u64 byte_cnt,
+			      struct intel_pt_insn *intel_pt_insn)
+{
+	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
+	struct intel_pt_cache_entry *e;
+	int err;
+
+	if (!c)
+		return -ENOMEM;
+
+	e = auxtrace_cache__alloc_entry(c);
+	if (!e)
+		return -ENOMEM;
+
+	e->insn_cnt = insn_cnt;
+	e->byte_cnt = byte_cnt;
+	e->op = intel_pt_insn->op;
+	e->branch = intel_pt_insn->branch;
+	e->length = intel_pt_insn->length;
+	e->rel = intel_pt_insn->rel;
+
+	err = auxtrace_cache__add(c, offset, &e->entry);
+	if (err)
+		auxtrace_cache__free_entry(c, e);
+
+	return err;
+}
+
+static struct intel_pt_cache_entry *
+intel_pt_cache_lookup(struct dso *dso, struct machine *machine, u64 offset)
+{
+	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
+
+	if (!c)
+		return NULL;
+
+	return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
+}
+
+static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
+				   uint64_t *insn_cnt_ptr, uint64_t *ip,
+				   uint64_t to_ip, uint64_t max_insn_cnt,
+				   void *data)
+{
+	struct intel_pt_queue *ptq = data;
+	struct machine *machine = ptq->pt->machine;
+	struct thread *thread;
+	struct addr_location al;
+	unsigned char buf[1024];
+	size_t bufsz;
+	ssize_t len;
+	int x86_64;
+	u8 cpumode;
+	u64 offset, start_offset, start_ip;
+	u64 insn_cnt = 0;
+	bool one_map = true;
+
+	if (to_ip && *ip == to_ip)
+		goto out_no_cache;
+
+	bufsz = intel_pt_insn_max_size();
+
+	if (*ip >= ptq->pt->kernel_start)
+		cpumode = PERF_RECORD_MISC_KERNEL;
+	else
+		cpumode = PERF_RECORD_MISC_USER;
+
+	thread = ptq->thread;
+	if (!thread) {
+		if (cpumode != PERF_RECORD_MISC_KERNEL)
+			return -EINVAL;
+		thread = ptq->pt->unknown_thread;
+	}
+
+	while (1) {
+		thread__find_addr_map(thread, cpumode, MAP__FUNCTION, *ip, &al);
+		if (!al.map || !al.map->dso)
+			return -EINVAL;
+
+		if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR &&
+		    dso__data_status_seen(al.map->dso,
+					  DSO_DATA_STATUS_SEEN_ITRACE))
+			return -ENOENT;
+
+		offset = al.map->map_ip(al.map, *ip);
+
+		if (!to_ip && one_map) {
+			struct intel_pt_cache_entry *e;
+
+			e = intel_pt_cache_lookup(al.map->dso, machine, offset);
+			if (e &&
+			    (!max_insn_cnt || e->insn_cnt <= max_insn_cnt)) {
+				*insn_cnt_ptr = e->insn_cnt;
+				*ip += e->byte_cnt;
+				intel_pt_insn->op = e->op;
+				intel_pt_insn->branch = e->branch;
+				intel_pt_insn->length = e->length;
+				intel_pt_insn->rel = e->rel;
+				intel_pt_log_insn_no_data(intel_pt_insn, *ip);
+				return 0;
+			}
+		}
+
+		start_offset = offset;
+		start_ip = *ip;
+
+		/* Load maps to ensure dso->is_64_bit has been updated */
+		map__load(al.map, machine->symbol_filter);
+
+		x86_64 = al.map->dso->is_64_bit;
+
+		while (1) {
+			len = dso__data_read_offset(al.map->dso, machine,
+						    offset, buf, bufsz);
+			if (len <= 0)
+				return -EINVAL;
+
+			if (intel_pt_get_insn(buf, len, x86_64, intel_pt_insn))
+				return -EINVAL;
+
+			intel_pt_log_insn(intel_pt_insn, *ip);
+
+			insn_cnt += 1;
+
+			if (intel_pt_insn->branch != INTEL_PT_BR_NO_BRANCH)
+				goto out;
+
+			if (max_insn_cnt && insn_cnt >= max_insn_cnt)
+				goto out_no_cache;
+
+			*ip += intel_pt_insn->length;
+
+			if (to_ip && *ip == to_ip)
+				goto out_no_cache;
+
+			if (*ip >= al.map->end)
+				break;
+
+			offset += intel_pt_insn->length;
+		}
+		one_map = false;
+	}
+out:
+	*insn_cnt_ptr = insn_cnt;
+
+	if (!one_map)
+		goto out_no_cache;
+
+	/*
+	 * Didn't lookup in the 'to_ip' case, so do it now to prevent duplicate
+	 * entries.
+	 */
+	if (to_ip) {
+		struct intel_pt_cache_entry *e;
+
+		e = intel_pt_cache_lookup(al.map->dso, machine, start_offset);
+		if (e)
+			return 0;
+	}
+
+	/* Ignore cache errors */
+	intel_pt_cache_add(al.map->dso, machine, start_offset, insn_cnt,
+			   *ip - start_ip, intel_pt_insn);
+
+	return 0;
+
+out_no_cache:
+	*insn_cnt_ptr = insn_cnt;
+	return 0;
+}
+
+static bool intel_pt_get_config(struct intel_pt *pt,
+				struct perf_event_attr *attr, u64 *config)
+{
+	if (attr->type == pt->pmu_type) {
+		if (config)
+			*config = attr->config;
+		return true;
+	}
+
+	return false;
+}
+
+static bool intel_pt_exclude_kernel(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, NULL) &&
+		    !evsel->attr.exclude_kernel)
+			return false;
+	}
+	return true;
+}
+
+static bool intel_pt_return_compression(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	u64 config;
+
+	if (!pt->noretcomp_bit)
+		return true;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, &config) &&
+		    (config & pt->noretcomp_bit))
+			return false;
+	}
+	return true;
+}
+
+static bool intel_pt_timeless_decoding(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	bool timeless_decoding = true;
+	u64 config;
+
+	if (!pt->tsc_bit || !pt->cap_user_time_zero)
+		return true;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (!(evsel->attr.sample_type & PERF_SAMPLE_TIME))
+			return true;
+		if (intel_pt_get_config(pt, &evsel->attr, &config)) {
+			if (config & pt->tsc_bit)
+				timeless_decoding = false;
+			else
+				return true;
+		}
+	}
+	return timeless_decoding;
+}
+
+static bool intel_pt_tracing_kernel(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, NULL) &&
+		    !evsel->attr.exclude_kernel)
+			return true;
+	}
+	return false;
+}
+
+static bool intel_pt_have_tsc(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	bool have_tsc = false;
+	u64 config;
+
+	if (!pt->tsc_bit)
+		return false;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, &config)) {
+			if (config & pt->tsc_bit)
+				have_tsc = true;
+			else
+				return false;
+		}
+	}
+	return have_tsc;
+}
+
+static u64 intel_pt_ns_to_ticks(const struct intel_pt *pt, u64 ns)
+{
+	u64 quot, rem;
+
+	quot = ns / pt->tc.time_mult;
+	rem  = ns % pt->tc.time_mult;
+	return (quot << pt->tc.time_shift) + (rem << pt->tc.time_shift) /
+		pt->tc.time_mult;
+}
+
+static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
+						   unsigned int queue_nr)
+{
+	struct intel_pt_params params = { .get_trace = 0, };
+	struct intel_pt_queue *ptq;
+
+	ptq = zalloc(sizeof(struct intel_pt_queue));
+	if (!ptq)
+		return NULL;
+
+	if (pt->synth_opts.callchain) {
+		size_t sz = sizeof(struct ip_callchain);
+
+		sz += pt->synth_opts.callchain_sz * sizeof(u64);
+		ptq->chain = zalloc(sz);
+		if (!ptq->chain)
+			goto out_free;
+	}
+
+	ptq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
+	if (!ptq->event_buf)
+		goto out_free;
+
+	ptq->pt = pt;
+	ptq->queue_nr = queue_nr;
+	ptq->exclude_kernel = intel_pt_exclude_kernel(pt);
+	ptq->pid = -1;
+	ptq->tid = -1;
+	ptq->cpu = -1;
+	ptq->next_tid = -1;
+
+	params.get_trace = intel_pt_get_trace;
+	params.walk_insn = intel_pt_walk_next_insn;
+	params.data = ptq;
+	params.return_compression = intel_pt_return_compression(pt);
+
+	if (pt->synth_opts.instructions) {
+		if (pt->synth_opts.period) {
+			switch (pt->synth_opts.period_type) {
+			case PERF_ITRACE_PERIOD_INSTRUCTIONS:
+				params.period_type =
+						INTEL_PT_PERIOD_INSTRUCTIONS;
+				params.period = pt->synth_opts.period;
+				break;
+			case PERF_ITRACE_PERIOD_TICKS:
+				params.period_type = INTEL_PT_PERIOD_TICKS;
+				params.period = pt->synth_opts.period;
+				break;
+			case PERF_ITRACE_PERIOD_NANOSECS:
+				params.period_type = INTEL_PT_PERIOD_TICKS;
+				params.period = intel_pt_ns_to_ticks(pt,
+							pt->synth_opts.period);
+				break;
+			default:
+				break;
+			}
+		}
+
+		if (!params.period) {
+			params.period_type = INTEL_PT_PERIOD_INSTRUCTIONS;
+			params.period = 1000;
+		}
+	}
+
+	ptq->decoder = intel_pt_decoder_new(&params);
+	if (!ptq->decoder)
+		goto out_free;
+
+	return ptq;
+
+out_free:
+	zfree(&ptq->event_buf);
+	zfree(&ptq->chain);
+	free(ptq);
+	return NULL;
+}
+
+static void intel_pt_free_queue(void *priv)
+{
+	struct intel_pt_queue *ptq = priv;
+
+	if (!ptq)
+		return;
+	intel_pt_decoder_free(ptq->decoder);
+	zfree(&ptq->event_buf);
+	zfree(&ptq->chain);
+	free(ptq);
+}
+
+static void intel_pt_set_pid_tid_cpu(struct intel_pt *pt,
+				     struct auxtrace_queue *queue)
+{
+	struct intel_pt_queue *ptq = queue->priv;
+
+	if (queue->tid == -1 || pt->have_sched_switch) {
+		ptq->tid = machine__get_current_tid(pt->machine, ptq->cpu);
+		ptq->thread = NULL;
+	}
+
+	if (!ptq->thread && ptq->tid != -1)
+		ptq->thread = machine__find_thread(pt->machine, -1, ptq->tid);
+
+	if (ptq->thread) {
+		ptq->pid = ptq->thread->pid_;
+		if (queue->cpu == -1)
+			ptq->cpu = ptq->thread->cpu;
+	}
+}
+
+static void intel_pt_sample_flags(struct intel_pt_queue *ptq)
+{
+	if (ptq->state->flags & INTEL_PT_ABORT_TX) {
+		ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TX_ABORT;
+	} else if (ptq->state->flags & INTEL_PT_ASYNC) {
+		if (ptq->state->to_ip)
+			ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+				     PERF_IP_FLAG_ASYNC |
+				     PERF_IP_FLAG_INTERRUPT;
+		else
+			ptq->flags = PERF_IP_FLAG_BRANCH |
+				     PERF_IP_FLAG_TRACE_END;
+		ptq->insn_len = 0;
+	} else {
+		if (ptq->state->from_ip)
+			ptq->flags = intel_pt_insn_type(ptq->state->insn_op);
+		else
+			ptq->flags = PERF_IP_FLAG_BRANCH |
+				     PERF_IP_FLAG_TRACE_BEGIN;
+		if (ptq->state->flags & INTEL_PT_IN_TX)
+			ptq->flags |= PERF_IP_FLAG_IN_TX;
+		ptq->insn_len = ptq->state->insn_len;
+	}
+}
+
+static int intel_pt_setup_queue(struct intel_pt *pt,
+				struct auxtrace_queue *queue,
+				unsigned int queue_nr)
+{
+	struct intel_pt_queue *ptq = queue->priv;
+
+	if (list_empty(&queue->head))
+		return 0;
+
+	if (!ptq) {
+		ptq = intel_pt_alloc_queue(pt, queue_nr);
+		if (!ptq)
+			return -ENOMEM;
+		queue->priv = ptq;
+
+		if (queue->cpu != -1)
+			ptq->cpu = queue->cpu;
+		ptq->tid = queue->tid;
+
+		if (pt->sampling_mode) {
+			if (pt->timeless_decoding)
+				ptq->step_through_buffers = true;
+			if (pt->timeless_decoding || !pt->have_sched_switch)
+				ptq->use_buffer_pid_tid = true;
+		}
+	}
+
+	if (!ptq->on_heap &&
+	    (!pt->sync_switch ||
+	     ptq->switch_state != INTEL_PT_SS_EXPECTING_SWITCH_EVENT)) {
+		const struct intel_pt_state *state;
+		int ret;
+
+		if (pt->timeless_decoding)
+			return 0;
+
+		intel_pt_log("queue %u getting timestamp\n", queue_nr);
+		intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
+			     queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+		while (1) {
+			state = intel_pt_decode(ptq->decoder);
+			if (state->err) {
+				if (state->err == INTEL_PT_ERR_NODATA) {
+					intel_pt_log("queue %u has no timestamp\n",
+						     queue_nr);
+					return 0;
+				}
+				continue;
+			}
+			if (state->timestamp)
+				break;
+		}
+
+		ptq->timestamp = state->timestamp;
+		intel_pt_log("queue %u timestamp 0x%" PRIx64 "\n",
+			     queue_nr, ptq->timestamp);
+		ptq->state = state;
+		ptq->have_sample = true;
+		intel_pt_sample_flags(ptq);
+		ret = auxtrace_heap__add(&pt->heap, queue_nr, ptq->timestamp);
+		if (ret)
+			return ret;
+		ptq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int intel_pt_setup_queues(struct intel_pt *pt)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < pt->queues.nr_queues; i++) {
+		ret = intel_pt_setup_queue(pt, &pt->queues.queue_array[i], i);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static int intel_pt_inject_event(union perf_event *event,
+				 struct perf_sample *sample, u64 type,
+				 bool swapped)
+{
+	event->header.size = perf_event__sample_event_size(sample, type, 0);
+	return perf_event__synthesize_sample(event, type, 0, sample, swapped);
+}
+
+static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event *event = ptq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->branches_id;
+	sample.stream_id = ptq->pt->branches_id;
+	sample.period = 1;
+	sample.cpu = ptq->cpu;
+
+	if (pt->branches_filter && !(pt->branches_filter & ptq->flags))
+		return 0;
+
+	if (pt->synth_opts.inject) {
+		ret = intel_pt_inject_event(event, &sample,
+					    pt->branches_sample_type,
+					    pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver branch event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event *event = ptq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->instructions_id;
+	sample.stream_id = ptq->pt->instructions_id;
+	sample.period = ptq->pt->instructions_sample_period;
+	sample.cpu = ptq->cpu;
+
+	if (pt->synth_opts.callchain) {
+		thread_stack__sample(ptq->thread, ptq->chain,
+				     pt->synth_opts.callchain_sz, sample.ip);
+		sample.callchain = ptq->chain;
+	}
+
+	if (pt->synth_opts.inject) {
+		ret = intel_pt_inject_event(event, &sample,
+					    pt->instructions_sample_type,
+					    pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver instruction event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_transaction_sample(struct intel_pt_queue *ptq)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event *event = ptq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->transactions_id;
+	sample.stream_id = ptq->pt->transactions_id;
+	sample.period = 1;
+	sample.cpu = ptq->cpu;
+	sample.flags = ptq->flags;
+	sample.insn_len = ptq->insn_len;
+
+	if (pt->synth_opts.callchain) {
+		thread_stack__sample(ptq->thread, ptq->chain,
+				     pt->synth_opts.callchain_sz, sample.ip);
+		sample.callchain = ptq->chain;
+	}
+
+	if (pt->synth_opts.inject) {
+		ret = intel_pt_inject_event(event, &sample,
+					    pt->transactions_sample_type,
+					    pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver transaction event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_error(struct intel_pt *pt, int code, int cpu,
+				pid_t pid, pid_t tid, u64 ip)
+{
+	union perf_event event;
+	char msg[MAX_AUXTRACE_ERROR_MSG];
+	int err;
+
+	intel_pt__strerror(code, msg, MAX_AUXTRACE_ERROR_MSG);
+
+	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
+			     code, cpu, pid, tid, ip, msg);
+
+	err = perf_session__deliver_synth_event(pt->session, &event, NULL);
+	if (err)
+		pr_err("Intel Processor Trace: failed to deliver error event, error %d\n",
+		       err);
+
+	return err;
+}
+
+static int intel_pt_next_tid(struct intel_pt *pt, struct intel_pt_queue *ptq)
+{
+	struct auxtrace_queue *queue;
+	pid_t tid = ptq->next_tid;
+	int err;
+
+	if (tid == -1)
+		return 0;
+
+	intel_pt_log("switch: cpu %d tid %d\n", ptq->cpu, tid);
+
+	err = machine__set_current_tid(pt->machine, ptq->cpu, -1, tid);
+
+	queue = &pt->queues.queue_array[ptq->queue_nr];
+	intel_pt_set_pid_tid_cpu(pt, queue);
+
+	ptq->next_tid = -1;
+
+	return err;
+}
+
+static inline bool intel_pt_is_switch_ip(struct intel_pt_queue *ptq, u64 ip)
+{
+	struct intel_pt *pt = ptq->pt;
+
+	return ip == pt->switch_ip &&
+	       (ptq->flags & PERF_IP_FLAG_BRANCH) &&
+	       !(ptq->flags & (PERF_IP_FLAG_CONDITIONAL | PERF_IP_FLAG_ASYNC |
+			       PERF_IP_FLAG_INTERRUPT | PERF_IP_FLAG_TX_ABORT));
+}
+
+static int intel_pt_sample(struct intel_pt_queue *ptq)
+{
+	const struct intel_pt_state *state = ptq->state;
+	struct intel_pt *pt = ptq->pt;
+	int err;
+
+	if (!ptq->have_sample)
+		return 0;
+
+	ptq->have_sample = false;
+
+	if (pt->sample_instructions &&
+	    (state->type & INTEL_PT_INSTRUCTION)) {
+		err = intel_pt_synth_instruction_sample(ptq);
+		if (err)
+			return err;
+	}
+
+	if (pt->sample_transactions &&
+	    (state->type & INTEL_PT_TRANSACTION)) {
+		err = intel_pt_synth_transaction_sample(ptq);
+		if (err)
+			return err;
+	}
+
+	if (!(state->type & INTEL_PT_BRANCH))
+		return 0;
+
+	if (pt->synth_opts.callchain)
+		thread_stack__event(ptq->thread, ptq->flags, state->from_ip,
+				    state->to_ip, ptq->insn_len,
+				    state->trace_nr);
+
+	if (pt->sample_branches) {
+		err = intel_pt_synth_branch_sample(ptq);
+		if (err)
+			return err;
+	}
+
+	if (!pt->sync_switch)
+		return 0;
+
+	if (intel_pt_is_switch_ip(ptq, state->to_ip)) {
+		switch (ptq->switch_state) {
+		case INTEL_PT_SS_UNKNOWN:
+		case INTEL_PT_SS_EXPECTING_SWITCH_IP:
+			err = intel_pt_next_tid(pt, ptq);
+			if (err)
+				return err;
+			ptq->switch_state = INTEL_PT_SS_TRACING;
+			break;
+		default:
+			ptq->switch_state = INTEL_PT_SS_EXPECTING_SWITCH_EVENT;
+			return 1;
+		}
+	} else if (!state->to_ip) {
+		ptq->switch_state = INTEL_PT_SS_NOT_TRACING;
+	} else if (ptq->switch_state == INTEL_PT_SS_NOT_TRACING) {
+		ptq->switch_state = INTEL_PT_SS_UNKNOWN;
+	} else if (ptq->switch_state == INTEL_PT_SS_UNKNOWN &&
+		   state->to_ip == pt->ptss_ip &&
+		   (ptq->flags & PERF_IP_FLAG_CALL)) {
+		ptq->switch_state = INTEL_PT_SS_TRACING;
+	}
+
+	return 0;
+}
+
+static u64 intel_pt_switch_ip(struct machine *machine, u64 *ptss_ip)
+{
+	struct map *map;
+	struct symbol *sym, *start;
+	u64 ip, switch_ip = 0;
+
+	if (ptss_ip)
+		*ptss_ip = 0;
+
+	map = machine__kernel_map(machine, MAP__FUNCTION);
+	if (!map)
+		return 0;
+
+	if (map__load(map, machine->symbol_filter))
+		return 0;
+
+	start = dso__first_symbol(map->dso, MAP__FUNCTION);
+
+	for (sym = start; sym; sym = dso__next_symbol(sym)) {
+		if (sym->binding == STB_GLOBAL &&
+		    !strcmp(sym->name, "__switch_to")) {
+			ip = map->unmap_ip(map, sym->start);
+			if (ip >= map->start && ip < map->end) {
+				switch_ip = ip;
+				break;
+			}
+		}
+	}
+
+	if (!switch_ip || !ptss_ip)
+		return 0;
+
+	for (sym = start; sym; sym = dso__next_symbol(sym)) {
+		if (!strcmp(sym->name, "perf_trace_sched_switch")) {
+			ip = map->unmap_ip(map, sym->start);
+			if (ip >= map->start && ip < map->end) {
+				*ptss_ip = ip;
+				break;
+			}
+		}
+	}
+
+	return switch_ip;
+}
+
+static int intel_pt_run_decoder(struct intel_pt_queue *ptq, u64 *timestamp)
+{
+	const struct intel_pt_state *state = ptq->state;
+	struct intel_pt *pt = ptq->pt;
+	int err;
+
+	if (!pt->kernel_start) {
+		pt->kernel_start = machine__kernel_start(pt->machine);
+		if (pt->per_cpu_mmaps && pt->have_sched_switch &&
+		    !pt->timeless_decoding && intel_pt_tracing_kernel(pt) &&
+		    !pt->sampling_mode) {
+			pt->switch_ip = intel_pt_switch_ip(pt->machine,
+							   &pt->ptss_ip);
+			if (pt->switch_ip) {
+				intel_pt_log("switch_ip: %"PRIx64" ptss_ip: %"PRIx64"\n",
+					     pt->switch_ip, pt->ptss_ip);
+				pt->sync_switch = true;
+				pt->est_tsc_orig = pt->est_tsc;
+				pt->est_tsc = false;
+			}
+		}
+	}
+
+	intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
+		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+	while (1) {
+		err = intel_pt_sample(ptq);
+		if (err)
+			return err;
+
+		state = intel_pt_decode(ptq->decoder);
+		if (state->err) {
+			if (state->err == INTEL_PT_ERR_NODATA)
+				return 1;
+			if (pt->sync_switch &&
+			    state->from_ip >= pt->kernel_start) {
+				pt->sync_switch = false;
+				pt->est_tsc = pt->est_tsc_orig;
+				intel_pt_next_tid(pt, ptq);
+			}
+			if (pt->synth_opts.errors) {
+				err = intel_pt_synth_error(pt, state->err,
+							   ptq->cpu, ptq->pid,
+							   ptq->tid,
+							   state->from_ip);
+				if (err)
+					return err;
+			}
+			continue;
+		}
+
+		ptq->state = state;
+		ptq->have_sample = true;
+		intel_pt_sample_flags(ptq);
+
+		/* Use estimated TSC upon return to user space */
+		if (pt->est_tsc) {
+			if (state->from_ip >= pt->kernel_start &&
+			    state->to_ip &&
+			    state->to_ip < pt->kernel_start)
+				ptq->timestamp = state->est_timestamp;
+			else if (state->timestamp > ptq->timestamp)
+				ptq->timestamp = state->timestamp;
+		/* Use estimated TSC in unknown switch state */
+		} else if (pt->sync_switch &&
+			   ptq->switch_state == INTEL_PT_SS_UNKNOWN &&
+			   state->to_ip == pt->switch_ip &&
+			   (ptq->flags & PERF_IP_FLAG_CALL) &&
+			   ptq->next_tid == -1) {
+			ptq->timestamp = state->est_timestamp;
+		} else if (state->timestamp > ptq->timestamp) {
+			ptq->timestamp = state->timestamp;
+		}
+
+		if (!pt->timeless_decoding && ptq->timestamp >= *timestamp) {
+			*timestamp = ptq->timestamp;
+			return 0;
+		}
+	}
+	return 0;
+}
+
+static inline int intel_pt_update_queues(struct intel_pt *pt)
+{
+	if (pt->queues.new_data) {
+		pt->queues.new_data = false;
+		return intel_pt_setup_queues(pt);
+	}
+	return 0;
+}
+
+static int intel_pt_process_queues(struct intel_pt *pt, u64 timestamp)
+{
+	unsigned int queue_nr;
+	u64 ts;
+	int ret;
+
+	while (1) {
+		struct auxtrace_queue *queue;
+		struct intel_pt_queue *ptq;
+
+		if (!pt->heap.heap_cnt)
+			return 0;
+
+		if (pt->heap.heap_array[0].ordinal >= timestamp)
+			return 0;
+
+		queue_nr = pt->heap.heap_array[0].queue_nr;
+		queue = &pt->queues.queue_array[queue_nr];
+		ptq = queue->priv;
+
+		intel_pt_log("queue %u processing 0x%" PRIx64 " to 0x%" PRIx64 "\n",
+			     queue_nr, pt->heap.heap_array[0].ordinal,
+			     timestamp);
+
+		auxtrace_heap__pop(&pt->heap);
+
+		if (pt->heap.heap_cnt) {
+			ts = pt->heap.heap_array[0].ordinal + 1;
+			if (ts > timestamp)
+				ts = timestamp;
+		} else {
+			ts = timestamp;
+		}
+
+		intel_pt_set_pid_tid_cpu(pt, queue);
+
+		ret = intel_pt_run_decoder(ptq, &ts);
+
+		if (ret < 0) {
+			auxtrace_heap__add(&pt->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&pt->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			ptq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int intel_pt_process_timeless_queues(struct intel_pt *pt, pid_t tid,
+					    u64 time_)
+{
+	struct auxtrace_queues *queues = &pt->queues;
+	unsigned int i;
+	u64 ts = 0;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &pt->queues.queue_array[i];
+		struct intel_pt_queue *ptq = queue->priv;
+
+		if (ptq && (tid == -1 || ptq->tid == tid)) {
+			ptq->time = time_;
+			intel_pt_set_pid_tid_cpu(pt, queue);
+			intel_pt_run_decoder(ptq, &ts);
+		}
+	}
+	return 0;
+}
+
+static int intel_pt_lost(struct intel_pt *pt, struct perf_sample *sample)
+{
+	return intel_pt_synth_error(pt, INTEL_PT_ERR_LOST, sample->cpu,
+				    sample->pid, sample->tid, 0);
+}
+
+static struct intel_pt_queue *intel_pt_cpu_to_ptq(struct intel_pt *pt, int cpu)
+{
+	unsigned i, j;
+
+	if (cpu < 0 || !pt->queues.nr_queues)
+		return NULL;
+
+	if ((unsigned)cpu >= pt->queues.nr_queues)
+		i = pt->queues.nr_queues - 1;
+	else
+		i = cpu;
+
+	if (pt->queues.queue_array[i].cpu == cpu)
+		return pt->queues.queue_array[i].priv;
+
+	for (j = 0; i > 0; j++) {
+		if (pt->queues.queue_array[--i].cpu == cpu)
+			return pt->queues.queue_array[i].priv;
+	}
+
+	for (; j < pt->queues.nr_queues; j++) {
+		if (pt->queues.queue_array[j].cpu == cpu)
+			return pt->queues.queue_array[j].priv;
+	}
+
+	return NULL;
+}
+
+static int intel_pt_process_switch(struct intel_pt *pt,
+				   struct perf_sample *sample)
+{
+	struct intel_pt_queue *ptq;
+	struct perf_evsel *evsel;
+	pid_t tid;
+	int cpu, err;
+
+	evsel = perf_evlist__id2evsel(pt->session->evlist, sample->id);
+	if (evsel != pt->switch_evsel)
+		return 0;
+
+	tid = perf_evsel__intval(evsel, sample, "next_pid");
+	cpu = sample->cpu;
+
+	intel_pt_log("sched_switch: cpu %d tid %d time %"PRIu64" tsc %#"PRIx64"\n",
+		     cpu, tid, sample->time, perf_time_to_tsc(sample->time,
+		     &pt->tc));
+
+	if (!pt->sync_switch)
+		goto out;
+
+	ptq = intel_pt_cpu_to_ptq(pt, cpu);
+	if (!ptq)
+		goto out;
+
+	switch (ptq->switch_state) {
+	case INTEL_PT_SS_NOT_TRACING:
+		ptq->next_tid = -1;
+		break;
+	case INTEL_PT_SS_UNKNOWN:
+	case INTEL_PT_SS_TRACING:
+		ptq->next_tid = tid;
+		ptq->switch_state = INTEL_PT_SS_EXPECTING_SWITCH_IP;
+		return 0;
+	case INTEL_PT_SS_EXPECTING_SWITCH_EVENT:
+		if (!ptq->on_heap) {
+			ptq->timestamp = perf_time_to_tsc(sample->time,
+							  &pt->tc);
+			err = auxtrace_heap__add(&pt->heap, ptq->queue_nr,
+						 ptq->timestamp);
+			if (err)
+				return err;
+			ptq->on_heap = true;
+		}
+		ptq->switch_state = INTEL_PT_SS_TRACING;
+		break;
+	case INTEL_PT_SS_EXPECTING_SWITCH_IP:
+		ptq->next_tid = tid;
+		intel_pt_log("ERROR: cpu %d expecting switch ip\n", cpu);
+		break;
+	default:
+		break;
+	}
+out:
+	return machine__set_current_tid(pt->machine, cpu, -1, tid);
+}
+
+static int intel_pt_process_itrace_start(struct intel_pt *pt,
+					 union perf_event *event,
+					 struct perf_sample *sample)
+{
+	if (!pt->per_cpu_mmaps)
+		return 0;
+
+	intel_pt_log("itrace_start: cpu %d pid %d tid %d time %"PRIu64" tsc %#"PRIx64"\n",
+		     sample->cpu, event->itrace_start.pid,
+		     event->itrace_start.tid, sample->time,
+		     perf_time_to_tsc(sample->time, &pt->tc));
+
+	return machine__set_current_tid(pt->machine, sample->cpu,
+					event->itrace_start.pid,
+					event->itrace_start.tid);
+}
+
+static int intel_pt_process_event(struct perf_session *session,
+				  union perf_event *event,
+				  struct perf_sample *sample,
+				  struct perf_tool *tool)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+	u64 timestamp;
+	int err = 0;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("Intel Processor Trace requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time)
+		timestamp = perf_time_to_tsc(sample->time, &pt->tc);
+	else
+		timestamp = 0;
+
+	if (timestamp || pt->timeless_decoding) {
+		err = intel_pt_update_queues(pt);
+		if (err)
+			return err;
+	}
+
+	if (pt->timeless_decoding) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = intel_pt_process_timeless_queues(pt,
+							       event->comm.tid,
+							       sample->time);
+		}
+	} else if (timestamp) {
+		err = intel_pt_process_queues(pt, timestamp);
+	}
+	if (err)
+		return err;
+
+	if (event->header.type == PERF_RECORD_AUX &&
+	    (event->aux.flags & PERF_AUX_FLAG_TRUNCATED) &&
+	    pt->synth_opts.errors)
+		err = intel_pt_lost(pt, sample);
+
+	if (pt->switch_evsel && event->header.type == PERF_RECORD_SAMPLE)
+		err = intel_pt_process_switch(pt, sample);
+	else if (event->header.type == PERF_RECORD_ITRACE_START)
+		err = intel_pt_process_itrace_start(pt, event, sample);
+
+	return err;
+}
+
+static int intel_pt_flush(struct perf_session *session, struct perf_tool *tool)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+	int ret;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = intel_pt_update_queues(pt);
+	if (ret < 0)
+		return ret;
+
+	if (pt->timeless_decoding)
+		return intel_pt_process_timeless_queues(pt, -1,
+							MAX_TIMESTAMP - 1);
+
+	return intel_pt_process_queues(pt, MAX_TIMESTAMP);
+}
+
+static void intel_pt_free_events(struct perf_session *session)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+	struct auxtrace_queues *queues = &pt->queues;
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		intel_pt_free_queue(queues->queue_array[i].priv);
+		queues->queue_array[i].priv = NULL;
+	}
+	intel_pt_log_disable();
+	auxtrace_queues__free(queues);
+}
+
+static void intel_pt_free(struct perf_session *session)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+
+	auxtrace_heap__free(&pt->heap);
+	intel_pt_free_events(session);
+	session->auxtrace = NULL;
+	thread__delete(pt->unknown_thread);
+	free(pt);
+}
+
+static int intel_pt_process_auxtrace_event(struct perf_session *session,
+					   union perf_event *event,
+					   struct perf_tool *tool __maybe_unused)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+
+	if (pt->sampling_mode)
+		return 0;
+
+	if (!pt->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data_file__fd(session->file);
+		int err;
+
+		if (perf_data_file__is_pipe(session->file)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&pt->queues, session, event,
+						 data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				intel_pt_dump_event(pt, buffer->data,
+						    buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
+		}
+	}
+
+	return 0;
+}
+
+struct intel_pt_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int intel_pt_event_synth(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample __maybe_unused,
+				struct machine *machine __maybe_unused)
+{
+	struct intel_pt_synth *intel_pt_synth =
+			container_of(tool, struct intel_pt_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(intel_pt_synth->session, event,
+						 NULL);
+}
+
+static int intel_pt_synth_event(struct perf_session *session,
+				struct perf_event_attr *attr, u64 id)
+{
+	struct intel_pt_synth intel_pt_synth;
+
+	memset(&intel_pt_synth, 0, sizeof(struct intel_pt_synth));
+	intel_pt_synth.session = session;
+
+	return perf_event__synthesize_attr(&intel_pt_synth.dummy_tool, attr, 1,
+					   &id, intel_pt_event_synth);
+}
+
+static int intel_pt_synth_events(struct intel_pt *pt,
+				 struct perf_session *session)
+{
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == pt->pmu_type && evsel->ids) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("There are no selected events with Intel Processor Trace data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+			    PERF_SAMPLE_PERIOD;
+	if (pt->timeless_decoding)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	else
+		attr.sample_type |= PERF_SAMPLE_TIME;
+	if (!pt->per_cpu_mmaps)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_CPU;
+	attr.exclude_user = evsel->attr.exclude_user;
+	attr.exclude_kernel = evsel->attr.exclude_kernel;
+	attr.exclude_hv = evsel->attr.exclude_hv;
+	attr.exclude_host = evsel->attr.exclude_host;
+	attr.exclude_guest = evsel->attr.exclude_guest;
+	attr.sample_id_all = evsel->attr.sample_id_all;
+	attr.read_format = evsel->attr.read_format;
+
+	id = evsel->id[0] + 1000000000;
+	if (!id)
+		id = 1;
+
+	if (pt->synth_opts.instructions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		if (pt->synth_opts.period_type == PERF_ITRACE_PERIOD_NANOSECS)
+			attr.sample_period =
+				intel_pt_ns_to_ticks(pt, pt->synth_opts.period);
+		else
+			attr.sample_period = pt->synth_opts.period;
+		pt->instructions_sample_period = attr.sample_period;
+		if (pt->synth_opts.callchain)
+			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
+		pr_debug("Synthesizing 'instructions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'instructions' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_instructions = true;
+		pt->instructions_sample_type = attr.sample_type;
+		pt->instructions_id = id;
+		id += 1;
+	}
+
+	if (pt->synth_opts.transactions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		attr.sample_period = 1;
+		if (pt->synth_opts.callchain)
+			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
+		pr_debug("Synthesizing 'transactions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'transactions' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_transactions = true;
+		pt->transactions_id = id;
+		id += 1;
+		evlist__for_each(evlist, evsel) {
+			if (evsel->id && evsel->id[0] == pt->transactions_id) {
+				if (evsel->name)
+					zfree(&evsel->name);
+				evsel->name = strdup("transactions");
+				break;
+			}
+		}
+	}
+
+	if (pt->synth_opts.branches) {
+		attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
+		attr.sample_period = 1;
+		attr.sample_type |= PERF_SAMPLE_ADDR;
+		attr.sample_type &= ~(u64)PERF_SAMPLE_CALLCHAIN;
+		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'branches' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_branches = true;
+		pt->branches_sample_type = attr.sample_type;
+		pt->branches_id = id;
+	}
+
+	pt->synth_needs_swap = evsel->needs_swap;
+
+	return 0;
+}
+
+static struct perf_evsel *intel_pt_find_sched_switch(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each_reverse(evlist, evsel) {
+		const char *name = perf_evsel__name(evsel);
+
+		if (!strcmp(name, "sched:sched_switch"))
+			return evsel;
+	}
+
+	return NULL;
+}
+
+static const char * const intel_pt_info_fmts[] = {
+	[INTEL_PT_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
+	[INTEL_PT_TIME_SHIFT]		= "  Time Shift         %"PRIu64"\n",
+	[INTEL_PT_TIME_MULT]		= "  Time Muliplier     %"PRIu64"\n",
+	[INTEL_PT_TIME_ZERO]		= "  Time Zero          %"PRIu64"\n",
+	[INTEL_PT_CAP_USER_TIME_ZERO]	= "  Cap Time Zero      %"PRId64"\n",
+	[INTEL_PT_TSC_BIT]		= "  TSC bit            %#"PRIx64"\n",
+	[INTEL_PT_NORETCOMP_BIT]	= "  NoRETComp bit      %#"PRIx64"\n",
+	[INTEL_PT_HAVE_SCHED_SWITCH]	= "  Have sched_switch  %"PRId64"\n",
+	[INTEL_PT_SNAPSHOT_MODE]	= "  Snapshot mode      %"PRId64"\n",
+	[INTEL_PT_PER_CPU_MMAPS]	= "  Per-cpu maps       %"PRId64"\n",
+};
+
+static void intel_pt_print_info(u64 *arr, int start, int finish)
+{
+	int i;
+
+	if (!dump_trace)
+		return;
+
+	for (i = start; i <= finish; i++)
+		fprintf(stdout, intel_pt_info_fmts[i], arr[i]);
+}
+
+int intel_pt_process_auxtrace_info(union perf_event *event,
+				   struct perf_session *session)
+{
+	struct auxtrace_info_event *auxtrace_info = &event->auxtrace_info;
+	size_t min_sz = sizeof(u64) * INTEL_PT_PER_CPU_MMAPS;
+	struct intel_pt *pt;
+	int err;
+
+	if (auxtrace_info->header.size < sizeof(struct auxtrace_info_event) +
+					min_sz)
+		return -EINVAL;
+
+	pt = zalloc(sizeof(struct intel_pt));
+	if (!pt)
+		return -ENOMEM;
+
+	err = auxtrace_queues__init(&pt->queues);
+	if (err)
+		goto err_free;
+
+	intel_pt_log_set_name(INTEL_PT_PMU_NAME);
+
+	pt->session = session;
+	pt->machine = &session->machines.host; /* No kvm support */
+	pt->auxtrace_type = auxtrace_info->type;
+	pt->pmu_type = auxtrace_info->priv[INTEL_PT_PMU_TYPE];
+	pt->tc.time_shift = auxtrace_info->priv[INTEL_PT_TIME_SHIFT];
+	pt->tc.time_mult = auxtrace_info->priv[INTEL_PT_TIME_MULT];
+	pt->tc.time_zero = auxtrace_info->priv[INTEL_PT_TIME_ZERO];
+	pt->cap_user_time_zero = auxtrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO];
+	pt->tsc_bit = auxtrace_info->priv[INTEL_PT_TSC_BIT];
+	pt->noretcomp_bit = auxtrace_info->priv[INTEL_PT_NORETCOMP_BIT];
+	pt->have_sched_switch = auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH];
+	pt->snapshot_mode = auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE];
+	pt->per_cpu_mmaps = auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS];
+	intel_pt_print_info(&auxtrace_info->priv[0], INTEL_PT_PMU_TYPE,
+			    INTEL_PT_PER_CPU_MMAPS);
+
+	pt->timeless_decoding = intel_pt_timeless_decoding(pt);
+	pt->have_tsc = intel_pt_have_tsc(pt);
+	pt->sampling_mode = false;
+	pt->est_tsc = pt->per_cpu_mmaps && !pt->timeless_decoding;
+
+	pt->unknown_thread = thread__new(999999999, 999999999);
+	if (!pt->unknown_thread) {
+		err = -ENOMEM;
+		goto err_free_queues;
+	}
+	err = thread__set_comm(pt->unknown_thread, "unknown", 0);
+	if (err)
+		goto err_delete_thread;
+	if (thread__init_map_groups(pt->unknown_thread, pt->machine)) {
+		err = -ENOMEM;
+		goto err_delete_thread;
+	}
+
+	pt->auxtrace.process_event = intel_pt_process_event;
+	pt->auxtrace.process_auxtrace_event = intel_pt_process_auxtrace_event;
+	pt->auxtrace.flush_events = intel_pt_flush;
+	pt->auxtrace.free_events = intel_pt_free_events;
+	pt->auxtrace.free = intel_pt_free;
+	session->auxtrace = &pt->auxtrace;
+
+	if (dump_trace)
+		return 0;
+
+	if (pt->have_sched_switch == 1) {
+		pt->switch_evsel = intel_pt_find_sched_switch(session->evlist);
+		if (!pt->switch_evsel) {
+			pr_err("%s: missing sched_switch event\n", __func__);
+			goto err_delete_thread;
+		}
+	}
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set) {
+		pt->synth_opts = *session->itrace_synth_opts;
+	} else {
+		itrace_synth_opts__set_default(&pt->synth_opts);
+		if (use_browser != -1) {
+			pt->synth_opts.branches = false;
+			pt->synth_opts.callchain = true;
+		}
+	}
+
+	if (pt->synth_opts.log)
+		intel_pt_log_enable();
+
+	if (pt->synth_opts.calls)
+		pt->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
+				       PERF_IP_FLAG_TRACE_END;
+	if (pt->synth_opts.returns)
+		pt->branches_filter |= PERF_IP_FLAG_RETURN |
+				       PERF_IP_FLAG_TRACE_BEGIN;
+
+	if (pt->synth_opts.callchain && !symbol_conf.use_callchain) {
+		symbol_conf.use_callchain = true;
+		if (callchain_register_param(&callchain_param) < 0) {
+			symbol_conf.use_callchain = false;
+			pt->synth_opts.callchain = false;
+		}
+	}
+
+	err = intel_pt_synth_events(pt, session);
+	if (err)
+		goto err_delete_thread;
+
+	err = auxtrace_queues__process_index(&pt->queues, session);
+	if (err)
+		goto err_delete_thread;
+
+	if (pt->queues.populated)
+		pt->data_queued = true;
+
+	if (pt->timeless_decoding)
+		pr_debug2("Intel PT decoding without timestamps\n");
+
+	return 0;
+
+err_delete_thread:
+	thread__delete(pt->unknown_thread);
+err_free_queues:
+	intel_pt_log_disable();
+	auxtrace_queues__free(&pt->queues);
+	session->auxtrace = NULL;
+err_free:
+	free(pt);
+	return err;
+}
diff --git a/tools/perf/util/intel-pt.h b/tools/perf/util/intel-pt.h
new file mode 100644
index 000000000000..a1bfe93473ba
--- /dev/null
+++ b/tools/perf/util/intel-pt.h
@@ -0,0 +1,51 @@
+/*
+ * intel_pt.h: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__PERF_INTEL_PT_H__
+#define INCLUDE__PERF_INTEL_PT_H__
+
+#define INTEL_PT_PMU_NAME "intel_pt"
+
+enum {
+	INTEL_PT_PMU_TYPE,
+	INTEL_PT_TIME_SHIFT,
+	INTEL_PT_TIME_MULT,
+	INTEL_PT_TIME_ZERO,
+	INTEL_PT_CAP_USER_TIME_ZERO,
+	INTEL_PT_TSC_BIT,
+	INTEL_PT_NORETCOMP_BIT,
+	INTEL_PT_HAVE_SCHED_SWITCH,
+	INTEL_PT_SNAPSHOT_MODE,
+	INTEL_PT_PER_CPU_MMAPS,
+	INTEL_PT_AUXTRACE_PRIV_MAX,
+};
+
+#define INTEL_PT_AUXTRACE_PRIV_SIZE (INTEL_PT_AUXTRACE_PRIV_MAX * sizeof(u64))
+
+struct auxtrace_record;
+struct perf_tool;
+union perf_event;
+struct perf_session;
+struct perf_event_attr;
+struct perf_pmu;
+
+struct auxtrace_record *intel_pt_recording_init(int *err);
+
+int intel_pt_process_auxtrace_info(union perf_event *event,
+				   struct perf_session *session);
+
+struct perf_event_attr *intel_pt_pmu_default_config(struct perf_pmu *pmu);
+
+#endif
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 7/8] perf tools: Take Intel PT into use
  2015-06-26 22:02 [GIT PULL 0/8] perf/pt -> Intel PT/BTS Arnaldo Carvalho de Melo
                   ` (5 preceding siblings ...)
  2015-06-26 22:02 ` [PATCH 6/8] perf tools: Add Intel PT support Arnaldo Carvalho de Melo
@ 2015-06-26 22:02 ` Arnaldo Carvalho de Melo
  2015-06-26 22:02 ` [PATCH 8/8] perf tools: Add Intel BTS support Arnaldo Carvalho de Melo
  2015-06-30  4:58 ` [GIT PULL 0/8] perf/pt -> Intel PT/BTS Ingo Molnar
  8 siblings, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-26 22:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Adrian Hunter, David Ahern, Namhyung Kim,
	Peter Zijlstra, Stephane Eranian, Jiri Olsa,
	Arnaldo Carvalho de Melo

From: Adrian Hunter <adrian.hunter@intel.com>

To record an AUX area, the weak function auxtrace_record__init() must be
implemented.

Equally to decode an AUX area, the AUX area tracing type must be added
to the perf_event__process_auxtrace_info() function.

This patch makes those two changes plus hooks up default config for the
intel_pt PMU.  Also some brief documentation is provided for using the
tools with intel_pt.

E.g:

  [root@perf4 ~]# dmesg
  451 [0.405807] Performance Events: PEBS fmt2+, 16-deep LBR, Broadwell events, full-width counters, Intel PMU driver.
  [root@perf4 ~]# perf --version
  perf version 4.1.g53874a
  [root@perf4 ~]#  perf record -e intel_pt//u -a sleep 10
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 0.383 MB perf.data ]
  [root@perf4 ~]# perf evlist
  intel_pt//u
  sched:sched_switch
  dummy:u
  [root@perf4 ~]# perf report --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 0  of event 'intel_pt//u'
  # Event count (approx.): 0
  #
  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ......
  #

  # Samples: 393  of event 'sched:sched_switch'
  # Event count (approx.): 393
  #
  # Overhead  Command         Shared Object     Symbol
  # ........  ..............  ................  ..............
    49.62%  swapper         [kernel.vmlinux]  [k] __schedule
    10.69%  rcu_sched       [kernel.vmlinux]  [k] __schedule
     6.62%  rcuos/0         [kernel.vmlinux]  [k] __schedule
     5.60%  kworker/0:1     [kernel.vmlinux]  [k] __schedule
     3.56%  rcuos/3         [kernel.vmlinux]  [k] __schedule
     3.05%  kworker/u384:2  [kernel.vmlinux]  [k] __schedule
     2.54%  kworker/2:0     [kernel.vmlinux]  [k] __schedule
     2.54%  tuned           [kernel.vmlinux]  [k] __schedule
  <SNIP>
  # Samples: 0  of event 'dummy:u'
  # Event count (approx.): 0
  #
  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ......

  # Samples: 28  of event 'instructions:u'
  # Event count (approx.): 5030172
  #
  # Overhead  Command     Shared Object        Symbol
  # ........  ..........  ...................  ................................
  #
    21.43%  tuned       libpython2.7.so.1.0  [.] PyEval_EvalFrameEx
                 |
                 ---PyEval_EvalFrameEx
                    |
                    |--83.33%-- PyEval_EvalCodeEx
                    |          PyEval_EvalFrameEx
                    |          |
                    |          |--60.00%-- PyEval_EvalCodeEx
                    |          |          PyEval_EvalFrameEx
                    |          |          PyEval_EvalFrameEx
                    |          |
                    |           --40.00%-- PyEval_EvalFrameEx
                    |
                     --16.67%-- PyEval_EvalFrameEx
                               PyEval_EvalCodeEx
                               PyEval_EvalFrameEx
                               PyEval_EvalCodeEx
                               PyEval_EvalFrameEx
                               PyEval_EvalFrameEx

    14.29%  tuned       libpython2.7.so.1.0  [.] _PyType_Lookup
                 |
                 ---_PyType_Lookup
                    _PyObject_GenericGetAttrWithDict
                    PyEval_EvalFrameEx
                    PyEval_EvalCodeEx
                    PyEval_EvalFrameEx
                    PyEval_EvalCodeEx
                    PyEval_EvalFrameEx
                    |
                    |--75.00%-- PyEval_EvalFrameEx
                    |
                     --25.00%-- PyEval_EvalCodeEx
                               PyEval_EvalFrameEx
                               PyEval_EvalFrameEx

     3.57%  irqbalance  irqbalance           [.] 0x0000000000004038
            |
            ---0x4038
               0x4761
               0x4761
               0x4761
               0x49f1
               0x2295

     3.57%  irqbalance  libc-2.17.so         [.] __GI_____strtoull_l_internal
            |
            ---__GI_____strtoull_l_internal
               0x6f49
               0x229a

     3.57%  irqbalance  libc-2.17.so         [.] __strchrnul
            |
            ---__strchrnul
               vfprintf
               __vsprintf_chk
               __sprintf_chk
               0x2724
               0x4038
               0x2331

     3.57%  irqbalance  libc-2.17.so         [.] __strstr_sse42
            |
            ---__strstr_sse42
               0x71e0
               0x229f

  # And now to some userspace ftrace on uninstrumented binaries 8-) :
  # Hand edited to make it a bit more compact, replacing /home/acme/bin/perf
  # with /bin/perf:

  [root@perf4 ~]# perf script
     perf 8921 [3] 7.310889: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310889: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       481630 perf_evlist__enable (/bin/perf) => 4816d8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       4816de perf_evlist__enable (/bin/perf) => 48164f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310889: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310889: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310890: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310890: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310890: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310890: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310890: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310890: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310890: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310893: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310893: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       4816a8 perf_evlist__enable (/bin/perf) => 4815f8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       4815fe perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310893: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310893: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310956: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310956: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       481630 perf_evlist__enable (/bin/perf) => 4816d8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       4816de perf_evlist__enable (/bin/perf) => 48164f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310956: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310956: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310961: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310961: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310961: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310961: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310961: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310961: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310961: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.310968: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310968: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       4816a8 perf_evlist__enable (/bin/perf) => 4815f8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       4815fe perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.310968: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.310968: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.311040: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311040: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       481630 perf_evlist__enable (/bin/perf) => 4816d8 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       4816de perf_evlist__enable (/bin/perf) => 48164f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.311040: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311040: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.311046: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311046: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311046: 1 branches:u:       481694 perf_evlist__enable (/bin/perf) => 481614 perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311046: 1 branches:u:       481652 perf_evlist__enable (/bin/perf) => 48165f perf_evlist__enable (/bin/perf)
     perf 8921 [3] 7.311046: 1 branches:u:       481684 perf_evlist__enable (/bin/perf) => 41d250 ioctl@plt (/bin/perf)
     perf 8921 [3] 7.311046: 1 branches:u:       41d250 ioctl@plt (/bin/perf) => 7fcecadbf250 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311046: 1 branches:u: 7fcecadbf255 __GI___ioctl (/usr/lib64/libc-2.17.so) => 0 [unknown] ([unknown])
     perf 8921 [3] 7.311050: 1 branches:u:            0 [unknown] ([unknown]) => 7fcecadbf257 __GI___ioctl (/usr/lib64/libc-2.17.so)
     perf 8921 [3] 7.311050: 1 branches:u: 7fcecadbf25f __GI___ioctl (/usr/lib64/libc-2.17.so) => 481689 perf_evlist__enable (/bin/perf)
:

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1432906425-9911-10-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/intel-pt.txt | 588 ++++++++++++++++++++++++++++++++++
 tools/perf/arch/x86/util/Build        |   2 +
 tools/perf/arch/x86/util/auxtrace.c   |  38 +++
 tools/perf/arch/x86/util/pmu.c        |  15 +
 tools/perf/util/auxtrace.c            |   5 +-
 tools/perf/util/pmu.c                 |   4 +-
 6 files changed, 649 insertions(+), 3 deletions(-)
 create mode 100644 tools/perf/Documentation/intel-pt.txt
 create mode 100644 tools/perf/arch/x86/util/auxtrace.c
 create mode 100644 tools/perf/arch/x86/util/pmu.c

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
new file mode 100644
index 000000000000..2866b62eb293
--- /dev/null
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -0,0 +1,588 @@
+Intel Processor Trace
+=====================
+
+Overview
+========
+
+Intel Processor Trace (Intel PT) is an extension of Intel Architecture that
+collects information about software execution such as control flow, execution
+modes and timings and formats it into highly compressed binary packets.
+Technical details are documented in the Intel 64 and IA-32 Architectures
+Software Developer Manuals, Chapter 36 Intel Processor Trace.
+
+Intel PT is first supported in Intel Core M and 5th generation Intel Core
+processors that are based on the Intel micro-architecture code name Broadwell.
+
+Trace data is collected by 'perf record' and stored within the perf.data file.
+See below for options to 'perf record'.
+
+Trace data must be 'decoded' which involves walking the object code and matching
+the trace data packets. For example a TNT packet only tells whether a
+conditional branch was taken or not taken, so to make use of that packet the
+decoder must know precisely which instruction was being executed.
+
+Decoding is done on-the-fly.  The decoder outputs samples in the same format as
+samples output by perf hardware events, for example as though the "instructions"
+or "branches" events had been recorded.  Presently 3 tools support this:
+'perf script', 'perf report' and 'perf inject'.  See below for more information
+on using those tools.
+
+The main distinguishing feature of Intel PT is that the decoder can determine
+the exact flow of software execution.  Intel PT can be used to understand why
+and how did software get to a certain point, or behave a certain way.  The
+software does not have to be recompiled, so Intel PT works with debug or release
+builds, however the executed images are needed - which makes use in JIT-compiled
+environments, or with self-modified code, a challenge.  Also symbols need to be
+provided to make sense of addresses.
+
+A limitation of Intel PT is that it produces huge amounts of trace data
+(hundreds of megabytes per second per core) which takes a long time to decode,
+for example two or three orders of magnitude longer than it took to collect.
+Another limitation is the performance impact of tracing, something that will
+vary depending on the use-case and architecture.
+
+
+Quickstart
+==========
+
+It is important to start small.  That is because it is easy to capture vastly
+more data than can possibly be processed.
+
+The simplest thing to do with Intel PT is userspace profiling of small programs.
+Data is captured with 'perf record' e.g. to trace 'ls' userspace-only:
+
+	perf record -e intel_pt//u ls
+
+And profiled with 'perf report' e.g.
+
+	perf report
+
+To also trace kernel space presents a problem, namely kernel self-modifying
+code.  A fairly good kernel image is available in /proc/kcore but to get an
+accurate image a copy of /proc/kcore needs to be made under the same conditions
+as the data capture.  A script perf-with-kcore can do that, but beware that the
+script makes use of 'sudo' to copy /proc/kcore.  If you have perf installed
+locally from the source tree you can do:
+
+	~/libexec/perf-core/perf-with-kcore record pt_ls -e intel_pt// -- ls
+
+which will create a directory named 'pt_ls' and put the perf.data file and
+copies of /proc/kcore, /proc/kallsyms and /proc/modules into it.  Then to use
+'perf report' becomes:
+
+	~/libexec/perf-core/perf-with-kcore report pt_ls
+
+Because samples are synthesized after-the-fact, the sampling period can be
+selected for reporting. e.g. sample every microsecond
+
+	~/libexec/perf-core/perf-with-kcore report pt_ls --itrace=i1usge
+
+See the sections below for more information about the --itrace option.
+
+Beware the smaller the period, the more samples that are produced, and the
+longer it takes to process them.
+
+Also note that the coarseness of Intel PT timing information will start to
+distort the statistical value of the sampling as the sampling period becomes
+smaller.
+
+To represent software control flow, "branches" samples are produced.  By default
+a branch sample is synthesized for every single branch.  To get an idea what
+data is available you can use the 'perf script' tool with no parameters, which
+will list all the samples.
+
+	perf record -e intel_pt//u ls
+	perf script
+
+An interesting field that is not printed by default is 'flags' which can be
+displayed as follows:
+
+	perf script -Fcomm,tid,pid,time,cpu,event,trace,ip,sym,dso,addr,symoff,flags
+
+The flags are "bcrosyiABEx" which stand for branch, call, return, conditional,
+system, asynchronous, interrupt, transaction abort, trace begin, trace end, and
+in transaction, respectively.
+
+While it is possible to create scripts to analyze the data, an alternative
+approach is available to export the data to a postgresql database.  Refer to
+script export-to-postgresql.py for more details, and to script
+call-graph-from-postgresql.py for an example of using the database.
+
+As mentioned above, it is easy to capture too much data.  One way to limit the
+data captured is to use 'snapshot' mode which is explained further below.
+Refer to 'new snapshot option' and 'Intel PT modes of operation' further below.
+
+Another problem that will be experienced is decoder errors.  They can be caused
+by inability to access the executed image, self-modified or JIT-ed code, or the
+inability to match side-band information (such as context switches and mmaps)
+which results in the decoder not knowing what code was executed.
+
+There is also the problem of perf not being able to copy the data fast enough,
+resulting in data lost because the buffer was full.  See 'Buffer handling' below
+for more details.
+
+
+perf record
+===========
+
+new event
+---------
+
+The Intel PT kernel driver creates a new PMU for Intel PT.  PMU events are
+selected by providing the PMU name followed by the "config" separated by slashes.
+An enhancement has been made to allow default "config" e.g. the option
+
+	-e intel_pt//
+
+will use a default config value.  Currently that is the same as
+
+	-e intel_pt/tsc,noretcomp=0/
+
+which is the same as
+
+	-e intel_pt/tsc=1,noretcomp=0/
+
+The config terms are listed in /sys/devices/intel_pt/format.  They are bit
+fields within the config member of the struct perf_event_attr which is
+passed to the kernel by the perf_event_open system call.  They correspond to bit
+fields in the IA32_RTIT_CTL MSR.  Here is a list of them and their definitions:
+
+	$ for f in `ls /sys/devices/intel_pt/format`;do
+	> echo $f
+	> cat /sys/devices/intel_pt/format/$f
+	> done
+	noretcomp
+	config:11
+	tsc
+	config:10
+
+Note that the default config must be overridden for each term i.e.
+
+	-e intel_pt/noretcomp=0/
+
+is the same as:
+
+	-e intel_pt/tsc=1,noretcomp=0/
+
+So, to disable TSC packets use:
+
+	-e intel_pt/tsc=0/
+
+It is also possible to specify the config value explicitly:
+
+	-e intel_pt/config=0x400/
+
+Note that, as with all events, the event is suffixed with event modifiers:
+
+	u	userspace
+	k	kernel
+	h	hypervisor
+	G	guest
+	H	host
+	p	precise ip
+
+'h', 'G' and 'H' are for virtualization which is not supported by Intel PT.
+'p' is also not relevant to Intel PT.  So only options 'u' and 'k' are
+meaningful for Intel PT.
+
+perf_event_attr is displayed if the -vv option is used e.g.
+
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             6
+	size                             112
+	config                           0x400
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|CPU|IDENTIFIER
+	read_format                      ID
+	disabled                         1
+	inherit                          1
+	exclude_kernel                   1
+	exclude_hv                       1
+	enable_on_exec                   1
+	sample_id_all                    1
+	------------------------------------------------------------
+	sys_perf_event_open: pid 31104  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 3  group_fd -1  flags 0x8
+	------------------------------------------------------------
+
+
+new snapshot option
+-------------------
+
+To select snapshot mode a new option has been added:
+
+	-S
+
+Optionally it can be followed by the snapshot size e.g.
+
+	-S0x100000
+
+The default snapshot size is the auxtrace mmap size.  If neither auxtrace mmap size
+nor snapshot size is specified, then the default is 4MiB for privileged users
+(or if /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for unprivileged users.
+If an unprivileged user does not specify mmap pages, the mmap pages will be
+reduced as described in the 'new auxtrace mmap size option' section below.
+
+The snapshot size is displayed if the option -vv is used e.g.
+
+	Intel PT snapshot size: %zu
+
+
+new auxtrace mmap size option
+---------------------------
+
+Intel PT buffer size is specified by an addition to the -m option e.g.
+
+	-m,16
+
+selects a buffer size of 16 pages i.e. 64KiB.
+
+Note that the existing functionality of -m is unchanged.  The auxtrace mmap size
+is specified by the optional addition of a comma and the value.
+
+The default auxtrace mmap size for Intel PT is 4MiB/page_size for privileged users
+(or if /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for unprivileged users.
+If an unprivileged user does not specify mmap pages, the mmap pages will be
+reduced from the default 512KiB/page_size to 256KiB/page_size, otherwise the
+user is likely to get an error as they exceed their mlock limit (Max locked
+memory as shown in /proc/self/limits).  Note that perf does not count the first
+512KiB (actually /proc/sys/kernel/perf_event_mlock_kb minus 1 page) per cpu
+against the mlock limit so an unprivileged user is allowed 512KiB per cpu plus
+their mlock limit (which defaults to 64KiB but is not multiplied by the number
+of cpus).
+
+In full-trace mode, powers of two are allowed for buffer size, with a minimum
+size of 2 pages.  In snapshot mode, it is the same but the minimum size is
+1 page.
+
+The mmap size and auxtrace mmap size are displayed if the -vv option is used e.g.
+
+	mmap length 528384
+	auxtrace mmap length 4198400
+
+
+Intel PT modes of operation
+---------------------------
+
+Intel PT can be used in 2 modes:
+	full-trace mode
+	snapshot mode
+
+Full-trace mode traces continuously e.g.
+
+	perf record -e intel_pt//u uname
+
+Snapshot mode captures the available data when a signal is sent e.g.
+
+	perf record -v -e intel_pt//u -S ./loopy 1000000000 &
+	[1] 11435
+	kill -USR2 11435
+	Recording AUX area tracing snapshot
+
+Note that the signal sent is SIGUSR2.
+Note that "Recording AUX area tracing snapshot" is displayed because the -v
+option is used.
+
+The 2 modes cannot be used together.
+
+
+Buffer handling
+---------------
+
+There may be buffer limitations (i.e. single ToPa entry) which means that actual
+buffer sizes are limited to powers of 2 up to 4MiB (MAX_ORDER).  In order to
+provide other sizes, and in particular an arbitrarily large size, multiple
+buffers are logically concatenated.  However an interrupt must be used to switch
+between buffers.  That has two potential problems:
+	a) the interrupt may not be handled in time so that the current buffer
+	becomes full and some trace data is lost.
+	b) the interrupts may slow the system and affect the performance
+	results.
+
+If trace data is lost, the driver sets 'truncated' in the PERF_RECORD_AUX event
+which the tools report as an error.
+
+In full-trace mode, the driver waits for data to be copied out before allowing
+the (logical) buffer to wrap-around.  If data is not copied out quickly enough,
+again 'truncated' is set in the PERF_RECORD_AUX event.  If the driver has to
+wait, the intel_pt event gets disabled.  Because it is difficult to know when
+that happens, perf tools always re-enable the intel_pt event after copying out
+data.
+
+
+Intel PT and build ids
+----------------------
+
+By default "perf record" post-processes the event stream to find all build ids
+for executables for all addresses sampled.  Deliberately, Intel PT is not
+decoded for that purpose (it would take too long).  Instead the build ids for
+all executables encountered (due to mmap, comm or task events) are included
+in the perf.data file.
+
+To see buildids included in the perf.data file use the command:
+
+	perf buildid-list
+
+If the perf.data file contains Intel PT data, that is the same as:
+
+	perf buildid-list --with-hits
+
+
+Snapshot mode and event disabling
+---------------------------------
+
+In order to make a snapshot, the intel_pt event is disabled using an IOCTL,
+namely PERF_EVENT_IOC_DISABLE.  However doing that can also disable the
+collection of side-band information.  In order to prevent that,  a dummy
+software event has been introduced that permits tracking events (like mmaps) to
+continue to be recorded while intel_pt is disabled.  That is important to ensure
+there is complete side-band information to allow the decoding of subsequent
+snapshots.
+
+A test has been created for that.  To find the test:
+
+	perf test list
+	...
+	23: Test using a dummy software event to keep tracking
+
+To run the test:
+
+	perf test 23
+	23: Test using a dummy software event to keep tracking     : Ok
+
+
+perf record modes (nothing new here)
+------------------------------------
+
+perf record essentially operates in one of three modes:
+	per thread
+	per cpu
+	workload only
+
+"per thread" mode is selected by -t or by --per-thread (with -p or -u or just a
+workload).
+"per cpu" is selected by -C or -a.
+"workload only" mode is selected by not using the other options but providing a
+command to run (i.e. the workload).
+
+In per-thread mode an exact list of threads is traced.  There is no inheritance.
+Each thread has its own event buffer.
+
+In per-cpu mode all processes (or processes from the selected cgroup i.e. -G
+option, or processes selected with -p or -u) are traced.  Each cpu has its own
+buffer. Inheritance is allowed.
+
+In workload-only mode, the workload is traced but with per-cpu buffers.
+Inheritance is allowed.  Note that you can now trace a workload in per-thread
+mode by using the --per-thread option.
+
+
+Privileged vs non-privileged users
+----------------------------------
+
+Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users
+have memory limits imposed upon them.  That affects what buffer sizes they can
+have as outlined above.
+
+Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users are
+not permitted to use tracepoints which means there is insufficient side-band
+information to decode Intel PT in per-cpu mode, and potentially workload-only
+mode too if the workload creates new processes.
+
+Note also, that to use tracepoints, read-access to debugfs is required.  So if
+debugfs is not mounted or the user does not have read-access, it will again not
+be possible to decode Intel PT in per-cpu mode.
+
+
+sched_switch tracepoint
+-----------------------
+
+The sched_switch tracepoint is used to provide side-band data for Intel PT
+decoding.  sched_switch events are automatically added. e.g. the second event
+shown below
+
+	$ perf record -vv -e intel_pt//u uname
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             6
+	size                             112
+	config                           0x400
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|CPU|IDENTIFIER
+	read_format                      ID
+	disabled                         1
+	inherit                          1
+	exclude_kernel                   1
+	exclude_hv                       1
+	enable_on_exec                   1
+	sample_id_all                    1
+	------------------------------------------------------------
+	sys_perf_event_open: pid 31104  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 3  group_fd -1  flags 0x8
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             2
+	size                             112
+	config                           0x108
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
+	read_format                      ID
+	inherit                          1
+	sample_id_all                    1
+	exclude_guest                    1
+	------------------------------------------------------------
+	sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             1
+	size                             112
+	config                           0x9
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|IDENTIFIER
+	read_format                      ID
+	disabled                         1
+	inherit                          1
+	exclude_kernel                   1
+	exclude_hv                       1
+	mmap                             1
+	comm                             1
+	enable_on_exec                   1
+	task                             1
+	sample_id_all                    1
+	mmap2                            1
+	comm_exec                        1
+	------------------------------------------------------------
+	sys_perf_event_open: pid 31104  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 3  group_fd -1  flags 0x8
+	mmap size 528384B
+	AUX area mmap length 4194304
+	perf event ring buffer mmapped per cpu
+	Synthesizing auxtrace information
+	Linux
+	[ perf record: Woken up 1 times to write data ]
+	[ perf record: Captured and wrote 0.042 MB perf.data ]
+
+Note, the sched_switch event is only added if the user is permitted to use it
+and only in per-cpu mode.
+
+Note also, the sched_switch event is only added if TSC packets are requested.
+That is because, in the absence of timing information, the sched_switch events
+cannot be matched against the Intel PT trace.
+
+
+perf script
+===========
+
+By default, perf script will decode trace data found in the perf.data file.
+This can be further controlled by new option --itrace.
+
+
+New --itrace option
+-------------------
+
+Having no option is the same as
+
+	--itrace
+
+which, in turn, is the same as
+
+	--itrace=ibxe
+
+The letters are:
+
+	i	synthesize "instructions" events
+	b	synthesize "branches" events
+	x	synthesize "transactions" events
+	c	synthesize branches events (calls only)
+	r	synthesize branches events (returns only)
+	e	synthesize tracing error events
+	d	create a debug log
+	g	synthesize a call chain (use with i or x)
+
+"Instructions" events look like they were recorded by "perf record -e
+instructions".
+
+"Branches" events look like they were recorded by "perf record -e branches". "c"
+and "r" can be combined to get calls and returns.
+
+"Transactions" events correspond to the start or end of transactions. The
+'flags' field can be used in perf script to determine whether the event is a
+tranasaction start, commit or abort.
+
+Error events are new.  They show where the decoder lost the trace.  Error events
+are quite important.  Users must know if what they are seeing is a complete
+picture or not.
+
+The "d" option will cause the creation of a file "intel_pt.log" containing all
+decoded packets and instructions.  Note that this option slows down the decoder
+and that the resulting file may be very large.
+
+In addition, the period of the "instructions" event can be specified. e.g.
+
+	--itrace=i10us
+
+sets the period to 10us i.e. one  instruction sample is synthesized for each 10
+microseconds of trace.  Alternatives to "us" are "ms" (milliseconds),
+"ns" (nanoseconds), "t" (TSC ticks) or "i" (instructions).
+
+"ms", "us" and "ns" are converted to TSC ticks.
+
+The timing information included with Intel PT does not give the time of every
+instruction.  Consequently, for the purpose of sampling, the decoder estimates
+the time since the last timing packet based on 1 tick per instruction.  The time
+on the sample is *not* adjusted and reflects the last known value of TSC.
+
+For Intel PT, the default period is 100us.
+
+Also the call chain size (default 16, max. 1024) for instructions or
+transactions events can be specified. e.g.
+
+	--itrace=ig32
+	--itrace=xg32
+
+To disable trace decoding entirely, use the option --no-itrace.
+
+
+dump option
+-----------
+
+perf script has an option (-D) to "dump" the events i.e. display the binary
+data.
+
+When -D is used, Intel PT packets are displayed.  The packet decoder does not
+pay attention to PSB packets, but just decodes the bytes - so the packets seen
+by the actual decoder may not be identical in places where the data is corrupt.
+One example of that would be when the buffer-switching interrupt has been too
+slow, and the buffer has been filled completely.  In that case, the last packet
+in the buffer might be truncated and immediately followed by a PSB as the trace
+continues in the next buffer.
+
+To disable the display of Intel PT packets, combine the -D option with
+--no-itrace.
+
+
+perf report
+===========
+
+By default, perf report will decode trace data found in the perf.data file.
+This can be further controlled by new option --itrace exactly the same as
+perf script, with the exception that the default is --itrace=igxe.
+
+
+perf inject
+===========
+
+perf inject also accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+	perf inject --itrace -i perf.data -o perf.data.new
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 139608878888..a8be9f9d0462 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -1,5 +1,6 @@
 libperf-y += header.o
 libperf-y += tsc.o
+libperf-y += pmu.o
 libperf-y += kvm-stat.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
@@ -7,4 +8,5 @@ libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_LIBUNWIND)          += unwind-libunwind.o
 libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 
+libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
new file mode 100644
index 000000000000..e7654b506312
--- /dev/null
+++ b/tools/perf/arch/x86/util/auxtrace.c
@@ -0,0 +1,38 @@
+/*
+ * auxtrace.c: AUX area tracing support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include "../../util/header.h"
+#include "../../util/auxtrace.h"
+#include "../../util/intel-pt.h"
+
+struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe_unused,
+					      int *err)
+{
+	char buffer[64];
+	int ret;
+
+	*err = 0;
+
+	ret = get_cpuid(buffer, sizeof(buffer));
+	if (ret) {
+		*err = ret;
+		return NULL;
+	}
+
+	if (!strncmp(buffer, "GenuineIntel,", 13))
+		return intel_pt_recording_init(err);
+
+	return NULL;
+}
diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
new file mode 100644
index 000000000000..fd11cc3ce780
--- /dev/null
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -0,0 +1,15 @@
+#include <string.h>
+
+#include <linux/perf_event.h>
+
+#include "../../util/intel-pt.h"
+#include "../../util/pmu.h"
+
+struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
+{
+#ifdef HAVE_AUXTRACE_SUPPORT
+	if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
+		return intel_pt_pmu_default_config(pmu);
+#endif
+	return NULL;
+}
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index ea29022a4f52..1b3cc297290d 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -47,6 +47,8 @@
 #include "debug.h"
 #include "parse-options.h"
 
+#include "intel-pt.h"
+
 int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
 			struct auxtrace_mmap_params *mp,
 			void *userpg, int fd)
@@ -876,7 +878,7 @@ static bool auxtrace__dont_decode(struct perf_session *session)
 
 int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 				      union perf_event *event,
-				      struct perf_session *session __maybe_unused)
+				      struct perf_session *session)
 {
 	enum auxtrace_type type = event->auxtrace_info.type;
 
@@ -885,6 +887,7 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 
 	switch (type) {
 	case PERF_AUXTRACE_INTEL_PT:
+		return intel_pt_process_auxtrace_info(event, session);
 	case PERF_AUXTRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 7bcb8c315615..b5ea26bc290d 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -462,8 +462,8 @@ static struct perf_pmu *pmu_lookup(const char *name)
 	LIST_HEAD(aliases);
 	__u32 type;
 
-	/* No support for intel_bts or intel_pt so disallow them */
-	if (!strcmp(name, "intel_bts") || !strcmp(name, "intel_pt"))
+	/* No support for intel_bts so disallow it */
+	if (!strcmp(name, "intel_bts"))
 		return NULL;
 
 	/*
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [PATCH 8/8] perf tools: Add Intel BTS support
  2015-06-26 22:02 [GIT PULL 0/8] perf/pt -> Intel PT/BTS Arnaldo Carvalho de Melo
                   ` (6 preceding siblings ...)
  2015-06-26 22:02 ` [PATCH 7/8] perf tools: Take Intel PT into use Arnaldo Carvalho de Melo
@ 2015-06-26 22:02 ` Arnaldo Carvalho de Melo
  2015-06-30  4:58 ` [GIT PULL 0/8] perf/pt -> Intel PT/BTS Ingo Molnar
  8 siblings, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-26 22:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: linux-kernel, Adrian Hunter, David Ahern, Namhyung Kim,
	Peter Zijlstra, Stephane Eranian, Jiri Olsa,
	Arnaldo Carvalho de Melo

From: Adrian Hunter <adrian.hunter@intel.com>

Intel BTS support fits within the new auxtrace infrastructure.
Recording is supporting by identifying the Intel BTS PMU, parsing
options and setting up events.

Decoding is supported by queuing up trace data by thread and then
decoding synchronously delivering synthesized event samples into the
session processing for tools to consume.

E.g:

  [root@zoo ~]# perf record --per-thread -e intel_bts// ls
  anaconda-ks.cfg  b  bin  lib64	libexec  new  old  perf.data
  perf.data.old  stream_test  tg.run
  [ perf record: Woken up 2 times to write data ]
  [ perf record: Captured and wrote 4.242 MB perf.data ]
  [root@zoo ~]# perf evlist
  intel_bts//
  dummy:u
  [root@zoo ~]# perf evlist -v
  intel_bts//: type: 7, size: 112, { sample_period, sample_freq }: 1,
  sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1,
  enable_on_exec: 1, sample_id_all: 1, exclude_guest: 1 dummy:u:
  type: 1, size: 112, config: 0x9, { sample_period, sample_freq }: 1,
  sample_type: IP|TID|IDENTIFIER, read_format: ID, disabled: 1,
  exclude_kernel: 1, exclude_hv: 1, mmap: 1, comm: 1, enable_on_exec: 1,
  task: 1, sample_id_all: 1, mmap2: 1, comm_exec: 1
  [root@zoo ~]# perf report --stdio
  # To display the perf.data header info, please use --header/--header-only options.
  #
  #
  # Total Lost Samples: 0
  #
  # Samples: 0  of event 'intel_bts//'
  # Event count (approx.): 0
  #
  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ......
  #

  # Samples: 0  of event 'dummy:u'
  # Event count (approx.): 0
  #
  # Overhead  Command  Shared Object  Symbol
  # ........  .......  .............  ......
  #

  # Samples: 184K of event 'branches'
  # Event count (approx.): 184522
  #
  # Overhead  Command  Shared Object       Symbol
  # ........  .......  ..................  ..............................................
  #
     7.36%  ls       [kernel.kallsyms]   [.] unmap_single_vma
     4.69%  ls       ld-2.20.so          [.] strcmp
     4.66%  ls       libc-2.20.so        [.] _dl_addr
     3.43%  ls       [kernel.kallsyms]   [.] change_protection
     3.01%  ls       ld-2.20.so          [.] do_lookup_x
     2.18%  ls       [kernel.kallsyms]   [.] filemap_map_pages
     2.09%  ls       ld-2.20.so          [.] _dl_name_match_p
     1.93%  ls       ld-2.20.so          [.] _dl_lookup_symbol_x
     1.91%  ls       [kernel.kallsyms]   [.] page_remove_rmap
     1.72%  ls       [kernel.kallsyms]   [.] do_set_pte
     1.48%  ls       ld-2.20.so          [.] _dl_relocate_object
     1.34%  ls       [kernel.kallsyms]   [.] mem_cgroup_begin_page_stat
     1.11%  ls       [kernel.kallsyms]   [.] page_add_file_rmap
     1.00%  ls       [kernel.kallsyms]   [.] mark_page_accessed
     0.97%  ls       [kernel.kallsyms]   [.] perf_event_aux
     0.95%  ls       [kernel.kallsyms]   [.] handle_mm_fault
     0.94%  ls       [kernel.kallsyms]   [.] get_page_from_freelist
  <SNIP>
  [root@zoo ~]# perf record --per-thread -e intel_bts//u ls
  <SNIP>
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 1.278 MB perf.data ]
  [root@zoo ~]# perf report --stdio
  <SNIP>
  # Samples: 55K of event 'branches:u'
  # Event count (approx.): 55165
  #
  # Overhead  Command  Shared Object       Symbol
  # ........  .......  ..................  ......................................
  #
    15.69%  ls       ld-2.20.so          [.] strcmp
    15.58%  ls       libc-2.20.so        [.] _dl_addr
    10.05%  ls       ld-2.20.so          [.] do_lookup_x
     6.98%  ls       ld-2.20.so          [.] _dl_name_match_p
     6.46%  ls       ld-2.20.so          [.] _dl_lookup_symbol_x
     4.95%  ls       ld-2.20.so          [.] _dl_relocate_object
     2.96%  ls       ls                  [.] quotearg_buffer_restyled
     2.78%  ls       libc-2.20.so        [.] getenv
     1.91%  ls       ld-2.20.so          [.] _dl_cache_libcmp
     1.76%  ls       libc-2.20.so        [.] __memmove_sse2
     1.75%  ls       ld-2.20.so          [.] check_match.isra.0
     1.47%  ls       ld-2.20.so          [.] _dl_map_object_deps
     1.27%  ls       ld-2.20.so          [.] _dl_map_object_from_fd
     1.17%  ls       ls                  [.] quote_name
     1.16%  ls       ld-2.20.so          [.] _dl_check_map_versions
   <SNIP>
     0.19%  ls       [kernel.kallsyms]   [.] entry_SYSCALL_64_fastpath
     0.19%  ls       [kernel.kallsyms]   [.] native_irq_return_iret
     0.19%  ls       libc-2.20.so        [.] _IO_file_xsputn@@GLIBC_2.2.5
   <SNIP>

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1432906425-9911-12-git-send-email-adrian.hunter@intel.com
[ Add a fix for an infinite loop in intel_bts_process_buffer
  misplaced in a followup patch in the original patchkit ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/Documentation/intel-bts.txt |  86 ++++
 tools/perf/arch/x86/util/Build         |   1 +
 tools/perf/arch/x86/util/auxtrace.c    |  49 +-
 tools/perf/arch/x86/util/intel-bts.c   | 458 +++++++++++++++++++
 tools/perf/arch/x86/util/pmu.c         |   3 +
 tools/perf/util/Build                  |   1 +
 tools/perf/util/auxtrace.c             |   3 +
 tools/perf/util/auxtrace.h             |   1 +
 tools/perf/util/intel-bts.c            | 791 +++++++++++++++++++++++++++++++++
 tools/perf/util/intel-bts.h            |  43 ++
 tools/perf/util/pmu.c                  |   4 -
 11 files changed, 1434 insertions(+), 6 deletions(-)
 create mode 100644 tools/perf/Documentation/intel-bts.txt
 create mode 100644 tools/perf/arch/x86/util/intel-bts.c
 create mode 100644 tools/perf/util/intel-bts.c
 create mode 100644 tools/perf/util/intel-bts.h

diff --git a/tools/perf/Documentation/intel-bts.txt b/tools/perf/Documentation/intel-bts.txt
new file mode 100644
index 000000000000..8bdc93bd7fdb
--- /dev/null
+++ b/tools/perf/Documentation/intel-bts.txt
@@ -0,0 +1,86 @@
+Intel Branch Trace Store
+========================
+
+Overview
+========
+
+Intel BTS could be regarded as a predecessor to Intel PT and has some
+similarities because it can also identify every branch a program takes.  A
+notable difference is that Intel BTS has no timing information and as a
+consequence the present implementation is limited to per-thread recording.
+
+While decoding Intel BTS does not require walking the object code, the object
+code is still needed to pair up calls and returns correctly, consequently much
+of the Intel PT documentation applies also to Intel BTS.  Refer to the Intel PT
+documentation and consider that the PMU 'intel_bts' can usually be used in
+place of 'intel_pt' in the examples provided, with the proviso that per-thread
+recording must also be stipulated i.e. the --per-thread option for
+'perf record'.
+
+
+perf record
+===========
+
+new event
+---------
+
+The Intel BTS kernel driver creates a new PMU for Intel BTS.  The perf record
+option is:
+
+	-e intel_bts//
+
+Currently Intel BTS is limited to per-thread tracing so the --per-thread option
+is also needed.
+
+
+snapshot option
+---------------
+
+The snapshot option is the same as Intel PT (refer Intel PT documentation).
+
+
+auxtrace mmap size option
+-----------------------
+
+The mmap size option is the same as Intel PT (refer Intel PT documentation).
+
+
+perf script
+===========
+
+By default, perf script will decode trace data found in the perf.data file.
+This can be further controlled by option --itrace.  The --itrace option is
+the same as Intel PT (refer Intel PT documentation) except that neither
+"instructions" events nor "transactions" events (and consequently call
+chains) are supported.
+
+To disable trace decoding entirely, use the option --no-itrace.
+
+
+dump option
+-----------
+
+perf script has an option (-D) to "dump" the events i.e. display the binary
+data.
+
+When -D is used, Intel BTS packets are displayed.
+
+To disable the display of Intel BTS packets, combine the -D option with
+--no-itrace.
+
+
+perf report
+===========
+
+By default, perf report will decode trace data found in the perf.data file.
+This can be further controlled by new option --itrace exactly the same as
+perf script.
+
+
+perf inject
+===========
+
+perf inject also accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+	perf inject --itrace -i perf.data -o perf.data.new
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index a8be9f9d0462..2c55e1b336c5 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -10,3 +10,4 @@ libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
+libperf-$(CONFIG_AUXTRACE) += intel-bts.o
diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
index e7654b506312..7a7805583e3f 100644
--- a/tools/perf/arch/x86/util/auxtrace.c
+++ b/tools/perf/arch/x86/util/auxtrace.c
@@ -13,11 +13,56 @@
  *
  */
 
+#include <stdbool.h>
+
 #include "../../util/header.h"
+#include "../../util/debug.h"
+#include "../../util/pmu.h"
 #include "../../util/auxtrace.h"
 #include "../../util/intel-pt.h"
+#include "../../util/intel-bts.h"
+#include "../../util/evlist.h"
+
+static
+struct auxtrace_record *auxtrace_record__init_intel(struct perf_evlist *evlist,
+						    int *err)
+{
+	struct perf_pmu *intel_pt_pmu;
+	struct perf_pmu *intel_bts_pmu;
+	struct perf_evsel *evsel;
+	bool found_pt = false;
+	bool found_bts = false;
+
+	intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
+	intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
+
+	if (evlist) {
+		evlist__for_each(evlist, evsel) {
+			if (intel_pt_pmu &&
+			    evsel->attr.type == intel_pt_pmu->type)
+				found_pt = true;
+			if (intel_bts_pmu &&
+			    evsel->attr.type == intel_bts_pmu->type)
+				found_bts = true;
+		}
+	}
+
+	if (found_pt && found_bts) {
+		pr_err("intel_pt and intel_bts may not be used together\n");
+		*err = -EINVAL;
+		return NULL;
+	}
+
+	if (found_pt)
+		return intel_pt_recording_init(err);
+
+	if (found_bts)
+		return intel_bts_recording_init(err);
 
-struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe_unused,
+	return NULL;
+}
+
+struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist,
 					      int *err)
 {
 	char buffer[64];
@@ -32,7 +77,7 @@ struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe
 	}
 
 	if (!strncmp(buffer, "GenuineIntel,", 13))
-		return intel_pt_recording_init(err);
+		return auxtrace_record__init_intel(evlist, err);
 
 	return NULL;
 }
diff --git a/tools/perf/arch/x86/util/intel-bts.c b/tools/perf/arch/x86/util/intel-bts.c
new file mode 100644
index 000000000000..9b94ce520917
--- /dev/null
+++ b/tools/perf/arch/x86/util/intel-bts.c
@@ -0,0 +1,458 @@
+/*
+ * intel-bts.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "../../util/cpumap.h"
+#include "../../util/evsel.h"
+#include "../../util/evlist.h"
+#include "../../util/session.h"
+#include "../../util/util.h"
+#include "../../util/pmu.h"
+#include "../../util/debug.h"
+#include "../../util/tsc.h"
+#include "../../util/auxtrace.h"
+#include "../../util/intel-bts.h"
+
+#define KiB(x) ((x) * 1024)
+#define MiB(x) ((x) * 1024 * 1024)
+#define KiB_MASK(x) (KiB(x) - 1)
+#define MiB_MASK(x) (MiB(x) - 1)
+
+#define INTEL_BTS_DFLT_SAMPLE_SIZE	KiB(4)
+
+#define INTEL_BTS_MAX_SAMPLE_SIZE	KiB(60)
+
+struct intel_bts_snapshot_ref {
+	void	*ref_buf;
+	size_t	ref_offset;
+	bool	wrapped;
+};
+
+struct intel_bts_recording {
+	struct auxtrace_record		itr;
+	struct perf_pmu			*intel_bts_pmu;
+	struct perf_evlist		*evlist;
+	bool				snapshot_mode;
+	size_t				snapshot_size;
+	int				snapshot_ref_cnt;
+	struct intel_bts_snapshot_ref	*snapshot_refs;
+};
+
+struct branch {
+	u64 from;
+	u64 to;
+	u64 misc;
+};
+
+static size_t intel_bts_info_priv_size(struct auxtrace_record *itr __maybe_unused)
+{
+	return INTEL_BTS_AUXTRACE_PRIV_SIZE;
+}
+
+static int intel_bts_info_fill(struct auxtrace_record *itr,
+			       struct perf_session *session,
+			       struct auxtrace_info_event *auxtrace_info,
+			       size_t priv_size)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
+	struct perf_event_mmap_page *pc;
+	struct perf_tsc_conversion tc = { .time_mult = 0, };
+	bool cap_user_time_zero = false;
+	int err;
+
+	if (priv_size != INTEL_BTS_AUXTRACE_PRIV_SIZE)
+		return -EINVAL;
+
+	if (!session->evlist->nr_mmaps)
+		return -EINVAL;
+
+	pc = session->evlist->mmap[0].base;
+	if (pc) {
+		err = perf_read_tsc_conversion(pc, &tc);
+		if (err) {
+			if (err != -EOPNOTSUPP)
+				return err;
+		} else {
+			cap_user_time_zero = tc.time_mult != 0;
+		}
+		if (!cap_user_time_zero)
+			ui__warning("Intel BTS: TSC not available\n");
+	}
+
+	auxtrace_info->type = PERF_AUXTRACE_INTEL_BTS;
+	auxtrace_info->priv[INTEL_BTS_PMU_TYPE] = intel_bts_pmu->type;
+	auxtrace_info->priv[INTEL_BTS_TIME_SHIFT] = tc.time_shift;
+	auxtrace_info->priv[INTEL_BTS_TIME_MULT] = tc.time_mult;
+	auxtrace_info->priv[INTEL_BTS_TIME_ZERO] = tc.time_zero;
+	auxtrace_info->priv[INTEL_BTS_CAP_USER_TIME_ZERO] = cap_user_time_zero;
+	auxtrace_info->priv[INTEL_BTS_SNAPSHOT_MODE] = btsr->snapshot_mode;
+
+	return 0;
+}
+
+static int intel_bts_recording_options(struct auxtrace_record *itr,
+				       struct perf_evlist *evlist,
+				       struct record_opts *opts)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
+	struct perf_evsel *evsel, *intel_bts_evsel = NULL;
+	const struct cpu_map *cpus = evlist->cpus;
+	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
+
+	btsr->evlist = evlist;
+	btsr->snapshot_mode = opts->auxtrace_snapshot_mode;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == intel_bts_pmu->type) {
+			if (intel_bts_evsel) {
+				pr_err("There may be only one " INTEL_BTS_PMU_NAME " event\n");
+				return -EINVAL;
+			}
+			evsel->attr.freq = 0;
+			evsel->attr.sample_period = 1;
+			intel_bts_evsel = evsel;
+			opts->full_auxtrace = true;
+		}
+	}
+
+	if (opts->auxtrace_snapshot_mode && !opts->full_auxtrace) {
+		pr_err("Snapshot mode (-S option) requires " INTEL_BTS_PMU_NAME " PMU event (-e " INTEL_BTS_PMU_NAME ")\n");
+		return -EINVAL;
+	}
+
+	if (!opts->full_auxtrace)
+		return 0;
+
+	if (opts->full_auxtrace && !cpu_map__empty(cpus)) {
+		pr_err(INTEL_BTS_PMU_NAME " does not support per-cpu recording\n");
+		return -EINVAL;
+	}
+
+	/* Set default sizes for snapshot mode */
+	if (opts->auxtrace_snapshot_mode) {
+		if (!opts->auxtrace_snapshot_size && !opts->auxtrace_mmap_pages) {
+			if (privileged) {
+				opts->auxtrace_mmap_pages = MiB(4) / page_size;
+			} else {
+				opts->auxtrace_mmap_pages = KiB(128) / page_size;
+				if (opts->mmap_pages == UINT_MAX)
+					opts->mmap_pages = KiB(256) / page_size;
+			}
+		} else if (!opts->auxtrace_mmap_pages && !privileged &&
+			   opts->mmap_pages == UINT_MAX) {
+			opts->mmap_pages = KiB(256) / page_size;
+		}
+		if (!opts->auxtrace_snapshot_size)
+			opts->auxtrace_snapshot_size =
+				opts->auxtrace_mmap_pages * (size_t)page_size;
+		if (!opts->auxtrace_mmap_pages) {
+			size_t sz = opts->auxtrace_snapshot_size;
+
+			sz = round_up(sz, page_size) / page_size;
+			opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
+		}
+		if (opts->auxtrace_snapshot_size >
+				opts->auxtrace_mmap_pages * (size_t)page_size) {
+			pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
+			       opts->auxtrace_snapshot_size,
+			       opts->auxtrace_mmap_pages * (size_t)page_size);
+			return -EINVAL;
+		}
+		if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages) {
+			pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
+			return -EINVAL;
+		}
+		pr_debug2("Intel BTS snapshot size: %zu\n",
+			  opts->auxtrace_snapshot_size);
+	}
+
+	/* Set default sizes for full trace mode */
+	if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) {
+		if (privileged) {
+			opts->auxtrace_mmap_pages = MiB(4) / page_size;
+		} else {
+			opts->auxtrace_mmap_pages = KiB(128) / page_size;
+			if (opts->mmap_pages == UINT_MAX)
+				opts->mmap_pages = KiB(256) / page_size;
+		}
+	}
+
+	/* Validate auxtrace_mmap_pages */
+	if (opts->auxtrace_mmap_pages) {
+		size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
+		size_t min_sz;
+
+		if (opts->auxtrace_snapshot_mode)
+			min_sz = KiB(4);
+		else
+			min_sz = KiB(8);
+
+		if (sz < min_sz || !is_power_of_2(sz)) {
+			pr_err("Invalid mmap size for Intel BTS: must be at least %zuKiB and a power of 2\n",
+			       min_sz / 1024);
+			return -EINVAL;
+		}
+	}
+
+	if (intel_bts_evsel) {
+		/*
+		 * To obtain the auxtrace buffer file descriptor, the auxtrace event
+		 * must come first.
+		 */
+		perf_evlist__to_front(evlist, intel_bts_evsel);
+		/*
+		 * In the case of per-cpu mmaps, we need the CPU on the
+		 * AUX event.
+		 */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(intel_bts_evsel, CPU);
+	}
+
+	/* Add dummy event to keep tracking */
+	if (opts->full_auxtrace) {
+		struct perf_evsel *tracking_evsel;
+		int err;
+
+		err = parse_events(evlist, "dummy:u", NULL);
+		if (err)
+			return err;
+
+		tracking_evsel = perf_evlist__last(evlist);
+
+		perf_evlist__set_tracking_event(evlist, tracking_evsel);
+
+		tracking_evsel->attr.freq = 0;
+		tracking_evsel->attr.sample_period = 1;
+	}
+
+	return 0;
+}
+
+static int intel_bts_parse_snapshot_options(struct auxtrace_record *itr,
+					    struct record_opts *opts,
+					    const char *str)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	unsigned long long snapshot_size = 0;
+	char *endptr;
+
+	if (str) {
+		snapshot_size = strtoull(str, &endptr, 0);
+		if (*endptr || snapshot_size > SIZE_MAX)
+			return -1;
+	}
+
+	opts->auxtrace_snapshot_mode = true;
+	opts->auxtrace_snapshot_size = snapshot_size;
+
+	btsr->snapshot_size = snapshot_size;
+
+	return 0;
+}
+
+static u64 intel_bts_reference(struct auxtrace_record *itr __maybe_unused)
+{
+	return rdtsc();
+}
+
+static int intel_bts_alloc_snapshot_refs(struct intel_bts_recording *btsr,
+					 int idx)
+{
+	const size_t sz = sizeof(struct intel_bts_snapshot_ref);
+	int cnt = btsr->snapshot_ref_cnt, new_cnt = cnt * 2;
+	struct intel_bts_snapshot_ref *refs;
+
+	if (!new_cnt)
+		new_cnt = 16;
+
+	while (new_cnt <= idx)
+		new_cnt *= 2;
+
+	refs = calloc(new_cnt, sz);
+	if (!refs)
+		return -ENOMEM;
+
+	memcpy(refs, btsr->snapshot_refs, cnt * sz);
+
+	btsr->snapshot_refs = refs;
+	btsr->snapshot_ref_cnt = new_cnt;
+
+	return 0;
+}
+
+static void intel_bts_free_snapshot_refs(struct intel_bts_recording *btsr)
+{
+	int i;
+
+	for (i = 0; i < btsr->snapshot_ref_cnt; i++)
+		zfree(&btsr->snapshot_refs[i].ref_buf);
+	zfree(&btsr->snapshot_refs);
+}
+
+static void intel_bts_recording_free(struct auxtrace_record *itr)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+
+	intel_bts_free_snapshot_refs(btsr);
+	free(btsr);
+}
+
+static int intel_bts_snapshot_start(struct auxtrace_record *itr)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(btsr->evlist, evsel) {
+		if (evsel->attr.type == btsr->intel_bts_pmu->type)
+			return perf_evlist__disable_event(btsr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_bts_snapshot_finish(struct auxtrace_record *itr)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(btsr->evlist, evsel) {
+		if (evsel->attr.type == btsr->intel_bts_pmu->type)
+			return perf_evlist__enable_event(btsr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static bool intel_bts_first_wrap(u64 *data, size_t buf_size)
+{
+	int i, a, b;
+
+	b = buf_size >> 3;
+	a = b - 512;
+	if (a < 0)
+		a = 0;
+
+	for (i = a; i < b; i++) {
+		if (data[i])
+			return true;
+	}
+
+	return false;
+}
+
+static int intel_bts_find_snapshot(struct auxtrace_record *itr, int idx,
+				   struct auxtrace_mmap *mm, unsigned char *data,
+				   u64 *head, u64 *old)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	bool wrapped;
+	int err;
+
+	pr_debug3("%s: mmap index %d old head %zu new head %zu\n",
+		  __func__, idx, (size_t)*old, (size_t)*head);
+
+	if (idx >= btsr->snapshot_ref_cnt) {
+		err = intel_bts_alloc_snapshot_refs(btsr, idx);
+		if (err)
+			goto out_err;
+	}
+
+	wrapped = btsr->snapshot_refs[idx].wrapped;
+	if (!wrapped && intel_bts_first_wrap((u64 *)data, mm->len)) {
+		btsr->snapshot_refs[idx].wrapped = true;
+		wrapped = true;
+	}
+
+	/*
+	 * In full trace mode 'head' continually increases.  However in snapshot
+	 * mode 'head' is an offset within the buffer.  Here 'old' and 'head'
+	 * are adjusted to match the full trace case which expects that 'old' is
+	 * always less than 'head'.
+	 */
+	if (wrapped) {
+		*old = *head;
+		*head += mm->len;
+	} else {
+		if (mm->mask)
+			*old &= mm->mask;
+		else
+			*old %= mm->len;
+		if (*old > *head)
+			*head += mm->len;
+	}
+
+	pr_debug3("%s: wrap-around %sdetected, adjusted old head %zu adjusted new head %zu\n",
+		  __func__, wrapped ? "" : "not ", (size_t)*old, (size_t)*head);
+
+	return 0;
+
+out_err:
+	pr_err("%s: failed, error %d\n", __func__, err);
+	return err;
+}
+
+static int intel_bts_read_finish(struct auxtrace_record *itr, int idx)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(btsr->evlist, evsel) {
+		if (evsel->attr.type == btsr->intel_bts_pmu->type)
+			return perf_evlist__enable_event_idx(btsr->evlist,
+							     evsel, idx);
+	}
+	return -EINVAL;
+}
+
+struct auxtrace_record *intel_bts_recording_init(int *err)
+{
+	struct perf_pmu *intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
+	struct intel_bts_recording *btsr;
+
+	if (!intel_bts_pmu)
+		return NULL;
+
+	btsr = zalloc(sizeof(struct intel_bts_recording));
+	if (!btsr) {
+		*err = -ENOMEM;
+		return NULL;
+	}
+
+	btsr->intel_bts_pmu = intel_bts_pmu;
+	btsr->itr.recording_options = intel_bts_recording_options;
+	btsr->itr.info_priv_size = intel_bts_info_priv_size;
+	btsr->itr.info_fill = intel_bts_info_fill;
+	btsr->itr.free = intel_bts_recording_free;
+	btsr->itr.snapshot_start = intel_bts_snapshot_start;
+	btsr->itr.snapshot_finish = intel_bts_snapshot_finish;
+	btsr->itr.find_snapshot = intel_bts_find_snapshot;
+	btsr->itr.parse_snapshot_options = intel_bts_parse_snapshot_options;
+	btsr->itr.reference = intel_bts_reference;
+	btsr->itr.read_finish = intel_bts_read_finish;
+	btsr->itr.alignment = sizeof(struct branch);
+	return &btsr->itr;
+}
diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
index fd11cc3ce780..79fe07158d00 100644
--- a/tools/perf/arch/x86/util/pmu.c
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -3,6 +3,7 @@
 #include <linux/perf_event.h>
 
 #include "../../util/intel-pt.h"
+#include "../../util/intel-bts.h"
 #include "../../util/pmu.h"
 
 struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
@@ -10,6 +11,8 @@ struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __mayb
 #ifdef HAVE_AUXTRACE_SUPPORT
 	if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
 		return intel_pt_pmu_default_config(pmu);
+	if (!strcmp(pmu->name, INTEL_BTS_PMU_NAME))
+		pmu->selectable = true;
 #endif
 	return NULL;
 }
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 296de661933f..09340834fbd1 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -78,6 +78,7 @@ libperf-y += thread-stack.o
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
+libperf-$(CONFIG_AUXTRACE) += intel-bts.o
 libperf-y += parse-branch-options.o
 
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 1b3cc297290d..3f3b40fa87c8 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -48,6 +48,7 @@
 #include "parse-options.h"
 
 #include "intel-pt.h"
+#include "intel-bts.h"
 
 int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
 			struct auxtrace_mmap_params *mp,
@@ -888,6 +889,8 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 	switch (type) {
 	case PERF_AUXTRACE_INTEL_PT:
 		return intel_pt_process_auxtrace_info(event, session);
+	case PERF_AUXTRACE_INTEL_BTS:
+		return intel_bts_process_auxtrace_info(event, session);
 	case PERF_AUXTRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 7d12f33a3a06..bf72b77a588a 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -40,6 +40,7 @@ struct events_stats;
 enum auxtrace_type {
 	PERF_AUXTRACE_UNKNOWN,
 	PERF_AUXTRACE_INTEL_PT,
+	PERF_AUXTRACE_INTEL_BTS,
 };
 
 enum itrace_period_type {
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
new file mode 100644
index 000000000000..68bb6fede55b
--- /dev/null
+++ b/tools/perf/util/intel-bts.c
@@ -0,0 +1,791 @@
+/*
+ * intel-bts.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <endian.h>
+#include <byteswap.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "cpumap.h"
+#include "color.h"
+#include "evsel.h"
+#include "evlist.h"
+#include "machine.h"
+#include "session.h"
+#include "util.h"
+#include "debug.h"
+#include "tsc.h"
+#include "auxtrace.h"
+#include "intel-bts.h"
+
+#define MAX_TIMESTAMP (~0ULL)
+
+#define INTEL_BTS_ERR_NOINSN  5
+#define INTEL_BTS_ERR_LOST    9
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define le64_to_cpu bswap_64
+#else
+#define le64_to_cpu
+#endif
+
+struct intel_bts {
+	struct auxtrace			auxtrace;
+	struct auxtrace_queues		queues;
+	struct auxtrace_heap		heap;
+	u32				auxtrace_type;
+	struct perf_session		*session;
+	struct machine			*machine;
+	bool				sampling_mode;
+	bool				snapshot_mode;
+	bool				data_queued;
+	u32				pmu_type;
+	struct perf_tsc_conversion	tc;
+	bool				cap_user_time_zero;
+	struct itrace_synth_opts	synth_opts;
+	bool				sample_branches;
+	u64				branches_sample_type;
+	u64				branches_id;
+	size_t				branches_event_size;
+	bool				synth_needs_swap;
+};
+
+struct intel_bts_queue {
+	struct intel_bts	*bts;
+	unsigned int		queue_nr;
+	struct auxtrace_buffer	*buffer;
+	bool			on_heap;
+	bool			done;
+	pid_t			pid;
+	pid_t			tid;
+	int			cpu;
+	u64			time;
+};
+
+struct branch {
+	u64 from;
+	u64 to;
+	u64 misc;
+};
+
+static void intel_bts_dump(struct intel_bts *bts __maybe_unused,
+			   unsigned char *buf, size_t len)
+{
+	struct branch *branch;
+	size_t i, pos = 0, br_sz = sizeof(struct branch), sz;
+	const char *color = PERF_COLOR_BLUE;
+
+	color_fprintf(stdout, color,
+		      ". ... Intel BTS data: size %zu bytes\n",
+		      len);
+
+	while (len) {
+		if (len >= br_sz)
+			sz = br_sz;
+		else
+			sz = len;
+		printf(".");
+		color_fprintf(stdout, color, "  %08x: ", pos);
+		for (i = 0; i < sz; i++)
+			color_fprintf(stdout, color, " %02x", buf[i]);
+		for (; i < br_sz; i++)
+			color_fprintf(stdout, color, "   ");
+		if (len >= br_sz) {
+			branch = (struct branch *)buf;
+			color_fprintf(stdout, color, " %"PRIx64" -> %"PRIx64" %s\n",
+				      le64_to_cpu(branch->from),
+				      le64_to_cpu(branch->to),
+				      le64_to_cpu(branch->misc) & 0x10 ?
+							"pred" : "miss");
+		} else {
+			color_fprintf(stdout, color, " Bad record!\n");
+		}
+		pos += sz;
+		buf += sz;
+		len -= sz;
+	}
+}
+
+static void intel_bts_dump_event(struct intel_bts *bts, unsigned char *buf,
+				 size_t len)
+{
+	printf(".\n");
+	intel_bts_dump(bts, buf, len);
+}
+
+static int intel_bts_lost(struct intel_bts *bts, struct perf_sample *sample)
+{
+	union perf_event event;
+	int err;
+
+	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
+			     INTEL_BTS_ERR_LOST, sample->cpu, sample->pid,
+			     sample->tid, 0, "Lost trace data");
+
+	err = perf_session__deliver_synth_event(bts->session, &event, NULL);
+	if (err)
+		pr_err("Intel BTS: failed to deliver error event, error %d\n",
+		       err);
+
+	return err;
+}
+
+static struct intel_bts_queue *intel_bts_alloc_queue(struct intel_bts *bts,
+						     unsigned int queue_nr)
+{
+	struct intel_bts_queue *btsq;
+
+	btsq = zalloc(sizeof(struct intel_bts_queue));
+	if (!btsq)
+		return NULL;
+
+	btsq->bts = bts;
+	btsq->queue_nr = queue_nr;
+	btsq->pid = -1;
+	btsq->tid = -1;
+	btsq->cpu = -1;
+
+	return btsq;
+}
+
+static int intel_bts_setup_queue(struct intel_bts *bts,
+				 struct auxtrace_queue *queue,
+				 unsigned int queue_nr)
+{
+	struct intel_bts_queue *btsq = queue->priv;
+
+	if (list_empty(&queue->head))
+		return 0;
+
+	if (!btsq) {
+		btsq = intel_bts_alloc_queue(bts, queue_nr);
+		if (!btsq)
+			return -ENOMEM;
+		queue->priv = btsq;
+
+		if (queue->cpu != -1)
+			btsq->cpu = queue->cpu;
+		btsq->tid = queue->tid;
+	}
+
+	if (bts->sampling_mode)
+		return 0;
+
+	if (!btsq->on_heap && !btsq->buffer) {
+		int ret;
+
+		btsq->buffer = auxtrace_buffer__next(queue, NULL);
+		if (!btsq->buffer)
+			return 0;
+
+		ret = auxtrace_heap__add(&bts->heap, queue_nr,
+					 btsq->buffer->reference);
+		if (ret)
+			return ret;
+		btsq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int intel_bts_setup_queues(struct intel_bts *bts)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < bts->queues.nr_queues; i++) {
+		ret = intel_bts_setup_queue(bts, &bts->queues.queue_array[i],
+					    i);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static inline int intel_bts_update_queues(struct intel_bts *bts)
+{
+	if (bts->queues.new_data) {
+		bts->queues.new_data = false;
+		return intel_bts_setup_queues(bts);
+	}
+	return 0;
+}
+
+static unsigned char *intel_bts_find_overlap(unsigned char *buf_a, size_t len_a,
+					     unsigned char *buf_b, size_t len_b)
+{
+	size_t offs, len;
+
+	if (len_a > len_b)
+		offs = len_a - len_b;
+	else
+		offs = 0;
+
+	for (; offs < len_a; offs += sizeof(struct branch)) {
+		len = len_a - offs;
+		if (!memcmp(buf_a + offs, buf_b, len))
+			return buf_b + len;
+	}
+
+	return buf_b;
+}
+
+static int intel_bts_do_fix_overlap(struct auxtrace_queue *queue,
+				    struct auxtrace_buffer *b)
+{
+	struct auxtrace_buffer *a;
+	void *start;
+
+	if (b->list.prev == &queue->head)
+		return 0;
+	a = list_entry(b->list.prev, struct auxtrace_buffer, list);
+	start = intel_bts_find_overlap(a->data, a->size, b->data, b->size);
+	if (!start)
+		return -EINVAL;
+	b->use_size = b->data + b->size - start;
+	b->use_data = start;
+	return 0;
+}
+
+static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
+					 struct branch *branch)
+{
+	int ret;
+	struct intel_bts *bts = btsq->bts;
+	union perf_event event;
+	struct perf_sample sample = { .ip = 0, };
+
+	event.sample.header.type = PERF_RECORD_SAMPLE;
+	event.sample.header.misc = PERF_RECORD_MISC_USER;
+	event.sample.header.size = sizeof(struct perf_event_header);
+
+	sample.ip = le64_to_cpu(branch->from);
+	sample.pid = btsq->pid;
+	sample.tid = btsq->tid;
+	sample.addr = le64_to_cpu(branch->to);
+	sample.id = btsq->bts->branches_id;
+	sample.stream_id = btsq->bts->branches_id;
+	sample.period = 1;
+	sample.cpu = btsq->cpu;
+
+	if (bts->synth_opts.inject) {
+		event.sample.header.size = bts->branches_event_size;
+		ret = perf_event__synthesize_sample(&event,
+						    bts->branches_sample_type,
+						    0, &sample,
+						    bts->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(bts->session, &event, &sample);
+	if (ret)
+		pr_err("Intel BTS: failed to deliver branch event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
+				    struct auxtrace_buffer *buffer)
+{
+	struct branch *branch;
+	size_t sz, bsz = sizeof(struct branch);
+	int err = 0;
+
+	if (buffer->use_data) {
+		sz = buffer->use_size;
+		branch = buffer->use_data;
+	} else {
+		sz = buffer->size;
+		branch = buffer->data;
+	}
+
+	if (!btsq->bts->sample_branches)
+		return 0;
+
+	for (; sz > bsz; branch += 1, sz -= bsz) {
+		if (!branch->from && !branch->to)
+			continue;
+		err = intel_bts_synth_branch_sample(btsq, branch);
+		if (err)
+			break;
+	}
+	return err;
+}
+
+static int intel_bts_process_queue(struct intel_bts_queue *btsq, u64 *timestamp)
+{
+	struct auxtrace_buffer *buffer = btsq->buffer;
+	struct auxtrace_queue *queue;
+	int err;
+
+	if (btsq->done)
+		return 1;
+
+	if (btsq->pid == -1) {
+		struct thread *thread;
+
+		thread = machine__find_thread(btsq->bts->machine, -1, btsq->tid);
+		if (thread)
+			btsq->pid = thread->pid_;
+	}
+
+	queue = &btsq->bts->queues.queue_array[btsq->queue_nr];
+
+	if (!buffer)
+		buffer = auxtrace_buffer__next(queue, NULL);
+
+	if (!buffer) {
+		if (!btsq->bts->sampling_mode)
+			btsq->done = 1;
+		return 1;
+	}
+
+	/* Currently there is no support for split buffers */
+	if (buffer->consecutive)
+		return -EINVAL;
+
+	if (!buffer->data) {
+		int fd = perf_data_file__fd(btsq->bts->session->file);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data)
+			return -ENOMEM;
+	}
+
+	if (btsq->bts->snapshot_mode && !buffer->consecutive &&
+	    intel_bts_do_fix_overlap(queue, buffer))
+		return -ENOMEM;
+
+	err = intel_bts_process_buffer(btsq, buffer);
+
+	auxtrace_buffer__drop_data(buffer);
+
+	btsq->buffer = auxtrace_buffer__next(queue, buffer);
+	if (btsq->buffer) {
+		if (timestamp)
+			*timestamp = btsq->buffer->reference;
+	} else {
+		if (!btsq->bts->sampling_mode)
+			btsq->done = 1;
+	}
+
+	return err;
+}
+
+static int intel_bts_flush_queue(struct intel_bts_queue *btsq)
+{
+	u64 ts = 0;
+	int ret;
+
+	while (1) {
+		ret = intel_bts_process_queue(btsq, &ts);
+		if (ret < 0)
+			return ret;
+		if (ret)
+			break;
+	}
+	return 0;
+}
+
+static int intel_bts_process_tid_exit(struct intel_bts *bts, pid_t tid)
+{
+	struct auxtrace_queues *queues = &bts->queues;
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &bts->queues.queue_array[i];
+		struct intel_bts_queue *btsq = queue->priv;
+
+		if (btsq && btsq->tid == tid)
+			return intel_bts_flush_queue(btsq);
+	}
+	return 0;
+}
+
+static int intel_bts_process_queues(struct intel_bts *bts, u64 timestamp)
+{
+	while (1) {
+		unsigned int queue_nr;
+		struct auxtrace_queue *queue;
+		struct intel_bts_queue *btsq;
+		u64 ts = 0;
+		int ret;
+
+		if (!bts->heap.heap_cnt)
+			return 0;
+
+		if (bts->heap.heap_array[0].ordinal > timestamp)
+			return 0;
+
+		queue_nr = bts->heap.heap_array[0].queue_nr;
+		queue = &bts->queues.queue_array[queue_nr];
+		btsq = queue->priv;
+
+		auxtrace_heap__pop(&bts->heap);
+
+		ret = intel_bts_process_queue(btsq, &ts);
+		if (ret < 0) {
+			auxtrace_heap__add(&bts->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&bts->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			btsq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int intel_bts_process_event(struct perf_session *session,
+				   union perf_event *event,
+				   struct perf_sample *sample,
+				   struct perf_tool *tool)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+	u64 timestamp;
+	int err;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("Intel BTS requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time)
+		timestamp = perf_time_to_tsc(sample->time, &bts->tc);
+	else
+		timestamp = 0;
+
+	err = intel_bts_update_queues(bts);
+	if (err)
+		return err;
+
+	err = intel_bts_process_queues(bts, timestamp);
+	if (err)
+		return err;
+	if (event->header.type == PERF_RECORD_EXIT) {
+		err = intel_bts_process_tid_exit(bts, event->comm.tid);
+		if (err)
+			return err;
+	}
+
+	if (event->header.type == PERF_RECORD_AUX &&
+	    (event->aux.flags & PERF_AUX_FLAG_TRUNCATED) &&
+	    bts->synth_opts.errors)
+		err = intel_bts_lost(bts, sample);
+
+	return err;
+}
+
+static int intel_bts_process_auxtrace_event(struct perf_session *session,
+					    union perf_event *event,
+					    struct perf_tool *tool __maybe_unused)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+
+	if (bts->sampling_mode)
+		return 0;
+
+	if (!bts->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data_file__fd(session->file);
+		int err;
+
+		if (perf_data_file__is_pipe(session->file)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&bts->queues, session, event,
+						 data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				intel_bts_dump_event(bts, buffer->data,
+						     buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int intel_bts_flush(struct perf_session *session __maybe_unused,
+			   struct perf_tool *tool __maybe_unused)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+	int ret;
+
+	if (dump_trace || bts->sampling_mode)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = intel_bts_update_queues(bts);
+	if (ret < 0)
+		return ret;
+
+	return intel_bts_process_queues(bts, MAX_TIMESTAMP);
+}
+
+static void intel_bts_free_queue(void *priv)
+{
+	struct intel_bts_queue *btsq = priv;
+
+	if (!btsq)
+		return;
+	free(btsq);
+}
+
+static void intel_bts_free_events(struct perf_session *session)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+	struct auxtrace_queues *queues = &bts->queues;
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		intel_bts_free_queue(queues->queue_array[i].priv);
+		queues->queue_array[i].priv = NULL;
+	}
+	auxtrace_queues__free(queues);
+}
+
+static void intel_bts_free(struct perf_session *session)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+
+	auxtrace_heap__free(&bts->heap);
+	intel_bts_free_events(session);
+	session->auxtrace = NULL;
+	free(bts);
+}
+
+struct intel_bts_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int intel_bts_event_synth(struct perf_tool *tool,
+				 union perf_event *event,
+				 struct perf_sample *sample __maybe_unused,
+				 struct machine *machine __maybe_unused)
+{
+	struct intel_bts_synth *intel_bts_synth =
+			container_of(tool, struct intel_bts_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(intel_bts_synth->session,
+						 event, NULL);
+}
+
+static int intel_bts_synth_event(struct perf_session *session,
+				 struct perf_event_attr *attr, u64 id)
+{
+	struct intel_bts_synth intel_bts_synth;
+
+	memset(&intel_bts_synth, 0, sizeof(struct intel_bts_synth));
+	intel_bts_synth.session = session;
+
+	return perf_event__synthesize_attr(&intel_bts_synth.dummy_tool, attr, 1,
+					   &id, intel_bts_event_synth);
+}
+
+static int intel_bts_synth_events(struct intel_bts *bts,
+				  struct perf_session *session)
+{
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == bts->pmu_type && evsel->ids) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("There are no selected events with Intel BTS data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+			    PERF_SAMPLE_PERIOD;
+	attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	attr.sample_type &= ~(u64)PERF_SAMPLE_CPU;
+	attr.exclude_user = evsel->attr.exclude_user;
+	attr.exclude_kernel = evsel->attr.exclude_kernel;
+	attr.exclude_hv = evsel->attr.exclude_hv;
+	attr.exclude_host = evsel->attr.exclude_host;
+	attr.exclude_guest = evsel->attr.exclude_guest;
+	attr.sample_id_all = evsel->attr.sample_id_all;
+	attr.read_format = evsel->attr.read_format;
+
+	id = evsel->id[0] + 1000000000;
+	if (!id)
+		id = 1;
+
+	if (bts->synth_opts.branches) {
+		attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
+		attr.sample_period = 1;
+		attr.sample_type |= PERF_SAMPLE_ADDR;
+		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_bts_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'branches' event type\n",
+			       __func__);
+			return err;
+		}
+		bts->sample_branches = true;
+		bts->branches_sample_type = attr.sample_type;
+		bts->branches_id = id;
+		/*
+		 * We only use sample types from PERF_SAMPLE_MASK so we can use
+		 * __perf_evsel__sample_size() here.
+		 */
+		bts->branches_event_size = sizeof(struct sample_event) +
+				__perf_evsel__sample_size(attr.sample_type);
+	}
+
+	bts->synth_needs_swap = evsel->needs_swap;
+
+	return 0;
+}
+
+static const char * const intel_bts_info_fmts[] = {
+	[INTEL_BTS_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
+	[INTEL_BTS_TIME_SHIFT]		= "  Time Shift         %"PRIu64"\n",
+	[INTEL_BTS_TIME_MULT]		= "  Time Muliplier     %"PRIu64"\n",
+	[INTEL_BTS_TIME_ZERO]		= "  Time Zero          %"PRIu64"\n",
+	[INTEL_BTS_CAP_USER_TIME_ZERO]	= "  Cap Time Zero      %"PRId64"\n",
+	[INTEL_BTS_SNAPSHOT_MODE]	= "  Snapshot mode      %"PRId64"\n",
+};
+
+static void intel_bts_print_info(u64 *arr, int start, int finish)
+{
+	int i;
+
+	if (!dump_trace)
+		return;
+
+	for (i = start; i <= finish; i++)
+		fprintf(stdout, intel_bts_info_fmts[i], arr[i]);
+}
+
+u64 intel_bts_auxtrace_info_priv[INTEL_BTS_AUXTRACE_PRIV_SIZE];
+
+int intel_bts_process_auxtrace_info(union perf_event *event,
+				    struct perf_session *session)
+{
+	struct auxtrace_info_event *auxtrace_info = &event->auxtrace_info;
+	size_t min_sz = sizeof(u64) * INTEL_BTS_SNAPSHOT_MODE;
+	struct intel_bts *bts;
+	int err;
+
+	if (auxtrace_info->header.size < sizeof(struct auxtrace_info_event) +
+					min_sz)
+		return -EINVAL;
+
+	bts = zalloc(sizeof(struct intel_bts));
+	if (!bts)
+		return -ENOMEM;
+
+	err = auxtrace_queues__init(&bts->queues);
+	if (err)
+		goto err_free;
+
+	bts->session = session;
+	bts->machine = &session->machines.host; /* No kvm support */
+	bts->auxtrace_type = auxtrace_info->type;
+	bts->pmu_type = auxtrace_info->priv[INTEL_BTS_PMU_TYPE];
+	bts->tc.time_shift = auxtrace_info->priv[INTEL_BTS_TIME_SHIFT];
+	bts->tc.time_mult = auxtrace_info->priv[INTEL_BTS_TIME_MULT];
+	bts->tc.time_zero = auxtrace_info->priv[INTEL_BTS_TIME_ZERO];
+	bts->cap_user_time_zero =
+			auxtrace_info->priv[INTEL_BTS_CAP_USER_TIME_ZERO];
+	bts->snapshot_mode = auxtrace_info->priv[INTEL_BTS_SNAPSHOT_MODE];
+
+	bts->sampling_mode = false;
+
+	bts->auxtrace.process_event = intel_bts_process_event;
+	bts->auxtrace.process_auxtrace_event = intel_bts_process_auxtrace_event;
+	bts->auxtrace.flush_events = intel_bts_flush;
+	bts->auxtrace.free_events = intel_bts_free_events;
+	bts->auxtrace.free = intel_bts_free;
+	session->auxtrace = &bts->auxtrace;
+
+	intel_bts_print_info(&auxtrace_info->priv[0], INTEL_BTS_PMU_TYPE,
+			     INTEL_BTS_SNAPSHOT_MODE);
+
+	if (dump_trace)
+		return 0;
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
+		bts->synth_opts = *session->itrace_synth_opts;
+	else
+		itrace_synth_opts__set_default(&bts->synth_opts);
+
+	err = intel_bts_synth_events(bts, session);
+	if (err)
+		goto err_free_queues;
+
+	err = auxtrace_queues__process_index(&bts->queues, session);
+	if (err)
+		goto err_free_queues;
+
+	if (bts->queues.populated)
+		bts->data_queued = true;
+
+	return 0;
+
+err_free_queues:
+	auxtrace_queues__free(&bts->queues);
+	session->auxtrace = NULL;
+err_free:
+	free(bts);
+	return err;
+}
diff --git a/tools/perf/util/intel-bts.h b/tools/perf/util/intel-bts.h
new file mode 100644
index 000000000000..ca65e21b3e83
--- /dev/null
+++ b/tools/perf/util/intel-bts.h
@@ -0,0 +1,43 @@
+/*
+ * intel-bts.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__PERF_INTEL_BTS_H__
+#define INCLUDE__PERF_INTEL_BTS_H__
+
+#define INTEL_BTS_PMU_NAME "intel_bts"
+
+enum {
+	INTEL_BTS_PMU_TYPE,
+	INTEL_BTS_TIME_SHIFT,
+	INTEL_BTS_TIME_MULT,
+	INTEL_BTS_TIME_ZERO,
+	INTEL_BTS_CAP_USER_TIME_ZERO,
+	INTEL_BTS_SNAPSHOT_MODE,
+	INTEL_BTS_AUXTRACE_PRIV_MAX,
+};
+
+#define INTEL_BTS_AUXTRACE_PRIV_SIZE (INTEL_BTS_AUXTRACE_PRIV_MAX * sizeof(u64))
+
+struct auxtrace_record;
+struct perf_tool;
+union perf_event;
+struct perf_session;
+
+struct auxtrace_record *intel_bts_recording_init(int *err);
+
+int intel_bts_process_auxtrace_info(union perf_event *event,
+				    struct perf_session *session);
+
+#endif
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index b5ea26bc290d..52d569cda606 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -462,10 +462,6 @@ static struct perf_pmu *pmu_lookup(const char *name)
 	LIST_HEAD(aliases);
 	__u32 type;
 
-	/* No support for intel_bts so disallow it */
-	if (!strcmp(name, "intel_bts"))
-		return NULL;
-
 	/*
 	 * The pmu data we store & need consists of the pmu
 	 * type value and format definitions. Load both right
-- 
2.1.0


^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-06-26 22:02 [GIT PULL 0/8] perf/pt -> Intel PT/BTS Arnaldo Carvalho de Melo
                   ` (7 preceding siblings ...)
  2015-06-26 22:02 ` [PATCH 8/8] perf tools: Add Intel BTS support Arnaldo Carvalho de Melo
@ 2015-06-30  4:58 ` Ingo Molnar
  2015-06-30  7:54   ` Adrian Hunter
  8 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2015-06-30  4:58 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: linux-kernel, Adrian Hunter, Jiri Olsa, David Ahern,
	Namhyung Kim, Peter Zijlstra, Stephane Eranian,
	Arnaldo Carvalho de Melo


* Arnaldo Carvalho de Melo <acme@kernel.org> wrote:

> Hi Ingo,
> 
> 	Please consider pulling, there are several other patches after this,
> but I think that this may be acceptable to showcase the capabilities already
> present, look at the output of 'perf script' and callchains for userspace
> without using any extra debugging info (no need for DWARF, CFI, nothing),
> really cool capabilities... :-)
> 
> 	Adrian wrote some docs and I tested it both on a Ivy Bridge machine
> where there is only BTS and on a Broadwell machine with the whole shebang,
> adding the output of the commands to the csets, to further showcase what is
> there already.
> 
> 	This is on top of my last perf-core-for-mingo tag.
> 
> 	Up to you, please let us know what you think and we'll continue working
> on having this in an acceptable form for merging,
> 
> Regards,
> 
> - Arnaldo
> 
> P.S. Kudos for Adrian for the patience with this process, way more is needed to
> polish this, but the promise is there, cool stuff indeed! :-)
> 
> The following changes since commit 36c8bb56a9f718a9a5f35d1834ca9dcec95deb4a:
> 
>   perf symbols: Check access permission when reading symbol files (2015-06-26 12:11:53 -0300)
> 
> are available in the git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-pt-for-mingo
> 
> for you to fetch changes up to 04759f172270afb28c8004f5cad62ed55710a499:
> 
>   perf tools: Add Intel BTS support (2015-06-26 18:36:11 -0300)
> 
> ----------------------------------------------------------------
> Put Intel PT and BTS into initial use (Adrian Hunter)
> 
> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
> 
> ----------------------------------------------------------------
> Adrian Hunter (8):
>       perf auxtrace: Add Intel PT as an AUX area tracing type
>       perf tools: Add Intel PT packet decoder
>       perf tools: Add Intel PT instruction decoder
>       perf tools: Add Intel PT log
>       perf tools: Add Intel PT decoder
>       perf tools: Add Intel PT support
>       perf tools: Take Intel PT into use
>       perf tools: Add Intel BTS support
> 
>  tools/build/Makefile.build                         |    2 +
>  tools/perf/.gitignore                              |    2 +
>  tools/perf/Documentation/intel-bts.txt             |   86 +
>  tools/perf/Documentation/intel-pt.txt              |  588 ++++++
>  tools/perf/MANIFEST                                |    7 +
>  tools/perf/Makefile.perf                           |   12 +-
>  tools/perf/arch/x86/util/Build                     |    5 +
>  tools/perf/arch/x86/util/auxtrace.c                |   83 +
>  tools/perf/arch/x86/util/intel-bts.c               |  458 +++++
>  tools/perf/arch/x86/util/intel-pt.c                |  752 ++++++++
>  tools/perf/arch/x86/util/pmu.c                     |   18 +
>  tools/perf/util/Build                              |    3 +
>  tools/perf/util/auxtrace.c                         |    9 +-
>  tools/perf/util/auxtrace.h                         |    2 +
>  tools/perf/util/intel-bts.c                        |  791 ++++++++
>  tools/perf/util/intel-bts.h                        |   43 +
>  tools/perf/util/intel-pt-decoder/Build             |   14 +
>  .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 1759 ++++++++++++++++++
>  .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |  102 ++
>  .../util/intel-pt-decoder/intel-pt-insn-decoder.c  |  246 +++
>  .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |   65 +
>  tools/perf/util/intel-pt-decoder/intel-pt-log.c    |  155 ++
>  tools/perf/util/intel-pt-decoder/intel-pt-log.h    |   52 +
>  .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   |  400 +++++
>  .../util/intel-pt-decoder/intel-pt-pkt-decoder.h   |   64 +
>  tools/perf/util/intel-pt.c                         | 1889 ++++++++++++++++++++
>  tools/perf/util/intel-pt.h                         |   51 +
>  tools/perf/util/pmu.c                              |    4 -
>  28 files changed, 7655 insertions(+), 7 deletions(-)
>  create mode 100644 tools/perf/Documentation/intel-bts.txt
>  create mode 100644 tools/perf/Documentation/intel-pt.txt
>  create mode 100644 tools/perf/arch/x86/util/auxtrace.c
>  create mode 100644 tools/perf/arch/x86/util/intel-bts.c
>  create mode 100644 tools/perf/arch/x86/util/intel-pt.c
>  create mode 100644 tools/perf/arch/x86/util/pmu.c
>  create mode 100644 tools/perf/util/intel-bts.c
>  create mode 100644 tools/perf/util/intel-bts.h
>  create mode 100644 tools/perf/util/intel-pt-decoder/Build
>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.c
>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.h
>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
>  create mode 100644 tools/perf/util/intel-pt.c
>  create mode 100644 tools/perf/util/intel-pt.h

Yeah, so I did a 'newbie test':

I pulled the tree and saw that it has a tools/perf/Documentation/intel-bts.txt 
file and started reading it.

Based on its text:

  The Intel BTS kernel driver creates a new PMU for Intel BTS.  The perf record
  option is:

        -e intel_bts//

  Currently Intel BTS is limited to per-thread tracing so the --per-thread option
  is also needed.

I tried the following command which failed:

  triton:~/tip> perf record -e intel_bts// --per-thread sleep 1
  invalid or unsupported event: 'intel_bts//'
  Run 'perf list' for a list of valid events

   usage: perf record [<options>] [<command>]
      or: perf record [<options>] -- <command> [<options>]

      -e, --event <event>   event selector. use 'perf list' to list available events

That's a really ... unhelpful message. If I typoed something I want to know that. 
If the kernel does not support something, I want to know about that too. Tooling 
telling me: "maybe you typoed something, maybe it's not supported, I really don't 
care" is not very productive.

So this was with a distro kernel, and in the hope that I'm missing some magic new 
kernel feature, I tried it the latest -tip kernel, but it still gives me the same 
failure.

So the test newbie user got stuck after wasting some time.

Me as a kernel developer could probably figure it out, but that's not the point: 
if newbies cannot discover and use our new features then it's as if they didn't 
exist, and I'm not pulling non-existent features! ;-)

Could we please improve all this?

Thanks,


	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-06-30  4:58 ` [GIT PULL 0/8] perf/pt -> Intel PT/BTS Ingo Molnar
@ 2015-06-30  7:54   ` Adrian Hunter
  2015-06-30 10:56     ` Ingo Molnar
  0 siblings, 1 reply; 23+ messages in thread
From: Adrian Hunter @ 2015-06-30  7:54 UTC (permalink / raw)
  To: Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: linux-kernel, Jiri Olsa, David Ahern, Namhyung Kim,
	Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo

On 30/06/15 07:58, Ingo Molnar wrote:
> 
> * Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
> 
>> Hi Ingo,
>>
>> 	Please consider pulling, there are several other patches after this,
>> but I think that this may be acceptable to showcase the capabilities already
>> present, look at the output of 'perf script' and callchains for userspace
>> without using any extra debugging info (no need for DWARF, CFI, nothing),
>> really cool capabilities... :-)
>>
>> 	Adrian wrote some docs and I tested it both on a Ivy Bridge machine
>> where there is only BTS and on a Broadwell machine with the whole shebang,
>> adding the output of the commands to the csets, to further showcase what is
>> there already.
>>
>> 	This is on top of my last perf-core-for-mingo tag.
>>
>> 	Up to you, please let us know what you think and we'll continue working
>> on having this in an acceptable form for merging,
>>
>> Regards,
>>
>> - Arnaldo
>>
>> P.S. Kudos for Adrian for the patience with this process, way more is needed to
>> polish this, but the promise is there, cool stuff indeed! :-)
>>
>> The following changes since commit 36c8bb56a9f718a9a5f35d1834ca9dcec95deb4a:
>>
>>   perf symbols: Check access permission when reading symbol files (2015-06-26 12:11:53 -0300)
>>
>> are available in the git repository at:
>>
>>   git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux.git tags/perf-pt-for-mingo
>>
>> for you to fetch changes up to 04759f172270afb28c8004f5cad62ed55710a499:
>>
>>   perf tools: Add Intel BTS support (2015-06-26 18:36:11 -0300)
>>
>> ----------------------------------------------------------------
>> Put Intel PT and BTS into initial use (Adrian Hunter)
>>
>> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
>>
>> ----------------------------------------------------------------
>> Adrian Hunter (8):
>>       perf auxtrace: Add Intel PT as an AUX area tracing type
>>       perf tools: Add Intel PT packet decoder
>>       perf tools: Add Intel PT instruction decoder
>>       perf tools: Add Intel PT log
>>       perf tools: Add Intel PT decoder
>>       perf tools: Add Intel PT support
>>       perf tools: Take Intel PT into use
>>       perf tools: Add Intel BTS support
>>
>>  tools/build/Makefile.build                         |    2 +
>>  tools/perf/.gitignore                              |    2 +
>>  tools/perf/Documentation/intel-bts.txt             |   86 +
>>  tools/perf/Documentation/intel-pt.txt              |  588 ++++++
>>  tools/perf/MANIFEST                                |    7 +
>>  tools/perf/Makefile.perf                           |   12 +-
>>  tools/perf/arch/x86/util/Build                     |    5 +
>>  tools/perf/arch/x86/util/auxtrace.c                |   83 +
>>  tools/perf/arch/x86/util/intel-bts.c               |  458 +++++
>>  tools/perf/arch/x86/util/intel-pt.c                |  752 ++++++++
>>  tools/perf/arch/x86/util/pmu.c                     |   18 +
>>  tools/perf/util/Build                              |    3 +
>>  tools/perf/util/auxtrace.c                         |    9 +-
>>  tools/perf/util/auxtrace.h                         |    2 +
>>  tools/perf/util/intel-bts.c                        |  791 ++++++++
>>  tools/perf/util/intel-bts.h                        |   43 +
>>  tools/perf/util/intel-pt-decoder/Build             |   14 +
>>  .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 1759 ++++++++++++++++++
>>  .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |  102 ++
>>  .../util/intel-pt-decoder/intel-pt-insn-decoder.c  |  246 +++
>>  .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |   65 +
>>  tools/perf/util/intel-pt-decoder/intel-pt-log.c    |  155 ++
>>  tools/perf/util/intel-pt-decoder/intel-pt-log.h    |   52 +
>>  .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   |  400 +++++
>>  .../util/intel-pt-decoder/intel-pt-pkt-decoder.h   |   64 +
>>  tools/perf/util/intel-pt.c                         | 1889 ++++++++++++++++++++
>>  tools/perf/util/intel-pt.h                         |   51 +
>>  tools/perf/util/pmu.c                              |    4 -
>>  28 files changed, 7655 insertions(+), 7 deletions(-)
>>  create mode 100644 tools/perf/Documentation/intel-bts.txt
>>  create mode 100644 tools/perf/Documentation/intel-pt.txt
>>  create mode 100644 tools/perf/arch/x86/util/auxtrace.c
>>  create mode 100644 tools/perf/arch/x86/util/intel-bts.c
>>  create mode 100644 tools/perf/arch/x86/util/intel-pt.c
>>  create mode 100644 tools/perf/arch/x86/util/pmu.c
>>  create mode 100644 tools/perf/util/intel-bts.c
>>  create mode 100644 tools/perf/util/intel-bts.h
>>  create mode 100644 tools/perf/util/intel-pt-decoder/Build
>>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
>>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
>>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
>>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
>>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.c
>>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.h
>>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
>>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
>>  create mode 100644 tools/perf/util/intel-pt.c
>>  create mode 100644 tools/perf/util/intel-pt.h
> 
> Yeah, so I did a 'newbie test':
> 
> I pulled the tree and saw that it has a tools/perf/Documentation/intel-bts.txt 
> file and started reading it.
> 
> Based on its text:
> 
>   The Intel BTS kernel driver creates a new PMU for Intel BTS.  The perf record
>   option is:
> 
>         -e intel_bts//
> 
>   Currently Intel BTS is limited to per-thread tracing so the --per-thread option
>   is also needed.
> 
> I tried the following command which failed:
> 
>   triton:~/tip> perf record -e intel_bts// --per-thread sleep 1
>   invalid or unsupported event: 'intel_bts//'
>   Run 'perf list' for a list of valid events
> 
>    usage: perf record [<options>] [<command>]
>       or: perf record [<options>] -- <command> [<options>]
> 
>       -e, --event <event>   event selector. use 'perf list' to list available events
> 
> That's a really ... unhelpful message. If I typoed something I want to know that. 
> If the kernel does not support something, I want to know about that too. Tooling 
> telling me: "maybe you typoed something, maybe it's not supported, I really don't 
> care" is not very productive.

That is not entirely true. The message says "Run 'perf list' for a list of
valid events" which will tell you if the event is valid. So you can tell the
difference between a typo and unsupported event.

> 
> So this was with a distro kernel, and in the hope that I'm missing some magic new 
> kernel feature, I tried it the latest -tip kernel, but it still gives me the same 
> failure.
> 
> So the test newbie user got stuck after wasting some time.
> 
> Me as a kernel developer could probably figure it out, but that's not the point: 
> if newbies cannot discover and use our new features then it's as if they didn't 
> exist, and I'm not pulling non-existent features! ;-)
> 
> Could we please improve all this?

'perf list' shows the event wasn't supported, so I am not sure what more the
"newbie" could expect.  Do you have any suggestions?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-06-30  7:54   ` Adrian Hunter
@ 2015-06-30 10:56     ` Ingo Molnar
  2015-06-30 13:23       ` Adrian Hunter
  0 siblings, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2015-06-30 10:56 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Jiri Olsa, David Ahern,
	Namhyung Kim, Peter Zijlstra, Stephane Eranian,
	Arnaldo Carvalho de Melo


* Adrian Hunter <adrian.hunter@intel.com> wrote:

> > Yeah, so I did a 'newbie test':
> > 
> > I pulled the tree and saw that it has a tools/perf/Documentation/intel-bts.txt 
> > file and started reading it.
> > 
> > Based on its text:
> > 
> >   The Intel BTS kernel driver creates a new PMU for Intel BTS.  The perf record
> >   option is:
> > 
> >         -e intel_bts//
> > 
> >   Currently Intel BTS is limited to per-thread tracing so the --per-thread option
> >   is also needed.
> > 
> > I tried the following command which failed:
> > 
> >   triton:~/tip> perf record -e intel_bts// --per-thread sleep 1
> >   invalid or unsupported event: 'intel_bts//'
> >   Run 'perf list' for a list of valid events
> > 
> >    usage: perf record [<options>] [<command>]
> >       or: perf record [<options>] -- <command> [<options>]
> > 
> >       -e, --event <event>   event selector. use 'perf list' to list available events
> > 
> > That's a really ... unhelpful message. If I typoed something I want to know that. 
> > If the kernel does not support something, I want to know about that too. Tooling 
> > telling me: "maybe you typoed something, maybe it's not supported, I really don't 
> > care" is not very productive.
> 
> That is not entirely true. The message says "Run 'perf list' for a list of valid 
> events" which will tell you if the event is valid. So you can tell the 
> difference between a typo and unsupported event.

Yeah, but my point is: why doesn't the tool do this disambiguation for me? Tools 
are hard enough to use as-is already, no need to put artificial roadblocks in the 
path of first time users.

> > So this was with a distro kernel, and in the hope that I'm missing some magic 
> > new kernel feature, I tried it the latest -tip kernel, but it still gives me 
> > the same failure.
> > 
> > So the test newbie user got stuck after wasting some time.
> > 
> > Me as a kernel developer could probably figure it out, but that's not the 
> > point: if newbies cannot discover and use our new features then it's as if 
> > they didn't exist, and I'm not pulling non-existent features! ;-)
> > 
> > Could we please improve all this?
> 
> 'perf list' shows the event wasn't supported, so I am not sure what more the 
> "newbie" could expect.  Do you have any suggestions?

So I think a first time user would expect a clear message from the computer: what 
was wrong with what he wrote and what should he do to fix it.

Btw., here's the 'perf list' output from a system running the latest -tip kernel:

  vega:~> uname -a
  Linux vega 4.1.0-02935-g390ad45394a3-dirty #567 SMP PREEMPT Mon Jun 29 11:44:48 CEST 2015 x86_64 x86_64 x86_64 GNU/Linux
  vega:~> perf list | grep -i bts
  vega:~> 

so is there any kernel feature dependency? It's unclear. If yes, it should be 
mentioned in the document, and in the tooling output as well. If not then we have 
a bug somewhere.

I.e. you need to smooth the first time user's rocky path to first use as much as 
technically possible. Every single such helping step will literally double the 
number of users who will be able to successfully make use of the new feature.

As a positive example take a look at the newbie's road to 'perf trace':

  vega:~> trace
  Error:  No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
  Hint:   Try 'sudo mount -o remount,mode=755 /sys/kernel/debug'

Aha, useful message, I need to run this as root:

  # trace

     0.000 ( 0.000 ms): sleep/28926  ... [continued]: nanosleep()) = 0
     0.051 ( 0.007 ms): sleep/28926 close(fd: 1                                                           ) = 0
     0.063 ( 0.005 ms): sleep/28926 close(fd: 2                                                           ) = 0
     0.072 ( 0.000 ms): sleep/28926 exit_group(                                      

Ok?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-06-30 10:56     ` Ingo Molnar
@ 2015-06-30 13:23       ` Adrian Hunter
  2015-06-30 14:39         ` Arnaldo Carvalho de Melo
  2015-07-01  8:19         ` Adrian Hunter
  0 siblings, 2 replies; 23+ messages in thread
From: Adrian Hunter @ 2015-06-30 13:23 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Jiri Olsa, David Ahern,
	Namhyung Kim, Peter Zijlstra, Stephane Eranian,
	Arnaldo Carvalho de Melo

On 30/06/15 13:56, Ingo Molnar wrote:
> 
> * Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>>> Yeah, so I did a 'newbie test':
>>>
>>> I pulled the tree and saw that it has a tools/perf/Documentation/intel-bts.txt 
>>> file and started reading it.
>>>
>>> Based on its text:
>>>
>>>   The Intel BTS kernel driver creates a new PMU for Intel BTS.  The perf record
>>>   option is:
>>>
>>>         -e intel_bts//
>>>
>>>   Currently Intel BTS is limited to per-thread tracing so the --per-thread option
>>>   is also needed.
>>>
>>> I tried the following command which failed:
>>>
>>>   triton:~/tip> perf record -e intel_bts// --per-thread sleep 1
>>>   invalid or unsupported event: 'intel_bts//'
>>>   Run 'perf list' for a list of valid events
>>>
>>>    usage: perf record [<options>] [<command>]
>>>       or: perf record [<options>] -- <command> [<options>]
>>>
>>>       -e, --event <event>   event selector. use 'perf list' to list available events
>>>
>>> That's a really ... unhelpful message. If I typoed something I want to know that. 
>>> If the kernel does not support something, I want to know about that too. Tooling 
>>> telling me: "maybe you typoed something, maybe it's not supported, I really don't 
>>> care" is not very productive.
>>
>> That is not entirely true. The message says "Run 'perf list' for a list of valid 
>> events" which will tell you if the event is valid. So you can tell the 
>> difference between a typo and unsupported event.
> 
> Yeah, but my point is: why doesn't the tool do this disambiguation for me? Tools 
> are hard enough to use as-is already, no need to put artificial roadblocks in the 
> path of first time users.

That applies to all events e.g.

# perf record -e sched:sched_swotch sleep 1
invalid or unsupported event: 'sched:sched_swotch'                                                                                       
Run 'perf list' for a list of valid events                                                                                               
                                                                                                                                         
 usage: perf record [<options>] [<command>]                                                                                              
    or: perf record [<options>] -- <command> [<options>]                                                                                 
                                                                                                                                         
    -e, --event <event>   event selector. use 'perf list' to list available events    

So it is a general problem.

> 
>>> So this was with a distro kernel, and in the hope that I'm missing some magic 
>>> new kernel feature, I tried it the latest -tip kernel, but it still gives me 
>>> the same failure.
>>>
>>> So the test newbie user got stuck after wasting some time.
>>>
>>> Me as a kernel developer could probably figure it out, but that's not the 
>>> point: if newbies cannot discover and use our new features then it's as if 
>>> they didn't exist, and I'm not pulling non-existent features! ;-)
>>>
>>> Could we please improve all this?
>>
>> 'perf list' shows the event wasn't supported, so I am not sure what more the 
>> "newbie" could expect.  Do you have any suggestions?
> 
> So I think a first time user would expect a clear message from the computer: what 
> was wrong with what he wrote and what should he do to fix it.
> 
> Btw., here's the 'perf list' output from a system running the latest -tip kernel:
> 
>   vega:~> uname -a
>   Linux vega 4.1.0-02935-g390ad45394a3-dirty #567 SMP PREEMPT Mon Jun 29 11:44:48 CEST 2015 x86_64 x86_64 x86_64 GNU/Linux
>   vega:~> perf list | grep -i bts
>   vega:~> 
> 
> so is there any kernel feature dependency? It's unclear. If yes, it should be 
> mentioned in the document, and in the tooling output as well. If not then we have 
> a bug somewhere.

I am not aware of any dependencies, apart from perf events itself.

Are you sure you compiled perf tools with the new patches ;-)
And it is an Intel CPU?

> 
> I.e. you need to smooth the first time user's rocky path to first use as much as 
> technically possible. Every single such helping step will literally double the 
> number of users who will be able to successfully make use of the new feature.
> 
> As a positive example take a look at the newbie's road to 'perf trace':
> 
>   vega:~> trace
>   Error:  No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
>   Hint:   Try 'sudo mount -o remount,mode=755 /sys/kernel/debug'
> 
> Aha, useful message, I need to run this as root:
> 
>   # trace
> 
>      0.000 ( 0.000 ms): sleep/28926  ... [continued]: nanosleep()) = 0
>      0.051 ( 0.007 ms): sleep/28926 close(fd: 1                                                           ) = 0
>      0.063 ( 0.005 ms): sleep/28926 close(fd: 2                                                           ) = 0
>      0.072 ( 0.000 ms): sleep/28926 exit_group(                                      
> 
> Ok?

Could do something like the following:

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 09f8d2357108..5ab8fee89361 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -666,8 +666,13 @@ int parse_events_add_pmu(struct parse_events_evlist *data,
 	struct perf_evsel *evsel;
 
 	pmu = perf_pmu__find(name);
-	if (!pmu)
+	if (!pmu) {
+		if ((!strcmp(name, "intel_bts") || !strcmp(name, "intel_pt")) &&
+		    data->error)
+			if (asprintf(&data->error->str, "%s is not supported by the running kernel", name) < 0)
+				return -ENOMEM;
 		return -EINVAL;
+	}
 
 	if (pmu->default_config) {
 		memcpy(&attr, pmu->default_config,

Could then add checks for Intel hardware and bts CPU feature flag.



^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-06-30 13:23       ` Adrian Hunter
@ 2015-06-30 14:39         ` Arnaldo Carvalho de Melo
  2015-07-01  8:19         ` Adrian Hunter
  1 sibling, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-30 14:39 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ingo Molnar, linux-kernel, Jiri Olsa, David Ahern, Namhyung Kim,
	Peter Zijlstra, Stephane Eranian

Em Tue, Jun 30, 2015 at 04:23:45PM +0300, Adrian Hunter escreveu:
> On 30/06/15 13:56, Ingo Molnar wrote:
> > 
> > * Adrian Hunter <adrian.hunter@intel.com> wrote:
> > 
> >>> Yeah, so I did a 'newbie test':
> >>>
> >>> I pulled the tree and saw that it has a tools/perf/Documentation/intel-bts.txt 
> >>> file and started reading it.
> >>>
> >>> Based on its text:
> >>>
> >>>   The Intel BTS kernel driver creates a new PMU for Intel BTS.  The perf record
> >>>   option is:
> >>>
> >>>         -e intel_bts//
> >>>
> >>>   Currently Intel BTS is limited to per-thread tracing so the --per-thread option
> >>>   is also needed.
> >>>
> >>> I tried the following command which failed:
> >>>
> >>>   triton:~/tip> perf record -e intel_bts// --per-thread sleep 1
> >>>   invalid or unsupported event: 'intel_bts//'
> >>>   Run 'perf list' for a list of valid events
> >>>
> >>>    usage: perf record [<options>] [<command>]
> >>>       or: perf record [<options>] -- <command> [<options>]
> >>>
> >>>       -e, --event <event>   event selector. use 'perf list' to list available events
> >>>
> >>> That's a really ... unhelpful message. If I typoed something I want to know that. 
> >>> If the kernel does not support something, I want to know about that too. Tooling 
> >>> telling me: "maybe you typoed something, maybe it's not supported, I really don't 
> >>> care" is not very productive.
> >>
> >> That is not entirely true. The message says "Run 'perf list' for a list of valid 
> >> events" which will tell you if the event is valid. So you can tell the 
> >> difference between a typo and unsupported event.
> > 
> > Yeah, but my point is: why doesn't the tool do this disambiguation for me? Tools 
> > are hard enough to use as-is already, no need to put artificial roadblocks in the 
> > path of first time users.
> 
> That applies to all events e.g.
> 
> # perf record -e sched:sched_swotch sleep 1
> invalid or unsupported event: 'sched:sched_swotch'                                                                                       
> Run 'perf list' for a list of valid events                                                                                               
>                                                                                                                                          
>  usage: perf record [<options>] [<command>]                                                                                              
>     or: perf record [<options>] -- <command> [<options>]                                                                                 
>                                                                                                                                          
>     -e, --event <event>   event selector. use 'perf list' to list available events    
> 
> So it is a general problem.

Right, guess it is interesting to note at this point that Jiri improved
this area:

[acme@zoo linux]$ perf record -e sched:sched_swotch sleep 1
event syntax error: 'sched:sched_swotch'
                     \___ unknown tracepoint
Run 'perf list' for a list of valid events

 usage: perf record [<options>] [<command>]
    or: perf record [<options>] -- <command> [<options>]

    -e, --event <event>   event selector. use 'perf list' to list available events
[acme@zoo linux]$

But:

[acme@zoo linux]$ perf record -e intel_pt// usleep 1
invalid or unsupported event: 'intel_pt//'
Run 'perf list' for a list of valid events

 usage: perf record [<options>] [<command>]
    or: perf record [<options>] -- <command> [<options>]

    -e, --event <event>   event selector. use 'perf list' to list available events
[acme@zoo linux]$

I'll investigate how this can be improved...

- Arnaldo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-06-30 13:23       ` Adrian Hunter
  2015-06-30 14:39         ` Arnaldo Carvalho de Melo
@ 2015-07-01  8:19         ` Adrian Hunter
  2015-07-01 12:47           ` Adrian Hunter
  2015-07-02  9:43           ` Ingo Molnar
  1 sibling, 2 replies; 23+ messages in thread
From: Adrian Hunter @ 2015-07-01  8:19 UTC (permalink / raw)
  To: Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: linux-kernel, Jiri Olsa, David Ahern, Namhyung Kim,
	Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo

On 30/06/15 16:23, Adrian Hunter wrote:
> On 30/06/15 13:56, Ingo Molnar wrote:
>>
>> * Adrian Hunter <adrian.hunter@intel.com> wrote:
>>
>>>> Yeah, so I did a 'newbie test':
>>>>
>>>> I pulled the tree and saw that it has a tools/perf/Documentation/intel-bts.txt 
>>>> file and started reading it.
>>>>
>>>> Based on its text:
>>>>
>>>>   The Intel BTS kernel driver creates a new PMU for Intel BTS.  The perf record
>>>>   option is:
>>>>
>>>>         -e intel_bts//
>>>>
>>>>   Currently Intel BTS is limited to per-thread tracing so the --per-thread option
>>>>   is also needed.
>>>>
>>>> I tried the following command which failed:
>>>>
>>>>   triton:~/tip> perf record -e intel_bts// --per-thread sleep 1
>>>>   invalid or unsupported event: 'intel_bts//'
>>>>   Run 'perf list' for a list of valid events
>>>>
>>>>    usage: perf record [<options>] [<command>]
>>>>       or: perf record [<options>] -- <command> [<options>]
>>>>
>>>>       -e, --event <event>   event selector. use 'perf list' to list available events
>>>>
>>>> That's a really ... unhelpful message. If I typoed something I want to know that. 
>>>> If the kernel does not support something, I want to know about that too. Tooling 
>>>> telling me: "maybe you typoed something, maybe it's not supported, I really don't 
>>>> care" is not very productive.
>>>
>>> That is not entirely true. The message says "Run 'perf list' for a list of valid 
>>> events" which will tell you if the event is valid. So you can tell the 
>>> difference between a typo and unsupported event.
>>
>> Yeah, but my point is: why doesn't the tool do this disambiguation for me? Tools 
>> are hard enough to use as-is already, no need to put artificial roadblocks in the 
>> path of first time users.
> 
> That applies to all events e.g.
> 
> # perf record -e sched:sched_swotch sleep 1
> invalid or unsupported event: 'sched:sched_swotch'                                                                                       
> Run 'perf list' for a list of valid events                                                                                               
>                                                                                                                                          
>  usage: perf record [<options>] [<command>]                                                                                              
>     or: perf record [<options>] -- <command> [<options>]                                                                                 
>                                                                                                                                          
>     -e, --event <event>   event selector. use 'perf list' to list available events    
> 
> So it is a general problem.
> 
>>
>>>> So this was with a distro kernel, and in the hope that I'm missing some magic 
>>>> new kernel feature, I tried it the latest -tip kernel, but it still gives me 
>>>> the same failure.
>>>>
>>>> So the test newbie user got stuck after wasting some time.
>>>>
>>>> Me as a kernel developer could probably figure it out, but that's not the 
>>>> point: if newbies cannot discover and use our new features then it's as if 
>>>> they didn't exist, and I'm not pulling non-existent features! ;-)
>>>>
>>>> Could we please improve all this?
>>>
>>> 'perf list' shows the event wasn't supported, so I am not sure what more the 
>>> "newbie" could expect.  Do you have any suggestions?
>>
>> So I think a first time user would expect a clear message from the computer: what 
>> was wrong with what he wrote and what should he do to fix it.
>>
>> Btw., here's the 'perf list' output from a system running the latest -tip kernel:
>>
>>   vega:~> uname -a
>>   Linux vega 4.1.0-02935-g390ad45394a3-dirty #567 SMP PREEMPT Mon Jun 29 11:44:48 CEST 2015 x86_64 x86_64 x86_64 GNU/Linux
>>   vega:~> perf list | grep -i bts
>>   vega:~> 
>>
>> so is there any kernel feature dependency? It's unclear. If yes, it should be 
>> mentioned in the document, and in the tooling output as well. If not then we have 
>> a bug somewhere.
> 
> I am not aware of any dependencies, apart from perf events itself.
> 
> Are you sure you compiled perf tools with the new patches ;-)
> And it is an Intel CPU?
> 
>>
>> I.e. you need to smooth the first time user's rocky path to first use as much as 
>> technically possible. Every single such helping step will literally double the 
>> number of users who will be able to successfully make use of the new feature.
>>
>> As a positive example take a look at the newbie's road to 'perf trace':
>>
>>   vega:~> trace
>>   Error:  No permissions to read /sys/kernel/debug/tracing/events/raw_syscalls/sys_(enter|exit)
>>   Hint:   Try 'sudo mount -o remount,mode=755 /sys/kernel/debug'
>>
>> Aha, useful message, I need to run this as root:
>>
>>   # trace
>>
>>      0.000 ( 0.000 ms): sleep/28926  ... [continued]: nanosleep()) = 0
>>      0.051 ( 0.007 ms): sleep/28926 close(fd: 1                                                           ) = 0
>>      0.063 ( 0.005 ms): sleep/28926 close(fd: 2                                                           ) = 0
>>      0.072 ( 0.000 ms): sleep/28926 exit_group(                                      
>>
>> Ok?
> 
> Could do something like the following:
> 
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 09f8d2357108..5ab8fee89361 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -666,8 +666,13 @@ int parse_events_add_pmu(struct parse_events_evlist *data,
>  	struct perf_evsel *evsel;
>  
>  	pmu = perf_pmu__find(name);
> -	if (!pmu)
> +	if (!pmu) {
> +		if ((!strcmp(name, "intel_bts") || !strcmp(name, "intel_pt")) &&
> +		    data->error)
> +			if (asprintf(&data->error->str, "%s is not supported by the running kernel", name) < 0)
> +				return -ENOMEM;
>  		return -EINVAL;
> +	}
>  
>  	if (pmu->default_config) {
>  		memcpy(&attr, pmu->default_config,
> 
> Could then add checks for Intel hardware and bts CPU feature flag.

How is this?

From: Adrian Hunter <adrian.hunter@intel.com>
Date: Wed, 1 Jul 2015 11:14:50 +0300
Subject: [PATCH] perf tools: Add error messages for missing intel_bts and
 intel_pt support

Add error messages to assist users in determining why there is
no intel_bts or intel_pt support.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/arch/x86/util/header.c | 15 ++++++++++++++
 tools/perf/util/header.h          |  3 ++-
 tools/perf/util/parse-events.c    | 41 ++++++++++++++++++++++++++++++++++++++-
 3 files changed, 57 insertions(+), 2 deletions(-)

diff --git a/tools/perf/arch/x86/util/header.c b/tools/perf/arch/x86/util/header.c
index 146d12a1cec0..afc5bdfd2d15 100644
--- a/tools/perf/arch/x86/util/header.c
+++ b/tools/perf/arch/x86/util/header.c
@@ -57,3 +57,18 @@ get_cpuid(char *buffer, size_t sz)
 	}
 	return -1;
 }
+
+int have_intel_cpu(void)
+{
+	char buffer[64];
+	int ret;
+
+	ret = get_cpuid(buffer, sizeof(buffer));
+	if (ret)
+		return -1;
+
+	if (!strncmp(buffer, "GenuineIntel,", 13))
+		return 1;
+
+	return 0;
+}
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index d4d57962c591..f6eab49d13d1 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -153,8 +153,9 @@ bool is_perf_magic(u64 magic);
 int write_padded(int fd, const void *bf, size_t count, size_t count_aligned);
 
 /*
- * arch specific callback
+ * arch specific callbacks
  */
 int get_cpuid(char *buffer, size_t sz);
+int have_intel_cpu(void);
 
 #endif /* __PERF_HEADER_H */
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 09f8d2357108..23fb777b40fa 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -656,6 +656,45 @@ static char *pmu_event_name(struct list_head *head_terms)
 	return NULL;
 }
 
+int __weak have_intel_cpu(void)
+{
+	return 0;
+}
+
+static int pmu_not_found_error(char *name, struct parse_events_error *err)
+{
+	int ret;
+
+	if (!err)
+		goto out;
+
+	if (!strcmp(name, "intel_bts")) {
+		ret = have_intel_cpu();
+		if (ret < 0)
+			goto out;
+		if (!ret) {
+			err->str = strdup("intel_bts requires an Intel CPU");
+			goto out;
+		}
+		err->str = strdup("kernel does not support intel_bts (requires 64-bit v4.1 kernel or later and BTS hardware support)");
+		goto out;
+	}
+
+	if (!strcmp(name, "intel_pt")) {
+		ret = have_intel_cpu();
+		if (ret < 0)
+			goto out;
+		if (!ret) {
+			err->str = strdup("intel_pt requires an Intel CPU (Core 5th generation or later)");
+			goto out;
+		}
+		err->str = strdup("kernel does not support intel_pt (requires v4.1 kernel or later and 5th generation Intel Core processor or later)");
+		goto out;
+	}
+out:
+	return -EINVAL;
+}
+
 int parse_events_add_pmu(struct parse_events_evlist *data,
 			 struct list_head *list, char *name,
 			 struct list_head *head_config)
@@ -667,7 +706,7 @@ int parse_events_add_pmu(struct parse_events_evlist *data,
 
 	pmu = perf_pmu__find(name);
 	if (!pmu)
-		return -EINVAL;
+		return pmu_not_found_error(name, data->error);
 
 	if (pmu->default_config) {
 		memcpy(&attr, pmu->default_config,
-- 
1.9.1




^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-07-01  8:19         ` Adrian Hunter
@ 2015-07-01 12:47           ` Adrian Hunter
  2015-07-02  9:28             ` Ingo Molnar
  2015-07-02  9:43           ` Ingo Molnar
  1 sibling, 1 reply; 23+ messages in thread
From: Adrian Hunter @ 2015-07-01 12:47 UTC (permalink / raw)
  To: Ingo Molnar, Arnaldo Carvalho de Melo
  Cc: linux-kernel, Jiri Olsa, David Ahern, Namhyung Kim,
	Peter Zijlstra, Stephane Eranian, Arnaldo Carvalho de Melo

On 01/07/15 11:19, Adrian Hunter wrote:
> On 30/06/15 16:23, Adrian Hunter wrote:
>> On 30/06/15 13:56, Ingo Molnar wrote:
>>>
>>> * Adrian Hunter <adrian.hunter@intel.com> wrote:
>>>
>>>>> Yeah, so I did a 'newbie test':
>>>>>
>>>>> I pulled the tree and saw that it has a tools/perf/Documentation/intel-bts.txt 
>>>>> file and started reading it.
>>>>>
>>>>> Based on its text:
>>>>>
>>>>>   The Intel BTS kernel driver creates a new PMU for Intel BTS.  The perf record
>>>>>   option is:
>>>>>
>>>>>         -e intel_bts//
>>>>>
>>>>>   Currently Intel BTS is limited to per-thread tracing so the --per-thread option
>>>>>   is also needed.
>>>>>
>>>>> I tried the following command which failed:
>>>>>
>>>>>   triton:~/tip> perf record -e intel_bts// --per-thread sleep 1
>>>>>   invalid or unsupported event: 'intel_bts//'
>>>>>   Run 'perf list' for a list of valid events
>>>>>
>>>>>    usage: perf record [<options>] [<command>]
>>>>>       or: perf record [<options>] -- <command> [<options>]
>>>>>
>>>>>       -e, --event <event>   event selector. use 'perf list' to list available events
>>>>>
>>>>> That's a really ... unhelpful message. If I typoed something I want to know that. 
>>>>> If the kernel does not support something, I want to know about that too. Tooling 
>>>>> telling me: "maybe you typoed something, maybe it's not supported, I really don't 
>>>>> care" is not very productive.
>>>>
>>>> That is not entirely true. The message says "Run 'perf list' for a list of valid 
>>>> events" which will tell you if the event is valid. So you can tell the 
>>>> difference between a typo and unsupported event.
>>>
>>> Yeah, but my point is: why doesn't the tool do this disambiguation for me? Tools 
>>> are hard enough to use as-is already, no need to put artificial roadblocks in the 
>>> path of first time users.
>>
>> That applies to all events e.g.
>>
>> # perf record -e sched:sched_swotch sleep 1
>> invalid or unsupported event: 'sched:sched_swotch'                                                                                       
>> Run 'perf list' for a list of valid events                                                                                               
>>                                                                                                                                          
>>  usage: perf record [<options>] [<command>]                                                                                              
>>     or: perf record [<options>] -- <command> [<options>]                                                                                 
>>                                                                                                                                          
>>     -e, --event <event>   event selector. use 'perf list' to list available events    
>>
>> So it is a general problem.
>>
>>>
>>>>> So this was with a distro kernel, and in the hope that I'm missing some magic 
>>>>> new kernel feature, I tried it the latest -tip kernel, but it still gives me 
>>>>> the same failure.
>>>>>
>>>>> So the test newbie user got stuck after wasting some time.
>>>>>
>>>>> Me as a kernel developer could probably figure it out, but that's not the 
>>>>> point: if newbies cannot discover and use our new features then it's as if 
>>>>> they didn't exist, and I'm not pulling non-existent features! ;-)
>>>>>
>>>>> Could we please improve all this?
>>>>
>>>> 'perf list' shows the event wasn't supported, so I am not sure what more the 
>>>> "newbie" could expect.  Do you have any suggestions?
>>>
>>> So I think a first time user would expect a clear message from the computer: what 
>>> was wrong with what he wrote and what should he do to fix it.
>>>
>>> Btw., here's the 'perf list' output from a system running the latest -tip kernel:
>>>
>>>   vega:~> uname -a
>>>   Linux vega 4.1.0-02935-g390ad45394a3-dirty #567 SMP PREEMPT Mon Jun 29 11:44:48 CEST 2015 x86_64 x86_64 x86_64 GNU/Linux
>>>   vega:~> perf list | grep -i bts
>>>   vega:~> 
>>>
>>> so is there any kernel feature dependency? It's unclear. If yes, it should be 
>>> mentioned in the document, and in the tooling output as well. If not then we have 
>>> a bug somewhere.
>>
>> I am not aware of any dependencies, apart from perf events itself.
>>
>> Are you sure you compiled perf tools with the new patches ;-)
>> And it is an Intel CPU?
>>

I am going to need to know what hardware it is and cpu feature flags i.e.
/proc/cpuinfo

Also ls /sys/bus/event_source/devices



^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-07-01 12:47           ` Adrian Hunter
@ 2015-07-02  9:28             ` Ingo Molnar
  0 siblings, 0 replies; 23+ messages in thread
From: Ingo Molnar @ 2015-07-02  9:28 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Jiri Olsa, David Ahern,
	Namhyung Kim, Peter Zijlstra, Stephane Eranian,
	Arnaldo Carvalho de Melo


* Adrian Hunter <adrian.hunter@intel.com> wrote:

> I am going to need to know what hardware it is and cpu feature flags i.e.
> /proc/cpuinfo
> 
> Also ls /sys/bus/event_source/devices

Sorry about the delay, the merge window is busier than usual ...

So I tried it on:

  1) Intel box, old kernel, new tools: threw an error
  2)   AMD box, new kernel, new tools: threw an error
  3)   AMD box, old kernel, new tools: threw an error

failure was expected in all 3 cases, but it threw the exact same error with no 
real indication about what the problem was, while the real problem was different 
for each case:

  1)  good CPU,  bad kernel
  2)   bad CPU, good kernel
  3)   bad CPU,  bad kernel

I'd expect the 'good CPU, good kernel' combination to work:

  4) Intel box, new kernel, new tools

... but haven't tried it yet.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-07-01  8:19         ` Adrian Hunter
  2015-07-01 12:47           ` Adrian Hunter
@ 2015-07-02  9:43           ` Ingo Molnar
  2015-07-02 12:35             ` Adrian Hunter
  2015-07-03 14:31             ` Alexander Shishkin
  1 sibling, 2 replies; 23+ messages in thread
From: Ingo Molnar @ 2015-07-02  9:43 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Jiri Olsa, David Ahern,
	Namhyung Kim, Peter Zijlstra, Stephane Eranian,
	Arnaldo Carvalho de Melo


* Adrian Hunter <adrian.hunter@intel.com> wrote:

> How is this?
> 
> From: Adrian Hunter <adrian.hunter@intel.com>
> Date: Wed, 1 Jul 2015 11:14:50 +0300
> Subject: [PATCH] perf tools: Add error messages for missing intel_bts and
>  intel_pt support
> 
> Add error messages to assist users in determining why there is
> no intel_bts or intel_pt support.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/arch/x86/util/header.c | 15 ++++++++++++++
>  tools/perf/util/header.h          |  3 ++-
>  tools/perf/util/parse-events.c    | 41 ++++++++++++++++++++++++++++++++++++++-
>  3 files changed, 57 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/arch/x86/util/header.c b/tools/perf/arch/x86/util/header.c
> index 146d12a1cec0..afc5bdfd2d15 100644
> --- a/tools/perf/arch/x86/util/header.c
> +++ b/tools/perf/arch/x86/util/header.c
> @@ -57,3 +57,18 @@ get_cpuid(char *buffer, size_t sz)
>  	}
>  	return -1;
>  }
> +
> +int have_intel_cpu(void)
> +{
> +	char buffer[64];
> +	int ret;
> +
> +	ret = get_cpuid(buffer, sizeof(buffer));
> +	if (ret)
> +		return -1;
> +
> +	if (!strncmp(buffer, "GenuineIntel,", 13))
> +		return 1;
> +
> +	return 0;
> +}
> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
> index d4d57962c591..f6eab49d13d1 100644
> --- a/tools/perf/util/header.h
> +++ b/tools/perf/util/header.h
> @@ -153,8 +153,9 @@ bool is_perf_magic(u64 magic);
>  int write_padded(int fd, const void *bf, size_t count, size_t count_aligned);
>  
>  /*
> - * arch specific callback
> + * arch specific callbacks
>   */
>  int get_cpuid(char *buffer, size_t sz);
> +int have_intel_cpu(void);
>  
>  #endif /* __PERF_HEADER_H */
> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
> index 09f8d2357108..23fb777b40fa 100644
> --- a/tools/perf/util/parse-events.c
> +++ b/tools/perf/util/parse-events.c
> @@ -656,6 +656,45 @@ static char *pmu_event_name(struct list_head *head_terms)
>  	return NULL;
>  }
>  
> +int __weak have_intel_cpu(void)
> +{
> +	return 0;
> +}
> +
> +static int pmu_not_found_error(char *name, struct parse_events_error *err)
> +{
> +	int ret;
> +
> +	if (!err)
> +		goto out;
> +
> +	if (!strcmp(name, "intel_bts")) {
> +		ret = have_intel_cpu();
> +		if (ret < 0)
> +			goto out;
> +		if (!ret) {
> +			err->str = strdup("intel_bts requires an Intel CPU");
> +			goto out;
> +		}
> +		err->str = strdup("kernel does not support intel_bts (requires 64-bit v4.1 kernel or later and BTS hardware support)");
> +		goto out;
> +	}
> +
> +	if (!strcmp(name, "intel_pt")) {
> +		ret = have_intel_cpu();
> +		if (ret < 0)
> +			goto out;
> +		if (!ret) {
> +			err->str = strdup("intel_pt requires an Intel CPU (Core 5th generation or later)");
> +			goto out;
> +		}
> +		err->str = strdup("kernel does not support intel_pt (requires v4.1 kernel or later and 5th generation Intel Core processor or later)");
> +		goto out;
> +	}
> +out:
> +	return -EINVAL;
> +}
> +
>  int parse_events_add_pmu(struct parse_events_evlist *data,
>  			 struct list_head *list, char *name,
>  			 struct list_head *head_config)
> @@ -667,7 +706,7 @@ int parse_events_add_pmu(struct parse_events_evlist *data,
>  
>  	pmu = perf_pmu__find(name);
>  	if (!pmu)
> -		return -EINVAL;
> +		return pmu_not_found_error(name, data->error);
>  
>  	if (pmu->default_config) {
>  		memcpy(&attr, pmu->default_config,

So I really think we need an extended error reporting feature on the perf kernel 
side, so that a 'natural' error (plus a string) is reported back to tooling, 
instead of the current -EINVAL.

No need to do it for everything, doing it for BTS and related functionality would 
be a good first step to start this.

If you are interested you could try this, or I can try to write something (after 
the merge window).

So the idea would be to convert such opaque error returns:

        if (attr->config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS &&
            !attr->freq && hwc->sample_period == 1) {
                /* BTS is not supported by this architecture. */
                if (!x86_pmu.bts_active)
                        return -EOPNOTSUPP;

into:

			return perf_err(event, -EOPNOTSUPP, "The BTS hardware feature is not available on this CPU.");

Where perf_err() is a function that on one hand returns -EOPNOTSUPP - so 'legacy' 
error handling works as before: the syscall will return -EOPNOTSUPP.

But if a new 'extended error reporting' feature bit is set in the perf_attr, then 
perf_err() also does the following:

 - it copies the error string either back to a user-space pointer which is in the 
   perf_attr (plus a max length field)

 - or an alternative implementation would be to puts the string into the event's 
   ring buffer, with a special (new) event marker - where it can be recovered by 
   tooling.

(I think the first approach is better, because it would work fine for events 
without ring-buffers as well.)

Old tooling won't have the feature flag set and won't have the string pointer in 
perf_attr, so nothing will happen on that case.

New tooling that supports 'extended error reporting' has the feature flag set and 
the kernel will copy the error string into the provided user-space string buffer, 
where tooling could use that string to generate more meaningful error messages:

  perf syscall error: The BTS hardware feature is not available on this CPU.

Does this make sense to you?

Now an additional complication is the fact that BTS can now also be a separate 
PMU, listed under /sys/bus/event_source/devices/intel_bts/.

If it's not listed there, we don't know the exact reason: is it not available 
because it's an old kernel? Or is it the wrong CPU?

We could solve that by extending the sysfs interface and adding an "error" file to 
the PMU directory: which would contain the reason why the driver was not created.

I.e. if the BTS driver was not created, we'd still have 
/sys/bus/event_source/devices/intel_bts/error (and no other file), which gives 
tooling a good way to discover why a particular PMU is not available. This adds a 
tiny bit more RAM overhead, but it's for the better I think, because tooling could 
be a lot more certain about what the capabilities of the kernel are.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-07-02  9:43           ` Ingo Molnar
@ 2015-07-02 12:35             ` Adrian Hunter
  2015-07-02 13:02               ` Arnaldo Carvalho de Melo
  2015-07-02 19:00               ` Ingo Molnar
  2015-07-03 14:31             ` Alexander Shishkin
  1 sibling, 2 replies; 23+ messages in thread
From: Adrian Hunter @ 2015-07-02 12:35 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Jiri Olsa, David Ahern,
	Namhyung Kim, Peter Zijlstra, Stephane Eranian,
	Arnaldo Carvalho de Melo, Vince Weaver, Alexander Shishkin

On 02/07/15 12:43, Ingo Molnar wrote:
> 
> * Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>> How is this?
>>
>> From: Adrian Hunter <adrian.hunter@intel.com>
>> Date: Wed, 1 Jul 2015 11:14:50 +0300
>> Subject: [PATCH] perf tools: Add error messages for missing intel_bts and
>>  intel_pt support
>>
>> Add error messages to assist users in determining why there is
>> no intel_bts or intel_pt support.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/perf/arch/x86/util/header.c | 15 ++++++++++++++
>>  tools/perf/util/header.h          |  3 ++-
>>  tools/perf/util/parse-events.c    | 41 ++++++++++++++++++++++++++++++++++++++-
>>  3 files changed, 57 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/arch/x86/util/header.c b/tools/perf/arch/x86/util/header.c
>> index 146d12a1cec0..afc5bdfd2d15 100644
>> --- a/tools/perf/arch/x86/util/header.c
>> +++ b/tools/perf/arch/x86/util/header.c
>> @@ -57,3 +57,18 @@ get_cpuid(char *buffer, size_t sz)
>>  	}
>>  	return -1;
>>  }
>> +
>> +int have_intel_cpu(void)
>> +{
>> +	char buffer[64];
>> +	int ret;
>> +
>> +	ret = get_cpuid(buffer, sizeof(buffer));
>> +	if (ret)
>> +		return -1;
>> +
>> +	if (!strncmp(buffer, "GenuineIntel,", 13))
>> +		return 1;
>> +
>> +	return 0;
>> +}
>> diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
>> index d4d57962c591..f6eab49d13d1 100644
>> --- a/tools/perf/util/header.h
>> +++ b/tools/perf/util/header.h
>> @@ -153,8 +153,9 @@ bool is_perf_magic(u64 magic);
>>  int write_padded(int fd, const void *bf, size_t count, size_t count_aligned);
>>  
>>  /*
>> - * arch specific callback
>> + * arch specific callbacks
>>   */
>>  int get_cpuid(char *buffer, size_t sz);
>> +int have_intel_cpu(void);
>>  
>>  #endif /* __PERF_HEADER_H */
>> diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
>> index 09f8d2357108..23fb777b40fa 100644
>> --- a/tools/perf/util/parse-events.c
>> +++ b/tools/perf/util/parse-events.c
>> @@ -656,6 +656,45 @@ static char *pmu_event_name(struct list_head *head_terms)
>>  	return NULL;
>>  }
>>  
>> +int __weak have_intel_cpu(void)
>> +{
>> +	return 0;
>> +}
>> +
>> +static int pmu_not_found_error(char *name, struct parse_events_error *err)
>> +{
>> +	int ret;
>> +
>> +	if (!err)
>> +		goto out;
>> +
>> +	if (!strcmp(name, "intel_bts")) {
>> +		ret = have_intel_cpu();
>> +		if (ret < 0)
>> +			goto out;
>> +		if (!ret) {
>> +			err->str = strdup("intel_bts requires an Intel CPU");
>> +			goto out;
>> +		}
>> +		err->str = strdup("kernel does not support intel_bts (requires 64-bit v4.1 kernel or later and BTS hardware support)");
>> +		goto out;
>> +	}
>> +
>> +	if (!strcmp(name, "intel_pt")) {
>> +		ret = have_intel_cpu();
>> +		if (ret < 0)
>> +			goto out;
>> +		if (!ret) {
>> +			err->str = strdup("intel_pt requires an Intel CPU (Core 5th generation or later)");
>> +			goto out;
>> +		}
>> +		err->str = strdup("kernel does not support intel_pt (requires v4.1 kernel or later and 5th generation Intel Core processor or later)");
>> +		goto out;
>> +	}
>> +out:
>> +	return -EINVAL;
>> +}
>> +
>>  int parse_events_add_pmu(struct parse_events_evlist *data,
>>  			 struct list_head *list, char *name,
>>  			 struct list_head *head_config)
>> @@ -667,7 +706,7 @@ int parse_events_add_pmu(struct parse_events_evlist *data,
>>  
>>  	pmu = perf_pmu__find(name);
>>  	if (!pmu)
>> -		return -EINVAL;
>> +		return pmu_not_found_error(name, data->error);
>>  
>>  	if (pmu->default_config) {
>>  		memcpy(&attr, pmu->default_config,
> 
> So I really think we need an extended error reporting feature on the perf kernel 
> side, so that a 'natural' error (plus a string) is reported back to tooling, 
> instead of the current -EINVAL.
> 
> No need to do it for everything, doing it for BTS and related functionality would 
> be a good first step to start this.
> 
> If you are interested you could try this, or I can try to write something (after 
> the merge window).

Don't have time sorry.

> 
> So the idea would be to convert such opaque error returns:
> 
>         if (attr->config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS &&
>             !attr->freq && hwc->sample_period == 1) {
>                 /* BTS is not supported by this architecture. */
>                 if (!x86_pmu.bts_active)
>                         return -EOPNOTSUPP;
> 
> into:
> 
> 			return perf_err(event, -EOPNOTSUPP, "The BTS hardware feature is not available on this CPU.");
> 
> Where perf_err() is a function that on one hand returns -EOPNOTSUPP - so 'legacy' 
> error handling works as before: the syscall will return -EOPNOTSUPP.
> 
> But if a new 'extended error reporting' feature bit is set in the perf_attr, then 
> perf_err() also does the following:
> 
>  - it copies the error string either back to a user-space pointer which is in the 
>    perf_attr (plus a max length field)
> 
>  - or an alternative implementation would be to puts the string into the event's 
>    ring buffer, with a special (new) event marker - where it can be recovered by 
>    tooling.
> 
> (I think the first approach is better, because it would work fine for events 
> without ring-buffers as well.)
> 
> Old tooling won't have the feature flag set and won't have the string pointer in 
> perf_attr, so nothing will happen on that case.
> 
> New tooling that supports 'extended error reporting' has the feature flag set and 
> the kernel will copy the error string into the provided user-space string buffer, 
> where tooling could use that string to generate more meaningful error messages:
> 
>   perf syscall error: The BTS hardware feature is not available on this CPU.
> 
> Does this make sense to you?

Normally the warts of syscalls are hidden by libraries.  I am not sure a
library couldn't do just as well with the advantage that it would work for
old kernels too.  A library could also have information about multiple
architectures, not just the one that is actually running e.g. your BTS
example won't work on ARM.

A library could provide a similar API to the one you described.  Either it
just does the syscall and returns, or if extended error information is
requested, it probes the API with various options, checks
perf_event_paranoid, and generally uses whatever information it can to best
figure out what went wrong.

> 
> Now an additional complication is the fact that BTS can now also be a separate 
> PMU, listed under /sys/bus/event_source/devices/intel_bts/.
> 
> If it's not listed there, we don't know the exact reason: is it not available 
> because it's an old kernel? Or is it the wrong CPU?
> 
> We could solve that by extending the sysfs interface and adding an "error" file to 
> the PMU directory: which would contain the reason why the driver was not created.
> 
> I.e. if the BTS driver was not created, we'd still have 
> /sys/bus/event_source/devices/intel_bts/error (and no other file), which gives 
> tooling a good way to discover why a particular PMU is not available. This adds a 
> tiny bit more RAM overhead, but it's for the better I think, because tooling could 
> be a lot more certain about what the capabilities of the kernel are.

That presumes the user understands what is or is not available on their
architecture, because on a non-x86 architecture they still get nothing
x86-specific.

It also runs a bit against the way config works e.g. even if the PMU driver
is config'ed out it still has to provide a sysfs interface.

So, isn't the patch I proposed sufficient for now?


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-07-02 12:35             ` Adrian Hunter
@ 2015-07-02 13:02               ` Arnaldo Carvalho de Melo
  2015-07-02 19:00               ` Ingo Molnar
  1 sibling, 0 replies; 23+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-07-02 13:02 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ingo Molnar, linux-kernel, Jiri Olsa, David Ahern, Namhyung Kim,
	Peter Zijlstra, Stephane Eranian, Vince Weaver,
	Alexander Shishkin

Em Thu, Jul 02, 2015 at 03:35:17PM +0300, Adrian Hunter escreveu:
> On 02/07/15 12:43, Ingo Molnar wrote:
> > Old tooling won't have the feature flag set and won't have the string pointer in 
> > perf_attr, so nothing will happen on that case.
> > 
> > New tooling that supports 'extended error reporting' has the feature flag set and 
> > the kernel will copy the error string into the provided user-space string buffer, 
> > where tooling could use that string to generate more meaningful error messages:
> > 
> >   perf syscall error: The BTS hardware feature is not available on this CPU.
> > 
> > Does this make sense to you?
 
> Normally the warts of syscalls are hidden by libraries.  I am not sure a
> library couldn't do just as well with the advantage that it would work for
> old kernels too.  A library could also have information about multiple
> architectures, not just the one that is actually running e.g. your BTS
> example won't work on ARM.
 
> A library could provide a similar API to the one you described.  Either it
> just does the syscall and returns, or if extended error information is
> requested, it probes the API with various options, checks
> perf_event_paranoid, and generally uses whatever information it can to best
> figure out what went wrong.

Agreed, that is what tools/perf/util/evsel.c does, e.g.:

static struct {
        bool sample_id_all;
        bool exclude_guest;
        bool mmap2;
        bool cloexec;
        bool clockid;
        bool clockid_wrong;
} perf_missing_features;

And in __perf_evsel__open():

        /*
         * Must probe features in the order they were added to the
         * perf_event_attr interface.
         */
        if (!perf_missing_features.clockid_wrong && evsel->attr.use_clockid) {
                perf_missing_features.clockid_wrong = true;
                goto fallback_missing_features;
        } else if (!perf_missing_features.clockid && evsel->attr.use_clockid) {
                perf_missing_features.clockid = true;
                goto fallback_missing_features;
        } else if (!perf_missing_features.cloexec && (flags & PERF_FLAG_FD_CLOEXEC)) {
                perf_missing_features.cloexec = true;
                goto fallback_missing_features;
        } else if (!perf_missing_features.mmap2 && evsel->attr.mmap2) {
                perf_missing_features.mmap2 = true;
                goto fallback_missing_features;
        } else if (!perf_missing_features.exclude_guest &&
                   (evsel->attr.exclude_guest || evsel->attr.exclude_host)) {
                perf_missing_features.exclude_guest = true;
                goto fallback_missing_features;
        } else if (!perf_missing_features.sample_id_all) {
                perf_missing_features.sample_id_all = true;
                goto retry_sample_id;
        }

So it does already probe for features available in the kernel and reacts
as best as imagined so far to kernels lacking those features.

The whole point here is: who writes the library? Those who need it to
support some feature they want to have merged?

We can use the existing infrastructure to the fullest extent possible,
but at some point we need to extend it, who does it?
 
> > Now an additional complication is the fact that BTS can now also be a separate 
> > PMU, listed under /sys/bus/event_source/devices/intel_bts/.

> > If it's not listed there, we don't know the exact reason: is it not available 
> > because it's an old kernel? Or is it the wrong CPU?

> > We could solve that by extending the sysfs interface and adding an "error" file to 
> > the PMU directory: which would contain the reason why the driver was not created.

> > I.e. if the BTS driver was not created, we'd still have 
> > /sys/bus/event_source/devices/intel_bts/error (and no other file), which gives 
> > tooling a good way to discover why a particular PMU is not available. This adds a 
> > tiny bit more RAM overhead, but it's for the better I think, because tooling could 
> > be a lot more certain about what the capabilities of the kernel are.
 
> That presumes the user understands what is or is not available on their
> architecture, because on a non-x86 architecture they still get nothing
> x86-specific.
 
> It also runs a bit against the way config works e.g. even if the PMU driver
> is config'ed out it still has to provide a sysfs interface.
 
> So, isn't the patch I proposed sufficient for now?

I haven't tested it so far, but it seems to improve things, given the
current infrastructure.

- Arnaldo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-07-02 12:35             ` Adrian Hunter
  2015-07-02 13:02               ` Arnaldo Carvalho de Melo
@ 2015-07-02 19:00               ` Ingo Molnar
  2015-07-03  9:08                 ` Adrian Hunter
  1 sibling, 1 reply; 23+ messages in thread
From: Ingo Molnar @ 2015-07-02 19:00 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Jiri Olsa, David Ahern,
	Namhyung Kim, Peter Zijlstra, Stephane Eranian,
	Arnaldo Carvalho de Melo, Vince Weaver, Alexander Shishkin


* Adrian Hunter <adrian.hunter@intel.com> wrote:

> > So I really think we need an extended error reporting feature on the perf 
> > kernel side, so that a 'natural' error (plus a string) is reported back to 
> > tooling, instead of the current -EINVAL.
> > 
> > No need to do it for everything, doing it for BTS and related functionality 
> > would be a good first step to start this.
> > 
> > If you are interested you could try this, or I can try to write something 
> > (after the merge window).
> 
> Don't have time sorry.

So who on the Intel side has time to finish Intel/PT support properly?

As a first step I think we need to disable the Intel/PT kernel side - it was
merged with the express promise that proper tooling would come along promptly,
but if that's not happening due to lack of resources, then the kernel side is
obviously in limbo as well.

> > Does this make sense to you?
> 
> Normally the warts of syscalls are hidden by libraries.  I am not sure a library 
> couldn't do just as well with the advantage that it would work for old kernels 
> too.  A library could also have information about multiple architectures, not 
> just the one that is actually running e.g. your BTS example won't work on ARM.

The method I suggested works _both_ with old and new kernels, with the new kernel 
simply giving better error feedback.

And these are not warts really, it's simply a threshold issue: above a certain 
complexity of features it makes sense to introduce a qualitatively better error 
reporting interface. When I tested Intel PT support I happened to go past that 
threshold and it became obvious that we need it.

> A library could provide a similar API to the one you described.  Either it just 
> does the syscall and returns, or if extended error information is requested, it 
> probes the API with various options, checks perf_event_paranoid, and generally 
> uses whatever information it can to best figure out what went wrong.

So I think eventually a proper libperf will crystalize out of perf's syscall 
wrappers (I suggested it for a long time), but this has very little impact on the 
end user who doesn't care whether it's done in perf, or in some intermediate 
library.

So it can be done in the current perf code just as much - when libperf gets 
introduced it will be factored out into tools/lib/perf/ without much problem and 
then other tools can use it too.

> > I.e. if the BTS driver was not created, we'd still have 
> > /sys/bus/event_source/devices/intel_bts/error (and no other file), which gives 
> > tooling a good way to discover why a particular PMU is not available. This 
> > adds a tiny bit more RAM overhead, but it's for the better I think, because 
> > tooling could be a lot more certain about what the capabilities of the kernel 
> > are.
> 
> That presumes the user understands what is or is not available on their 
> architecture, because on a non-x86 architecture they still get nothing 
> x86-specific.
> 
> It also runs a bit against the way config works e.g. even if the PMU driver is 
> config'ed out it still has to provide a sysfs interface.

I partially agree - but it's really a problem that we currently only get two 
states:

  - the PMU driver is there in sysfs
  - the PMU driver is not there in sysfs

while we really want to a lot more about why it's not there - and it's the kernel 
that knows this best, not tooling.

So maybe instead of having a separate directory, we could have a sysfs file that 
listed all the PMU drivers that the kernel knows about, with a status line that 
tells us whether they are:

  - not configured
  - configured but runtime disabled due to reason X or Y or Z
  - configured and enabled

OTOH having the directories with a single file in them is a close substitute, and 
better meets the sysfs 'one file, one value' principle.

> So, isn't the patch I proposed sufficient for now?

Well, no, because it really does the check in the wrong place and introduces 
possible friction between what tooling thinks is the supported environment and 
what the kernel thinks - and thus postpones the problem.

I could live with this current solution as the initial version if a subsequent 
effort fixes it up properly by adding much better error reporting infrastructure, 
but the 'no time' comment makes me doubt the whole Intel/PT feature set.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-07-02 19:00               ` Ingo Molnar
@ 2015-07-03  9:08                 ` Adrian Hunter
  0 siblings, 0 replies; 23+ messages in thread
From: Adrian Hunter @ 2015-07-03  9:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Jiri Olsa, David Ahern,
	Namhyung Kim, Peter Zijlstra, Stephane Eranian,
	Arnaldo Carvalho de Melo, Vince Weaver, Alexander Shishkin

On 02/07/15 22:00, Ingo Molnar wrote:
> 
> * Adrian Hunter <adrian.hunter@intel.com> wrote:
> 
>>> So I really think we need an extended error reporting feature on the perf 
>>> kernel side, so that a 'natural' error (plus a string) is reported back to 
>>> tooling, instead of the current -EINVAL.
>>>
>>> No need to do it for everything, doing it for BTS and related functionality 
>>> would be a good first step to start this.
>>>
>>> If you are interested you could try this, or I can try to write something 
>>> (after the merge window).
>>
>> Don't have time sorry.
> 
> So who on the Intel side has time to finish Intel/PT support properly?
> 
> As a first step I think we need to disable the Intel/PT kernel side - it was
> merged with the express promise that proper tooling would come along promptly,
> but if that's not happening due to lack of resources, then the kernel side is
> obviously in limbo as well.

I am afraid I took your "If you are interested you could try this" literally.

So you are saying you will apply the patches if I develop the extended error
string feature for the perf_event_open syscall?

> 
>>> Does this make sense to you?
>>
>> Normally the warts of syscalls are hidden by libraries.  I am not sure a library 
>> couldn't do just as well with the advantage that it would work for old kernels 
>> too.  A library could also have information about multiple architectures, not 
>> just the one that is actually running e.g. your BTS example won't work on ARM.
> 
> The method I suggested works _both_ with old and new kernels, with the new kernel 
> simply giving better error feedback.
> 
> And these are not warts really, it's simply a threshold issue: above a certain 
> complexity of features it makes sense to introduce a qualitatively better error 
> reporting interface. When I tested Intel PT support I happened to go past that 
> threshold and it became obvious that we need it.
> 
>> A library could provide a similar API to the one you described.  Either it just 
>> does the syscall and returns, or if extended error information is requested, it 
>> probes the API with various options, checks perf_event_paranoid, and generally 
>> uses whatever information it can to best figure out what went wrong.
> 
> So I think eventually a proper libperf will crystalize out of perf's syscall 
> wrappers (I suggested it for a long time), but this has very little impact on the 
> end user who doesn't care whether it's done in perf, or in some intermediate 
> library.
> 
> So it can be done in the current perf code just as much - when libperf gets 
> introduced it will be factored out into tools/lib/perf/ without much problem and 
> then other tools can use it too.
> 
>>> I.e. if the BTS driver was not created, we'd still have 
>>> /sys/bus/event_source/devices/intel_bts/error (and no other file), which gives 
>>> tooling a good way to discover why a particular PMU is not available. This 
>>> adds a tiny bit more RAM overhead, but it's for the better I think, because 
>>> tooling could be a lot more certain about what the capabilities of the kernel 
>>> are.
>>
>> That presumes the user understands what is or is not available on their 
>> architecture, because on a non-x86 architecture they still get nothing 
>> x86-specific.
>>
>> It also runs a bit against the way config works e.g. even if the PMU driver is 
>> config'ed out it still has to provide a sysfs interface.
> 
> I partially agree - but it's really a problem that we currently only get two 
> states:
> 
>   - the PMU driver is there in sysfs
>   - the PMU driver is not there in sysfs
> 
> while we really want to a lot more about why it's not there - and it's the kernel 
> that knows this best, not tooling.
> 
> So maybe instead of having a separate directory, we could have a sysfs file that 
> listed all the PMU drivers that the kernel knows about, with a status line that 
> tells us whether they are:
> 
>   - not configured
>   - configured but runtime disabled due to reason X or Y or Z
>   - configured and enabled
> 
> OTOH having the directories with a single file in them is a close substitute, and 
> better meets the sysfs 'one file, one value' principle.

One file, one value can be done.

What about a directory:

	/sys/bus/event_source/known_pmus/

that contains a subdirectory for each pmu e.g.

	/sys/bus/event_source/known_pmus/intel_pt/

which contains at least 1 file:

	/sys/bus/event_source/known_pmus/intel_pt/status

That makes it easy to add more information by adding new files.

So the core would create and manage the sysfs directories and files.  The
default status would be "Not supported on this architecture". Arch and PMU
drivers would just need to tell the core what their status is.

> 
>> So, isn't the patch I proposed sufficient for now?
> 
> Well, no, because it really does the check in the wrong place and introduces 
> possible friction between what tooling thinks is the supported environment and 
> what the kernel thinks - and thus postpones the problem.
> 
> I could live with this current solution as the initial version if a subsequent 
> effort fixes it up properly by adding much better error reporting infrastructure, 
> but the 'no time' comment makes me doubt the whole Intel/PT feature set.


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [GIT PULL 0/8] perf/pt -> Intel PT/BTS
  2015-07-02  9:43           ` Ingo Molnar
  2015-07-02 12:35             ` Adrian Hunter
@ 2015-07-03 14:31             ` Alexander Shishkin
  1 sibling, 0 replies; 23+ messages in thread
From: Alexander Shishkin @ 2015-07-03 14:31 UTC (permalink / raw)
  To: Ingo Molnar, Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, linux-kernel, Jiri Olsa, David Ahern,
	Namhyung Kim, Peter Zijlstra, Stephane Eranian,
	Arnaldo Carvalho de Melo

Ingo Molnar <mingo@kernel.org> writes:

> So I really think we need an extended error reporting feature on the perf kernel 
> side, so that a 'natural' error (plus a string) is reported back to tooling, 
> instead of the current -EINVAL.
>
> No need to do it for everything, doing it for BTS and related functionality would 
> be a good first step to start this.
>
> If you are interested you could try this, or I can try to write something (after 
> the merge window).
>
> So the idea would be to convert such opaque error returns:
>
>         if (attr->config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS &&
>             !attr->freq && hwc->sample_period == 1) {
>                 /* BTS is not supported by this architecture. */
>                 if (!x86_pmu.bts_active)
>                         return -EOPNOTSUPP;
>
> into:
>
> 			return perf_err(event, -EOPNOTSUPP, "The BTS hardware feature is not available on this CPU.");

So I poked around this a bit and came up with the patch below to give
this topic some more momentum.

Your average error will then be like this:

#define PERF_MODNAME "perf/x86"

...

               /* BTS is currently only allowed for user-mode. */
               if (!attr->exclude_kernel)
                       return perf_err(event, -EOPNOTSUPP,
                                       "BTS sampling not allowed for kernel space");

which in userspace will translate into:

---cut---
$ perf record -e branches:uk -c1 ls
kernel says: {
        "code": -95,
        "module": "perf/x86",
        "message": "BTS sampling not allowed for kernel space"
}

Error:
No hardware sampling interrupt available.
No APIC? If so then you can boot the kernel with the "lapic" boot parameter to force-enable it.
---cut---

The way I hacked it into tools/perf, it still prints out the old error
message (which, to prove the point once again, is neither here nor
there).

The patch below attaches the error to struct perf_event or copies it
directly to user's buffer in case if we don't have an event at the point
of error. Maybe a slightly easier way of doing it would be to strap it
to task_struct instead, but let's not stir the pot too much this time
around.

Also, to add an extra spin, the report is formatted as a JSON object so
that we could include both human-readable and machine-readable bits.

>From a7de169ef2ca5425cd57286bfb61bfd5f8d15738 Mon Sep 17 00:00:00 2001
From: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Date: Fri, 3 Jul 2015 17:12:54 +0300
Subject: [PATCH] perf: Introduce extended syscall error reporting

It has been pointed several times out that perf syscall error reporting
leaves a lot to be desired [1].

This patch introduces a fairly simple extension that allows call sites
to annotate their error codes with arbitrary strings, which will then
be copied to userspace (if they asked for it) along with the module
name that produced the error message in JSON format. This way, we can
provide both human-readable and machine-parsable information to user and
leave room for extensions in the future.

[1] http://marc.info/?l=linux-kernel&m=141470811013082

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 include/linux/perf_event.h      | 38 ++++++++++++++++++++++++
 include/uapi/linux/perf_event.h |  8 ++++-
 kernel/events/core.c            | 66 +++++++++++++++++++++++++++++++++++++++--
 3 files changed, 108 insertions(+), 4 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 1b82d44b0a..15bbef478f 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -56,6 +56,37 @@ struct perf_guest_info_callbacks {
 #include <linux/cgroup.h>
 #include <asm/local.h>
 
+#ifndef PERF_MODNAME
+#define PERF_MODNAME KBUILD_MODNAME
+#endif
+
+/*
+ * Extended error reporting: annotate an error code with a string
+ * and a module name to help users diagnase problems with their
+ * attributes and whatnot.
+ */
+struct perf_err_site {
+	const char		*message;
+	const char		*owner;
+	const int		code;
+};
+
+#ifdef CONFIG_PERF_EVENTS
+
+#define __perf_err(__e, __c, __m) ({			\
+	static struct perf_err_site __err_site = {	\
+		.message	= (__m),		\
+		.owner		= PERF_MODNAME,		\
+		.code		= (__c),		\
+	};						\
+	(__e) = &__err_site;				\
+	(__c);						\
+})
+
+#define perf_err(__evt, __c, __m) ({ __perf_err((__evt)->error, (__c), (__m)); })
+
+#endif
+
 struct perf_callchain_entry {
 	__u64				nr;
 	__u64				ip[PERF_MAX_STACK_DEPTH];
@@ -447,6 +478,13 @@ struct perf_event {
 	struct list_head		owner_entry;
 	struct task_struct		*owner;
 
+	/*
+	 * Extended error reporting
+	 */
+	const struct perf_err_site	*error;
+	void __user			*error_buffer;
+	size_t				error_buffer_size;
+
 	/* mmap bits */
 	struct mutex			mmap_mutex;
 	atomic_t			mmap_count;
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index d97f84c080..d1ae1a079c 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -264,6 +264,7 @@ enum perf_event_read_format {
 					/* add: sample_stack_user */
 #define PERF_ATTR_SIZE_VER4	104	/* add: sample_regs_intr */
 #define PERF_ATTR_SIZE_VER5	112	/* add: aux_watermark */
+#define PERF_ATTR_SIZE_VER6	120	/* add: perf_err */
 
 /*
  * Hardware event_id to monitor via a performance monitoring event:
@@ -374,7 +375,12 @@ struct perf_event_attr {
 	 * Wakeup watermark for AUX area
 	 */
 	__u32	aux_watermark;
-	__u32	__reserved_2;	/* align to __u64 */
+
+	/*
+	 * Extended error reporting buffer
+	 */
+	__u32	perf_err_size;
+	__u64	perf_err;
 };
 
 #define perf_flags(attr)	(*(&(attr)->read_format + 1))
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8e13f3e54e..fd345c96d1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -49,6 +49,60 @@
 
 #include <asm/irq_regs.h>
 
+static bool extended_reporting_enabled(struct perf_event_attr *attr)
+{
+	if (attr->size >= PERF_ATTR_SIZE_VER6 &&
+	    attr->perf_err_size > 0)
+		return true;
+
+	return false;
+}
+
+/*
+ * Provide a JSON formatted error report to the user if they asked for it.
+ */
+static void __perf_error_report(struct perf_event_attr *attr,
+				const struct perf_err_site *err_site)
+{
+	void *buffer;
+
+	if (!err_site || !extended_reporting_enabled(attr))
+		return;
+
+	buffer = kasprintf(GFP_KERNEL,
+			   "{\n"
+			   "\t\"code\": %d,\n"
+			   "\t\"module\": \"%s\",\n"
+			   "\t\"message\": \"%s\"\n"
+			   "}\n",
+			   err_site->code, err_site->owner, err_site->message);
+	if (!buffer)
+		return;
+
+	(void)copy_to_user((void __user *)attr->perf_err, buffer,
+			   attr->perf_err_size);
+	kfree(buffer);
+}
+
+/*
+ * Synchronous version of perf_err(), for the paths where we don't have
+ * an event.
+ */
+#define perf_err_attr(__attr, __c, __m) ({		\
+	struct perf_err_site *__site;			\
+	__perf_err(__site, (__c), (__m));		\
+	__perf_error_report(__attr, __site);		\
+	(__c);						\
+})
+
+/*
+ * Report an error before returning from a syscall
+ */
+static void perf_error_report(struct perf_event *event)
+{
+	__perf_error_report(&event->attr, event->error);
+}
+
 static struct workqueue_struct *perf_wq;
 
 typedef int (*remote_function_f)(void *);
@@ -3594,6 +3648,8 @@ static void _free_event(struct perf_event *event)
  */
 static void free_event(struct perf_event *event)
 {
+	perf_error_report(event);
+
 	if (WARN(atomic_long_cmpxchg(&event->refcount, 1, 0) != 1,
 				"unexpected event refcount: %ld; ptr=%p\n",
 				atomic_long_read(&event->refcount), event)) {
@@ -3661,6 +3717,8 @@ static void put_event(struct perf_event *event)
 	if (!atomic_long_dec_and_test(&event->refcount))
 		return;
 
+	perf_error_report(event);
+
 	if (!is_kernel_event(event))
 		perf_remove_from_owner(event);
 
@@ -7634,6 +7692,7 @@ err_ns:
 		perf_detach_cgroup(event);
 	if (event->ns)
 		put_pid_ns(event->ns);
+	perf_error_report(event);
 	kfree(event);
 
 	return ERR_PTR(err);
@@ -7912,15 +7971,16 @@ SYSCALL_DEFINE5(perf_event_open,
 
 	if (!attr.exclude_kernel) {
 		if (perf_paranoid_kernel() && !capable(CAP_SYS_ADMIN))
-			return -EACCES;
+			return perf_err_attr(&attr, -EACCES,
+					     "kernel tracing forbidden for the unprivileged");
 	}
 
 	if (attr.freq) {
 		if (attr.sample_freq > sysctl_perf_event_sample_rate)
-			return -EINVAL;
+			return perf_err_attr(&attr, -EINVAL, "sample_freq too high");
 	} else {
 		if (attr.sample_period & (1ULL << 63))
-			return -EINVAL;
+			return perf_err_attr(&attr, -EINVAL, "sample_period too high");
 	}
 
 	/*
-- 
2.1.4




^ permalink raw reply related	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2015-07-03 14:34 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-26 22:02 [GIT PULL 0/8] perf/pt -> Intel PT/BTS Arnaldo Carvalho de Melo
2015-06-26 22:02 ` [PATCH 1/8] perf auxtrace: Add Intel PT as an AUX area tracing type Arnaldo Carvalho de Melo
2015-06-26 22:02 ` [PATCH 2/8] perf tools: Add Intel PT packet decoder Arnaldo Carvalho de Melo
2015-06-26 22:02 ` [PATCH 3/8] perf tools: Add Intel PT instruction decoder Arnaldo Carvalho de Melo
2015-06-26 22:02 ` [PATCH 4/8] perf tools: Add Intel PT log Arnaldo Carvalho de Melo
2015-06-26 22:02 ` [PATCH 5/8] perf tools: Add Intel PT decoder Arnaldo Carvalho de Melo
2015-06-26 22:02 ` [PATCH 6/8] perf tools: Add Intel PT support Arnaldo Carvalho de Melo
2015-06-26 22:02 ` [PATCH 7/8] perf tools: Take Intel PT into use Arnaldo Carvalho de Melo
2015-06-26 22:02 ` [PATCH 8/8] perf tools: Add Intel BTS support Arnaldo Carvalho de Melo
2015-06-30  4:58 ` [GIT PULL 0/8] perf/pt -> Intel PT/BTS Ingo Molnar
2015-06-30  7:54   ` Adrian Hunter
2015-06-30 10:56     ` Ingo Molnar
2015-06-30 13:23       ` Adrian Hunter
2015-06-30 14:39         ` Arnaldo Carvalho de Melo
2015-07-01  8:19         ` Adrian Hunter
2015-07-01 12:47           ` Adrian Hunter
2015-07-02  9:28             ` Ingo Molnar
2015-07-02  9:43           ` Ingo Molnar
2015-07-02 12:35             ` Adrian Hunter
2015-07-02 13:02               ` Arnaldo Carvalho de Melo
2015-07-02 19:00               ` Ingo Molnar
2015-07-03  9:08                 ` Adrian Hunter
2015-07-03 14:31             ` Alexander Shishkin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.