All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area  and Instruction Tracing
@ 2015-05-29 13:33 Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 01/17] perf db-export: Fix thread ref-counting Adrian Hunter
                   ` (16 more replies)
  0 siblings, 17 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Hi

Here is V6 patches for the introduction of an abstraction for
using the AUX area and Instruction tracing. The patches for
AUX area support have been applied, just leaving patches for
Intel PT and Intel BTS.

The patches can also be found here:

	http://git.infradead.org/users/ahunter/linux-perf.git

An example (unchanged from V3) perf.data file and build id archive
can be found here:

	http://git.infradead.org/~ahunter/tfr/

There is also a tar of the 3 most relevant files with debugging
symbols. These need to be placed in under the correct paths in
/usr/lib/debug to get symbols.

Changes in V6:

   Some minor expansion of commit messages.

   Patches already applied:
      perf tools: Disallow PMU events intel_pt and intel_bts until there is support

   perf db-export: Fix thread ref-counting
      New patch

   perf tools: Ensure thread-stack is flushed
      New patch

   perf tools: Add Intel PT support
      Support thread ref-counting

   perf tools: Add Intel PT decoder
      Fix a bug: FUP packet in PSB to update last IP

   perf tools: Take Intel PT into use
      Add Overview and Quickstart sections to intel_pt.txt

   perf tools: Add Intel BTS support
      Add Overview to intel_bts.txt
      Support thread ref-counting

   perf tools: Add example call-graph script
      Add documentation comments to scripts

Changes in V5:

   Patches already applied:
      perf report: Fix placement of itrace option in documentation
      perf tools: Add AUX area tracing index
      perf tools: Hit all build ids when AUX area tracing
      perf tools: Add build option NO_AUXTRACE to exclude AUX area tracing
      perf auxtrace: Add option to synthesize events for transactions
      perf tools: Add support for PERF_RECORD_AUX
      perf tools: Add support for PERF_RECORD_ITRACE_START
      perf tools: Add AUX area tracing Snapshot Mode
      perf record: Add AUX area tracing Snapshot Mode support

   perf tools: Disallow PMU events intel_pt and intel_bts until there is support
      New patch

   perf tools: Add Intel PT decoder
      Style improvements pointed out by Acme: aligning '=', single line initializing
      Make use of zalloc() not malloc / memset
      Make use of zfree
      Map internal error codes to fixed constants for output
      Change intel_pt_error_message() to intel_pt__strerror()

   perf tools: Add Intel PT support
      Make use of zfree

   perf tools: Take Intel PT into use
      Allow "intel_pt" PMU to be selected as an event

   perf tools: Add Intel BTS support
      Allow "intel_bts" PMU to be selected as an event
      Make use of zfree
      Map internal error codes to fixed constants for output
      Let "intel_bts" show up in 'perf list'

   perf tools: Output sample flags and insn_len from intel_bts
      Map internal error codes to fixed constants for output

Changes on V4:

   perf tools: Amend mmap ref counting for the AUX area mmap
      Dropped because already applied

   perf script: Always allow fields 'addr' and 'cpu' for auxtrace
      Dropped because already applied

   perf report: Add Instruction Tracing support
      Dropped because already applied

   perf report: Fix placement of itrace option in documentation
      New patch

   perf tools: Add AUX area tracing index
      Change size checks for more flexibility i.e.
      - don't mind if an indexed auxtrace_event is bigger than
      struct auxtrace_event
      - don't mind if the auxtrace index does not fill the whole
      file section
      Rename 'index' variable to 'ent' to avoid build errors on
      older gcc

   perf tools: Add build option NO_AUXTRACE to exclude AUX area tracing
      Fix whitespace alignment of NO_AUXTRACE=1
      Add NO_AUXTRACE=1 to make_minimal

   perf tools: Add support for PERF_RECORD_AUX
      Expand commit message

   perf tools: Add AUX area tracing Snapshot Mode
      Whitespace fixups

   perf record: Add AUX area tracing Snapshot Mode support
      Whitespace fixups
      Don't init static variables to 0 or NULL

   perf tools: Add Intel PT packet decoder
      Whitespace fixups

   perf tools: Add Intel PT instruction decoder
      Avoid build error on older (broken) gcc by adding -Wno-override-init
      Avoid build errors due to funny collate sequences i.e. use LC_COLLATE=C etc

   perf tools: Add Intel PT decoder
      Avoid build errors initializing structures to 0

   perf tools: Add Intel PT support
      Avoid build errors initializing structures to 0
      Allow for perf_pmu__config_terms() having an extra parameter now
      Allow for parse_events() having an extra parameter now
      Rename 'div' variable to 'd' to avoid build errors
      Whitespace fixup
      Remove a couple of unused enums

   perf tools: Add Intel BTS support
      Avoid build errors initializing structures to 0
      Allow for parse_events() having an extra parameter now

   perf tools: Put itrace options into an asciidoc include
      New patch

Changes in V3:

   New patch:
      perf tools: Amend mmap ref counting for the AUX area mmap

   Move some code under arch:
      perf tools: Add Intel PT support
      perf tools: Add Intel BTS support

   Updated documentation:
      perf report: Add Instruction Tracing support
      perf auxtrace: Add option to synthesize events for transactions
      perf tools: Take Intel PT into use
      perf tools: Add Intel BTS support

   Patches already applied:
      perf header: Add AUX area tracing feature
      perf evlist: Add support for mmapping an AUX area buffer
      perf tools: Add user events for AUX area tracing
      perf tools: Add support for AUX area recording
      perf record: Add basic AUX area tracing support
      perf record: Extend -m option for AUX area tracing mmap pages
      perf tools: Add a user event for AUX area tracing errors
      perf session: Add hooks to allow transparent decoding of AUX area tracing data
      perf session: Add instruction tracing options
      perf auxtrace: Add helpers for AUX area tracing errors
      perf auxtrace: Add helpers for queuing AUX area tracing data
      perf auxtrace: Add a heap for sorting AUX area tracing queues
      perf auxtrace: Add processing for AUX area tracing events
      perf auxtrace: Add a hashtable for caching
      perf tools: Add member to struct dso for an instruction cache
      perf script: Add Instruction Tracing support
      perf inject: Re-pipe AUX area tracing events
      perf inject: Add Instruction Tracing support
      perf script: Add field option 'flags' to print sample flags
      perf tools: Add aux_watermark member of struct perf_event_attr

Changes in V2:

   Get rid of MIN()
      perf auxtrace: Add helpers for AUX area tracing errors
      perf inject: Re-pipe AUX area tracing events
      perf tools: Add build option NO_AUXTRACE to exclude AUX area tracing


Intel BTS can be used on most recent Intel CPUs. Intel PT
is available on Broadwell.

Examples:

	Trace 'ls' with Intel BTS userspace only

	perf record --per-thread -e intel_bts//u ls
	perf report
	perf script

	Trace 'ls' with Intel BTS kernel and userspace

	~/libexec/perf-core/perf-with-kcore record bts-ls --per-thread -e intel_bts// -- ls
	~/libexec/perf-core/perf-with-kcore report bts-ls
	~/libexec/perf-core/perf-with-kcore script bts-ls

	Trace 'ls' with Intel PT userspace only

	perf record -e intel_pt//u ls
	perf report
	perf script

	Trace 'ls' with Intel PT kernel and userspace

	~/libexec/perf-core/perf-with-kcore record pt-ls -e intel_pt// -- ls
	~/libexec/perf-core/perf-with-kcore report pt-ls
	~/libexec/perf-core/perf-with-kcore script pt-ls


The abstraction has two separate aspects:
	1. recording AUX area data
	2. processing AUX area data

Recording consists of mmapping a separate buffer and copying
the data into the perf.data file.  The buffer is an AUX area
buffer.  The data is written preceded by a new user event
PERF_RECORD_AUXTRACE.  The data is too big to fit in the event
but follows immediately afterward. Session processing has to
skip to get to the next event header in a similar fashion to
the existing PERF_RECORD_HEADER_TRACING_DATA
event.  The main recording patches are:

      perf evlist: Add support for mmapping an AUX area buffer
      perf tools: Add user events for AUX area tracing
      perf tools: Add support for AUX area recording
      perf record: Add basic AUX area tracing support

Processing consists of providing hooks in session processing
to enable a decoder to see all the events and deliver synthesized
events transparently into the event stream.  The main processing
patch is:

      perf session: Add hooks to allow transparent decoding of AUX area tracing data


Adrian Hunter (17):
      perf db-export: Fix thread ref-counting
      perf tools: Ensure thread-stack is flushed
      perf auxtrace: Add Intel PT as an AUX area tracing type
      perf tools: Add Intel PT packet decoder
      perf tools: Add Intel PT instruction decoder
      perf tools: Add Intel PT log
      perf tools: Add Intel PT decoder
      perf tools: Add Intel PT support
      perf tools: Take Intel PT into use
      perf tools: Allow auxtrace data alignment
      perf tools: Add Intel BTS support
      perf tools: Output sample flags and insn_len from intel_pt
      perf tools: Output sample flags and insn_len from intel_bts
      perf tools: Intel PT to always update thread stack trace number
      perf tools: Intel BTS to always update thread stack trace number
      perf tools: Put itrace options into an asciidoc include
      perf tools: Add example call-graph script

 tools/build/Makefile.build                         |    2 +
 tools/perf/.gitignore                              |    2 +
 tools/perf/Documentation/intel-bts.txt             |   86 +
 tools/perf/Documentation/intel-pt.txt              |  588 ++++++
 tools/perf/Documentation/itrace.txt                |   22 +
 tools/perf/Documentation/perf-inject.txt           |   23 +-
 tools/perf/Documentation/perf-report.txt           |   23 +-
 tools/perf/Documentation/perf-script.txt           |   23 +-
 tools/perf/Makefile.perf                           |   12 +-
 tools/perf/arch/x86/util/Build                     |    5 +
 tools/perf/arch/x86/util/auxtrace.c                |   83 +
 tools/perf/arch/x86/util/intel-bts.c               |  458 +++++
 tools/perf/arch/x86/util/intel-pt.c                |  752 ++++++++
 tools/perf/arch/x86/util/pmu.c                     |   18 +
 .../scripts/python/call-graph-from-postgresql.py   |  327 ++++
 tools/perf/scripts/python/export-to-postgresql.py  |   47 +
 tools/perf/util/Build                              |    3 +
 tools/perf/util/auxtrace.c                         |   16 +-
 tools/perf/util/auxtrace.h                         |    3 +
 tools/perf/util/db-export.c                        |   19 +-
 tools/perf/util/intel-bts.c                        |  921 ++++++++++
 tools/perf/util/intel-bts.h                        |   43 +
 tools/perf/util/intel-pt-decoder/Build             |   14 +
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 1759 ++++++++++++++++++
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |  102 ++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.c  |  246 +++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |   65 +
 tools/perf/util/intel-pt-decoder/intel-pt-log.c    |  155 ++
 tools/perf/util/intel-pt-decoder/intel-pt-log.h    |   52 +
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   |  400 +++++
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.h   |   64 +
 tools/perf/util/intel-pt.c                         | 1895 ++++++++++++++++++++
 tools/perf/util/intel-pt.h                         |   51 +
 tools/perf/util/machine.c                          |   21 +
 tools/perf/util/machine.h                          |    3 +
 tools/perf/util/pmu.c                              |    4 -
 tools/perf/util/session.c                          |   20 +
 tools/perf/util/thread-stack.c                     |   18 +-
 tools/perf/util/thread-stack.h                     |    1 +
 39 files changed, 8259 insertions(+), 87 deletions(-)
 create mode 100644 tools/perf/Documentation/intel-bts.txt
 create mode 100644 tools/perf/Documentation/intel-pt.txt
 create mode 100644 tools/perf/Documentation/itrace.txt
 create mode 100644 tools/perf/arch/x86/util/auxtrace.c
 create mode 100644 tools/perf/arch/x86/util/intel-bts.c
 create mode 100644 tools/perf/arch/x86/util/intel-pt.c
 create mode 100644 tools/perf/arch/x86/util/pmu.c
 create mode 100644 tools/perf/scripts/python/call-graph-from-postgresql.py
 create mode 100644 tools/perf/util/intel-bts.c
 create mode 100644 tools/perf/util/intel-bts.h
 create mode 100644 tools/perf/util/intel-pt-decoder/Build
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
 create mode 100644 tools/perf/util/intel-pt.c
 create mode 100644 tools/perf/util/intel-pt.h


Regards
Adrian

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [PATCH V6 01/17] perf db-export: Fix thread ref-counting
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-05-29 18:35   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 02/17] perf tools: Ensure thread-stack is flushed Adrian Hunter
                   ` (15 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Thread ref-counting was not done for get_main_thread() meaning
that there was a thread__get() from machine__find_thread()
that was not being paired with thread__put(). Fix that.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/db-export.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c
index eb7a2ac..1c9689e 100644
--- a/tools/perf/util/db-export.c
+++ b/tools/perf/util/db-export.c
@@ -234,7 +234,7 @@ int db_export__symbol(struct db_export *dbe, struct symbol *sym,
 static struct thread *get_main_thread(struct machine *machine, struct thread *thread)
 {
 	if (thread->pid_ == thread->tid)
-		return thread;
+		return thread__get(thread);
 
 	if (thread->pid_ == -1)
 		return NULL;
@@ -308,19 +308,18 @@ int db_export__sample(struct db_export *dbe, union perf_event *event,
 	if (err)
 		return err;
 
-	/* FIXME: check refcounting for get_main_thread, that calls machine__find_thread... */
 	main_thread = get_main_thread(al->machine, thread);
 	if (main_thread)
 		comm = machine__thread_exec_comm(al->machine, main_thread);
 
 	err = db_export__thread(dbe, thread, al->machine, comm);
 	if (err)
-		return err;
+		goto out_put;
 
 	if (comm) {
 		err = db_export__comm(dbe, comm, main_thread);
 		if (err)
-			return err;
+			goto out_put;
 		es.comm_db_id = comm->db_id;
 	}
 
@@ -328,7 +327,7 @@ int db_export__sample(struct db_export *dbe, union perf_event *event,
 
 	err = db_ids_from_al(dbe, al, &es.dso_db_id, &es.sym_db_id, &es.offset);
 	if (err)
-		return err;
+		goto out_put;
 
 	if ((evsel->attr.sample_type & PERF_SAMPLE_ADDR) &&
 	    sample_addr_correlates_sym(&evsel->attr)) {
@@ -338,20 +337,22 @@ int db_export__sample(struct db_export *dbe, union perf_event *event,
 		err = db_ids_from_al(dbe, &addr_al, &es.addr_dso_db_id,
 				     &es.addr_sym_db_id, &es.addr_offset);
 		if (err)
-			return err;
+			goto out_put;
 		if (dbe->crp) {
 			err = thread_stack__process(thread, comm, sample, al,
 						    &addr_al, es.db_id,
 						    dbe->crp);
 			if (err)
-				return err;
+				goto out_put;
 		}
 	}
 
 	if (dbe->export_sample)
-		return dbe->export_sample(dbe, &es);
+		err = dbe->export_sample(dbe, &es);
 
-	return 0;
+out_put:
+	thread__put(main_thread);
+	return err;
 }
 
 static struct {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 02/17] perf tools: Ensure thread-stack is flushed
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 01/17] perf db-export: Fix thread ref-counting Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-06-18 21:56   ` Arnaldo Carvalho de Melo
  2015-06-19 23:15   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 03/17] perf auxtrace: Add Intel PT as an AUX area tracing type Adrian Hunter
                   ` (14 subsequent siblings)
  16 siblings, 2 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

The thread-stack represents a thread's current stack.  When
a thread exits there can still be many functions on the stack
e.g. exit() can be called many levels deep, so all the callers
will never return.  To get that information output, the
thread-stack must be flushed.

Previously it was assumed the thread-stack would be flushed
when the struct thread was deleted.  With thread ref-counting
it is no longer clear when that will be, if ever. So instead
explicitly flush all the thread-stacks at the end of a session.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/machine.c      | 21 +++++++++++++++++++++
 tools/perf/util/machine.h      |  3 +++
 tools/perf/util/session.c      | 20 ++++++++++++++++++++
 tools/perf/util/thread-stack.c | 18 +++++++++++++-----
 tools/perf/util/thread-stack.h |  1 +
 5 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 0c0e61c..c0c29b9 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1845,6 +1845,27 @@ int machine__for_each_thread(struct machine *machine,
 	return rc;
 }
 
+int machines__for_each_thread(struct machines *machines,
+			      int (*fn)(struct thread *thread, void *p),
+			      void *priv)
+{
+	struct rb_node *nd;
+	int rc = 0;
+
+	rc = machine__for_each_thread(&machines->host, fn, priv);
+	if (rc != 0)
+		return rc;
+
+	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
+		struct machine *machine = rb_entry(nd, struct machine, rb_node);
+
+		rc = machine__for_each_thread(machine, fn, priv);
+		if (rc != 0)
+			return rc;
+	}
+	return rc;
+}
+
 int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
 				  struct target *target, struct thread_map *threads,
 				  perf_event__handler_t process, bool data_mmap)
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index c7963c6..6b4a6fb 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -213,6 +213,9 @@ size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp);
 int machine__for_each_thread(struct machine *machine,
 			     int (*fn)(struct thread *thread, void *p),
 			     void *priv);
+int machines__for_each_thread(struct machines *machines,
+			      int (*fn)(struct thread *thread, void *p),
+			      void *priv);
 
 int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
 				  struct target *target, struct thread_map *threads,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 39fe09d..b44bb2a 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -16,6 +16,7 @@
 #include "perf_regs.h"
 #include "asm/bug.h"
 #include "auxtrace.h"
+#include "thread-stack.h"
 
 static int perf_session__deliver_event(struct perf_session *session,
 				       union perf_event *event,
@@ -1320,6 +1321,19 @@ static void perf_session__warn_about_errors(const struct perf_session *session)
 	events_stats__auxtrace_error_warn(stats);
 }
 
+static int perf_session__flush_thread_stack(struct thread *thread,
+					    void *p __maybe_unused)
+{
+	return thread_stack__flush(thread);
+}
+
+static int perf_session__flush_thread_stacks(struct perf_session *session)
+{
+	return machines__for_each_thread(&session->machines,
+					 perf_session__flush_thread_stack,
+					 NULL);
+}
+
 volatile int session_done;
 
 static int __perf_session__process_pipe_events(struct perf_session *session)
@@ -1409,6 +1423,9 @@ done:
 	if (err)
 		goto out_err;
 	err = auxtrace__flush_events(session, tool);
+	if (err)
+		goto out_err;
+	err = perf_session__flush_thread_stacks(session);
 out_err:
 	free(buf);
 	perf_session__warn_about_errors(session);
@@ -1559,6 +1576,9 @@ out:
 	if (err)
 		goto out_err;
 	err = auxtrace__flush_events(session, tool);
+	if (err)
+		goto out_err;
+	err = perf_session__flush_thread_stacks(session);
 out_err:
 	ui_progress__finish();
 	perf_session__warn_about_errors(session);
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index 9ed59a4..679688e 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -219,7 +219,7 @@ static int thread_stack__call_return(struct thread *thread,
 	return crp->process(&cr, crp->data);
 }
 
-static int thread_stack__flush(struct thread *thread, struct thread_stack *ts)
+static int __thread_stack__flush(struct thread *thread, struct thread_stack *ts)
 {
 	struct call_return_processor *crp = ts->crp;
 	int err;
@@ -242,6 +242,14 @@ static int thread_stack__flush(struct thread *thread, struct thread_stack *ts)
 	return 0;
 }
 
+int thread_stack__flush(struct thread *thread)
+{
+	if (thread->ts)
+		return __thread_stack__flush(thread, thread->ts);
+
+	return 0;
+}
+
 int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
 			u64 to_ip, u16 insn_len, u64 trace_nr)
 {
@@ -264,7 +272,7 @@ int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
 	 */
 	if (trace_nr != thread->ts->trace_nr) {
 		if (thread->ts->trace_nr)
-			thread_stack__flush(thread, thread->ts);
+			__thread_stack__flush(thread, thread->ts);
 		thread->ts->trace_nr = trace_nr;
 	}
 
@@ -297,7 +305,7 @@ void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr)
 
 	if (trace_nr != thread->ts->trace_nr) {
 		if (thread->ts->trace_nr)
-			thread_stack__flush(thread, thread->ts);
+			__thread_stack__flush(thread, thread->ts);
 		thread->ts->trace_nr = trace_nr;
 	}
 }
@@ -305,7 +313,7 @@ void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr)
 void thread_stack__free(struct thread *thread)
 {
 	if (thread->ts) {
-		thread_stack__flush(thread, thread->ts);
+		__thread_stack__flush(thread, thread->ts);
 		zfree(&thread->ts->stack);
 		zfree(&thread->ts);
 	}
@@ -689,7 +697,7 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
 
 	/* Flush stack on exec */
 	if (ts->comm != comm && thread->pid_ == thread->tid) {
-		err = thread_stack__flush(thread, ts);
+		err = __thread_stack__flush(thread, ts);
 		if (err)
 			return err;
 		ts->comm = comm;
diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
index b843bbe..e1528f1 100644
--- a/tools/perf/util/thread-stack.h
+++ b/tools/perf/util/thread-stack.h
@@ -96,6 +96,7 @@ int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
 void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr);
 void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
 			  size_t sz, u64 ip);
+int thread_stack__flush(struct thread *thread);
 void thread_stack__free(struct thread *thread);
 
 struct call_return_processor *
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 03/17] perf auxtrace: Add Intel PT as an AUX area tracing type
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 01/17] perf db-export: Fix thread ref-counting Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 02/17] perf tools: Ensure thread-stack is flushed Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 04/17] perf tools: Add Intel PT packet decoder Adrian Hunter
                   ` (13 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add the Intel Processor Trace type constant PERF_AUXTRACE_INTEL_PT.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
---
 tools/perf/util/auxtrace.c | 1 +
 tools/perf/util/auxtrace.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index df66966..734c4d2 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -884,6 +884,7 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 		fprintf(stdout, " type: %u\n", type);
 
 	switch (type) {
+	case PERF_AUXTRACE_INTEL_PT:
 	case PERF_AUXTRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index a171abb..ed98743 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -39,6 +39,7 @@ struct events_stats;
 
 enum auxtrace_type {
 	PERF_AUXTRACE_UNKNOWN,
+	PERF_AUXTRACE_INTEL_PT,
 };
 
 enum itrace_period_type {
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 04/17] perf tools: Add Intel PT packet decoder
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (2 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 03/17] perf auxtrace: Add Intel PT as an AUX area tracing type Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 05/17] perf tools: Add Intel PT instruction decoder Adrian Hunter
                   ` (12 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add support for decoding Intel Processor Trace packets.

This essentially provides intel_pt_get_packet() which
takes a buffer of binary data and returns the decoded
packet.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/Build                              |   1 +
 tools/perf/util/intel-pt-decoder/Build             |   1 +
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   | 400 +++++++++++++++++++++
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.h   |  64 ++++
 4 files changed, 466 insertions(+)
 create mode 100644 tools/perf/util/intel-pt-decoder/Build
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h

diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index e4b676d..86c81f6 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -75,6 +75,7 @@ libperf-$(CONFIG_X86) += tsc.o
 libperf-y += cloexec.o
 libperf-y += thread-stack.o
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 libperf-y += parse-branch-options.o
 
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
new file mode 100644
index 0000000..9d67381
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -0,0 +1 @@
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
new file mode 100644
index 0000000..988c82c
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
@@ -0,0 +1,400 @@
+/*
+ * intel_pt_pkt_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <endian.h>
+#include <byteswap.h>
+
+#include "intel-pt-pkt-decoder.h"
+
+#define BIT(n)		(1 << (n))
+
+#define BIT63		((uint64_t)1 << 63)
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define le16_to_cpu bswap_16
+#define le32_to_cpu bswap_32
+#define le64_to_cpu bswap_64
+#define memcpy_le64(d, s, n) do { \
+	memcpy((d), (s), (n));    \
+	*(d) = le64_to_cpu(*(d)); \
+} while (0)
+#else
+#define le16_to_cpu
+#define le32_to_cpu
+#define le64_to_cpu
+#define memcpy_le64 memcpy
+#endif
+
+static const char * const packet_name[] = {
+	[INTEL_PT_BAD]		= "Bad Packet!",
+	[INTEL_PT_PAD]		= "PAD",
+	[INTEL_PT_TNT]		= "TNT",
+	[INTEL_PT_TIP_PGD]	= "TIP.PGD",
+	[INTEL_PT_TIP_PGE]	= "TIP.PGE",
+	[INTEL_PT_TSC]		= "TSC",
+	[INTEL_PT_MODE_EXEC]	= "MODE.Exec",
+	[INTEL_PT_MODE_TSX]	= "MODE.TSX",
+	[INTEL_PT_TIP]		= "TIP",
+	[INTEL_PT_FUP]		= "FUP",
+	[INTEL_PT_PSB]		= "PSB",
+	[INTEL_PT_PSBEND]	= "PSBEND",
+	[INTEL_PT_CBR]		= "CBR",
+	[INTEL_PT_PIP]		= "PIP",
+	[INTEL_PT_OVF]		= "OVF",
+};
+
+const char *intel_pt_pkt_name(enum intel_pt_pkt_type type)
+{
+	return packet_name[type];
+}
+
+static int intel_pt_get_long_tnt(const unsigned char *buf, size_t len,
+				 struct intel_pt_pkt *packet)
+{
+	uint64_t payload;
+	int count;
+
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	payload = le64_to_cpu(*(uint64_t *)buf);
+
+	for (count = 47; count; count--) {
+		if (payload & BIT63)
+			break;
+		payload <<= 1;
+	}
+
+	packet->type = INTEL_PT_TNT;
+	packet->count = count;
+	packet->payload = payload << 1;
+	return 8;
+}
+
+static int intel_pt_get_pip(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	uint64_t payload = 0;
+
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	packet->type = INTEL_PT_PIP;
+	memcpy_le64(&payload, buf + 2, 6);
+	packet->payload = payload >> 1;
+
+	return 8;
+}
+
+static int intel_pt_get_cbr(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 4)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_CBR;
+	packet->payload = buf[2];
+	return 4;
+}
+
+static int intel_pt_get_ovf(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_OVF;
+	return 2;
+}
+
+static int intel_pt_get_psb(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	int i;
+
+	if (len < 16)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	for (i = 2; i < 16; i += 2) {
+		if (buf[i] != 2 || buf[i + 1] != 0x82)
+			return INTEL_PT_BAD_PACKET;
+	}
+
+	packet->type = INTEL_PT_PSB;
+	return 16;
+}
+
+static int intel_pt_get_psbend(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_PSBEND;
+	return 2;
+}
+
+static int intel_pt_get_pad(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_PAD;
+	return 1;
+}
+
+static int intel_pt_get_ext(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	switch (buf[1]) {
+	case 0xa3: /* Long TNT */
+		return intel_pt_get_long_tnt(buf, len, packet);
+	case 0x43: /* PIP */
+		return intel_pt_get_pip(buf, len, packet);
+	case 0x03: /* CBR */
+		return intel_pt_get_cbr(buf, len, packet);
+	case 0xf3: /* OVF */
+		return intel_pt_get_ovf(packet);
+	case 0x82: /* PSB */
+		return intel_pt_get_psb(buf, len, packet);
+	case 0x23: /* PSBEND */
+		return intel_pt_get_psbend(packet);
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+}
+
+static int intel_pt_get_short_tnt(unsigned int byte,
+				  struct intel_pt_pkt *packet)
+{
+	int count;
+
+	for (count = 6; count; count--) {
+		if (byte & BIT(7))
+			break;
+		byte <<= 1;
+	}
+
+	packet->type = INTEL_PT_TNT;
+	packet->count = count;
+	packet->payload = (uint64_t)byte << 57;
+
+	return 1;
+}
+
+static int intel_pt_get_ip(enum intel_pt_pkt_type type, unsigned int byte,
+			   const unsigned char *buf, size_t len,
+			   struct intel_pt_pkt *packet)
+{
+	switch (byte >> 5) {
+	case 0:
+		packet->count = 0;
+		break;
+	case 1:
+		if (len < 3)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 2;
+		packet->payload = le16_to_cpu(*(uint16_t *)(buf + 1));
+		break;
+	case 2:
+		if (len < 5)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 4;
+		packet->payload = le32_to_cpu(*(uint32_t *)(buf + 1));
+		break;
+	case 3:
+	case 6:
+		if (len < 7)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 6;
+		memcpy_le64(&packet->payload, buf + 1, 6);
+		break;
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+
+	packet->type = type;
+
+	return packet->count + 1;
+}
+
+static int intel_pt_get_mode(const unsigned char *buf, size_t len,
+			     struct intel_pt_pkt *packet)
+{
+	if (len < 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	switch (buf[1] >> 5) {
+	case 0:
+		packet->type = INTEL_PT_MODE_EXEC;
+		switch (buf[1] & 3) {
+		case 0:
+			packet->payload = 16;
+			break;
+		case 1:
+			packet->payload = 64;
+			break;
+		case 2:
+			packet->payload = 32;
+			break;
+		default:
+			return INTEL_PT_BAD_PACKET;
+		}
+		break;
+	case 1:
+		packet->type = INTEL_PT_MODE_TSX;
+		if ((buf[1] & 3) == 3)
+			return INTEL_PT_BAD_PACKET;
+		packet->payload = buf[1] & 3;
+		break;
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+
+	return 2;
+}
+
+static int intel_pt_get_tsc(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_TSC;
+	memcpy_le64(&packet->payload, buf + 1, 7);
+	return 8;
+}
+
+static int intel_pt_do_get_packet(const unsigned char *buf, size_t len,
+				  struct intel_pt_pkt *packet)
+{
+	unsigned int byte;
+
+	memset(packet, 0, sizeof(struct intel_pt_pkt));
+
+	if (!len)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	byte = buf[0];
+	if (!(byte & BIT(0))) {
+		if (byte == 0)
+			return intel_pt_get_pad(packet);
+		if (byte == 2)
+			return intel_pt_get_ext(buf, len, packet);
+		return intel_pt_get_short_tnt(byte, packet);
+	}
+
+	switch (byte & 0x1f) {
+	case 0x0D:
+		return intel_pt_get_ip(INTEL_PT_TIP, byte, buf, len, packet);
+	case 0x11:
+		return intel_pt_get_ip(INTEL_PT_TIP_PGE, byte, buf, len,
+				       packet);
+	case 0x01:
+		return intel_pt_get_ip(INTEL_PT_TIP_PGD, byte, buf, len,
+				       packet);
+	case 0x1D:
+		return intel_pt_get_ip(INTEL_PT_FUP, byte, buf, len, packet);
+	case 0x19:
+		switch (byte) {
+		case 0x99:
+			return intel_pt_get_mode(buf, len, packet);
+		case 0x19:
+			return intel_pt_get_tsc(buf, len, packet);
+		default:
+			return INTEL_PT_BAD_PACKET;
+		}
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+}
+
+int intel_pt_get_packet(const unsigned char *buf, size_t len,
+			struct intel_pt_pkt *packet)
+{
+	int ret;
+
+	ret = intel_pt_do_get_packet(buf, len, packet);
+	if (ret > 0) {
+		while (ret < 8 && len > (size_t)ret && !buf[ret])
+			ret += 1;
+	}
+	return ret;
+}
+
+int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf,
+		      size_t buf_len)
+{
+	int ret, i;
+	unsigned long long payload = packet->payload;
+	const char *name = intel_pt_pkt_name(packet->type);
+
+	switch (packet->type) {
+	case INTEL_PT_BAD:
+	case INTEL_PT_PAD:
+	case INTEL_PT_PSB:
+	case INTEL_PT_PSBEND:
+	case INTEL_PT_OVF:
+		return snprintf(buf, buf_len, "%s", name);
+	case INTEL_PT_TNT: {
+		size_t blen = buf_len;
+
+		ret = snprintf(buf, blen, "%s ", name);
+		if (ret < 0)
+			return ret;
+		buf += ret;
+		blen -= ret;
+		for (i = 0; i < packet->count; i++) {
+			if (payload & BIT63)
+				ret = snprintf(buf, blen, "T");
+			else
+				ret = snprintf(buf, blen, "N");
+			if (ret < 0)
+				return ret;
+			buf += ret;
+			blen -= ret;
+			payload <<= 1;
+		}
+		ret = snprintf(buf, blen, " (%d)", packet->count);
+		if (ret < 0)
+			return ret;
+		blen -= ret;
+		return buf_len - blen;
+	}
+	case INTEL_PT_TIP_PGD:
+	case INTEL_PT_TIP_PGE:
+	case INTEL_PT_TIP:
+	case INTEL_PT_FUP:
+		if (!(packet->count))
+			return snprintf(buf, buf_len, "%s no ip", name);
+	case INTEL_PT_CBR:
+		return snprintf(buf, buf_len, "%s 0x%llx", name, payload);
+	case INTEL_PT_TSC:
+		if (packet->count)
+			return snprintf(buf, buf_len,
+					"%s 0x%llx CTC 0x%x FC 0x%x",
+					name, payload, packet->count & 0xffff,
+					(packet->count >> 16) & 0x1ff);
+		else
+			return snprintf(buf, buf_len, "%s 0x%llx",
+					name, payload);
+	case INTEL_PT_MODE_EXEC:
+		return snprintf(buf, buf_len, "%s %lld", name, payload);
+	case INTEL_PT_MODE_TSX:
+		return snprintf(buf, buf_len, "%s TXAbort:%u InTX:%u",
+				name, (unsigned)(payload >> 1) & 1,
+				(unsigned)payload & 1);
+	case INTEL_PT_PIP:
+		ret = snprintf(buf, buf_len, "%s 0x%llx",
+			       name, payload);
+		return ret;
+	default:
+		break;
+	}
+	return snprintf(buf, buf_len, "%s 0x%llx (%d)",
+			name, payload, packet->count);
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
new file mode 100644
index 0000000..53404fa
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
@@ -0,0 +1,64 @@
+/*
+ * intel_pt_pkt_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_PKT_DECODER_H__
+#define INCLUDE__INTEL_PT_PKT_DECODER_H__
+
+#include <stddef.h>
+#include <stdint.h>
+
+#define INTEL_PT_PKT_DESC_MAX	256
+
+#define INTEL_PT_NEED_MORE_BYTES	-1
+#define INTEL_PT_BAD_PACKET		-2
+
+#define INTEL_PT_PSB_STR		"\002\202\002\202\002\202\002\202" \
+					"\002\202\002\202\002\202\002\202"
+#define INTEL_PT_PSB_LEN		16
+
+#define INTEL_PT_PKT_MAX_SZ		16
+
+enum intel_pt_pkt_type {
+	INTEL_PT_BAD,
+	INTEL_PT_PAD,
+	INTEL_PT_TNT,
+	INTEL_PT_TIP_PGD,
+	INTEL_PT_TIP_PGE,
+	INTEL_PT_TSC,
+	INTEL_PT_MODE_EXEC,
+	INTEL_PT_MODE_TSX,
+	INTEL_PT_TIP,
+	INTEL_PT_FUP,
+	INTEL_PT_PSB,
+	INTEL_PT_PSBEND,
+	INTEL_PT_CBR,
+	INTEL_PT_PIP,
+	INTEL_PT_OVF,
+};
+
+struct intel_pt_pkt {
+	enum intel_pt_pkt_type	type;
+	int			count;
+	uint64_t		payload;
+};
+
+const char *intel_pt_pkt_name(enum intel_pt_pkt_type);
+
+int intel_pt_get_packet(const unsigned char *buf, size_t len,
+			struct intel_pt_pkt *packet);
+
+int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf, size_t len);
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 05/17] perf tools: Add Intel PT instruction decoder
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (3 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 04/17] perf tools: Add Intel PT packet decoder Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-06-18 22:29   ` Arnaldo Carvalho de Melo
  2015-05-29 13:33 ` [PATCH V6 06/17] perf tools: Add Intel PT log Adrian Hunter
                   ` (11 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add support for decoding instructions for Intel Processor Trace.  The
kernel x86 instruction decoder is used for this.

This essentially provides intel_pt_get_insn() which takes a binary
buffer, uses the kernel's x86 instruction decoder to get details
of the instruction and then categorizes it for consumption by
an Intel PT decoder.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/build/Makefile.build                         |   2 +
 tools/perf/.gitignore                              |   2 +
 tools/perf/Makefile.perf                           |  12 +-
 tools/perf/util/intel-pt-decoder/Build             |  15 +-
 .../util/intel-pt-decoder/intel-pt-insn-decoder.c  | 246 +++++++++++++++++++++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |  65 ++++++
 6 files changed, 339 insertions(+), 3 deletions(-)
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h

diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
index 10df572..7ad74e4 100644
--- a/tools/build/Makefile.build
+++ b/tools/build/Makefile.build
@@ -57,6 +57,8 @@ quiet_cmd_cc_i_c = CPP      $@
 quiet_cmd_cc_s_c = AS       $@
       cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $<
 
+quiet_cmd_gen = GEN      $@
+
 # Link agregate command
 # If there's nothing to link, create empty $@ object.
 quiet_cmd_ld_multi = LD       $@
diff --git a/tools/perf/.gitignore b/tools/perf/.gitignore
index 812f904..c88d5c5 100644
--- a/tools/perf/.gitignore
+++ b/tools/perf/.gitignore
@@ -28,3 +28,5 @@ config.mak.autogen
 *-flex.*
 *.pyc
 *.pyo
+util/intel-pt-decoder/inat-tables.c
+util/intel-pt-decoder/inat.c
diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 5816a3b..3ae3a8e 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -76,6 +76,12 @@ include config/utilities.mak
 #
 # Define NO_AUXTRACE if you do not want AUX area tracing support
 
+# As per kernel Makefile, avoid funny character set dependencies
+unexport LC_ALL
+LC_COLLATE=C
+LC_NUMERIC=C
+export LC_COLLATE LC_NUMERIC
+
 ifeq ($(srctree),)
 srctree := $(patsubst %/,%,$(dir $(shell pwd)))
 srctree := $(patsubst %/,%,$(dir $(srctree)))
@@ -122,6 +128,7 @@ INSTALL = install
 FLEX    = flex
 BISON   = bison
 STRIP   = strip
+AWK     = awk
 
 LIB_DIR          = $(srctree)/tools/lib/api/
 TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
@@ -272,7 +279,7 @@ strip: $(PROGRAMS) $(OUTPUT)perf
 
 PERF_IN := $(OUTPUT)perf-in.o
 
-export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX
+export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX AWK
 build := -f $(srctree)/tools/build/Makefile.build dir=. obj
 
 $(PERF_IN): $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h FORCE
@@ -536,7 +543,8 @@ clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
 	$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
 	$(Q)$(RM) .config-detected
 	$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf perf-read-vdso32 perf-read-vdsox32
-	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex*
+	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex* \
+		$(OUTPUT)util/intel-pt-decoder/inat.c $(OUTPUT)util/intel-pt-decoder/inat-tables.c
 	$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean
 	$(python-clean)
 
diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
index 9d67381..f5f7f87 100644
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -1 +1,14 @@
-libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o
+
+inat_tables_script = ../../arch/x86/tools/gen-insn-attr-x86.awk
+inat_tables_maps = ../../arch/x86/lib/x86-opcode-map.txt
+
+$(OUTPUT)util/intel-pt-decoder/inat-tables.c: $(inat_tables_script) $(inat_tables_maps)
+	@$(call echo-cmd,gen)$(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
+
+$(OUTPUT)util/intel-pt-decoder/inat.c:
+	@$(call echo-cmd,gen)cp ../../arch/x86/lib/inat.c $(OUTPUT)util/intel-pt-decoder/inat.c
+
+$(OUTPUT)util/intel-pt-decoder/intel-pt-insn-decoder.o: $(OUTPUT)util/intel-pt-decoder/inat.c $(OUTPUT)util/intel-pt-decoder/inat-tables.c
+
+CFLAGS_intel-pt-insn-decoder.o += -I../../arch/x86/include -I$(OUTPUT)util/intel-pt-decoder -I../../arch/x86/lib -Wno-override-init
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
new file mode 100644
index 0000000..2fa82c5
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
@@ -0,0 +1,246 @@
+/*
+ * intel_pt_insn_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <endian.h>
+#include <byteswap.h>
+
+#include "event.h"
+
+#include <asm/insn.h>
+
+#include "inat.c"
+#include <insn.c>
+
+#include "intel-pt-insn-decoder.h"
+
+/* Based on branch_type() from perf_event_intel_lbr.c */
+static void intel_pt_insn_decoder(struct insn *insn,
+				  struct intel_pt_insn *intel_pt_insn)
+{
+	enum intel_pt_insn_op op = INTEL_PT_OP_OTHER;
+	enum intel_pt_insn_branch branch = INTEL_PT_BR_NO_BRANCH;
+	int ext;
+
+	if (insn_is_avx(insn)) {
+		intel_pt_insn->op = INTEL_PT_OP_OTHER;
+		intel_pt_insn->branch = INTEL_PT_BR_NO_BRANCH;
+		intel_pt_insn->length = insn->length;
+		return;
+	}
+
+	switch (insn->opcode.bytes[0]) {
+	case 0xf:
+		switch (insn->opcode.bytes[1]) {
+		case 0x05: /* syscall */
+		case 0x34: /* sysenter */
+			op = INTEL_PT_OP_SYSCALL;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 0x07: /* sysret */
+		case 0x35: /* sysexit */
+			op = INTEL_PT_OP_SYSRET;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 0x80 ... 0x8f: /* jcc */
+			op = INTEL_PT_OP_JCC;
+			branch = INTEL_PT_BR_CONDITIONAL;
+			break;
+		default:
+			break;
+		}
+		break;
+	case 0x70 ... 0x7f: /* jcc */
+		op = INTEL_PT_OP_JCC;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xc2: /* near ret */
+	case 0xc3: /* near ret */
+	case 0xca: /* far ret */
+	case 0xcb: /* far ret */
+		op = INTEL_PT_OP_RET;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xcf: /* iret */
+		op = INTEL_PT_OP_IRET;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xcc ... 0xce: /* int */
+		op = INTEL_PT_OP_INT;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xe8: /* call near rel */
+		op = INTEL_PT_OP_CALL;
+		branch = INTEL_PT_BR_UNCONDITIONAL;
+		break;
+	case 0x9a: /* call far absolute */
+		op = INTEL_PT_OP_CALL;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xe0 ... 0xe2: /* loop */
+		op = INTEL_PT_OP_LOOP;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xe3: /* jcc */
+		op = INTEL_PT_OP_JCC;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xe9: /* jmp */
+	case 0xeb: /* jmp */
+		op = INTEL_PT_OP_JMP;
+		branch = INTEL_PT_BR_UNCONDITIONAL;
+		break;
+	case 0xea: /* far jmp */
+		op = INTEL_PT_OP_JMP;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xff: /* call near absolute, call far absolute ind */
+		ext = (insn->modrm.bytes[0] >> 3) & 0x7;
+		switch (ext) {
+		case 2: /* near ind call */
+		case 3: /* far ind call */
+			op = INTEL_PT_OP_CALL;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 4:
+		case 5:
+			op = INTEL_PT_OP_JMP;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		default:
+			break;
+		}
+		break;
+	default:
+		break;
+	}
+
+	intel_pt_insn->op = op;
+	intel_pt_insn->branch = branch;
+	intel_pt_insn->length = insn->length;
+
+	if (branch == INTEL_PT_BR_CONDITIONAL ||
+	    branch == INTEL_PT_BR_UNCONDITIONAL) {
+#if __BYTE_ORDER == __BIG_ENDIAN
+		switch (insn->immediate.nbytes) {
+		case 1:
+			intel_pt_insn->rel = insn->immediate.value;
+			break;
+		case 2:
+			intel_pt_insn->rel =
+					bswap_16((short)insn->immediate.value);
+			break;
+		case 4:
+			intel_pt_insn->rel = bswap_32(insn->immediate.value);
+			break;
+		}
+#else
+		intel_pt_insn->rel = insn->immediate.value;
+#endif
+	}
+}
+
+int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
+		      struct intel_pt_insn *intel_pt_insn)
+{
+	struct insn insn;
+
+	insn_init(&insn, buf, len, x86_64);
+	insn_get_length(&insn);
+	if (!insn_complete(&insn) || insn.length > len)
+		return -1;
+	intel_pt_insn_decoder(&insn, intel_pt_insn);
+	if (insn.length < INTEL_PT_INSN_DBG_BUF_SZ)
+		memcpy(intel_pt_insn->buf, buf, insn.length);
+	else
+		memcpy(intel_pt_insn->buf, buf, INTEL_PT_INSN_DBG_BUF_SZ);
+	return 0;
+}
+
+const char *branch_name[] = {
+	[INTEL_PT_OP_OTHER]	= "Other",
+	[INTEL_PT_OP_CALL]	= "Call",
+	[INTEL_PT_OP_RET]	= "Ret",
+	[INTEL_PT_OP_JCC]	= "Jcc",
+	[INTEL_PT_OP_JMP]	= "Jmp",
+	[INTEL_PT_OP_LOOP]	= "Loop",
+	[INTEL_PT_OP_IRET]	= "IRet",
+	[INTEL_PT_OP_INT]	= "Int",
+	[INTEL_PT_OP_SYSCALL]	= "Syscall",
+	[INTEL_PT_OP_SYSRET]	= "Sysret",
+};
+
+const char *intel_pt_insn_name(enum intel_pt_insn_op op)
+{
+	return branch_name[op];
+}
+
+int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
+		       size_t buf_len)
+{
+	switch (intel_pt_insn->branch) {
+	case INTEL_PT_BR_CONDITIONAL:
+	case INTEL_PT_BR_UNCONDITIONAL:
+		return snprintf(buf, buf_len, "%s %s%d",
+				intel_pt_insn_name(intel_pt_insn->op),
+				intel_pt_insn->rel > 0 ? "+" : "",
+				intel_pt_insn->rel);
+	case INTEL_PT_BR_NO_BRANCH:
+	case INTEL_PT_BR_INDIRECT:
+		return snprintf(buf, buf_len, "%s",
+				intel_pt_insn_name(intel_pt_insn->op));
+	default:
+		break;
+	}
+	return 0;
+}
+
+size_t intel_pt_insn_max_size(void)
+{
+	return MAX_INSN_SIZE;
+}
+
+int intel_pt_insn_type(enum intel_pt_insn_op op)
+{
+	switch (op) {
+	case INTEL_PT_OP_OTHER:
+		return 0;
+	case INTEL_PT_OP_CALL:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL;
+	case INTEL_PT_OP_RET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN;
+	case INTEL_PT_OP_JCC:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
+	case INTEL_PT_OP_JMP:
+		return PERF_IP_FLAG_BRANCH;
+	case INTEL_PT_OP_LOOP:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
+	case INTEL_PT_OP_IRET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
+		       PERF_IP_FLAG_INTERRUPT;
+	case INTEL_PT_OP_INT:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+		       PERF_IP_FLAG_INTERRUPT;
+	case INTEL_PT_OP_SYSCALL:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+		       PERF_IP_FLAG_SYSCALLRET;
+	case INTEL_PT_OP_SYSRET:
+		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
+		       PERF_IP_FLAG_SYSCALLRET;
+	default:
+		return 0;
+	}
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
new file mode 100644
index 0000000..b0adbf3
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
@@ -0,0 +1,65 @@
+/*
+ * intel_pt_insn_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_INSN_DECODER_H__
+#define INCLUDE__INTEL_PT_INSN_DECODER_H__
+
+#include <stddef.h>
+#include <stdint.h>
+
+#define INTEL_PT_INSN_DESC_MAX		32
+#define INTEL_PT_INSN_DBG_BUF_SZ	16
+
+enum intel_pt_insn_op {
+	INTEL_PT_OP_OTHER,
+	INTEL_PT_OP_CALL,
+	INTEL_PT_OP_RET,
+	INTEL_PT_OP_JCC,
+	INTEL_PT_OP_JMP,
+	INTEL_PT_OP_LOOP,
+	INTEL_PT_OP_IRET,
+	INTEL_PT_OP_INT,
+	INTEL_PT_OP_SYSCALL,
+	INTEL_PT_OP_SYSRET,
+};
+
+enum intel_pt_insn_branch {
+	INTEL_PT_BR_NO_BRANCH,
+	INTEL_PT_BR_INDIRECT,
+	INTEL_PT_BR_CONDITIONAL,
+	INTEL_PT_BR_UNCONDITIONAL,
+};
+
+struct intel_pt_insn {
+	enum intel_pt_insn_op		op;
+	enum intel_pt_insn_branch	branch;
+	int				length;
+	int32_t				rel;
+	unsigned char			buf[INTEL_PT_INSN_DBG_BUF_SZ];
+};
+
+int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
+		      struct intel_pt_insn *intel_pt_insn);
+
+const char *intel_pt_insn_name(enum intel_pt_insn_op op);
+
+int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
+		       size_t buf_len);
+
+size_t intel_pt_insn_max_size(void);
+
+int intel_pt_insn_type(enum intel_pt_insn_op op);
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 06/17] perf tools: Add Intel PT log
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (4 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 05/17] perf tools: Add Intel PT instruction decoder Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 07/17] perf tools: Add Intel PT decoder Adrian Hunter
                   ` (10 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add a facility to log Intel Processor Trace decoding.  The log is
intended for debugging purposes only.

The log file name is "intel_pt.log" and is opened in the current
directory.  The log contains a record of all packets and
instructions decoded and can get very large (10 MB would be a
small one).

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt-decoder/Build          |   2 +-
 tools/perf/util/intel-pt-decoder/intel-pt-log.c | 155 ++++++++++++++++++++++++
 tools/perf/util/intel-pt-decoder/intel-pt-log.h |  52 ++++++++
 3 files changed, 208 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.h

diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
index f5f7f87..587321a 100644
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -1,4 +1,4 @@
-libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o intel-pt-log.o
 
 inat_tables_script = ../../arch/x86/tools/gen-insn-attr-x86.awk
 inat_tables_maps = ../../arch/x86/lib/x86-opcode-map.txt
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.c b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
new file mode 100644
index 0000000..d09c7d9
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
@@ -0,0 +1,155 @@
+/*
+ * intel_pt_log.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdarg.h>
+#include <stdbool.h>
+#include <string.h>
+
+#include "intel-pt-log.h"
+#include "intel-pt-insn-decoder.h"
+
+#include "intel-pt-pkt-decoder.h"
+
+#define MAX_LOG_NAME 256
+
+static FILE *f;
+static char log_name[MAX_LOG_NAME];
+static bool enable_logging;
+
+void intel_pt_log_enable(void)
+{
+	enable_logging = true;
+}
+
+void intel_pt_log_disable(void)
+{
+	if (f)
+		fflush(f);
+	enable_logging = false;
+}
+
+void intel_pt_log_set_name(const char *name)
+{
+	strncpy(log_name, name, MAX_LOG_NAME - 5);
+	strcat(log_name, ".log");
+}
+
+static void intel_pt_print_data(const unsigned char *buf, int len, uint64_t pos,
+				int indent)
+{
+	int i;
+
+	for (i = 0; i < indent; i++)
+		fprintf(f, " ");
+
+	fprintf(f, "  %08" PRIx64 ": ", pos);
+	for (i = 0; i < len; i++)
+		fprintf(f, " %02x", buf[i]);
+	for (; i < 16; i++)
+		fprintf(f, "   ");
+	fprintf(f, " ");
+}
+
+static void intel_pt_print_no_data(uint64_t pos, int indent)
+{
+	int i;
+
+	for (i = 0; i < indent; i++)
+		fprintf(f, " ");
+
+	fprintf(f, "  %08" PRIx64 ": ", pos);
+	for (i = 0; i < 16; i++)
+		fprintf(f, "   ");
+	fprintf(f, " ");
+}
+
+static int intel_pt_log_open(void)
+{
+	if (!enable_logging)
+		return -1;
+
+	if (f)
+		return 0;
+
+	if (!log_name[0])
+		return -1;
+
+	f = fopen(log_name, "w+");
+	if (!f) {
+		enable_logging = false;
+		return -1;
+	}
+
+	return 0;
+}
+
+void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			 uint64_t pos, const unsigned char *buf)
+{
+	char desc[INTEL_PT_PKT_DESC_MAX];
+
+	if (intel_pt_log_open())
+		return;
+
+	intel_pt_print_data(buf, pkt_len, pos, 0);
+	intel_pt_pkt_desc(packet, desc, INTEL_PT_PKT_DESC_MAX);
+	fprintf(f, "%s\n", desc);
+}
+
+void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	char desc[INTEL_PT_INSN_DESC_MAX];
+	size_t len = intel_pt_insn->length;
+
+	if (intel_pt_log_open())
+		return;
+
+	if (len > INTEL_PT_INSN_DBG_BUF_SZ)
+		len = INTEL_PT_INSN_DBG_BUF_SZ;
+	intel_pt_print_data(intel_pt_insn->buf, len, ip, 8);
+	if (intel_pt_insn_desc(intel_pt_insn, desc, INTEL_PT_INSN_DESC_MAX) > 0)
+		fprintf(f, "%s\n", desc);
+	else
+		fprintf(f, "Bad instruction!\n");
+}
+
+void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	char desc[INTEL_PT_INSN_DESC_MAX];
+
+	if (intel_pt_log_open())
+		return;
+
+	intel_pt_print_no_data(ip, 8);
+	if (intel_pt_insn_desc(intel_pt_insn, desc, INTEL_PT_INSN_DESC_MAX) > 0)
+		fprintf(f, "%s\n", desc);
+	else
+		fprintf(f, "Bad instruction!\n");
+}
+
+void intel_pt_log(const char *fmt, ...)
+{
+	va_list args;
+
+	if (intel_pt_log_open())
+		return;
+
+	va_start(args, fmt);
+	vfprintf(f, fmt, args);
+	va_end(args);
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.h b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
new file mode 100644
index 0000000..db3942f
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
@@ -0,0 +1,52 @@
+/*
+ * intel_pt_log.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_LOG_H__
+#define INCLUDE__INTEL_PT_LOG_H__
+
+#include <stdint.h>
+#include <inttypes.h>
+
+struct intel_pt_pkt;
+
+void intel_pt_log_enable(void);
+void intel_pt_log_disable(void);
+void intel_pt_log_set_name(const char *name);
+
+void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			 uint64_t pos, const unsigned char *buf);
+
+struct intel_pt_insn;
+
+void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip);
+void intel_pt_log_insn_no_data(struct intel_pt_insn *intel_pt_insn,
+			       uint64_t ip);
+
+__attribute__((format(printf, 1, 2)))
+void intel_pt_log(const char *fmt, ...);
+
+#define x64_fmt "0x%" PRIx64
+
+static inline void intel_pt_log_at(const char *msg, uint64_t u)
+{
+	intel_pt_log("%s at " x64_fmt "\n", msg, u);
+}
+
+static inline void intel_pt_log_to(const char *msg, uint64_t u)
+{
+	intel_pt_log("%s to " x64_fmt "\n", msg, u);
+}
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 07/17] perf tools: Add Intel PT decoder
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (5 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 06/17] perf tools: Add Intel PT log Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 08/17] perf tools: Add Intel PT support Adrian Hunter
                   ` (9 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add support for decoding an Intel Processor Trace.

Intel PT trace data must be 'decoded' which involves walking
the object code and matching the trace data packets.

The decoder requests a buffer of binary data via a get_trace()
call-back, which it decodes using instruction information which
it gets via another call-back walk_insn().

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt-decoder/Build             |    2 +-
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 1759 ++++++++++++++++++++
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |  102 ++
 3 files changed, 1862 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.h

diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
index 587321a..fa12eac 100644
--- a/tools/perf/util/intel-pt-decoder/Build
+++ b/tools/perf/util/intel-pt-decoder/Build
@@ -1,4 +1,4 @@
-libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o intel-pt-log.o
+libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o intel-pt-log.o intel-pt-decoder.o
 
 inat_tables_script = ../../arch/x86/tools/gen-insn-attr-x86.awk
 inat_tables_maps = ../../arch/x86/lib/x86-opcode-map.txt
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
new file mode 100644
index 0000000..748a7a0
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -0,0 +1,1759 @@
+/*
+ * intel_pt_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include "../cache.h"
+#include "../util.h"
+
+#include "intel-pt-insn-decoder.h"
+#include "intel-pt-pkt-decoder.h"
+#include "intel-pt-decoder.h"
+#include "intel-pt-log.h"
+
+#define INTEL_PT_BLK_SIZE 1024
+
+#define BIT63 (((uint64_t)1 << 63))
+
+#define INTEL_PT_RETURN 1
+
+struct intel_pt_blk {
+	struct intel_pt_blk *prev;
+	uint64_t ip[INTEL_PT_BLK_SIZE];
+};
+
+struct intel_pt_stack {
+	struct intel_pt_blk *blk;
+	struct intel_pt_blk *spare;
+	int pos;
+};
+
+enum intel_pt_pkt_state {
+	INTEL_PT_STATE_NO_PSB,
+	INTEL_PT_STATE_NO_IP,
+	INTEL_PT_STATE_ERR_RESYNC,
+	INTEL_PT_STATE_IN_SYNC,
+	INTEL_PT_STATE_TNT,
+	INTEL_PT_STATE_TIP,
+	INTEL_PT_STATE_TIP_PGD,
+	INTEL_PT_STATE_FUP,
+	INTEL_PT_STATE_FUP_NO_TIP,
+};
+
+#ifdef INTEL_PT_STRICT
+#define INTEL_PT_STATE_ERR1	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR2	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR3	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR4	INTEL_PT_STATE_NO_PSB
+#else
+#define INTEL_PT_STATE_ERR1	(decoder->pkt_state)
+#define INTEL_PT_STATE_ERR2	INTEL_PT_STATE_NO_IP
+#define INTEL_PT_STATE_ERR3	INTEL_PT_STATE_ERR_RESYNC
+#define INTEL_PT_STATE_ERR4	INTEL_PT_STATE_IN_SYNC
+#endif
+
+struct intel_pt_decoder {
+	int (*get_trace)(struct intel_pt_buffer *buffer, void *data);
+	int (*walk_insn)(struct intel_pt_insn *intel_pt_insn,
+			 uint64_t *insn_cnt_ptr, uint64_t *ip, uint64_t to_ip,
+			 uint64_t max_insn_cnt, void *data);
+	void *data;
+	struct intel_pt_state state;
+	const unsigned char *buf;
+	size_t len;
+	bool return_compression;
+	bool pge;
+	uint64_t pos;
+	uint64_t last_ip;
+	uint64_t ip;
+	uint64_t cr3;
+	uint64_t timestamp;
+	uint64_t tsc_timestamp;
+	uint64_t ref_timestamp;
+	uint64_t ret_addr;
+	struct intel_pt_stack stack;
+	enum intel_pt_pkt_state pkt_state;
+	struct intel_pt_pkt packet;
+	struct intel_pt_pkt tnt;
+	int pkt_step;
+	int pkt_len;
+	unsigned int cbr;
+	int exec_mode;
+	unsigned int insn_bytes;
+	uint64_t sign_bit;
+	uint64_t sign_bits;
+	uint64_t period;
+	enum intel_pt_period_type period_type;
+	uint64_t period_insn_cnt;
+	uint64_t period_mask;
+	uint64_t period_ticks;
+	uint64_t last_masked_timestamp;
+	bool continuous_period;
+	bool overflow;
+	bool set_fup_tx_flags;
+	unsigned int fup_tx_flags;
+	unsigned int tx_flags;
+	uint64_t timestamp_insn_cnt;
+	const unsigned char *next_buf;
+	size_t next_len;
+	unsigned char temp_buf[INTEL_PT_PKT_MAX_SZ];
+};
+
+static uint64_t intel_pt_lower_power_of_2(uint64_t x)
+{
+	int i;
+
+	for (i = 0; x != 1; i++)
+		x >>= 1;
+
+	return x << i;
+}
+
+static void intel_pt_setup_period(struct intel_pt_decoder *decoder)
+{
+	if (decoder->period_type == INTEL_PT_PERIOD_TICKS) {
+		uint64_t period;
+
+		period = intel_pt_lower_power_of_2(decoder->period);
+		decoder->period_mask  = ~(period - 1);
+		decoder->period_ticks = period;
+	}
+}
+
+struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
+{
+	struct intel_pt_decoder *decoder;
+
+	if (!params->get_trace || !params->walk_insn)
+		return NULL;
+
+	decoder = zalloc(sizeof(struct intel_pt_decoder));
+	if (!decoder)
+		return NULL;
+
+	decoder->get_trace          = params->get_trace;
+	decoder->walk_insn          = params->walk_insn;
+	decoder->data               = params->data;
+	decoder->return_compression = params->return_compression;
+
+	decoder->sign_bit           = (uint64_t)1 << 47;
+	decoder->sign_bits          = ~(((uint64_t)1 << 48) - 1);
+
+	decoder->period             = params->period;
+	decoder->period_type        = params->period_type;
+
+	intel_pt_setup_period(decoder);
+
+	return decoder;
+}
+
+static void intel_pt_pop_blk(struct intel_pt_stack *stack)
+{
+	struct intel_pt_blk *blk = stack->blk;
+
+	stack->blk = blk->prev;
+	if (!stack->spare)
+		stack->spare = blk;
+	else
+		free(blk);
+}
+
+static uint64_t intel_pt_pop(struct intel_pt_stack *stack)
+{
+	if (!stack->pos) {
+		if (!stack->blk)
+			return 0;
+		intel_pt_pop_blk(stack);
+		if (!stack->blk)
+			return 0;
+		stack->pos = INTEL_PT_BLK_SIZE;
+	}
+	return stack->blk->ip[--stack->pos];
+}
+
+static int intel_pt_alloc_blk(struct intel_pt_stack *stack)
+{
+	struct intel_pt_blk *blk;
+
+	if (stack->spare) {
+		blk = stack->spare;
+		stack->spare = NULL;
+	} else {
+		blk = malloc(sizeof(struct intel_pt_blk));
+		if (!blk)
+			return -ENOMEM;
+	}
+
+	blk->prev = stack->blk;
+	stack->blk = blk;
+	stack->pos = 0;
+	return 0;
+}
+
+static int intel_pt_push(struct intel_pt_stack *stack, uint64_t ip)
+{
+	int err;
+
+	if (!stack->blk || stack->pos == INTEL_PT_BLK_SIZE) {
+		err = intel_pt_alloc_blk(stack);
+		if (err)
+			return err;
+	}
+
+	stack->blk->ip[stack->pos++] = ip;
+	return 0;
+}
+
+static void intel_pt_clear_stack(struct intel_pt_stack *stack)
+{
+	while (stack->blk)
+		intel_pt_pop_blk(stack);
+	stack->pos = 0;
+}
+
+static void intel_pt_free_stack(struct intel_pt_stack *stack)
+{
+	intel_pt_clear_stack(stack);
+	zfree(&stack->blk);
+	zfree(&stack->spare);
+}
+
+void intel_pt_decoder_free(struct intel_pt_decoder *decoder)
+{
+	intel_pt_free_stack(&decoder->stack);
+	free(decoder);
+}
+
+static int intel_pt_ext_err(int code)
+{
+	switch (code) {
+	case -ENOMEM:
+		return INTEL_PT_ERR_NOMEM;
+	case -ENOSYS:
+		return INTEL_PT_ERR_INTERN;
+	case -EBADMSG:
+		return INTEL_PT_ERR_BADPKT;
+	case -ENODATA:
+		return INTEL_PT_ERR_NODATA;
+	case -EILSEQ:
+		return INTEL_PT_ERR_NOINSN;
+	case -ENOENT:
+		return INTEL_PT_ERR_MISMAT;
+	case -EOVERFLOW:
+		return INTEL_PT_ERR_OVR;
+	case -ENOSPC:
+		return INTEL_PT_ERR_LOST;
+	default:
+		return INTEL_PT_ERR_UNK;
+	}
+}
+
+static const char *intel_pt_err_msgs[] = {
+	[INTEL_PT_ERR_NOMEM]  = "Memory allocation failed",
+	[INTEL_PT_ERR_INTERN] = "Internal error",
+	[INTEL_PT_ERR_BADPKT] = "Bad packet",
+	[INTEL_PT_ERR_NODATA] = "No more data",
+	[INTEL_PT_ERR_NOINSN] = "Failed to get instruction",
+	[INTEL_PT_ERR_MISMAT] = "Trace doesn't match instruction",
+	[INTEL_PT_ERR_OVR]    = "Overflow packet",
+	[INTEL_PT_ERR_LOST]   = "Lost trace data",
+	[INTEL_PT_ERR_UNK]    = "Unknown error!",
+};
+
+int intel_pt__strerror(int code, char *buf, size_t buflen)
+{
+	if (code < 1 || code > INTEL_PT_ERR_MAX)
+		code = INTEL_PT_ERR_UNK;
+	strlcpy(buf, intel_pt_err_msgs[code], buflen);
+	return 0;
+}
+
+static uint64_t intel_pt_calc_ip(struct intel_pt_decoder *decoder,
+				 const struct intel_pt_pkt *packet,
+				 uint64_t last_ip)
+{
+	uint64_t ip;
+
+	switch (packet->count) {
+	case 2:
+		ip = (last_ip & (uint64_t)0xffffffffffff0000ULL) |
+		     packet->payload;
+		break;
+	case 4:
+		ip = (last_ip & (uint64_t)0xffffffff00000000ULL) |
+		     packet->payload;
+		break;
+	case 6:
+		ip = packet->payload;
+		break;
+	default:
+		return 0;
+	}
+
+	if (ip & decoder->sign_bit)
+		return ip | decoder->sign_bits;
+
+	return ip;
+}
+
+static inline void intel_pt_set_last_ip(struct intel_pt_decoder *decoder)
+{
+	decoder->last_ip = intel_pt_calc_ip(decoder, &decoder->packet,
+					    decoder->last_ip);
+}
+
+static inline void intel_pt_set_ip(struct intel_pt_decoder *decoder)
+{
+	intel_pt_set_last_ip(decoder);
+	decoder->ip = decoder->last_ip;
+}
+
+static void intel_pt_decoder_log_packet(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log_packet(&decoder->packet, decoder->pkt_len, decoder->pos,
+			    decoder->buf);
+}
+
+static int intel_pt_bug(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log("ERROR: Internal error\n");
+	decoder->pkt_state = INTEL_PT_STATE_NO_PSB;
+	return -ENOSYS;
+}
+
+static inline void intel_pt_clear_tx_flags(struct intel_pt_decoder *decoder)
+{
+	decoder->tx_flags = 0;
+}
+
+static inline void intel_pt_update_in_tx(struct intel_pt_decoder *decoder)
+{
+	decoder->tx_flags = decoder->packet.payload & INTEL_PT_IN_TX;
+}
+
+static int intel_pt_bad_packet(struct intel_pt_decoder *decoder)
+{
+	intel_pt_clear_tx_flags(decoder);
+	decoder->pkt_len = 1;
+	decoder->pkt_step = 1;
+	intel_pt_decoder_log_packet(decoder);
+	if (decoder->pkt_state != INTEL_PT_STATE_NO_PSB) {
+		intel_pt_log("ERROR: Bad packet\n");
+		decoder->pkt_state = INTEL_PT_STATE_ERR1;
+	}
+	return -EBADMSG;
+}
+
+static int intel_pt_get_data(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_buffer buffer = { .buf = 0, };
+	int ret;
+
+	decoder->pkt_step = 0;
+
+	intel_pt_log("Getting more data\n");
+	ret = decoder->get_trace(&buffer, decoder->data);
+	if (ret)
+		return ret;
+	decoder->buf = buffer.buf;
+	decoder->len = buffer.len;
+	if (!decoder->len) {
+		intel_pt_log("No more data\n");
+		return -ENODATA;
+	}
+	if (!buffer.consecutive) {
+		decoder->ip = 0;
+		decoder->pkt_state = INTEL_PT_STATE_NO_PSB;
+		decoder->ref_timestamp = buffer.ref_timestamp;
+		decoder->timestamp = 0;
+		decoder->state.trace_nr = buffer.trace_nr;
+		intel_pt_log("Reference timestamp 0x%" PRIx64 "\n",
+			     decoder->ref_timestamp);
+		return -ENOLINK;
+	}
+
+	return 0;
+}
+
+static int intel_pt_get_next_data(struct intel_pt_decoder *decoder)
+{
+	if (!decoder->next_buf)
+		return intel_pt_get_data(decoder);
+
+	decoder->buf = decoder->next_buf;
+	decoder->len = decoder->next_len;
+	decoder->next_buf = 0;
+	decoder->next_len = 0;
+	return 0;
+}
+
+static int intel_pt_get_split_packet(struct intel_pt_decoder *decoder)
+{
+	unsigned char *buf = decoder->temp_buf;
+	size_t old_len, len, n;
+	int ret;
+
+	old_len = decoder->len;
+	len = decoder->len;
+	memcpy(buf, decoder->buf, len);
+
+	ret = intel_pt_get_data(decoder);
+	if (ret) {
+		decoder->pos += old_len;
+		return ret < 0 ? ret : -EINVAL;
+	}
+
+	n = INTEL_PT_PKT_MAX_SZ - len;
+	if (n > decoder->len)
+		n = decoder->len;
+	memcpy(buf + len, decoder->buf, n);
+	len += n;
+
+	ret = intel_pt_get_packet(buf, len, &decoder->packet);
+	if (ret < (int)old_len) {
+		decoder->next_buf = decoder->buf;
+		decoder->next_len = decoder->len;
+		decoder->buf = buf;
+		decoder->len = old_len;
+		return intel_pt_bad_packet(decoder);
+	}
+
+	decoder->next_buf = decoder->buf + (ret - old_len);
+	decoder->next_len = decoder->len - (ret - old_len);
+
+	decoder->buf = buf;
+	decoder->len = ret;
+
+	return ret;
+}
+
+static int intel_pt_get_next_packet(struct intel_pt_decoder *decoder)
+{
+	int ret;
+
+	do {
+		decoder->pos += decoder->pkt_step;
+		decoder->buf += decoder->pkt_step;
+		decoder->len -= decoder->pkt_step;
+
+		if (!decoder->len) {
+			ret = intel_pt_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		ret = intel_pt_get_packet(decoder->buf, decoder->len,
+					  &decoder->packet);
+		if (ret == INTEL_PT_NEED_MORE_BYTES &&
+		    decoder->len < INTEL_PT_PKT_MAX_SZ && !decoder->next_buf) {
+			ret = intel_pt_get_split_packet(decoder);
+			if (ret < 0)
+				return ret;
+		}
+		if (ret <= 0)
+			return intel_pt_bad_packet(decoder);
+
+		decoder->pkt_len = ret;
+		decoder->pkt_step = ret;
+		intel_pt_decoder_log_packet(decoder);
+	} while (decoder->packet.type == INTEL_PT_PAD);
+
+	return 0;
+}
+
+static uint64_t intel_pt_next_period(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp, masked_timestamp;
+
+	timestamp = decoder->timestamp + decoder->timestamp_insn_cnt;
+	masked_timestamp = timestamp & decoder->period_mask;
+	if (decoder->continuous_period) {
+		if (masked_timestamp != decoder->last_masked_timestamp)
+			return 1;
+	} else {
+		timestamp += 1;
+		masked_timestamp = timestamp & decoder->period_mask;
+		if (masked_timestamp != decoder->last_masked_timestamp) {
+			decoder->last_masked_timestamp = masked_timestamp;
+			decoder->continuous_period = true;
+		}
+	}
+	return decoder->period_ticks - (timestamp - masked_timestamp);
+}
+
+static uint64_t intel_pt_next_sample(struct intel_pt_decoder *decoder)
+{
+	switch (decoder->period_type) {
+	case INTEL_PT_PERIOD_INSTRUCTIONS:
+		return decoder->period - decoder->period_insn_cnt;
+	case INTEL_PT_PERIOD_TICKS:
+		return intel_pt_next_period(decoder);
+	case INTEL_PT_PERIOD_NONE:
+	default:
+		return 0;
+	}
+}
+
+static void intel_pt_sample_insn(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp, masked_timestamp;
+
+	switch (decoder->period_type) {
+	case INTEL_PT_PERIOD_INSTRUCTIONS:
+		decoder->period_insn_cnt = 0;
+		break;
+	case INTEL_PT_PERIOD_TICKS:
+		timestamp = decoder->timestamp + decoder->timestamp_insn_cnt;
+		masked_timestamp = timestamp & decoder->period_mask;
+		decoder->last_masked_timestamp = masked_timestamp;
+		break;
+	case INTEL_PT_PERIOD_NONE:
+	default:
+		break;
+	}
+
+	decoder->state.type |= INTEL_PT_INSTRUCTION;
+}
+
+static int intel_pt_walk_insn(struct intel_pt_decoder *decoder,
+			      struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	uint64_t max_insn_cnt, insn_cnt = 0;
+	int err;
+
+	max_insn_cnt = intel_pt_next_sample(decoder);
+
+	err = decoder->walk_insn(intel_pt_insn, &insn_cnt, &decoder->ip, ip,
+				 max_insn_cnt, decoder->data);
+
+	decoder->timestamp_insn_cnt += insn_cnt;
+	decoder->period_insn_cnt += insn_cnt;
+
+	if (err) {
+		decoder->pkt_state = INTEL_PT_STATE_ERR2;
+		intel_pt_log_at("ERROR: Failed to get instruction",
+				decoder->ip);
+		if (err == -ENOENT)
+			return -ENOLINK;
+		return -EILSEQ;
+	}
+
+	if (ip && decoder->ip == ip) {
+		err = -EAGAIN;
+		goto out;
+	}
+
+	if (max_insn_cnt && insn_cnt >= max_insn_cnt)
+		intel_pt_sample_insn(decoder);
+
+	if (intel_pt_insn->branch == INTEL_PT_BR_NO_BRANCH) {
+		decoder->state.type = INTEL_PT_INSTRUCTION;
+		decoder->state.from_ip = decoder->ip;
+		decoder->state.to_ip = 0;
+		decoder->ip += intel_pt_insn->length;
+		err = INTEL_PT_RETURN;
+		goto out;
+	}
+
+	if (intel_pt_insn->op == INTEL_PT_OP_CALL) {
+		/* Zero-length calls are excluded */
+		if (intel_pt_insn->branch != INTEL_PT_BR_UNCONDITIONAL ||
+		    intel_pt_insn->rel) {
+			err = intel_pt_push(&decoder->stack, decoder->ip +
+					    intel_pt_insn->length);
+			if (err)
+				goto out;
+		}
+	} else if (intel_pt_insn->op == INTEL_PT_OP_RET) {
+		decoder->ret_addr = intel_pt_pop(&decoder->stack);
+	}
+
+	if (intel_pt_insn->branch == INTEL_PT_BR_UNCONDITIONAL) {
+		decoder->state.from_ip = decoder->ip;
+		decoder->ip += intel_pt_insn->length +
+				intel_pt_insn->rel;
+		decoder->state.to_ip = decoder->ip;
+		err = INTEL_PT_RETURN;
+	}
+out:
+	decoder->state.insn_op = intel_pt_insn->op;
+	decoder->state.insn_len = intel_pt_insn->length;
+
+	if (decoder->tx_flags & INTEL_PT_IN_TX)
+		decoder->state.flags |= INTEL_PT_IN_TX;
+
+	return err;
+}
+
+static int intel_pt_walk_fup(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	uint64_t ip;
+	int err;
+
+	ip = decoder->last_ip;
+
+	while (1) {
+		err = intel_pt_walk_insn(decoder, &intel_pt_insn, ip);
+		if (err == INTEL_PT_RETURN)
+			return 0;
+		if (err == -EAGAIN) {
+			if (decoder->set_fup_tx_flags) {
+				decoder->set_fup_tx_flags = false;
+				decoder->tx_flags = decoder->fup_tx_flags;
+				decoder->state.type = INTEL_PT_TRANSACTION;
+				decoder->state.from_ip = decoder->ip;
+				decoder->state.to_ip = 0;
+				decoder->state.flags = decoder->fup_tx_flags;
+				return 0;
+			}
+			return err;
+		}
+		decoder->set_fup_tx_flags = false;
+		if (err)
+			return err;
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+			intel_pt_log_at("ERROR: Unexpected indirect branch",
+					decoder->ip);
+			decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+			return -ENOENT;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+			intel_pt_log_at("ERROR: Unexpected conditional branch",
+					decoder->ip);
+			decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+			return -ENOENT;
+		}
+
+		intel_pt_bug(decoder);
+	}
+}
+
+static int intel_pt_walk_tip(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	int err;
+
+	err = intel_pt_walk_insn(decoder, &intel_pt_insn, 0);
+	if (err == INTEL_PT_RETURN)
+		return 0;
+	if (err)
+		return err;
+
+	if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+		if (decoder->pkt_state == INTEL_PT_STATE_TIP_PGD) {
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = 0;
+			if (decoder->packet.count != 0)
+				decoder->ip = decoder->last_ip;
+		} else {
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->state.from_ip = decoder->ip;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				decoder->state.to_ip = decoder->last_ip;
+				decoder->ip = decoder->last_ip;
+			}
+		}
+		return 0;
+	}
+
+	if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+		intel_pt_log_at("ERROR: Conditional branch when expecting indirect branch",
+				decoder->ip);
+		decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+		return -ENOENT;
+	}
+
+	return intel_pt_bug(decoder);
+}
+
+static int intel_pt_walk_tnt(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	int err;
+
+	while (1) {
+		err = intel_pt_walk_insn(decoder, &intel_pt_insn, 0);
+		if (err == INTEL_PT_RETURN)
+			return 0;
+		if (err)
+			return err;
+
+		if (intel_pt_insn.op == INTEL_PT_OP_RET) {
+			if (!decoder->return_compression) {
+				intel_pt_log_at("ERROR: RET when expecting conditional branch",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			if (!decoder->ret_addr) {
+				intel_pt_log_at("ERROR: Bad RET compression (stack empty)",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			if (!(decoder->tnt.payload & BIT63)) {
+				intel_pt_log_at("ERROR: Bad RET compression (TNT=N)",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			decoder->tnt.count -= 1;
+			if (!decoder->tnt.count)
+				decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->tnt.payload <<= 1;
+			decoder->state.from_ip = decoder->ip;
+			decoder->ip = decoder->ret_addr;
+			decoder->state.to_ip = decoder->ip;
+			return 0;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+			/* Handle deferred TIPs */
+			err = intel_pt_get_next_packet(decoder);
+			if (err)
+				return err;
+			if (decoder->packet.type != INTEL_PT_TIP ||
+			    decoder->packet.count == 0) {
+				intel_pt_log_at("ERROR: Missing deferred TIP for indirect branch",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				decoder->pkt_step = 0;
+				return -ENOENT;
+			}
+			intel_pt_set_last_ip(decoder);
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = decoder->last_ip;
+			decoder->ip = decoder->last_ip;
+			return 0;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+			decoder->tnt.count -= 1;
+			if (!decoder->tnt.count)
+				decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			if (decoder->tnt.payload & BIT63) {
+				decoder->tnt.payload <<= 1;
+				decoder->state.from_ip = decoder->ip;
+				decoder->ip += intel_pt_insn.length +
+					       intel_pt_insn.rel;
+				decoder->state.to_ip = decoder->ip;
+				return 0;
+			}
+			/* Instruction sample for a non-taken branch */
+			if (decoder->state.type & INTEL_PT_INSTRUCTION) {
+				decoder->tnt.payload <<= 1;
+				decoder->state.type = INTEL_PT_INSTRUCTION;
+				decoder->state.from_ip = decoder->ip;
+				decoder->state.to_ip = 0;
+				decoder->ip += intel_pt_insn.length;
+				return 0;
+			}
+			decoder->ip += intel_pt_insn.length;
+			if (!decoder->tnt.count)
+				return -EAGAIN;
+			decoder->tnt.payload <<= 1;
+			continue;
+		}
+
+		return intel_pt_bug(decoder);
+	}
+}
+
+static int intel_pt_mode_tsx(struct intel_pt_decoder *decoder, bool *no_tip)
+{
+	unsigned int fup_tx_flags;
+	int err;
+
+	fup_tx_flags = decoder->packet.payload &
+		       (INTEL_PT_IN_TX | INTEL_PT_ABORT_TX);
+	err = intel_pt_get_next_packet(decoder);
+	if (err)
+		return err;
+	if (decoder->packet.type == INTEL_PT_FUP) {
+		decoder->fup_tx_flags = fup_tx_flags;
+		decoder->set_fup_tx_flags = true;
+		if (!(decoder->fup_tx_flags & INTEL_PT_ABORT_TX))
+			*no_tip = true;
+	} else {
+		intel_pt_log_at("ERROR: Missing FUP after MODE.TSX",
+				decoder->pos);
+		intel_pt_update_in_tx(decoder);
+	}
+	return 0;
+}
+
+static void intel_pt_calc_tsc_timestamp(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp;
+
+	if (decoder->ref_timestamp) {
+		timestamp = decoder->packet.payload |
+			    (decoder->ref_timestamp & (0xffULL << 56));
+		if (timestamp < decoder->ref_timestamp) {
+			if (decoder->ref_timestamp - timestamp > (1ULL << 55))
+				timestamp += (1ULL << 56);
+		} else {
+			if (timestamp - decoder->ref_timestamp > (1ULL << 55))
+				timestamp -= (1ULL << 56);
+		}
+		decoder->tsc_timestamp = timestamp;
+		decoder->timestamp = timestamp;
+		decoder->ref_timestamp = 0;
+		decoder->timestamp_insn_cnt = 0;
+	} else if (decoder->timestamp) {
+		timestamp = decoder->packet.payload |
+			    (decoder->timestamp & (0xffULL << 56));
+		if (timestamp < decoder->timestamp &&
+		    decoder->timestamp - timestamp < 0x100) {
+			intel_pt_log_to("ERROR: Suppressing backwards timestamp",
+					timestamp);
+			timestamp = decoder->timestamp;
+		}
+		while (timestamp < decoder->timestamp) {
+			intel_pt_log_to("Wraparound timestamp", timestamp);
+			timestamp += (1ULL << 56);
+		}
+		decoder->tsc_timestamp = timestamp;
+		decoder->timestamp = timestamp;
+		decoder->timestamp_insn_cnt = 0;
+	}
+
+	intel_pt_log_to("Setting timestamp", decoder->timestamp);
+}
+
+static int intel_pt_overflow(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log("ERROR: Buffer overflow\n");
+	intel_pt_clear_tx_flags(decoder);
+	decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+	decoder->overflow = true;
+	return -EOVERFLOW;
+}
+
+/* Walk PSB+ packets when already in sync. */
+static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_PSBEND:
+			return 0;
+
+		case INTEL_PT_TIP_PGD:
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+		case INTEL_PT_TNT:
+		case INTEL_PT_BAD:
+		case INTEL_PT_PSB:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			return -EAGAIN;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_FUP:
+			decoder->pge = true;
+			intel_pt_set_last_ip(decoder);
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	if (decoder->tx_flags & INTEL_PT_ABORT_TX) {
+		decoder->tx_flags = 0;
+		decoder->state.flags &= ~INTEL_PT_IN_TX;
+		decoder->state.flags |= INTEL_PT_ABORT_TX;
+	} else {
+		decoder->state.flags |= INTEL_PT_ASYNC;
+	}
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TNT:
+		case INTEL_PT_FUP:
+		case INTEL_PT_PSB:
+		case INTEL_PT_TSC:
+		case INTEL_PT_CBR:
+		case INTEL_PT_MODE_TSX:
+		case INTEL_PT_BAD:
+		case INTEL_PT_PSBEND:
+			intel_pt_log("ERROR: Missing TIP after FUP\n");
+			decoder->pkt_state = INTEL_PT_STATE_ERR3;
+			return -ENOENT;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TIP_PGD:
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = 0;
+			if (decoder->packet.count != 0) {
+				intel_pt_set_ip(decoder);
+				intel_pt_log("Omitting PGD ip " x64_fmt "\n",
+					     decoder->ip);
+			}
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			return 0;
+
+		case INTEL_PT_TIP_PGE:
+			decoder->pge = true;
+			intel_pt_log("Omitting PGE ip " x64_fmt "\n",
+				     decoder->ip);
+			decoder->state.from_ip = 0;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				intel_pt_set_ip(decoder);
+				decoder->state.to_ip = decoder->ip;
+			}
+			return 0;
+
+		case INTEL_PT_TIP:
+			decoder->state.from_ip = decoder->ip;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				intel_pt_set_ip(decoder);
+				decoder->state.to_ip = decoder->ip;
+			}
+			return 0;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PAD:
+			break;
+
+		default:
+			return intel_pt_bug(decoder);
+		}
+	}
+}
+
+static int intel_pt_walk_trace(struct intel_pt_decoder *decoder)
+{
+	bool no_tip = false;
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+next:
+		switch (decoder->packet.type) {
+		case INTEL_PT_TNT:
+			if (!decoder->packet.count)
+				break;
+			decoder->tnt = decoder->packet;
+			decoder->pkt_state = INTEL_PT_STATE_TNT;
+			err = intel_pt_walk_tnt(decoder);
+			if (err == -EAGAIN)
+				break;
+			return err;
+
+		case INTEL_PT_TIP_PGD:
+			if (decoder->packet.count != 0)
+				intel_pt_set_last_ip(decoder);
+			decoder->pkt_state = INTEL_PT_STATE_TIP_PGD;
+			return intel_pt_walk_tip(decoder);
+
+		case INTEL_PT_TIP_PGE: {
+			decoder->pge = true;
+			if (decoder->packet.count == 0) {
+				intel_pt_log_at("Skipping zero TIP.PGE",
+						decoder->pos);
+				break;
+			}
+			intel_pt_set_ip(decoder);
+			decoder->state.from_ip = 0;
+			decoder->state.to_ip = decoder->ip;
+			return 0;
+		}
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TIP:
+			if (decoder->packet.count != 0)
+				intel_pt_set_last_ip(decoder);
+			decoder->pkt_state = INTEL_PT_STATE_TIP;
+			return intel_pt_walk_tip(decoder);
+
+		case INTEL_PT_FUP:
+			if (decoder->packet.count == 0) {
+				intel_pt_log_at("Skipping zero FUP",
+						decoder->pos);
+				no_tip = false;
+				break;
+			}
+			intel_pt_set_last_ip(decoder);
+			err = intel_pt_walk_fup(decoder);
+			if (err != -EAGAIN) {
+				if (err)
+					return err;
+				if (no_tip)
+					decoder->pkt_state =
+						INTEL_PT_STATE_FUP_NO_TIP;
+				else
+					decoder->pkt_state = INTEL_PT_STATE_FUP;
+				return 0;
+			}
+			if (no_tip) {
+				no_tip = false;
+				break;
+			}
+			return intel_pt_walk_fup_tip(decoder);
+
+		case INTEL_PT_PSB:
+			intel_pt_clear_stack(&decoder->stack);
+			err = intel_pt_walk_psbend(decoder);
+			if (err == -EAGAIN)
+				goto next;
+			if (err)
+				return err;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			/* MODE_TSX need not be followed by FUP */
+			if (!decoder->pge) {
+				intel_pt_update_in_tx(decoder);
+				break;
+			}
+			err = intel_pt_mode_tsx(decoder, &no_tip);
+			if (err)
+				return err;
+			goto next;
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_PSBEND:
+		case INTEL_PT_PAD:
+			break;
+
+		default:
+			return intel_pt_bug(decoder);
+		}
+	}
+}
+
+/* Walk PSB+ packets to get in sync. */
+static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TIP_PGD:
+			decoder->continuous_period = false;
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			return -ENOENT;
+
+		case INTEL_PT_FUP:
+			decoder->pge = true;
+			if (decoder->last_ip || decoder->packet.count == 6 ||
+			    decoder->packet.count == 0) {
+				uint64_t current_ip = decoder->ip;
+
+				intel_pt_set_ip(decoder);
+				if (current_ip)
+					intel_pt_log_to("Setting IP",
+							decoder->ip);
+			}
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_TNT:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			if (decoder->ip)
+				decoder->pkt_state = INTEL_PT_STATE_ERR4;
+			else
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+			return -ENOENT;
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_PSBEND:
+			return 0;
+
+		case INTEL_PT_PSB:
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TIP_PGD:
+			decoder->continuous_period = false;
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+			decoder->pge = decoder->packet.type != INTEL_PT_TIP_PGD;
+			if (decoder->last_ip || decoder->packet.count == 6 ||
+			    decoder->packet.count == 0)
+				intel_pt_set_ip(decoder);
+			if (decoder->ip)
+				return 0;
+			break;
+
+		case INTEL_PT_FUP:
+			if (decoder->overflow) {
+				if (decoder->last_ip ||
+				    decoder->packet.count == 6 ||
+				    decoder->packet.count == 0)
+					intel_pt_set_ip(decoder);
+				if (decoder->ip)
+					return 0;
+			}
+			if (decoder->packet.count)
+				intel_pt_set_last_ip(decoder);
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_PSB:
+			err = intel_pt_walk_psb(decoder);
+			if (err)
+				return err;
+			if (decoder->ip) {
+				/* Do not have a sample */
+				decoder->state.type = 0;
+				return 0;
+			}
+			break;
+
+		case INTEL_PT_TNT:
+		case INTEL_PT_PSBEND:
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_sync_ip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	intel_pt_log("Scanning for full IP\n");
+	err = intel_pt_walk_to_ip(decoder);
+	if (err)
+		return err;
+
+	decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+	decoder->overflow = false;
+
+	decoder->state.from_ip = 0;
+	decoder->state.to_ip = decoder->ip;
+	intel_pt_log_to("Setting IP", decoder->ip);
+
+	return 0;
+}
+
+static int intel_pt_part_psb(struct intel_pt_decoder *decoder)
+{
+	const unsigned char *end = decoder->buf + decoder->len;
+	size_t i;
+
+	for (i = INTEL_PT_PSB_LEN - 1; i; i--) {
+		if (i > decoder->len)
+			continue;
+		if (!memcmp(end - i, INTEL_PT_PSB_STR, i))
+			return i;
+	}
+	return 0;
+}
+
+static int intel_pt_rest_psb(struct intel_pt_decoder *decoder, int part_psb)
+{
+	size_t rest_psb = INTEL_PT_PSB_LEN - part_psb;
+	const char *psb = INTEL_PT_PSB_STR;
+
+	if (rest_psb > decoder->len ||
+	    memcmp(decoder->buf, psb + part_psb, rest_psb))
+		return 0;
+
+	return rest_psb;
+}
+
+static int intel_pt_get_split_psb(struct intel_pt_decoder *decoder,
+				  int part_psb)
+{
+	int rest_psb, ret;
+
+	decoder->pos += decoder->len;
+	decoder->len = 0;
+
+	ret = intel_pt_get_next_data(decoder);
+	if (ret)
+		return ret;
+
+	rest_psb = intel_pt_rest_psb(decoder, part_psb);
+	if (!rest_psb)
+		return 0;
+
+	decoder->pos -= part_psb;
+	decoder->next_buf = decoder->buf + rest_psb;
+	decoder->next_len = decoder->len - rest_psb;
+	memcpy(decoder->temp_buf, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	decoder->buf = decoder->temp_buf;
+	decoder->len = INTEL_PT_PSB_LEN;
+
+	return 0;
+}
+
+static int intel_pt_scan_for_psb(struct intel_pt_decoder *decoder)
+{
+	unsigned char *next;
+	int ret;
+
+	intel_pt_log("Scanning for PSB\n");
+	while (1) {
+		if (!decoder->len) {
+			ret = intel_pt_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		next = memmem(decoder->buf, decoder->len, INTEL_PT_PSB_STR,
+			      INTEL_PT_PSB_LEN);
+		if (!next) {
+			int part_psb;
+
+			part_psb = intel_pt_part_psb(decoder);
+			if (part_psb) {
+				ret = intel_pt_get_split_psb(decoder, part_psb);
+				if (ret)
+					return ret;
+			} else {
+				decoder->pos += decoder->len;
+				decoder->len = 0;
+			}
+			continue;
+		}
+
+		decoder->pkt_step = next - decoder->buf;
+		return intel_pt_get_next_packet(decoder);
+	}
+}
+
+static int intel_pt_sync(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	decoder->pge = false;
+	decoder->continuous_period = false;
+	decoder->last_ip = 0;
+	decoder->ip = 0;
+	intel_pt_clear_stack(&decoder->stack);
+
+	err = intel_pt_scan_for_psb(decoder);
+	if (err)
+		return err;
+
+	decoder->pkt_state = INTEL_PT_STATE_NO_IP;
+
+	err = intel_pt_walk_psb(decoder);
+	if (err)
+		return err;
+
+	if (decoder->ip) {
+		decoder->state.type = 0; /* Do not have a sample */
+		decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+	} else {
+		return intel_pt_sync_ip(decoder);
+	}
+
+	return 0;
+}
+
+const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	do {
+		decoder->state.type = INTEL_PT_BRANCH;
+		decoder->state.flags = 0;
+
+		switch (decoder->pkt_state) {
+		case INTEL_PT_STATE_NO_PSB:
+			err = intel_pt_sync(decoder);
+			break;
+		case INTEL_PT_STATE_NO_IP:
+			decoder->last_ip = 0;
+			/* Fall through */
+		case INTEL_PT_STATE_ERR_RESYNC:
+			err = intel_pt_sync_ip(decoder);
+			break;
+		case INTEL_PT_STATE_IN_SYNC:
+			err = intel_pt_walk_trace(decoder);
+			break;
+		case INTEL_PT_STATE_TNT:
+			err = intel_pt_walk_tnt(decoder);
+			if (err == -EAGAIN)
+				err = intel_pt_walk_trace(decoder);
+			break;
+		case INTEL_PT_STATE_TIP:
+		case INTEL_PT_STATE_TIP_PGD:
+			err = intel_pt_walk_tip(decoder);
+			break;
+		case INTEL_PT_STATE_FUP:
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			err = intel_pt_walk_fup(decoder);
+			if (err == -EAGAIN)
+				err = intel_pt_walk_fup_tip(decoder);
+			else if (!err)
+				decoder->pkt_state = INTEL_PT_STATE_FUP;
+			break;
+		case INTEL_PT_STATE_FUP_NO_TIP:
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			err = intel_pt_walk_fup(decoder);
+			if (err == -EAGAIN)
+				err = intel_pt_walk_trace(decoder);
+			break;
+		default:
+			err = intel_pt_bug(decoder);
+			break;
+		}
+	} while (err == -ENOLINK);
+
+	decoder->state.err = err ? intel_pt_ext_err(err) : 0;
+	decoder->state.timestamp = decoder->timestamp;
+	decoder->state.est_timestamp = decoder->timestamp +
+				       (decoder->timestamp_insn_cnt << 1);
+	decoder->state.cr3 = decoder->cr3;
+
+	if (err)
+		decoder->state.from_ip = decoder->ip;
+
+	return &decoder->state;
+}
+
+static bool intel_pt_at_psb(unsigned char *buf, size_t len)
+{
+	if (len < INTEL_PT_PSB_LEN)
+		return false;
+	return memmem(buf, INTEL_PT_PSB_LEN, INTEL_PT_PSB_STR,
+		      INTEL_PT_PSB_LEN);
+}
+
+/**
+ * intel_pt_next_psb - move buffer pointer to the start of the next PSB packet.
+ * @buf: pointer to buffer pointer
+ * @len: size of buffer
+ *
+ * Updates the buffer pointer to point to the start of the next PSB packet if
+ * there is one, otherwise the buffer pointer is unchanged.  If @buf is updated,
+ * @len is adjusted accordingly.
+ *
+ * Return: %true if a PSB packet is found, %false otherwise.
+ */
+static bool intel_pt_next_psb(unsigned char **buf, size_t *len)
+{
+	unsigned char *next;
+
+	next = memmem(*buf, *len, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	if (next) {
+		*len -= next - *buf;
+		*buf = next;
+		return true;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_step_psb - move buffer pointer to the start of the following PSB
+ *                     packet.
+ * @buf: pointer to buffer pointer
+ * @len: size of buffer
+ *
+ * Updates the buffer pointer to point to the start of the following PSB packet
+ * (skipping the PSB at @buf itself) if there is one, otherwise the buffer
+ * pointer is unchanged.  If @buf is updated, @len is adjusted accordingly.
+ *
+ * Return: %true if a PSB packet is found, %false otherwise.
+ */
+static bool intel_pt_step_psb(unsigned char **buf, size_t *len)
+{
+	unsigned char *next;
+
+	if (!*len)
+		return false;
+
+	next = memmem(*buf + 1, *len - 1, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	if (next) {
+		*len -= next - *buf;
+		*buf = next;
+		return true;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_last_psb - find the last PSB packet in a buffer.
+ * @buf: buffer
+ * @len: size of buffer
+ *
+ * This function finds the last PSB in a buffer.
+ *
+ * Return: A pointer to the last PSB in @buf if found, %NULL otherwise.
+ */
+static unsigned char *intel_pt_last_psb(unsigned char *buf, size_t len)
+{
+	const char *n = INTEL_PT_PSB_STR;
+	unsigned char *p;
+	size_t k;
+
+	if (len < INTEL_PT_PSB_LEN)
+		return NULL;
+
+	k = len - INTEL_PT_PSB_LEN + 1;
+	while (1) {
+		p = memrchr(buf, n[0], k);
+		if (!p)
+			return NULL;
+		if (!memcmp(p + 1, n + 1, INTEL_PT_PSB_LEN - 1))
+			return p;
+		k = p - buf;
+		if (!k)
+			return NULL;
+	}
+}
+
+/**
+ * intel_pt_next_tsc - find and return next TSC.
+ * @buf: buffer
+ * @len: size of buffer
+ * @tsc: TSC value returned
+ *
+ * Find a TSC packet in @buf and return the TSC value.  This function assumes
+ * that @buf starts at a PSB and that PSB+ will contain TSC and so stops if a
+ * PSBEND packet is found.
+ *
+ * Return: %true if TSC is found, false otherwise.
+ */
+static bool intel_pt_next_tsc(unsigned char *buf, size_t len, uint64_t *tsc)
+{
+	struct intel_pt_pkt packet;
+	int ret;
+
+	while (len) {
+		ret = intel_pt_get_packet(buf, len, &packet);
+		if (ret <= 0)
+			return false;
+		if (packet.type == INTEL_PT_TSC) {
+			*tsc = packet.payload;
+			return true;
+		}
+		if (packet.type == INTEL_PT_PSBEND)
+			return false;
+		buf += ret;
+		len -= ret;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_tsc_cmp - compare 7-byte TSCs.
+ * @tsc1: first TSC to compare
+ * @tsc2: second TSC to compare
+ *
+ * This function compares 7-byte TSC values allowing for the possibility that
+ * TSC wrapped around.  Generally it is not possible to know if TSC has wrapped
+ * around so for that purpose this function assumes the absolute difference is
+ * less than half the maximum difference.
+ *
+ * Return: %-1 if @tsc1 is before @tsc2, %0 if @tsc1 == @tsc2, %1 if @tsc1 is
+ * after @tsc2.
+ */
+static int intel_pt_tsc_cmp(uint64_t tsc1, uint64_t tsc2)
+{
+	const uint64_t halfway = (1ULL << 55);
+
+	if (tsc1 == tsc2)
+		return 0;
+
+	if (tsc1 < tsc2) {
+		if (tsc2 - tsc1 < halfway)
+			return -1;
+		else
+			return 1;
+	} else {
+		if (tsc1 - tsc2 < halfway)
+			return 1;
+		else
+			return -1;
+	}
+}
+
+/**
+ * intel_pt_find_overlap_tsc - determine start of non-overlapped trace data
+ *                             using TSC.
+ * @buf_a: first buffer
+ * @len_a: size of first buffer
+ * @buf_b: second buffer
+ * @len_b: size of second buffer
+ *
+ * If the trace contains TSC we can look at the last TSC of @buf_a and the
+ * first TSC of @buf_b in order to determine if the buffers overlap, and then
+ * walk forward in @buf_b until a later TSC is found.  A precondition is that
+ * @buf_a and @buf_b are positioned at a PSB.
+ *
+ * Return: A pointer into @buf_b from where non-overlapped data starts, or
+ * @buf_b + @len_b if there is no non-overlapped data.
+ */
+static unsigned char *intel_pt_find_overlap_tsc(unsigned char *buf_a,
+						size_t len_a,
+						unsigned char *buf_b,
+						size_t len_b)
+{
+	uint64_t tsc_a, tsc_b;
+	unsigned char *p;
+	size_t len;
+
+	p = intel_pt_last_psb(buf_a, len_a);
+	if (!p)
+		return buf_b; /* No PSB in buf_a => no overlap */
+
+	len = len_a - (p - buf_a);
+	if (!intel_pt_next_tsc(p, len, &tsc_a)) {
+		/* The last PSB+ in buf_a is incomplete, so go back one more */
+		len_a -= len;
+		p = intel_pt_last_psb(buf_a, len_a);
+		if (!p)
+			return buf_b; /* No full PSB+ => assume no overlap */
+		len = len_a - (p - buf_a);
+		if (!intel_pt_next_tsc(p, len, &tsc_a))
+			return buf_b; /* No TSC in buf_a => assume no overlap */
+	}
+
+	while (1) {
+		/* Ignore PSB+ with no TSC */
+		if (intel_pt_next_tsc(buf_b, len_b, &tsc_b) &&
+		    intel_pt_tsc_cmp(tsc_a, tsc_b) < 0)
+			return buf_b; /* tsc_a < tsc_b => no overlap */
+
+		if (!intel_pt_step_psb(&buf_b, &len_b))
+			return buf_b + len_b; /* No PSB in buf_b => no data */
+	}
+}
+
+/**
+ * intel_pt_find_overlap - determine start of non-overlapped trace data.
+ * @buf_a: first buffer
+ * @len_a: size of first buffer
+ * @buf_b: second buffer
+ * @len_b: size of second buffer
+ * @have_tsc: can use TSC packets to detect overlap
+ *
+ * When trace samples or snapshots are recorded there is the possibility that
+ * the data overlaps.  Note that, for the purposes of decoding, data is only
+ * useful if it begins with a PSB packet.
+ *
+ * Return: A pointer into @buf_b from where non-overlapped data starts, or
+ * @buf_b + @len_b if there is no non-overlapped data.
+ */
+unsigned char *intel_pt_find_overlap(unsigned char *buf_a, size_t len_a,
+				     unsigned char *buf_b, size_t len_b,
+				     bool have_tsc)
+{
+	unsigned char *found;
+
+	/* Buffer 'b' must start at PSB so throw away everything before that */
+	if (!intel_pt_next_psb(&buf_b, &len_b))
+		return buf_b + len_b; /* No PSB */
+
+	if (!intel_pt_next_psb(&buf_a, &len_a))
+		return buf_b; /* No overlap */
+
+	if (have_tsc) {
+		found = intel_pt_find_overlap_tsc(buf_a, len_a, buf_b, len_b);
+		if (found)
+			return found;
+	}
+
+	/*
+	 * Buffer 'b' cannot end within buffer 'a' so, for comparison purposes,
+	 * we can ignore the first part of buffer 'a'.
+	 */
+	while (len_b < len_a) {
+		if (!intel_pt_step_psb(&buf_a, &len_a))
+			return buf_b; /* No overlap */
+	}
+
+	/* Now len_b >= len_a */
+	if (len_b > len_a) {
+		/* The leftover buffer 'b' must start at a PSB */
+		while (!intel_pt_at_psb(buf_b + len_a, len_b - len_a)) {
+			if (!intel_pt_step_psb(&buf_a, &len_a))
+				return buf_b; /* No overlap */
+		}
+	}
+
+	while (1) {
+		/* Potential overlap so check the bytes */
+		found = memmem(buf_a, len_a, buf_b, len_a);
+		if (found)
+			return buf_b + len_a;
+
+		/* Try again at next PSB in buffer 'a' */
+		if (!intel_pt_step_psb(&buf_a, &len_a))
+			return buf_b; /* No overlap */
+
+		/* The leftover buffer 'b' must start at a PSB */
+		while (!intel_pt_at_psb(buf_b + len_a, len_b - len_a)) {
+			if (!intel_pt_step_psb(&buf_a, &len_a))
+				return buf_b; /* No overlap */
+		}
+	}
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
new file mode 100644
index 0000000..955263a
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -0,0 +1,102 @@
+/*
+ * intel_pt_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_DECODER_H__
+#define INCLUDE__INTEL_PT_DECODER_H__
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+
+#include "intel-pt-insn-decoder.h"
+
+#define INTEL_PT_IN_TX		(1 << 0)
+#define INTEL_PT_ABORT_TX	(1 << 1)
+#define INTEL_PT_ASYNC		(1 << 2)
+
+enum intel_pt_sample_type {
+	INTEL_PT_BRANCH		= 1 << 0,
+	INTEL_PT_INSTRUCTION	= 1 << 1,
+	INTEL_PT_TRANSACTION	= 1 << 2,
+};
+
+enum intel_pt_period_type {
+	INTEL_PT_PERIOD_NONE,
+	INTEL_PT_PERIOD_INSTRUCTIONS,
+	INTEL_PT_PERIOD_TICKS,
+};
+
+enum {
+	INTEL_PT_ERR_NOMEM = 1,
+	INTEL_PT_ERR_INTERN,
+	INTEL_PT_ERR_BADPKT,
+	INTEL_PT_ERR_NODATA,
+	INTEL_PT_ERR_NOINSN,
+	INTEL_PT_ERR_MISMAT,
+	INTEL_PT_ERR_OVR,
+	INTEL_PT_ERR_LOST,
+	INTEL_PT_ERR_UNK,
+	INTEL_PT_ERR_MAX,
+};
+
+struct intel_pt_state {
+	enum intel_pt_sample_type type;
+	int err;
+	uint64_t from_ip;
+	uint64_t to_ip;
+	uint64_t cr3;
+	uint64_t timestamp;
+	uint64_t est_timestamp;
+	uint64_t trace_nr;
+	uint32_t flags;
+	enum intel_pt_insn_op insn_op;
+	int insn_len;
+};
+
+struct intel_pt_insn;
+
+struct intel_pt_buffer {
+	const unsigned char *buf;
+	size_t len;
+	bool consecutive;
+	uint64_t ref_timestamp;
+	uint64_t trace_nr;
+};
+
+struct intel_pt_params {
+	int (*get_trace)(struct intel_pt_buffer *buffer, void *data);
+	int (*walk_insn)(struct intel_pt_insn *intel_pt_insn,
+			 uint64_t *insn_cnt_ptr, uint64_t *ip, uint64_t to_ip,
+			 uint64_t max_insn_cnt, void *data);
+	void *data;
+	bool return_compression;
+	uint64_t period;
+	enum intel_pt_period_type period_type;
+};
+
+struct intel_pt_decoder;
+
+struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params);
+void intel_pt_decoder_free(struct intel_pt_decoder *decoder);
+
+const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder);
+
+unsigned char *intel_pt_find_overlap(unsigned char *buf_a, size_t len_a,
+				     unsigned char *buf_b, size_t len_b,
+				     bool have_tsc);
+
+int intel_pt__strerror(int code, char *buf, size_t buflen);
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (6 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 07/17] perf tools: Add Intel PT decoder Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-06-19 16:04   ` Arnaldo Carvalho de Melo
  2015-05-29 13:33 ` [PATCH V6 09/17] perf tools: Take Intel PT into use Adrian Hunter
                   ` (8 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add support for Intel Processor Trace.

Intel PT support fits within the new auxtrace infrastructure.
Recording is supporting by identifying the Intel PT PMU,
parsing options and setting up events.  Decoding is supported
by queuing up trace data by cpu or thread and then decoding
synchronously delivering synthesized event samples into the
session processing for tools to consume.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/arch/x86/util/Build      |    2 +
 tools/perf/arch/x86/util/intel-pt.c |  752 ++++++++++++++
 tools/perf/util/Build               |    1 +
 tools/perf/util/intel-pt.c          | 1889 +++++++++++++++++++++++++++++++++++
 tools/perf/util/intel-pt.h          |   51 +
 5 files changed, 2695 insertions(+)
 create mode 100644 tools/perf/arch/x86/util/intel-pt.c
 create mode 100644 tools/perf/util/intel-pt.c
 create mode 100644 tools/perf/util/intel-pt.h

diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index cfbccc4..1396088 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -6,3 +6,5 @@ libperf-$(CONFIG_DWARF) += dwarf-regs.o
 
 libperf-$(CONFIG_LIBUNWIND)          += unwind-libunwind.o
 libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
+
+libperf-$(CONFIG_AUXTRACE) += intel-pt.o
diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
new file mode 100644
index 0000000..da7d2c1
--- /dev/null
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -0,0 +1,752 @@
+/*
+ * intel_pt.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdbool.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "../../perf.h"
+#include "../../util/session.h"
+#include "../../util/event.h"
+#include "../../util/evlist.h"
+#include "../../util/evsel.h"
+#include "../../util/cpumap.h"
+#include "../../util/parse-options.h"
+#include "../../util/parse-events.h"
+#include "../../util/pmu.h"
+#include "../../util/debug.h"
+#include "../../util/auxtrace.h"
+#include "../../util/tsc.h"
+#include "../../util/intel-pt.h"
+
+#define KiB(x) ((x) * 1024)
+#define MiB(x) ((x) * 1024 * 1024)
+#define KiB_MASK(x) (KiB(x) - 1)
+#define MiB_MASK(x) (MiB(x) - 1)
+
+#define INTEL_PT_DEFAULT_SAMPLE_SIZE	KiB(4)
+
+#define INTEL_PT_MAX_SAMPLE_SIZE	KiB(60)
+
+#define INTEL_PT_PSB_PERIOD_NEAR	256
+
+struct intel_pt_snapshot_ref {
+	void *ref_buf;
+	size_t ref_offset;
+	bool wrapped;
+};
+
+struct intel_pt_recording {
+	struct auxtrace_record		itr;
+	struct perf_pmu			*intel_pt_pmu;
+	int				have_sched_switch;
+	struct perf_evlist		*evlist;
+	bool				snapshot_mode;
+	bool				snapshot_init_done;
+	size_t				snapshot_size;
+	size_t				snapshot_ref_buf_size;
+	int				snapshot_ref_cnt;
+	struct intel_pt_snapshot_ref	*snapshot_refs;
+};
+
+static int intel_pt_parse_terms_with_default(struct list_head *formats,
+					     const char *str,
+					     u64 *config)
+{
+	struct list_head *terms;
+	struct perf_event_attr attr = { .size = 0, };
+	int err;
+
+	terms = malloc(sizeof(struct list_head));
+	if (!terms)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(terms);
+
+	err = parse_events_terms(terms, str);
+	if (err)
+		goto out_free;
+
+	attr.config = *config;
+	err = perf_pmu__config_terms(formats, &attr, terms, true, NULL);
+	if (err)
+		goto out_free;
+
+	*config = attr.config;
+out_free:
+	parse_events__free_terms(terms);
+	return err;
+}
+
+static int intel_pt_parse_terms(struct list_head *formats, const char *str,
+				u64 *config)
+{
+	*config = 0;
+	return intel_pt_parse_terms_with_default(formats, str, config);
+}
+
+static size_t intel_pt_psb_period(struct perf_pmu *intel_pt_pmu __maybe_unused,
+				  struct perf_evlist *evlist __maybe_unused)
+{
+	return 256;
+}
+
+static u64 intel_pt_default_config(struct perf_pmu *intel_pt_pmu)
+{
+	u64 config;
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &config);
+	return config;
+}
+
+static int intel_pt_parse_snapshot_options(struct auxtrace_record *itr,
+					   struct record_opts *opts,
+					   const char *str)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	unsigned long long snapshot_size = 0;
+	char *endptr;
+
+	if (str) {
+		snapshot_size = strtoull(str, &endptr, 0);
+		if (*endptr || snapshot_size > SIZE_MAX)
+			return -1;
+	}
+
+	opts->auxtrace_snapshot_mode = true;
+	opts->auxtrace_snapshot_size = snapshot_size;
+
+	ptr->snapshot_size = snapshot_size;
+
+	return 0;
+}
+
+struct perf_event_attr *
+intel_pt_pmu_default_config(struct perf_pmu *intel_pt_pmu)
+{
+	struct perf_event_attr *attr;
+
+	attr = zalloc(sizeof(struct perf_event_attr));
+	if (!attr)
+		return NULL;
+
+	attr->config = intel_pt_default_config(intel_pt_pmu);
+
+	intel_pt_pmu->selectable = true;
+
+	return attr;
+}
+
+static size_t intel_pt_info_priv_size(struct auxtrace_record *itr __maybe_unused)
+{
+	return INTEL_PT_AUXTRACE_PRIV_SIZE;
+}
+
+static int intel_pt_info_fill(struct auxtrace_record *itr,
+			      struct perf_session *session,
+			      struct auxtrace_info_event *auxtrace_info,
+			      size_t priv_size)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
+	struct perf_event_mmap_page *pc;
+	struct perf_tsc_conversion tc = { .time_mult = 0, };
+	bool cap_user_time_zero = false, per_cpu_mmaps;
+	u64 tsc_bit, noretcomp_bit;
+	int err;
+
+	if (priv_size != INTEL_PT_AUXTRACE_PRIV_SIZE)
+		return -EINVAL;
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
+	intel_pt_parse_terms(&intel_pt_pmu->format, "noretcomp",
+			     &noretcomp_bit);
+
+	if (!session->evlist->nr_mmaps)
+		return -EINVAL;
+
+	pc = session->evlist->mmap[0].base;
+	if (pc) {
+		err = perf_read_tsc_conversion(pc, &tc);
+		if (err) {
+			if (err != -EOPNOTSUPP)
+				return err;
+		} else {
+			cap_user_time_zero = tc.time_mult != 0;
+		}
+		if (!cap_user_time_zero)
+			ui__warning("Intel Processor Trace: TSC not available\n");
+	}
+
+	per_cpu_mmaps = !cpu_map__empty(session->evlist->cpus);
+
+	auxtrace_info->type = PERF_AUXTRACE_INTEL_PT;
+	auxtrace_info->priv[INTEL_PT_PMU_TYPE] = intel_pt_pmu->type;
+	auxtrace_info->priv[INTEL_PT_TIME_SHIFT] = tc.time_shift;
+	auxtrace_info->priv[INTEL_PT_TIME_MULT] = tc.time_mult;
+	auxtrace_info->priv[INTEL_PT_TIME_ZERO] = tc.time_zero;
+	auxtrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO] = cap_user_time_zero;
+	auxtrace_info->priv[INTEL_PT_TSC_BIT] = tsc_bit;
+	auxtrace_info->priv[INTEL_PT_NORETCOMP_BIT] = noretcomp_bit;
+	auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH] = ptr->have_sched_switch;
+	auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE] = ptr->snapshot_mode;
+	auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS] = per_cpu_mmaps;
+
+	return 0;
+}
+
+static int intel_pt_track_switches(struct perf_evlist *evlist)
+{
+	const char *sched_switch = "sched:sched_switch";
+	struct perf_evsel *evsel;
+	int err;
+
+	if (!perf_evlist__can_select_event(evlist, sched_switch))
+		return -EPERM;
+
+	err = parse_events(evlist, sched_switch, NULL);
+	if (err) {
+		pr_debug2("%s: failed to parse %s, error %d\n",
+			  __func__, sched_switch, err);
+		return err;
+	}
+
+	evsel = perf_evlist__last(evlist);
+
+	perf_evsel__set_sample_bit(evsel, CPU);
+	perf_evsel__set_sample_bit(evsel, TIME);
+
+	evsel->system_wide = true;
+	evsel->no_aux_samples = true;
+	evsel->immediate = true;
+
+	return 0;
+}
+
+static int intel_pt_recording_options(struct auxtrace_record *itr,
+				      struct perf_evlist *evlist,
+				      struct record_opts *opts)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
+	bool have_timing_info;
+	struct perf_evsel *evsel, *intel_pt_evsel = NULL;
+	const struct cpu_map *cpus = evlist->cpus;
+	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
+	u64 tsc_bit;
+
+	ptr->evlist = evlist;
+	ptr->snapshot_mode = opts->auxtrace_snapshot_mode;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == intel_pt_pmu->type) {
+			if (intel_pt_evsel) {
+				pr_err("There may be only one " INTEL_PT_PMU_NAME " event\n");
+				return -EINVAL;
+			}
+			evsel->attr.freq = 0;
+			evsel->attr.sample_period = 1;
+			intel_pt_evsel = evsel;
+			opts->full_auxtrace = true;
+		}
+	}
+
+	if (opts->auxtrace_snapshot_mode && !opts->full_auxtrace) {
+		pr_err("Snapshot mode (-S option) requires " INTEL_PT_PMU_NAME " PMU event (-e " INTEL_PT_PMU_NAME ")\n");
+		return -EINVAL;
+	}
+
+	if (opts->use_clockid) {
+		pr_err("Cannot use clockid (-k option) with " INTEL_PT_PMU_NAME "\n");
+		return -EINVAL;
+	}
+
+	if (!opts->full_auxtrace)
+		return 0;
+
+	/* Set default sizes for snapshot mode */
+	if (opts->auxtrace_snapshot_mode) {
+		size_t psb_period = intel_pt_psb_period(intel_pt_pmu, evlist);
+
+		if (!opts->auxtrace_snapshot_size && !opts->auxtrace_mmap_pages) {
+			if (privileged) {
+				opts->auxtrace_mmap_pages = MiB(4) / page_size;
+			} else {
+				opts->auxtrace_mmap_pages = KiB(128) / page_size;
+				if (opts->mmap_pages == UINT_MAX)
+					opts->mmap_pages = KiB(256) / page_size;
+			}
+		} else if (!opts->auxtrace_mmap_pages && !privileged &&
+			   opts->mmap_pages == UINT_MAX) {
+			opts->mmap_pages = KiB(256) / page_size;
+		}
+		if (!opts->auxtrace_snapshot_size)
+			opts->auxtrace_snapshot_size =
+				opts->auxtrace_mmap_pages * (size_t)page_size;
+		if (!opts->auxtrace_mmap_pages) {
+			size_t sz = opts->auxtrace_snapshot_size;
+
+			sz = round_up(sz, page_size) / page_size;
+			opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
+		}
+		if (opts->auxtrace_snapshot_size >
+				opts->auxtrace_mmap_pages * (size_t)page_size) {
+			pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
+			       opts->auxtrace_snapshot_size,
+			       opts->auxtrace_mmap_pages * (size_t)page_size);
+			return -EINVAL;
+		}
+		if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages) {
+			pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
+			return -EINVAL;
+		}
+		pr_debug2("Intel PT snapshot size: %zu\n",
+			  opts->auxtrace_snapshot_size);
+		if (psb_period &&
+		    opts->auxtrace_snapshot_size <= psb_period +
+						  INTEL_PT_PSB_PERIOD_NEAR)
+			ui__warning("Intel PT snapshot size (%zu) may be too small for PSB period (%zu)\n",
+				    opts->auxtrace_snapshot_size, psb_period);
+	}
+
+	/* Set default sizes for full trace mode */
+	if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) {
+		if (privileged) {
+			opts->auxtrace_mmap_pages = MiB(4) / page_size;
+		} else {
+			opts->auxtrace_mmap_pages = KiB(128) / page_size;
+			if (opts->mmap_pages == UINT_MAX)
+				opts->mmap_pages = KiB(256) / page_size;
+		}
+	}
+
+	/* Validate auxtrace_mmap_pages */
+	if (opts->auxtrace_mmap_pages) {
+		size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
+		size_t min_sz;
+
+		if (opts->auxtrace_snapshot_mode)
+			min_sz = KiB(4);
+		else
+			min_sz = KiB(8);
+
+		if (sz < min_sz || !is_power_of_2(sz)) {
+			pr_err("Invalid mmap size for Intel Processor Trace: must be at least %zuKiB and a power of 2\n",
+			       min_sz / 1024);
+			return -EINVAL;
+		}
+	}
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
+
+	if (opts->full_auxtrace && (intel_pt_evsel->attr.config & tsc_bit))
+		have_timing_info = true;
+	else
+		have_timing_info = false;
+
+	/*
+	 * Per-cpu recording needs sched_switch events to distinguish different
+	 * threads.
+	 */
+	if (have_timing_info && !cpu_map__empty(cpus)) {
+		int err;
+
+		err = intel_pt_track_switches(evlist);
+		if (err == -EPERM)
+			pr_debug2("Unable to select sched:sched_switch\n");
+		else if (err)
+			return err;
+		else
+			ptr->have_sched_switch = 1;
+	}
+
+	if (intel_pt_evsel) {
+		/*
+		 * To obtain the auxtrace buffer file descriptor, the auxtrace
+		 * event must come first.
+		 */
+		perf_evlist__to_front(evlist, intel_pt_evsel);
+		/*
+		 * In the case of per-cpu mmaps, we need the CPU on the
+		 * AUX event.
+		 */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(intel_pt_evsel, CPU);
+	}
+
+	/* Add dummy event to keep tracking */
+	if (opts->full_auxtrace) {
+		struct perf_evsel *tracking_evsel;
+		int err;
+
+		err = parse_events(evlist, "dummy:u", NULL);
+		if (err)
+			return err;
+
+		tracking_evsel = perf_evlist__last(evlist);
+
+		perf_evlist__set_tracking_event(evlist, tracking_evsel);
+
+		tracking_evsel->attr.freq = 0;
+		tracking_evsel->attr.sample_period = 1;
+
+		/* In per-cpu case, always need the time of mmap events etc */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(tracking_evsel, TIME);
+	}
+
+	/*
+	 * Warn the user when we do not have enough information to decode i.e.
+	 * per-cpu with no sched_switch (except workload-only).
+	 */
+	if (!ptr->have_sched_switch && !cpu_map__empty(cpus) &&
+	    !target__none(&opts->target))
+		ui__warning("Intel Processor Trace decoding will not be possible except for kernel tracing!\n");
+
+	return 0;
+}
+
+static int intel_pt_snapshot_start(struct auxtrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(ptr->evlist, evsel) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__disable_event(ptr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_pt_snapshot_finish(struct auxtrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(ptr->evlist, evsel) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__enable_event(ptr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_pt_alloc_snapshot_refs(struct intel_pt_recording *ptr, int idx)
+{
+	const size_t sz = sizeof(struct intel_pt_snapshot_ref);
+	int cnt = ptr->snapshot_ref_cnt, new_cnt = cnt * 2;
+	struct intel_pt_snapshot_ref *refs;
+
+	if (!new_cnt)
+		new_cnt = 16;
+
+	while (new_cnt <= idx)
+		new_cnt *= 2;
+
+	refs = calloc(new_cnt, sz);
+	if (!refs)
+		return -ENOMEM;
+
+	memcpy(refs, ptr->snapshot_refs, cnt * sz);
+
+	ptr->snapshot_refs = refs;
+	ptr->snapshot_ref_cnt = new_cnt;
+
+	return 0;
+}
+
+static void intel_pt_free_snapshot_refs(struct intel_pt_recording *ptr)
+{
+	int i;
+
+	for (i = 0; i < ptr->snapshot_ref_cnt; i++)
+		zfree(&ptr->snapshot_refs[i].ref_buf);
+	zfree(&ptr->snapshot_refs);
+}
+
+static void intel_pt_recording_free(struct auxtrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+
+	intel_pt_free_snapshot_refs(ptr);
+	free(ptr);
+}
+
+static int intel_pt_alloc_snapshot_ref(struct intel_pt_recording *ptr, int idx,
+				       size_t snapshot_buf_size)
+{
+	size_t ref_buf_size = ptr->snapshot_ref_buf_size;
+	void *ref_buf;
+
+	ref_buf = zalloc(ref_buf_size);
+	if (!ref_buf)
+		return -ENOMEM;
+
+	ptr->snapshot_refs[idx].ref_buf = ref_buf;
+	ptr->snapshot_refs[idx].ref_offset = snapshot_buf_size - ref_buf_size;
+
+	return 0;
+}
+
+static size_t intel_pt_snapshot_ref_buf_size(struct intel_pt_recording *ptr,
+					     size_t snapshot_buf_size)
+{
+	const size_t max_size = 256 * 1024;
+	size_t buf_size = 0, psb_period;
+
+	if (ptr->snapshot_size <= 64 * 1024)
+		return 0;
+
+	psb_period = intel_pt_psb_period(ptr->intel_pt_pmu, ptr->evlist);
+	if (psb_period)
+		buf_size = psb_period * 2;
+
+	if (!buf_size || buf_size > max_size)
+		buf_size = max_size;
+
+	if (buf_size >= snapshot_buf_size)
+		return 0;
+
+	if (buf_size >= ptr->snapshot_size / 2)
+		return 0;
+
+	return buf_size;
+}
+
+static int intel_pt_snapshot_init(struct intel_pt_recording *ptr,
+				  size_t snapshot_buf_size)
+{
+	if (ptr->snapshot_init_done)
+		return 0;
+
+	ptr->snapshot_init_done = true;
+
+	ptr->snapshot_ref_buf_size = intel_pt_snapshot_ref_buf_size(ptr,
+							snapshot_buf_size);
+
+	return 0;
+}
+
+/**
+ * intel_pt_compare_buffers - compare bytes in a buffer to a circular buffer.
+ * @buf1: first buffer
+ * @compare_size: number of bytes to compare
+ * @buf2: second buffer (a circular buffer)
+ * @offs2: offset in second buffer
+ * @buf2_size: size of second buffer
+ *
+ * The comparison allows for the possibility that the bytes to compare in the
+ * circular buffer are not contiguous.  It is assumed that @compare_size <=
+ * @buf2_size.  This function returns %false if the bytes are identical, %true
+ * otherwise.
+ */
+static bool intel_pt_compare_buffers(void *buf1, size_t compare_size,
+				     void *buf2, size_t offs2, size_t buf2_size)
+{
+	size_t end2 = offs2 + compare_size, part_size;
+
+	if (end2 <= buf2_size)
+		return memcmp(buf1, buf2 + offs2, compare_size);
+
+	part_size = end2 - buf2_size;
+	if (memcmp(buf1, buf2 + offs2, part_size))
+		return true;
+
+	compare_size -= part_size;
+
+	return memcmp(buf1 + part_size, buf2, compare_size);
+}
+
+static bool intel_pt_compare_ref(void *ref_buf, size_t ref_offset,
+				 size_t ref_size, size_t buf_size,
+				 void *data, size_t head)
+{
+	size_t ref_end = ref_offset + ref_size;
+
+	if (ref_end > buf_size) {
+		if (head > ref_offset || head < ref_end - buf_size)
+			return true;
+	} else if (head > ref_offset && head < ref_end) {
+		return true;
+	}
+
+	return intel_pt_compare_buffers(ref_buf, ref_size, data, ref_offset,
+					buf_size);
+}
+
+static void intel_pt_copy_ref(void *ref_buf, size_t ref_size, size_t buf_size,
+			      void *data, size_t head)
+{
+	if (head >= ref_size) {
+		memcpy(ref_buf, data + head - ref_size, ref_size);
+	} else {
+		memcpy(ref_buf, data, head);
+		ref_size -= head;
+		memcpy(ref_buf + head, data + buf_size - ref_size, ref_size);
+	}
+}
+
+static bool intel_pt_wrapped(struct intel_pt_recording *ptr, int idx,
+			     struct auxtrace_mmap *mm, unsigned char *data,
+			     u64 head)
+{
+	struct intel_pt_snapshot_ref *ref = &ptr->snapshot_refs[idx];
+	bool wrapped;
+
+	wrapped = intel_pt_compare_ref(ref->ref_buf, ref->ref_offset,
+				       ptr->snapshot_ref_buf_size, mm->len,
+				       data, head);
+
+	intel_pt_copy_ref(ref->ref_buf, ptr->snapshot_ref_buf_size, mm->len,
+			  data, head);
+
+	return wrapped;
+}
+
+static bool intel_pt_first_wrap(u64 *data, size_t buf_size)
+{
+	int i, a, b;
+
+	b = buf_size >> 3;
+	a = b - 512;
+	if (a < 0)
+		a = 0;
+
+	for (i = a; i < b; i++) {
+		if (data[i])
+			return true;
+	}
+
+	return false;
+}
+
+static int intel_pt_find_snapshot(struct auxtrace_record *itr, int idx,
+				  struct auxtrace_mmap *mm, unsigned char *data,
+				  u64 *head, u64 *old)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	bool wrapped;
+	int err;
+
+	pr_debug3("%s: mmap index %d old head %zu new head %zu\n",
+		  __func__, idx, (size_t)*old, (size_t)*head);
+
+	err = intel_pt_snapshot_init(ptr, mm->len);
+	if (err)
+		goto out_err;
+
+	if (idx >= ptr->snapshot_ref_cnt) {
+		err = intel_pt_alloc_snapshot_refs(ptr, idx);
+		if (err)
+			goto out_err;
+	}
+
+	if (ptr->snapshot_ref_buf_size) {
+		if (!ptr->snapshot_refs[idx].ref_buf) {
+			err = intel_pt_alloc_snapshot_ref(ptr, idx, mm->len);
+			if (err)
+				goto out_err;
+		}
+		wrapped = intel_pt_wrapped(ptr, idx, mm, data, *head);
+	} else {
+		wrapped = ptr->snapshot_refs[idx].wrapped;
+		if (!wrapped && intel_pt_first_wrap((u64 *)data, mm->len)) {
+			ptr->snapshot_refs[idx].wrapped = true;
+			wrapped = true;
+		}
+	}
+
+	/*
+	 * In full trace mode 'head' continually increases.  However in snapshot
+	 * mode 'head' is an offset within the buffer.  Here 'old' and 'head'
+	 * are adjusted to match the full trace case which expects that 'old' is
+	 * always less than 'head'.
+	 */
+	if (wrapped) {
+		*old = *head;
+		*head += mm->len;
+	} else {
+		if (mm->mask)
+			*old &= mm->mask;
+		else
+			*old %= mm->len;
+		if (*old > *head)
+			*head += mm->len;
+	}
+
+	pr_debug3("%s: wrap-around %sdetected, adjusted old head %zu adjusted new head %zu\n",
+		  __func__, wrapped ? "" : "not ", (size_t)*old, (size_t)*head);
+
+	return 0;
+
+out_err:
+	pr_err("%s: failed, error %d\n", __func__, err);
+	return err;
+}
+
+static u64 intel_pt_reference(struct auxtrace_record *itr __maybe_unused)
+{
+	return rdtsc();
+}
+
+static int intel_pt_read_finish(struct auxtrace_record *itr, int idx)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(ptr->evlist, evsel) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__enable_event_idx(ptr->evlist, evsel,
+							     idx);
+	}
+	return -EINVAL;
+}
+
+struct auxtrace_record *intel_pt_recording_init(int *err)
+{
+	struct perf_pmu *intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
+	struct intel_pt_recording *ptr;
+
+	if (!intel_pt_pmu)
+		return NULL;
+
+	ptr = zalloc(sizeof(struct intel_pt_recording));
+	if (!ptr) {
+		*err = -ENOMEM;
+		return NULL;
+	}
+
+	ptr->intel_pt_pmu = intel_pt_pmu;
+	ptr->itr.recording_options = intel_pt_recording_options;
+	ptr->itr.info_priv_size = intel_pt_info_priv_size;
+	ptr->itr.info_fill = intel_pt_info_fill;
+	ptr->itr.free = intel_pt_recording_free;
+	ptr->itr.snapshot_start = intel_pt_snapshot_start;
+	ptr->itr.snapshot_finish = intel_pt_snapshot_finish;
+	ptr->itr.find_snapshot = intel_pt_find_snapshot;
+	ptr->itr.parse_snapshot_options = intel_pt_parse_snapshot_options;
+	ptr->itr.reference = intel_pt_reference;
+	ptr->itr.read_finish = intel_pt_read_finish;
+	return &ptr->itr;
+}
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index 86c81f6..ec7ab9d 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -76,6 +76,7 @@ libperf-y += cloexec.o
 libperf-y += thread-stack.o
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
+libperf-$(CONFIG_AUXTRACE) += intel-pt.o
 libperf-y += parse-branch-options.o
 
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
new file mode 100644
index 0000000..6d66879
--- /dev/null
+++ b/tools/perf/util/intel-pt.c
@@ -0,0 +1,1889 @@
+/*
+ * intel_pt.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <stdio.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+
+#include "../perf.h"
+#include "session.h"
+#include "machine.h"
+#include "tool.h"
+#include "event.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "map.h"
+#include "color.h"
+#include "util.h"
+#include "thread.h"
+#include "thread-stack.h"
+#include "symbol.h"
+#include "callchain.h"
+#include "dso.h"
+#include "debug.h"
+#include "auxtrace.h"
+#include "tsc.h"
+#include "intel-pt.h"
+
+#include "intel-pt-decoder/intel-pt-log.h"
+#include "intel-pt-decoder/intel-pt-decoder.h"
+#include "intel-pt-decoder/intel-pt-insn-decoder.h"
+#include "intel-pt-decoder/intel-pt-pkt-decoder.h"
+
+#define MAX_TIMESTAMP (~0ULL)
+
+struct intel_pt {
+	struct auxtrace auxtrace;
+	struct auxtrace_queues queues;
+	struct auxtrace_heap heap;
+	u32 auxtrace_type;
+	struct perf_session *session;
+	struct machine *machine;
+	struct perf_evsel *switch_evsel;
+	struct thread *unknown_thread;
+	bool timeless_decoding;
+	bool sampling_mode;
+	bool snapshot_mode;
+	bool per_cpu_mmaps;
+	bool have_tsc;
+	bool data_queued;
+	bool est_tsc;
+	bool sync_switch;
+	bool est_tsc_orig;
+	int have_sched_switch;
+	u32 pmu_type;
+	u64 kernel_start;
+	u64 switch_ip;
+	u64 ptss_ip;
+
+	struct perf_tsc_conversion tc;
+	bool cap_user_time_zero;
+
+	struct itrace_synth_opts synth_opts;
+
+	bool sample_instructions;
+	u64 instructions_sample_type;
+	u64 instructions_sample_period;
+	u64 instructions_id;
+
+	bool sample_branches;
+	u32 branches_filter;
+	u64 branches_sample_type;
+	u64 branches_id;
+
+	bool sample_transactions;
+	u64 transactions_sample_type;
+	u64 transactions_id;
+
+	bool synth_needs_swap;
+
+	u64 tsc_bit;
+	u64 noretcomp_bit;
+};
+
+enum switch_state {
+	INTEL_PT_SS_NOT_TRACING,
+	INTEL_PT_SS_UNKNOWN,
+	INTEL_PT_SS_TRACING,
+	INTEL_PT_SS_EXPECTING_SWITCH_EVENT,
+	INTEL_PT_SS_EXPECTING_SWITCH_IP,
+};
+
+struct intel_pt_queue {
+	struct intel_pt *pt;
+	unsigned int queue_nr;
+	struct auxtrace_buffer *buffer;
+	void *decoder;
+	const struct intel_pt_state *state;
+	struct ip_callchain *chain;
+	union perf_event *event_buf;
+	bool on_heap;
+	bool stop;
+	bool step_through_buffers;
+	bool use_buffer_pid_tid;
+	pid_t pid, tid;
+	int cpu;
+	int switch_state;
+	pid_t next_tid;
+	struct thread *thread;
+	bool exclude_kernel;
+	bool have_sample;
+	u64 time;
+	u64 timestamp;
+	u32 flags;
+	u16 insn_len;
+};
+
+static void intel_pt_dump(struct intel_pt *pt __maybe_unused,
+			  unsigned char *buf, size_t len)
+{
+	struct intel_pt_pkt packet;
+	size_t pos = 0;
+	int ret, pkt_len, i;
+	char desc[INTEL_PT_PKT_DESC_MAX];
+	const char *color = PERF_COLOR_BLUE;
+
+	color_fprintf(stdout, color,
+		      ". ... Intel Processor Trace data: size %zu bytes\n",
+		      len);
+
+	while (len) {
+		ret = intel_pt_get_packet(buf, len, &packet);
+		if (ret > 0)
+			pkt_len = ret;
+		else
+			pkt_len = 1;
+		printf(".");
+		color_fprintf(stdout, color, "  %08x: ", pos);
+		for (i = 0; i < pkt_len; i++)
+			color_fprintf(stdout, color, " %02x", buf[i]);
+		for (; i < 16; i++)
+			color_fprintf(stdout, color, "   ");
+		if (ret > 0) {
+			ret = intel_pt_pkt_desc(&packet, desc,
+						INTEL_PT_PKT_DESC_MAX);
+			if (ret > 0)
+				color_fprintf(stdout, color, " %s\n", desc);
+		} else {
+			color_fprintf(stdout, color, " Bad packet!\n");
+		}
+		pos += pkt_len;
+		buf += pkt_len;
+		len -= pkt_len;
+	}
+}
+
+static void intel_pt_dump_event(struct intel_pt *pt, unsigned char *buf,
+				size_t len)
+{
+	printf(".\n");
+	intel_pt_dump(pt, buf, len);
+}
+
+static int intel_pt_do_fix_overlap(struct intel_pt *pt, struct auxtrace_buffer *a,
+				   struct auxtrace_buffer *b)
+{
+	void *start;
+
+	start = intel_pt_find_overlap(a->data, a->size, b->data, b->size,
+				      pt->have_tsc);
+	if (!start)
+		return -EINVAL;
+	b->use_size = b->data + b->size - start;
+	b->use_data = start;
+	return 0;
+}
+
+static void intel_pt_use_buffer_pid_tid(struct intel_pt_queue *ptq,
+					struct auxtrace_queue *queue,
+					struct auxtrace_buffer *buffer)
+{
+	if (queue->cpu == -1 && buffer->cpu != -1)
+		ptq->cpu = buffer->cpu;
+
+	ptq->pid = buffer->pid;
+	ptq->tid = buffer->tid;
+
+	intel_pt_log("queue %u cpu %d pid %d tid %d\n",
+		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+
+	ptq->thread = NULL;
+
+	if (ptq->tid != -1) {
+		if (ptq->pid != -1)
+			ptq->thread = machine__findnew_thread(ptq->pt->machine,
+							      ptq->pid,
+							      ptq->tid);
+		else
+			ptq->thread = machine__find_thread(ptq->pt->machine, -1,
+							   ptq->tid);
+	}
+}
+
+/* This function assumes data is processed sequentially only */
+static int intel_pt_get_trace(struct intel_pt_buffer *b, void *data)
+{
+	struct intel_pt_queue *ptq = data;
+	struct auxtrace_buffer *buffer = ptq->buffer, *old_buffer = buffer;
+	struct auxtrace_queue *queue;
+
+	if (ptq->stop) {
+		b->len = 0;
+		return 0;
+	}
+
+	queue = &ptq->pt->queues.queue_array[ptq->queue_nr];
+
+	buffer = auxtrace_buffer__next(queue, buffer);
+	if (!buffer) {
+		if (old_buffer)
+			auxtrace_buffer__drop_data(old_buffer);
+		b->len = 0;
+		return 0;
+	}
+
+	ptq->buffer = buffer;
+
+	if (!buffer->data) {
+		int fd = perf_data_file__fd(ptq->pt->session->file);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data)
+			return -ENOMEM;
+	}
+
+	if (ptq->pt->snapshot_mode && !buffer->consecutive && old_buffer &&
+	    intel_pt_do_fix_overlap(ptq->pt, old_buffer, buffer))
+		return -ENOMEM;
+
+	if (old_buffer)
+		auxtrace_buffer__drop_data(old_buffer);
+
+	if (buffer->use_data) {
+		b->len = buffer->use_size;
+		b->buf = buffer->use_data;
+	} else {
+		b->len = buffer->size;
+		b->buf = buffer->data;
+	}
+	b->ref_timestamp = buffer->reference;
+
+	if (!old_buffer || ptq->pt->sampling_mode || (ptq->pt->snapshot_mode &&
+						      !buffer->consecutive)) {
+		b->consecutive = false;
+		b->trace_nr = buffer->buffer_nr;
+	} else {
+		b->consecutive = true;
+	}
+
+	if (ptq->use_buffer_pid_tid && (ptq->pid != buffer->pid ||
+					ptq->tid != buffer->tid))
+		intel_pt_use_buffer_pid_tid(ptq, queue, buffer);
+
+	if (ptq->step_through_buffers)
+		ptq->stop = true;
+
+	if (!b->len)
+		return intel_pt_get_trace(b, data);
+
+	return 0;
+}
+
+struct intel_pt_cache_entry {
+	struct auxtrace_cache_entry	entry;
+	u64				insn_cnt;
+	u64				byte_cnt;
+	enum intel_pt_insn_op		op;
+	enum intel_pt_insn_branch	branch;
+	int				length;
+	int32_t				rel;
+};
+
+static int intel_pt_config_div(const char *var, const char *value, void *data)
+{
+	int *d = data;
+	long val;
+
+	if (!strcmp(var, "intel-pt.cache-divisor")) {
+		val = strtol(value, NULL, 0);
+		if (val > 0 && val <= INT_MAX)
+			*d = val;
+	}
+
+	return 0;
+}
+
+static int intel_pt_cache_divisor(void)
+{
+	static int d;
+
+	if (d)
+		return d;
+
+	perf_config(intel_pt_config_div, &d);
+
+	if (!d)
+		d = 64;
+
+	return d;
+}
+
+static unsigned int intel_pt_cache_size(struct dso *dso,
+					struct machine *machine)
+{
+	off_t size;
+
+	size = dso__data_size(dso, machine);
+	size /= intel_pt_cache_divisor();
+	if (size < 1000)
+		return 10;
+	if (size > (1 << 21))
+		return 21;
+	return 32 - __builtin_clz(size);
+}
+
+static struct auxtrace_cache *intel_pt_cache(struct dso *dso,
+					     struct machine *machine)
+{
+	struct auxtrace_cache *c;
+	unsigned int bits;
+
+	if (dso->auxtrace_cache)
+		return dso->auxtrace_cache;
+
+	bits = intel_pt_cache_size(dso, machine);
+
+	/* Ignoring cache creation failure */
+	c = auxtrace_cache__new(bits, sizeof(struct intel_pt_cache_entry), 200);
+
+	dso->auxtrace_cache = c;
+
+	return c;
+}
+
+static int intel_pt_cache_add(struct dso *dso, struct machine *machine,
+			      u64 offset, u64 insn_cnt, u64 byte_cnt,
+			      struct intel_pt_insn *intel_pt_insn)
+{
+	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
+	struct intel_pt_cache_entry *e;
+	int err;
+
+	if (!c)
+		return -ENOMEM;
+
+	e = auxtrace_cache__alloc_entry(c);
+	if (!e)
+		return -ENOMEM;
+
+	e->insn_cnt = insn_cnt;
+	e->byte_cnt = byte_cnt;
+	e->op = intel_pt_insn->op;
+	e->branch = intel_pt_insn->branch;
+	e->length = intel_pt_insn->length;
+	e->rel = intel_pt_insn->rel;
+
+	err = auxtrace_cache__add(c, offset, &e->entry);
+	if (err)
+		auxtrace_cache__free_entry(c, e);
+
+	return err;
+}
+
+static struct intel_pt_cache_entry *
+intel_pt_cache_lookup(struct dso *dso, struct machine *machine, u64 offset)
+{
+	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
+
+	if (!c)
+		return NULL;
+
+	return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
+}
+
+static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
+				   uint64_t *insn_cnt_ptr, uint64_t *ip,
+				   uint64_t to_ip, uint64_t max_insn_cnt,
+				   void *data)
+{
+	struct intel_pt_queue *ptq = data;
+	struct machine *machine = ptq->pt->machine;
+	struct thread *thread;
+	struct addr_location al;
+	unsigned char buf[1024];
+	size_t bufsz;
+	ssize_t len;
+	int x86_64;
+	u8 cpumode;
+	u64 offset, start_offset, start_ip;
+	u64 insn_cnt = 0;
+	bool one_map = true;
+
+	if (to_ip && *ip == to_ip)
+		goto out_no_cache;
+
+	bufsz = intel_pt_insn_max_size();
+
+	if (*ip >= ptq->pt->kernel_start)
+		cpumode = PERF_RECORD_MISC_KERNEL;
+	else
+		cpumode = PERF_RECORD_MISC_USER;
+
+	thread = ptq->thread;
+	if (!thread) {
+		if (cpumode != PERF_RECORD_MISC_KERNEL)
+			return -EINVAL;
+		thread = ptq->pt->unknown_thread;
+	}
+
+	while (1) {
+		thread__find_addr_map(thread, cpumode, MAP__FUNCTION, *ip, &al);
+		if (!al.map || !al.map->dso)
+			return -EINVAL;
+
+		if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR &&
+		    dso__data_status_seen(al.map->dso,
+					  DSO_DATA_STATUS_SEEN_ITRACE))
+			return -ENOENT;
+
+		offset = al.map->map_ip(al.map, *ip);
+
+		if (!to_ip && one_map) {
+			struct intel_pt_cache_entry *e;
+
+			e = intel_pt_cache_lookup(al.map->dso, machine, offset);
+			if (e &&
+			    (!max_insn_cnt || e->insn_cnt <= max_insn_cnt)) {
+				*insn_cnt_ptr = e->insn_cnt;
+				*ip += e->byte_cnt;
+				intel_pt_insn->op = e->op;
+				intel_pt_insn->branch = e->branch;
+				intel_pt_insn->length = e->length;
+				intel_pt_insn->rel = e->rel;
+				intel_pt_log_insn_no_data(intel_pt_insn, *ip);
+				return 0;
+			}
+		}
+
+		start_offset = offset;
+		start_ip = *ip;
+
+		/* Load maps to ensure dso->is_64_bit has been updated */
+		map__load(al.map, machine->symbol_filter);
+
+		x86_64 = al.map->dso->is_64_bit;
+
+		while (1) {
+			len = dso__data_read_offset(al.map->dso, machine,
+						    offset, buf, bufsz);
+			if (len <= 0)
+				return -EINVAL;
+
+			if (intel_pt_get_insn(buf, len, x86_64, intel_pt_insn))
+				return -EINVAL;
+
+			intel_pt_log_insn(intel_pt_insn, *ip);
+
+			insn_cnt += 1;
+
+			if (intel_pt_insn->branch != INTEL_PT_BR_NO_BRANCH)
+				goto out;
+
+			if (max_insn_cnt && insn_cnt >= max_insn_cnt)
+				goto out_no_cache;
+
+			*ip += intel_pt_insn->length;
+
+			if (to_ip && *ip == to_ip)
+				goto out_no_cache;
+
+			if (*ip >= al.map->end)
+				break;
+
+			offset += intel_pt_insn->length;
+		}
+		one_map = false;
+	}
+out:
+	*insn_cnt_ptr = insn_cnt;
+
+	if (!one_map)
+		goto out_no_cache;
+
+	/*
+	 * Didn't lookup in the 'to_ip' case, so do it now to prevent duplicate
+	 * entries.
+	 */
+	if (to_ip) {
+		struct intel_pt_cache_entry *e;
+
+		e = intel_pt_cache_lookup(al.map->dso, machine, start_offset);
+		if (e)
+			return 0;
+	}
+
+	/* Ignore cache errors */
+	intel_pt_cache_add(al.map->dso, machine, start_offset, insn_cnt,
+			   *ip - start_ip, intel_pt_insn);
+
+	return 0;
+
+out_no_cache:
+	*insn_cnt_ptr = insn_cnt;
+	return 0;
+}
+
+static bool intel_pt_get_config(struct intel_pt *pt,
+				struct perf_event_attr *attr, u64 *config)
+{
+	if (attr->type == pt->pmu_type) {
+		if (config)
+			*config = attr->config;
+		return true;
+	}
+
+	return false;
+}
+
+static bool intel_pt_exclude_kernel(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, NULL) &&
+		    !evsel->attr.exclude_kernel)
+			return false;
+	}
+	return true;
+}
+
+static bool intel_pt_return_compression(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	u64 config;
+
+	if (!pt->noretcomp_bit)
+		return true;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, &config) &&
+		    (config & pt->noretcomp_bit))
+			return false;
+	}
+	return true;
+}
+
+static bool intel_pt_timeless_decoding(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	bool timeless_decoding = true;
+	u64 config;
+
+	if (!pt->tsc_bit || !pt->cap_user_time_zero)
+		return true;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (!(evsel->attr.sample_type & PERF_SAMPLE_TIME))
+			return true;
+		if (intel_pt_get_config(pt, &evsel->attr, &config)) {
+			if (config & pt->tsc_bit)
+				timeless_decoding = false;
+			else
+				return true;
+		}
+	}
+	return timeless_decoding;
+}
+
+static bool intel_pt_tracing_kernel(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, NULL) &&
+		    !evsel->attr.exclude_kernel)
+			return true;
+	}
+	return false;
+}
+
+static bool intel_pt_have_tsc(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+	bool have_tsc = false;
+	u64 config;
+
+	if (!pt->tsc_bit)
+		return false;
+
+	evlist__for_each(pt->session->evlist, evsel) {
+		if (intel_pt_get_config(pt, &evsel->attr, &config)) {
+			if (config & pt->tsc_bit)
+				have_tsc = true;
+			else
+				return false;
+		}
+	}
+	return have_tsc;
+}
+
+static u64 intel_pt_ns_to_ticks(const struct intel_pt *pt, u64 ns)
+{
+	u64 quot, rem;
+
+	quot = ns / pt->tc.time_mult;
+	rem  = ns % pt->tc.time_mult;
+	return (quot << pt->tc.time_shift) + (rem << pt->tc.time_shift) /
+		pt->tc.time_mult;
+}
+
+static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
+						   unsigned int queue_nr)
+{
+	struct intel_pt_params params = { .get_trace = 0, };
+	struct intel_pt_queue *ptq;
+
+	ptq = zalloc(sizeof(struct intel_pt_queue));
+	if (!ptq)
+		return NULL;
+
+	if (pt->synth_opts.callchain) {
+		size_t sz = sizeof(struct ip_callchain);
+
+		sz += pt->synth_opts.callchain_sz * sizeof(u64);
+		ptq->chain = zalloc(sz);
+		if (!ptq->chain)
+			goto out_free;
+	}
+
+	ptq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
+	if (!ptq->event_buf)
+		goto out_free;
+
+	ptq->pt = pt;
+	ptq->queue_nr = queue_nr;
+	ptq->exclude_kernel = intel_pt_exclude_kernel(pt);
+	ptq->pid = -1;
+	ptq->tid = -1;
+	ptq->cpu = -1;
+	ptq->next_tid = -1;
+
+	params.get_trace = intel_pt_get_trace;
+	params.walk_insn = intel_pt_walk_next_insn;
+	params.data = ptq;
+	params.return_compression = intel_pt_return_compression(pt);
+
+	if (pt->synth_opts.instructions) {
+		if (pt->synth_opts.period) {
+			switch (pt->synth_opts.period_type) {
+			case PERF_ITRACE_PERIOD_INSTRUCTIONS:
+				params.period_type =
+						INTEL_PT_PERIOD_INSTRUCTIONS;
+				params.period = pt->synth_opts.period;
+				break;
+			case PERF_ITRACE_PERIOD_TICKS:
+				params.period_type = INTEL_PT_PERIOD_TICKS;
+				params.period = pt->synth_opts.period;
+				break;
+			case PERF_ITRACE_PERIOD_NANOSECS:
+				params.period_type = INTEL_PT_PERIOD_TICKS;
+				params.period = intel_pt_ns_to_ticks(pt,
+							pt->synth_opts.period);
+				break;
+			default:
+				break;
+			}
+		}
+
+		if (!params.period) {
+			params.period_type = INTEL_PT_PERIOD_INSTRUCTIONS;
+			params.period = 1000;
+		}
+	}
+
+	ptq->decoder = intel_pt_decoder_new(&params);
+	if (!ptq->decoder)
+		goto out_free;
+
+	return ptq;
+
+out_free:
+	zfree(&ptq->event_buf);
+	zfree(&ptq->chain);
+	free(ptq);
+	return NULL;
+}
+
+static void intel_pt_free_queue(void *priv)
+{
+	struct intel_pt_queue *ptq = priv;
+
+	if (!ptq)
+		return;
+	intel_pt_decoder_free(ptq->decoder);
+	zfree(&ptq->event_buf);
+	zfree(&ptq->chain);
+	free(ptq);
+}
+
+static void intel_pt_set_pid_tid_cpu(struct intel_pt *pt,
+				     struct auxtrace_queue *queue)
+{
+	struct intel_pt_queue *ptq = queue->priv;
+
+	if (queue->tid == -1 || pt->have_sched_switch) {
+		ptq->tid = machine__get_current_tid(pt->machine, ptq->cpu);
+		ptq->thread = NULL;
+	}
+
+	if (!ptq->thread && ptq->tid != -1)
+		ptq->thread = machine__find_thread(pt->machine, -1, ptq->tid);
+
+	if (ptq->thread) {
+		ptq->pid = ptq->thread->pid_;
+		if (queue->cpu == -1)
+			ptq->cpu = ptq->thread->cpu;
+	}
+}
+
+static void intel_pt_sample_flags(struct intel_pt_queue *ptq)
+{
+	if (ptq->state->flags & INTEL_PT_ABORT_TX) {
+		ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TX_ABORT;
+	} else if (ptq->state->flags & INTEL_PT_ASYNC) {
+		if (ptq->state->to_ip)
+			ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
+				     PERF_IP_FLAG_ASYNC |
+				     PERF_IP_FLAG_INTERRUPT;
+		else
+			ptq->flags = PERF_IP_FLAG_BRANCH |
+				     PERF_IP_FLAG_TRACE_END;
+		ptq->insn_len = 0;
+	} else {
+		if (ptq->state->from_ip)
+			ptq->flags = intel_pt_insn_type(ptq->state->insn_op);
+		else
+			ptq->flags = PERF_IP_FLAG_BRANCH |
+				     PERF_IP_FLAG_TRACE_BEGIN;
+		if (ptq->state->flags & INTEL_PT_IN_TX)
+			ptq->flags |= PERF_IP_FLAG_IN_TX;
+		ptq->insn_len = ptq->state->insn_len;
+	}
+}
+
+static int intel_pt_setup_queue(struct intel_pt *pt,
+				struct auxtrace_queue *queue,
+				unsigned int queue_nr)
+{
+	struct intel_pt_queue *ptq = queue->priv;
+
+	if (list_empty(&queue->head))
+		return 0;
+
+	if (!ptq) {
+		ptq = intel_pt_alloc_queue(pt, queue_nr);
+		if (!ptq)
+			return -ENOMEM;
+		queue->priv = ptq;
+
+		if (queue->cpu != -1)
+			ptq->cpu = queue->cpu;
+		ptq->tid = queue->tid;
+
+		if (pt->sampling_mode) {
+			if (pt->timeless_decoding)
+				ptq->step_through_buffers = true;
+			if (pt->timeless_decoding || !pt->have_sched_switch)
+				ptq->use_buffer_pid_tid = true;
+		}
+	}
+
+	if (!ptq->on_heap &&
+	    (!pt->sync_switch ||
+	     ptq->switch_state != INTEL_PT_SS_EXPECTING_SWITCH_EVENT)) {
+		const struct intel_pt_state *state;
+		int ret;
+
+		if (pt->timeless_decoding)
+			return 0;
+
+		intel_pt_log("queue %u getting timestamp\n", queue_nr);
+		intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
+			     queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+		while (1) {
+			state = intel_pt_decode(ptq->decoder);
+			if (state->err) {
+				if (state->err == INTEL_PT_ERR_NODATA) {
+					intel_pt_log("queue %u has no timestamp\n",
+						     queue_nr);
+					return 0;
+				}
+				continue;
+			}
+			if (state->timestamp)
+				break;
+		}
+
+		ptq->timestamp = state->timestamp;
+		intel_pt_log("queue %u timestamp 0x%" PRIx64 "\n",
+			     queue_nr, ptq->timestamp);
+		ptq->state = state;
+		ptq->have_sample = true;
+		intel_pt_sample_flags(ptq);
+		ret = auxtrace_heap__add(&pt->heap, queue_nr, ptq->timestamp);
+		if (ret)
+			return ret;
+		ptq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int intel_pt_setup_queues(struct intel_pt *pt)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < pt->queues.nr_queues; i++) {
+		ret = intel_pt_setup_queue(pt, &pt->queues.queue_array[i], i);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static int intel_pt_inject_event(union perf_event *event,
+				 struct perf_sample *sample, u64 type,
+				 bool swapped)
+{
+	event->header.size = perf_event__sample_event_size(sample, type, 0);
+	return perf_event__synthesize_sample(event, type, 0, sample, swapped);
+}
+
+static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event *event = ptq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->branches_id;
+	sample.stream_id = ptq->pt->branches_id;
+	sample.period = 1;
+	sample.cpu = ptq->cpu;
+
+	if (pt->branches_filter && !(pt->branches_filter & ptq->flags))
+		return 0;
+
+	if (pt->synth_opts.inject) {
+		ret = intel_pt_inject_event(event, &sample,
+					    pt->branches_sample_type,
+					    pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver branch event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event *event = ptq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->instructions_id;
+	sample.stream_id = ptq->pt->instructions_id;
+	sample.period = ptq->pt->instructions_sample_period;
+	sample.cpu = ptq->cpu;
+
+	if (pt->synth_opts.callchain) {
+		thread_stack__sample(ptq->thread, ptq->chain,
+				     pt->synth_opts.callchain_sz, sample.ip);
+		sample.callchain = ptq->chain;
+	}
+
+	if (pt->synth_opts.inject) {
+		ret = intel_pt_inject_event(event, &sample,
+					    pt->instructions_sample_type,
+					    pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver instruction event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_transaction_sample(struct intel_pt_queue *ptq)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event *event = ptq->event_buf;
+	struct perf_sample sample = { .ip = 0, };
+
+	event->sample.header.type = PERF_RECORD_SAMPLE;
+	event->sample.header.misc = PERF_RECORD_MISC_USER;
+	event->sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->transactions_id;
+	sample.stream_id = ptq->pt->transactions_id;
+	sample.period = 1;
+	sample.cpu = ptq->cpu;
+	sample.flags = ptq->flags;
+	sample.insn_len = ptq->insn_len;
+
+	if (pt->synth_opts.callchain) {
+		thread_stack__sample(ptq->thread, ptq->chain,
+				     pt->synth_opts.callchain_sz, sample.ip);
+		sample.callchain = ptq->chain;
+	}
+
+	if (pt->synth_opts.inject) {
+		ret = intel_pt_inject_event(event, &sample,
+					    pt->transactions_sample_type,
+					    pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver transaction event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_error(struct intel_pt *pt, int code, int cpu,
+				pid_t pid, pid_t tid, u64 ip)
+{
+	union perf_event event;
+	char msg[MAX_AUXTRACE_ERROR_MSG];
+	int err;
+
+	intel_pt__strerror(code, msg, MAX_AUXTRACE_ERROR_MSG);
+
+	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
+			     code, cpu, pid, tid, ip, msg);
+
+	err = perf_session__deliver_synth_event(pt->session, &event, NULL);
+	if (err)
+		pr_err("Intel Processor Trace: failed to deliver error event, error %d\n",
+		       err);
+
+	return err;
+}
+
+static int intel_pt_next_tid(struct intel_pt *pt, struct intel_pt_queue *ptq)
+{
+	struct auxtrace_queue *queue;
+	pid_t tid = ptq->next_tid;
+	int err;
+
+	if (tid == -1)
+		return 0;
+
+	intel_pt_log("switch: cpu %d tid %d\n", ptq->cpu, tid);
+
+	err = machine__set_current_tid(pt->machine, ptq->cpu, -1, tid);
+
+	queue = &pt->queues.queue_array[ptq->queue_nr];
+	intel_pt_set_pid_tid_cpu(pt, queue);
+
+	ptq->next_tid = -1;
+
+	return err;
+}
+
+static inline bool intel_pt_is_switch_ip(struct intel_pt_queue *ptq, u64 ip)
+{
+	struct intel_pt *pt = ptq->pt;
+
+	return ip == pt->switch_ip &&
+	       (ptq->flags & PERF_IP_FLAG_BRANCH) &&
+	       !(ptq->flags & (PERF_IP_FLAG_CONDITIONAL | PERF_IP_FLAG_ASYNC |
+			       PERF_IP_FLAG_INTERRUPT | PERF_IP_FLAG_TX_ABORT));
+}
+
+static int intel_pt_sample(struct intel_pt_queue *ptq)
+{
+	const struct intel_pt_state *state = ptq->state;
+	struct intel_pt *pt = ptq->pt;
+	int err;
+
+	if (!ptq->have_sample)
+		return 0;
+
+	ptq->have_sample = false;
+
+	if (pt->sample_instructions &&
+	    (state->type & INTEL_PT_INSTRUCTION)) {
+		err = intel_pt_synth_instruction_sample(ptq);
+		if (err)
+			return err;
+	}
+
+	if (pt->sample_transactions &&
+	    (state->type & INTEL_PT_TRANSACTION)) {
+		err = intel_pt_synth_transaction_sample(ptq);
+		if (err)
+			return err;
+	}
+
+	if (!(state->type & INTEL_PT_BRANCH))
+		return 0;
+
+	if (pt->synth_opts.callchain)
+		thread_stack__event(ptq->thread, ptq->flags, state->from_ip,
+				    state->to_ip, ptq->insn_len,
+				    state->trace_nr);
+
+	if (pt->sample_branches) {
+		err = intel_pt_synth_branch_sample(ptq);
+		if (err)
+			return err;
+	}
+
+	if (!pt->sync_switch)
+		return 0;
+
+	if (intel_pt_is_switch_ip(ptq, state->to_ip)) {
+		switch (ptq->switch_state) {
+		case INTEL_PT_SS_UNKNOWN:
+		case INTEL_PT_SS_EXPECTING_SWITCH_IP:
+			err = intel_pt_next_tid(pt, ptq);
+			if (err)
+				return err;
+			ptq->switch_state = INTEL_PT_SS_TRACING;
+			break;
+		default:
+			ptq->switch_state = INTEL_PT_SS_EXPECTING_SWITCH_EVENT;
+			return 1;
+		}
+	} else if (!state->to_ip) {
+		ptq->switch_state = INTEL_PT_SS_NOT_TRACING;
+	} else if (ptq->switch_state == INTEL_PT_SS_NOT_TRACING) {
+		ptq->switch_state = INTEL_PT_SS_UNKNOWN;
+	} else if (ptq->switch_state == INTEL_PT_SS_UNKNOWN &&
+		   state->to_ip == pt->ptss_ip &&
+		   (ptq->flags & PERF_IP_FLAG_CALL)) {
+		ptq->switch_state = INTEL_PT_SS_TRACING;
+	}
+
+	return 0;
+}
+
+static u64 intel_pt_switch_ip(struct machine *machine, u64 *ptss_ip)
+{
+	struct map *map;
+	struct symbol *sym, *start;
+	u64 ip, switch_ip = 0;
+
+	if (ptss_ip)
+		*ptss_ip = 0;
+
+	map = machine__kernel_map(machine, MAP__FUNCTION);
+	if (!map)
+		return 0;
+
+	if (map__load(map, machine->symbol_filter))
+		return 0;
+
+	start = dso__first_symbol(map->dso, MAP__FUNCTION);
+
+	for (sym = start; sym; sym = dso__next_symbol(sym)) {
+		if (sym->binding == STB_GLOBAL &&
+		    !strcmp(sym->name, "__switch_to")) {
+			ip = map->unmap_ip(map, sym->start);
+			if (ip >= map->start && ip < map->end) {
+				switch_ip = ip;
+				break;
+			}
+		}
+	}
+
+	if (!switch_ip || !ptss_ip)
+		return 0;
+
+	for (sym = start; sym; sym = dso__next_symbol(sym)) {
+		if (!strcmp(sym->name, "perf_trace_sched_switch")) {
+			ip = map->unmap_ip(map, sym->start);
+			if (ip >= map->start && ip < map->end) {
+				*ptss_ip = ip;
+				break;
+			}
+		}
+	}
+
+	return switch_ip;
+}
+
+static int intel_pt_run_decoder(struct intel_pt_queue *ptq, u64 *timestamp)
+{
+	const struct intel_pt_state *state = ptq->state;
+	struct intel_pt *pt = ptq->pt;
+	int err;
+
+	if (!pt->kernel_start) {
+		pt->kernel_start = machine__kernel_start(pt->machine);
+		if (pt->per_cpu_mmaps && pt->have_sched_switch &&
+		    !pt->timeless_decoding && intel_pt_tracing_kernel(pt) &&
+		    !pt->sampling_mode) {
+			pt->switch_ip = intel_pt_switch_ip(pt->machine,
+							   &pt->ptss_ip);
+			if (pt->switch_ip) {
+				intel_pt_log("switch_ip: %"PRIx64" ptss_ip: %"PRIx64"\n",
+					     pt->switch_ip, pt->ptss_ip);
+				pt->sync_switch = true;
+				pt->est_tsc_orig = pt->est_tsc;
+				pt->est_tsc = false;
+			}
+		}
+	}
+
+	intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
+		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+	while (1) {
+		err = intel_pt_sample(ptq);
+		if (err)
+			return err;
+
+		state = intel_pt_decode(ptq->decoder);
+		if (state->err) {
+			if (state->err == INTEL_PT_ERR_NODATA)
+				return 1;
+			if (pt->sync_switch &&
+			    state->from_ip >= pt->kernel_start) {
+				pt->sync_switch = false;
+				pt->est_tsc = pt->est_tsc_orig;
+				intel_pt_next_tid(pt, ptq);
+			}
+			if (pt->synth_opts.errors) {
+				err = intel_pt_synth_error(pt, state->err,
+							   ptq->cpu, ptq->pid,
+							   ptq->tid,
+							   state->from_ip);
+				if (err)
+					return err;
+			}
+			continue;
+		}
+
+		ptq->state = state;
+		ptq->have_sample = true;
+		intel_pt_sample_flags(ptq);
+
+		/* Use estimated TSC upon return to user space */
+		if (pt->est_tsc) {
+			if (state->from_ip >= pt->kernel_start &&
+			    state->to_ip &&
+			    state->to_ip < pt->kernel_start)
+				ptq->timestamp = state->est_timestamp;
+			else if (state->timestamp > ptq->timestamp)
+				ptq->timestamp = state->timestamp;
+		/* Use estimated TSC in unknown switch state */
+		} else if (pt->sync_switch &&
+			   ptq->switch_state == INTEL_PT_SS_UNKNOWN &&
+			   state->to_ip == pt->switch_ip &&
+			   (ptq->flags & PERF_IP_FLAG_CALL) &&
+			   ptq->next_tid == -1) {
+			ptq->timestamp = state->est_timestamp;
+		} else if (state->timestamp > ptq->timestamp) {
+			ptq->timestamp = state->timestamp;
+		}
+
+		if (!pt->timeless_decoding && ptq->timestamp >= *timestamp) {
+			*timestamp = ptq->timestamp;
+			return 0;
+		}
+	}
+	return 0;
+}
+
+static inline int intel_pt_update_queues(struct intel_pt *pt)
+{
+	if (pt->queues.new_data) {
+		pt->queues.new_data = false;
+		return intel_pt_setup_queues(pt);
+	}
+	return 0;
+}
+
+static int intel_pt_process_queues(struct intel_pt *pt, u64 timestamp)
+{
+	unsigned int queue_nr;
+	u64 ts;
+	int ret;
+
+	while (1) {
+		struct auxtrace_queue *queue;
+		struct intel_pt_queue *ptq;
+
+		if (!pt->heap.heap_cnt)
+			return 0;
+
+		if (pt->heap.heap_array[0].ordinal >= timestamp)
+			return 0;
+
+		queue_nr = pt->heap.heap_array[0].queue_nr;
+		queue = &pt->queues.queue_array[queue_nr];
+		ptq = queue->priv;
+
+		intel_pt_log("queue %u processing 0x%" PRIx64 " to 0x%" PRIx64 "\n",
+			     queue_nr, pt->heap.heap_array[0].ordinal,
+			     timestamp);
+
+		auxtrace_heap__pop(&pt->heap);
+
+		if (pt->heap.heap_cnt) {
+			ts = pt->heap.heap_array[0].ordinal + 1;
+			if (ts > timestamp)
+				ts = timestamp;
+		} else {
+			ts = timestamp;
+		}
+
+		intel_pt_set_pid_tid_cpu(pt, queue);
+
+		ret = intel_pt_run_decoder(ptq, &ts);
+
+		if (ret < 0) {
+			auxtrace_heap__add(&pt->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&pt->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			ptq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int intel_pt_process_timeless_queues(struct intel_pt *pt, pid_t tid,
+					    u64 time_)
+{
+	struct auxtrace_queues *queues = &pt->queues;
+	unsigned int i;
+	u64 ts = 0;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &pt->queues.queue_array[i];
+		struct intel_pt_queue *ptq = queue->priv;
+
+		if (ptq && (tid == -1 || ptq->tid == tid)) {
+			ptq->time = time_;
+			intel_pt_set_pid_tid_cpu(pt, queue);
+			intel_pt_run_decoder(ptq, &ts);
+		}
+	}
+	return 0;
+}
+
+static int intel_pt_lost(struct intel_pt *pt, struct perf_sample *sample)
+{
+	return intel_pt_synth_error(pt, INTEL_PT_ERR_LOST, sample->cpu,
+				    sample->pid, sample->tid, 0);
+}
+
+static struct intel_pt_queue *intel_pt_cpu_to_ptq(struct intel_pt *pt, int cpu)
+{
+	unsigned i, j;
+
+	if (cpu < 0 || !pt->queues.nr_queues)
+		return NULL;
+
+	if ((unsigned)cpu >= pt->queues.nr_queues)
+		i = pt->queues.nr_queues - 1;
+	else
+		i = cpu;
+
+	if (pt->queues.queue_array[i].cpu == cpu)
+		return pt->queues.queue_array[i].priv;
+
+	for (j = 0; i > 0; j++) {
+		if (pt->queues.queue_array[--i].cpu == cpu)
+			return pt->queues.queue_array[i].priv;
+	}
+
+	for (; j < pt->queues.nr_queues; j++) {
+		if (pt->queues.queue_array[j].cpu == cpu)
+			return pt->queues.queue_array[j].priv;
+	}
+
+	return NULL;
+}
+
+static int intel_pt_process_switch(struct intel_pt *pt,
+				   struct perf_sample *sample)
+{
+	struct intel_pt_queue *ptq;
+	struct perf_evsel *evsel;
+	pid_t tid;
+	int cpu, err;
+
+	evsel = perf_evlist__id2evsel(pt->session->evlist, sample->id);
+	if (evsel != pt->switch_evsel)
+		return 0;
+
+	tid = perf_evsel__intval(evsel, sample, "next_pid");
+	cpu = sample->cpu;
+
+	intel_pt_log("sched_switch: cpu %d tid %d time %"PRIu64" tsc %#"PRIx64"\n",
+		     cpu, tid, sample->time, perf_time_to_tsc(sample->time,
+		     &pt->tc));
+
+	if (!pt->sync_switch)
+		goto out;
+
+	ptq = intel_pt_cpu_to_ptq(pt, cpu);
+	if (!ptq)
+		goto out;
+
+	switch (ptq->switch_state) {
+	case INTEL_PT_SS_NOT_TRACING:
+		ptq->next_tid = -1;
+		break;
+	case INTEL_PT_SS_UNKNOWN:
+	case INTEL_PT_SS_TRACING:
+		ptq->next_tid = tid;
+		ptq->switch_state = INTEL_PT_SS_EXPECTING_SWITCH_IP;
+		return 0;
+	case INTEL_PT_SS_EXPECTING_SWITCH_EVENT:
+		if (!ptq->on_heap) {
+			ptq->timestamp = perf_time_to_tsc(sample->time,
+							  &pt->tc);
+			err = auxtrace_heap__add(&pt->heap, ptq->queue_nr,
+						 ptq->timestamp);
+			if (err)
+				return err;
+			ptq->on_heap = true;
+		}
+		ptq->switch_state = INTEL_PT_SS_TRACING;
+		break;
+	case INTEL_PT_SS_EXPECTING_SWITCH_IP:
+		ptq->next_tid = tid;
+		intel_pt_log("ERROR: cpu %d expecting switch ip\n", cpu);
+		break;
+	default:
+		break;
+	}
+out:
+	return machine__set_current_tid(pt->machine, cpu, -1, tid);
+}
+
+static int intel_pt_process_itrace_start(struct intel_pt *pt,
+					 union perf_event *event,
+					 struct perf_sample *sample)
+{
+	if (!pt->per_cpu_mmaps)
+		return 0;
+
+	intel_pt_log("itrace_start: cpu %d pid %d tid %d time %"PRIu64" tsc %#"PRIx64"\n",
+		     sample->cpu, event->itrace_start.pid,
+		     event->itrace_start.tid, sample->time,
+		     perf_time_to_tsc(sample->time, &pt->tc));
+
+	return machine__set_current_tid(pt->machine, sample->cpu,
+					event->itrace_start.pid,
+					event->itrace_start.tid);
+}
+
+static int intel_pt_process_event(struct perf_session *session,
+				  union perf_event *event,
+				  struct perf_sample *sample,
+				  struct perf_tool *tool)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+	u64 timestamp;
+	int err = 0;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("Intel Processor Trace requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time)
+		timestamp = perf_time_to_tsc(sample->time, &pt->tc);
+	else
+		timestamp = 0;
+
+	if (timestamp || pt->timeless_decoding) {
+		err = intel_pt_update_queues(pt);
+		if (err)
+			return err;
+	}
+
+	if (pt->timeless_decoding) {
+		if (event->header.type == PERF_RECORD_EXIT) {
+			err = intel_pt_process_timeless_queues(pt,
+							       event->comm.tid,
+							       sample->time);
+		}
+	} else if (timestamp) {
+		err = intel_pt_process_queues(pt, timestamp);
+	}
+	if (err)
+		return err;
+
+	if (event->header.type == PERF_RECORD_AUX &&
+	    (event->aux.flags & PERF_AUX_FLAG_TRUNCATED) &&
+	    pt->synth_opts.errors)
+		err = intel_pt_lost(pt, sample);
+
+	if (pt->switch_evsel && event->header.type == PERF_RECORD_SAMPLE)
+		err = intel_pt_process_switch(pt, sample);
+	else if (event->header.type == PERF_RECORD_ITRACE_START)
+		err = intel_pt_process_itrace_start(pt, event, sample);
+
+	return err;
+}
+
+static int intel_pt_flush(struct perf_session *session, struct perf_tool *tool)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+	int ret;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = intel_pt_update_queues(pt);
+	if (ret < 0)
+		return ret;
+
+	if (pt->timeless_decoding)
+		return intel_pt_process_timeless_queues(pt, -1,
+							MAX_TIMESTAMP - 1);
+
+	return intel_pt_process_queues(pt, MAX_TIMESTAMP);
+}
+
+static void intel_pt_free_events(struct perf_session *session)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+	struct auxtrace_queues *queues = &pt->queues;
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		intel_pt_free_queue(queues->queue_array[i].priv);
+		queues->queue_array[i].priv = NULL;
+	}
+	intel_pt_log_disable();
+	auxtrace_queues__free(queues);
+}
+
+static void intel_pt_free(struct perf_session *session)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+
+	auxtrace_heap__free(&pt->heap);
+	intel_pt_free_events(session);
+	session->auxtrace = NULL;
+	thread__delete(pt->unknown_thread);
+	free(pt);
+}
+
+static int intel_pt_process_auxtrace_event(struct perf_session *session,
+					   union perf_event *event,
+					   struct perf_tool *tool __maybe_unused)
+{
+	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
+					   auxtrace);
+
+	if (pt->sampling_mode)
+		return 0;
+
+	if (!pt->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data_file__fd(session->file);
+		int err;
+
+		if (perf_data_file__is_pipe(session->file)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&pt->queues, session, event,
+						 data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				intel_pt_dump_event(pt, buffer->data,
+						    buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
+		}
+	}
+
+	return 0;
+}
+
+struct intel_pt_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int intel_pt_event_synth(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample __maybe_unused,
+				struct machine *machine __maybe_unused)
+{
+	struct intel_pt_synth *intel_pt_synth =
+			container_of(tool, struct intel_pt_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(intel_pt_synth->session, event,
+						 NULL);
+}
+
+static int intel_pt_synth_event(struct perf_session *session,
+				struct perf_event_attr *attr, u64 id)
+{
+	struct intel_pt_synth intel_pt_synth;
+
+	memset(&intel_pt_synth, 0, sizeof(struct intel_pt_synth));
+	intel_pt_synth.session = session;
+
+	return perf_event__synthesize_attr(&intel_pt_synth.dummy_tool, attr, 1,
+					   &id, intel_pt_event_synth);
+}
+
+static int intel_pt_synth_events(struct intel_pt *pt,
+				 struct perf_session *session)
+{
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == pt->pmu_type && evsel->ids) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("There are no selected events with Intel Processor Trace data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+			    PERF_SAMPLE_PERIOD;
+	if (pt->timeless_decoding)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	else
+		attr.sample_type |= PERF_SAMPLE_TIME;
+	if (!pt->per_cpu_mmaps)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_CPU;
+	attr.exclude_user = evsel->attr.exclude_user;
+	attr.exclude_kernel = evsel->attr.exclude_kernel;
+	attr.exclude_hv = evsel->attr.exclude_hv;
+	attr.exclude_host = evsel->attr.exclude_host;
+	attr.exclude_guest = evsel->attr.exclude_guest;
+	attr.sample_id_all = evsel->attr.sample_id_all;
+	attr.read_format = evsel->attr.read_format;
+
+	id = evsel->id[0] + 1000000000;
+	if (!id)
+		id = 1;
+
+	if (pt->synth_opts.instructions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		if (pt->synth_opts.period_type == PERF_ITRACE_PERIOD_NANOSECS)
+			attr.sample_period =
+				intel_pt_ns_to_ticks(pt, pt->synth_opts.period);
+		else
+			attr.sample_period = pt->synth_opts.period;
+		pt->instructions_sample_period = attr.sample_period;
+		if (pt->synth_opts.callchain)
+			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
+		pr_debug("Synthesizing 'instructions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'instructions' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_instructions = true;
+		pt->instructions_sample_type = attr.sample_type;
+		pt->instructions_id = id;
+		id += 1;
+	}
+
+	if (pt->synth_opts.transactions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		attr.sample_period = 1;
+		if (pt->synth_opts.callchain)
+			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
+		pr_debug("Synthesizing 'transactions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'transactions' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_transactions = true;
+		pt->transactions_id = id;
+		id += 1;
+		evlist__for_each(evlist, evsel) {
+			if (evsel->id && evsel->id[0] == pt->transactions_id) {
+				if (evsel->name)
+					zfree(&evsel->name);
+				evsel->name = strdup("transactions");
+				break;
+			}
+		}
+	}
+
+	if (pt->synth_opts.branches) {
+		attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
+		attr.sample_period = 1;
+		attr.sample_type |= PERF_SAMPLE_ADDR;
+		attr.sample_type &= ~(u64)PERF_SAMPLE_CALLCHAIN;
+		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'branches' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_branches = true;
+		pt->branches_sample_type = attr.sample_type;
+		pt->branches_id = id;
+	}
+
+	pt->synth_needs_swap = evsel->needs_swap;
+
+	return 0;
+}
+
+static struct perf_evsel *intel_pt_find_sched_switch(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+
+	evlist__for_each_reverse(evlist, evsel) {
+		const char *name = perf_evsel__name(evsel);
+
+		if (!strcmp(name, "sched:sched_switch"))
+			return evsel;
+	}
+
+	return NULL;
+}
+
+static const char * const intel_pt_info_fmts[] = {
+	[INTEL_PT_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
+	[INTEL_PT_TIME_SHIFT]		= "  Time Shift         %"PRIu64"\n",
+	[INTEL_PT_TIME_MULT]		= "  Time Muliplier     %"PRIu64"\n",
+	[INTEL_PT_TIME_ZERO]		= "  Time Zero          %"PRIu64"\n",
+	[INTEL_PT_CAP_USER_TIME_ZERO]	= "  Cap Time Zero      %"PRId64"\n",
+	[INTEL_PT_TSC_BIT]		= "  TSC bit            %#"PRIx64"\n",
+	[INTEL_PT_NORETCOMP_BIT]	= "  NoRETComp bit      %#"PRIx64"\n",
+	[INTEL_PT_HAVE_SCHED_SWITCH]	= "  Have sched_switch  %"PRId64"\n",
+	[INTEL_PT_SNAPSHOT_MODE]	= "  Snapshot mode      %"PRId64"\n",
+	[INTEL_PT_PER_CPU_MMAPS]	= "  Per-cpu maps       %"PRId64"\n",
+};
+
+static void intel_pt_print_info(u64 *arr, int start, int finish)
+{
+	int i;
+
+	if (!dump_trace)
+		return;
+
+	for (i = start; i <= finish; i++)
+		fprintf(stdout, intel_pt_info_fmts[i], arr[i]);
+}
+
+int intel_pt_process_auxtrace_info(union perf_event *event,
+				   struct perf_session *session)
+{
+	struct auxtrace_info_event *auxtrace_info = &event->auxtrace_info;
+	size_t min_sz = sizeof(u64) * INTEL_PT_PER_CPU_MMAPS;
+	struct intel_pt *pt;
+	int err;
+
+	if (auxtrace_info->header.size < sizeof(struct auxtrace_info_event) +
+					min_sz)
+		return -EINVAL;
+
+	pt = zalloc(sizeof(struct intel_pt));
+	if (!pt)
+		return -ENOMEM;
+
+	err = auxtrace_queues__init(&pt->queues);
+	if (err)
+		goto err_free;
+
+	intel_pt_log_set_name(INTEL_PT_PMU_NAME);
+
+	pt->session = session;
+	pt->machine = &session->machines.host; /* No kvm support */
+	pt->auxtrace_type = auxtrace_info->type;
+	pt->pmu_type = auxtrace_info->priv[INTEL_PT_PMU_TYPE];
+	pt->tc.time_shift = auxtrace_info->priv[INTEL_PT_TIME_SHIFT];
+	pt->tc.time_mult = auxtrace_info->priv[INTEL_PT_TIME_MULT];
+	pt->tc.time_zero = auxtrace_info->priv[INTEL_PT_TIME_ZERO];
+	pt->cap_user_time_zero = auxtrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO];
+	pt->tsc_bit = auxtrace_info->priv[INTEL_PT_TSC_BIT];
+	pt->noretcomp_bit = auxtrace_info->priv[INTEL_PT_NORETCOMP_BIT];
+	pt->have_sched_switch = auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH];
+	pt->snapshot_mode = auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE];
+	pt->per_cpu_mmaps = auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS];
+	intel_pt_print_info(&auxtrace_info->priv[0], INTEL_PT_PMU_TYPE,
+			    INTEL_PT_PER_CPU_MMAPS);
+
+	pt->timeless_decoding = intel_pt_timeless_decoding(pt);
+	pt->have_tsc = intel_pt_have_tsc(pt);
+	pt->sampling_mode = false;
+	pt->est_tsc = pt->per_cpu_mmaps && !pt->timeless_decoding;
+
+	pt->unknown_thread = thread__new(999999999, 999999999);
+	if (!pt->unknown_thread) {
+		err = -ENOMEM;
+		goto err_free_queues;
+	}
+	err = thread__set_comm(pt->unknown_thread, "unknown", 0);
+	if (err)
+		goto err_delete_thread;
+	if (thread__init_map_groups(pt->unknown_thread, pt->machine)) {
+		err = -ENOMEM;
+		goto err_delete_thread;
+	}
+
+	pt->auxtrace.process_event = intel_pt_process_event;
+	pt->auxtrace.process_auxtrace_event = intel_pt_process_auxtrace_event;
+	pt->auxtrace.flush_events = intel_pt_flush;
+	pt->auxtrace.free_events = intel_pt_free_events;
+	pt->auxtrace.free = intel_pt_free;
+	session->auxtrace = &pt->auxtrace;
+
+	if (dump_trace)
+		return 0;
+
+	if (pt->have_sched_switch == 1) {
+		pt->switch_evsel = intel_pt_find_sched_switch(session->evlist);
+		if (!pt->switch_evsel) {
+			pr_err("%s: missing sched_switch event\n", __func__);
+			goto err_delete_thread;
+		}
+	}
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set) {
+		pt->synth_opts = *session->itrace_synth_opts;
+	} else {
+		itrace_synth_opts__set_default(&pt->synth_opts);
+		if (use_browser != -1) {
+			pt->synth_opts.branches = false;
+			pt->synth_opts.callchain = true;
+		}
+	}
+
+	if (pt->synth_opts.log)
+		intel_pt_log_enable();
+
+	if (pt->synth_opts.calls)
+		pt->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
+				       PERF_IP_FLAG_TRACE_END;
+	if (pt->synth_opts.returns)
+		pt->branches_filter |= PERF_IP_FLAG_RETURN |
+				       PERF_IP_FLAG_TRACE_BEGIN;
+
+	if (pt->synth_opts.callchain && !symbol_conf.use_callchain) {
+		symbol_conf.use_callchain = true;
+		if (callchain_register_param(&callchain_param) < 0) {
+			symbol_conf.use_callchain = false;
+			pt->synth_opts.callchain = false;
+		}
+	}
+
+	err = intel_pt_synth_events(pt, session);
+	if (err)
+		goto err_delete_thread;
+
+	err = auxtrace_queues__process_index(&pt->queues, session);
+	if (err)
+		goto err_delete_thread;
+
+	if (pt->queues.populated)
+		pt->data_queued = true;
+
+	if (pt->timeless_decoding)
+		pr_debug2("Intel PT decoding without timestamps\n");
+
+	return 0;
+
+err_delete_thread:
+	thread__delete(pt->unknown_thread);
+err_free_queues:
+	intel_pt_log_disable();
+	auxtrace_queues__free(&pt->queues);
+	session->auxtrace = NULL;
+err_free:
+	free(pt);
+	return err;
+}
diff --git a/tools/perf/util/intel-pt.h b/tools/perf/util/intel-pt.h
new file mode 100644
index 0000000..a1bfe93
--- /dev/null
+++ b/tools/perf/util/intel-pt.h
@@ -0,0 +1,51 @@
+/*
+ * intel_pt.h: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__PERF_INTEL_PT_H__
+#define INCLUDE__PERF_INTEL_PT_H__
+
+#define INTEL_PT_PMU_NAME "intel_pt"
+
+enum {
+	INTEL_PT_PMU_TYPE,
+	INTEL_PT_TIME_SHIFT,
+	INTEL_PT_TIME_MULT,
+	INTEL_PT_TIME_ZERO,
+	INTEL_PT_CAP_USER_TIME_ZERO,
+	INTEL_PT_TSC_BIT,
+	INTEL_PT_NORETCOMP_BIT,
+	INTEL_PT_HAVE_SCHED_SWITCH,
+	INTEL_PT_SNAPSHOT_MODE,
+	INTEL_PT_PER_CPU_MMAPS,
+	INTEL_PT_AUXTRACE_PRIV_MAX,
+};
+
+#define INTEL_PT_AUXTRACE_PRIV_SIZE (INTEL_PT_AUXTRACE_PRIV_MAX * sizeof(u64))
+
+struct auxtrace_record;
+struct perf_tool;
+union perf_event;
+struct perf_session;
+struct perf_event_attr;
+struct perf_pmu;
+
+struct auxtrace_record *intel_pt_recording_init(int *err);
+
+int intel_pt_process_auxtrace_info(union perf_event *event,
+				   struct perf_session *session);
+
+struct perf_event_attr *intel_pt_pmu_default_config(struct perf_pmu *pmu);
+
+#endif
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 09/17] perf tools: Take Intel PT into use
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (7 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 08/17] perf tools: Add Intel PT support Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 10/17] perf tools: Allow auxtrace data alignment Adrian Hunter
                   ` (7 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

To record an AUX area, the weak function auxtrace_record__init() must be
implemented.

Equally to decode an AUX area, the AUX area tracing type must be added
to the perf_event__process_auxtrace_info() function.

This patch makes those two changes plus hooks up default config for the
intel_pt PMU.  Also some brief documentation is provided for using the
tools with intel_pt.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/intel-pt.txt | 588 ++++++++++++++++++++++++++++++++++
 tools/perf/arch/x86/util/Build        |   2 +
 tools/perf/arch/x86/util/auxtrace.c   |  38 +++
 tools/perf/arch/x86/util/pmu.c        |  15 +
 tools/perf/util/auxtrace.c            |   5 +-
 tools/perf/util/pmu.c                 |   4 +-
 6 files changed, 649 insertions(+), 3 deletions(-)
 create mode 100644 tools/perf/Documentation/intel-pt.txt
 create mode 100644 tools/perf/arch/x86/util/auxtrace.c
 create mode 100644 tools/perf/arch/x86/util/pmu.c

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
new file mode 100644
index 0000000..2866b62
--- /dev/null
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -0,0 +1,588 @@
+Intel Processor Trace
+=====================
+
+Overview
+========
+
+Intel Processor Trace (Intel PT) is an extension of Intel Architecture that
+collects information about software execution such as control flow, execution
+modes and timings and formats it into highly compressed binary packets.
+Technical details are documented in the Intel 64 and IA-32 Architectures
+Software Developer Manuals, Chapter 36 Intel Processor Trace.
+
+Intel PT is first supported in Intel Core M and 5th generation Intel Core
+processors that are based on the Intel micro-architecture code name Broadwell.
+
+Trace data is collected by 'perf record' and stored within the perf.data file.
+See below for options to 'perf record'.
+
+Trace data must be 'decoded' which involves walking the object code and matching
+the trace data packets. For example a TNT packet only tells whether a
+conditional branch was taken or not taken, so to make use of that packet the
+decoder must know precisely which instruction was being executed.
+
+Decoding is done on-the-fly.  The decoder outputs samples in the same format as
+samples output by perf hardware events, for example as though the "instructions"
+or "branches" events had been recorded.  Presently 3 tools support this:
+'perf script', 'perf report' and 'perf inject'.  See below for more information
+on using those tools.
+
+The main distinguishing feature of Intel PT is that the decoder can determine
+the exact flow of software execution.  Intel PT can be used to understand why
+and how did software get to a certain point, or behave a certain way.  The
+software does not have to be recompiled, so Intel PT works with debug or release
+builds, however the executed images are needed - which makes use in JIT-compiled
+environments, or with self-modified code, a challenge.  Also symbols need to be
+provided to make sense of addresses.
+
+A limitation of Intel PT is that it produces huge amounts of trace data
+(hundreds of megabytes per second per core) which takes a long time to decode,
+for example two or three orders of magnitude longer than it took to collect.
+Another limitation is the performance impact of tracing, something that will
+vary depending on the use-case and architecture.
+
+
+Quickstart
+==========
+
+It is important to start small.  That is because it is easy to capture vastly
+more data than can possibly be processed.
+
+The simplest thing to do with Intel PT is userspace profiling of small programs.
+Data is captured with 'perf record' e.g. to trace 'ls' userspace-only:
+
+	perf record -e intel_pt//u ls
+
+And profiled with 'perf report' e.g.
+
+	perf report
+
+To also trace kernel space presents a problem, namely kernel self-modifying
+code.  A fairly good kernel image is available in /proc/kcore but to get an
+accurate image a copy of /proc/kcore needs to be made under the same conditions
+as the data capture.  A script perf-with-kcore can do that, but beware that the
+script makes use of 'sudo' to copy /proc/kcore.  If you have perf installed
+locally from the source tree you can do:
+
+	~/libexec/perf-core/perf-with-kcore record pt_ls -e intel_pt// -- ls
+
+which will create a directory named 'pt_ls' and put the perf.data file and
+copies of /proc/kcore, /proc/kallsyms and /proc/modules into it.  Then to use
+'perf report' becomes:
+
+	~/libexec/perf-core/perf-with-kcore report pt_ls
+
+Because samples are synthesized after-the-fact, the sampling period can be
+selected for reporting. e.g. sample every microsecond
+
+	~/libexec/perf-core/perf-with-kcore report pt_ls --itrace=i1usge
+
+See the sections below for more information about the --itrace option.
+
+Beware the smaller the period, the more samples that are produced, and the
+longer it takes to process them.
+
+Also note that the coarseness of Intel PT timing information will start to
+distort the statistical value of the sampling as the sampling period becomes
+smaller.
+
+To represent software control flow, "branches" samples are produced.  By default
+a branch sample is synthesized for every single branch.  To get an idea what
+data is available you can use the 'perf script' tool with no parameters, which
+will list all the samples.
+
+	perf record -e intel_pt//u ls
+	perf script
+
+An interesting field that is not printed by default is 'flags' which can be
+displayed as follows:
+
+	perf script -Fcomm,tid,pid,time,cpu,event,trace,ip,sym,dso,addr,symoff,flags
+
+The flags are "bcrosyiABEx" which stand for branch, call, return, conditional,
+system, asynchronous, interrupt, transaction abort, trace begin, trace end, and
+in transaction, respectively.
+
+While it is possible to create scripts to analyze the data, an alternative
+approach is available to export the data to a postgresql database.  Refer to
+script export-to-postgresql.py for more details, and to script
+call-graph-from-postgresql.py for an example of using the database.
+
+As mentioned above, it is easy to capture too much data.  One way to limit the
+data captured is to use 'snapshot' mode which is explained further below.
+Refer to 'new snapshot option' and 'Intel PT modes of operation' further below.
+
+Another problem that will be experienced is decoder errors.  They can be caused
+by inability to access the executed image, self-modified or JIT-ed code, or the
+inability to match side-band information (such as context switches and mmaps)
+which results in the decoder not knowing what code was executed.
+
+There is also the problem of perf not being able to copy the data fast enough,
+resulting in data lost because the buffer was full.  See 'Buffer handling' below
+for more details.
+
+
+perf record
+===========
+
+new event
+---------
+
+The Intel PT kernel driver creates a new PMU for Intel PT.  PMU events are
+selected by providing the PMU name followed by the "config" separated by slashes.
+An enhancement has been made to allow default "config" e.g. the option
+
+	-e intel_pt//
+
+will use a default config value.  Currently that is the same as
+
+	-e intel_pt/tsc,noretcomp=0/
+
+which is the same as
+
+	-e intel_pt/tsc=1,noretcomp=0/
+
+The config terms are listed in /sys/devices/intel_pt/format.  They are bit
+fields within the config member of the struct perf_event_attr which is
+passed to the kernel by the perf_event_open system call.  They correspond to bit
+fields in the IA32_RTIT_CTL MSR.  Here is a list of them and their definitions:
+
+	$ for f in `ls /sys/devices/intel_pt/format`;do
+	> echo $f
+	> cat /sys/devices/intel_pt/format/$f
+	> done
+	noretcomp
+	config:11
+	tsc
+	config:10
+
+Note that the default config must be overridden for each term i.e.
+
+	-e intel_pt/noretcomp=0/
+
+is the same as:
+
+	-e intel_pt/tsc=1,noretcomp=0/
+
+So, to disable TSC packets use:
+
+	-e intel_pt/tsc=0/
+
+It is also possible to specify the config value explicitly:
+
+	-e intel_pt/config=0x400/
+
+Note that, as with all events, the event is suffixed with event modifiers:
+
+	u	userspace
+	k	kernel
+	h	hypervisor
+	G	guest
+	H	host
+	p	precise ip
+
+'h', 'G' and 'H' are for virtualization which is not supported by Intel PT.
+'p' is also not relevant to Intel PT.  So only options 'u' and 'k' are
+meaningful for Intel PT.
+
+perf_event_attr is displayed if the -vv option is used e.g.
+
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             6
+	size                             112
+	config                           0x400
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|CPU|IDENTIFIER
+	read_format                      ID
+	disabled                         1
+	inherit                          1
+	exclude_kernel                   1
+	exclude_hv                       1
+	enable_on_exec                   1
+	sample_id_all                    1
+	------------------------------------------------------------
+	sys_perf_event_open: pid 31104  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 3  group_fd -1  flags 0x8
+	------------------------------------------------------------
+
+
+new snapshot option
+-------------------
+
+To select snapshot mode a new option has been added:
+
+	-S
+
+Optionally it can be followed by the snapshot size e.g.
+
+	-S0x100000
+
+The default snapshot size is the auxtrace mmap size.  If neither auxtrace mmap size
+nor snapshot size is specified, then the default is 4MiB for privileged users
+(or if /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for unprivileged users.
+If an unprivileged user does not specify mmap pages, the mmap pages will be
+reduced as described in the 'new auxtrace mmap size option' section below.
+
+The snapshot size is displayed if the option -vv is used e.g.
+
+	Intel PT snapshot size: %zu
+
+
+new auxtrace mmap size option
+---------------------------
+
+Intel PT buffer size is specified by an addition to the -m option e.g.
+
+	-m,16
+
+selects a buffer size of 16 pages i.e. 64KiB.
+
+Note that the existing functionality of -m is unchanged.  The auxtrace mmap size
+is specified by the optional addition of a comma and the value.
+
+The default auxtrace mmap size for Intel PT is 4MiB/page_size for privileged users
+(or if /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for unprivileged users.
+If an unprivileged user does not specify mmap pages, the mmap pages will be
+reduced from the default 512KiB/page_size to 256KiB/page_size, otherwise the
+user is likely to get an error as they exceed their mlock limit (Max locked
+memory as shown in /proc/self/limits).  Note that perf does not count the first
+512KiB (actually /proc/sys/kernel/perf_event_mlock_kb minus 1 page) per cpu
+against the mlock limit so an unprivileged user is allowed 512KiB per cpu plus
+their mlock limit (which defaults to 64KiB but is not multiplied by the number
+of cpus).
+
+In full-trace mode, powers of two are allowed for buffer size, with a minimum
+size of 2 pages.  In snapshot mode, it is the same but the minimum size is
+1 page.
+
+The mmap size and auxtrace mmap size are displayed if the -vv option is used e.g.
+
+	mmap length 528384
+	auxtrace mmap length 4198400
+
+
+Intel PT modes of operation
+---------------------------
+
+Intel PT can be used in 2 modes:
+	full-trace mode
+	snapshot mode
+
+Full-trace mode traces continuously e.g.
+
+	perf record -e intel_pt//u uname
+
+Snapshot mode captures the available data when a signal is sent e.g.
+
+	perf record -v -e intel_pt//u -S ./loopy 1000000000 &
+	[1] 11435
+	kill -USR2 11435
+	Recording AUX area tracing snapshot
+
+Note that the signal sent is SIGUSR2.
+Note that "Recording AUX area tracing snapshot" is displayed because the -v
+option is used.
+
+The 2 modes cannot be used together.
+
+
+Buffer handling
+---------------
+
+There may be buffer limitations (i.e. single ToPa entry) which means that actual
+buffer sizes are limited to powers of 2 up to 4MiB (MAX_ORDER).  In order to
+provide other sizes, and in particular an arbitrarily large size, multiple
+buffers are logically concatenated.  However an interrupt must be used to switch
+between buffers.  That has two potential problems:
+	a) the interrupt may not be handled in time so that the current buffer
+	becomes full and some trace data is lost.
+	b) the interrupts may slow the system and affect the performance
+	results.
+
+If trace data is lost, the driver sets 'truncated' in the PERF_RECORD_AUX event
+which the tools report as an error.
+
+In full-trace mode, the driver waits for data to be copied out before allowing
+the (logical) buffer to wrap-around.  If data is not copied out quickly enough,
+again 'truncated' is set in the PERF_RECORD_AUX event.  If the driver has to
+wait, the intel_pt event gets disabled.  Because it is difficult to know when
+that happens, perf tools always re-enable the intel_pt event after copying out
+data.
+
+
+Intel PT and build ids
+----------------------
+
+By default "perf record" post-processes the event stream to find all build ids
+for executables for all addresses sampled.  Deliberately, Intel PT is not
+decoded for that purpose (it would take too long).  Instead the build ids for
+all executables encountered (due to mmap, comm or task events) are included
+in the perf.data file.
+
+To see buildids included in the perf.data file use the command:
+
+	perf buildid-list
+
+If the perf.data file contains Intel PT data, that is the same as:
+
+	perf buildid-list --with-hits
+
+
+Snapshot mode and event disabling
+---------------------------------
+
+In order to make a snapshot, the intel_pt event is disabled using an IOCTL,
+namely PERF_EVENT_IOC_DISABLE.  However doing that can also disable the
+collection of side-band information.  In order to prevent that,  a dummy
+software event has been introduced that permits tracking events (like mmaps) to
+continue to be recorded while intel_pt is disabled.  That is important to ensure
+there is complete side-band information to allow the decoding of subsequent
+snapshots.
+
+A test has been created for that.  To find the test:
+
+	perf test list
+	...
+	23: Test using a dummy software event to keep tracking
+
+To run the test:
+
+	perf test 23
+	23: Test using a dummy software event to keep tracking     : Ok
+
+
+perf record modes (nothing new here)
+------------------------------------
+
+perf record essentially operates in one of three modes:
+	per thread
+	per cpu
+	workload only
+
+"per thread" mode is selected by -t or by --per-thread (with -p or -u or just a
+workload).
+"per cpu" is selected by -C or -a.
+"workload only" mode is selected by not using the other options but providing a
+command to run (i.e. the workload).
+
+In per-thread mode an exact list of threads is traced.  There is no inheritance.
+Each thread has its own event buffer.
+
+In per-cpu mode all processes (or processes from the selected cgroup i.e. -G
+option, or processes selected with -p or -u) are traced.  Each cpu has its own
+buffer. Inheritance is allowed.
+
+In workload-only mode, the workload is traced but with per-cpu buffers.
+Inheritance is allowed.  Note that you can now trace a workload in per-thread
+mode by using the --per-thread option.
+
+
+Privileged vs non-privileged users
+----------------------------------
+
+Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users
+have memory limits imposed upon them.  That affects what buffer sizes they can
+have as outlined above.
+
+Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users are
+not permitted to use tracepoints which means there is insufficient side-band
+information to decode Intel PT in per-cpu mode, and potentially workload-only
+mode too if the workload creates new processes.
+
+Note also, that to use tracepoints, read-access to debugfs is required.  So if
+debugfs is not mounted or the user does not have read-access, it will again not
+be possible to decode Intel PT in per-cpu mode.
+
+
+sched_switch tracepoint
+-----------------------
+
+The sched_switch tracepoint is used to provide side-band data for Intel PT
+decoding.  sched_switch events are automatically added. e.g. the second event
+shown below
+
+	$ perf record -vv -e intel_pt//u uname
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             6
+	size                             112
+	config                           0x400
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|CPU|IDENTIFIER
+	read_format                      ID
+	disabled                         1
+	inherit                          1
+	exclude_kernel                   1
+	exclude_hv                       1
+	enable_on_exec                   1
+	sample_id_all                    1
+	------------------------------------------------------------
+	sys_perf_event_open: pid 31104  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 3  group_fd -1  flags 0x8
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             2
+	size                             112
+	config                           0x108
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|CPU|PERIOD|RAW|IDENTIFIER
+	read_format                      ID
+	inherit                          1
+	sample_id_all                    1
+	exclude_guest                    1
+	------------------------------------------------------------
+	sys_perf_event_open: pid -1  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid -1  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid -1  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid -1  cpu 3  group_fd -1  flags 0x8
+	------------------------------------------------------------
+	perf_event_attr:
+	type                             1
+	size                             112
+	config                           0x9
+	{ sample_period, sample_freq }   1
+	sample_type                      IP|TID|TIME|IDENTIFIER
+	read_format                      ID
+	disabled                         1
+	inherit                          1
+	exclude_kernel                   1
+	exclude_hv                       1
+	mmap                             1
+	comm                             1
+	enable_on_exec                   1
+	task                             1
+	sample_id_all                    1
+	mmap2                            1
+	comm_exec                        1
+	------------------------------------------------------------
+	sys_perf_event_open: pid 31104  cpu 0  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 1  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 2  group_fd -1  flags 0x8
+	sys_perf_event_open: pid 31104  cpu 3  group_fd -1  flags 0x8
+	mmap size 528384B
+	AUX area mmap length 4194304
+	perf event ring buffer mmapped per cpu
+	Synthesizing auxtrace information
+	Linux
+	[ perf record: Woken up 1 times to write data ]
+	[ perf record: Captured and wrote 0.042 MB perf.data ]
+
+Note, the sched_switch event is only added if the user is permitted to use it
+and only in per-cpu mode.
+
+Note also, the sched_switch event is only added if TSC packets are requested.
+That is because, in the absence of timing information, the sched_switch events
+cannot be matched against the Intel PT trace.
+
+
+perf script
+===========
+
+By default, perf script will decode trace data found in the perf.data file.
+This can be further controlled by new option --itrace.
+
+
+New --itrace option
+-------------------
+
+Having no option is the same as
+
+	--itrace
+
+which, in turn, is the same as
+
+	--itrace=ibxe
+
+The letters are:
+
+	i	synthesize "instructions" events
+	b	synthesize "branches" events
+	x	synthesize "transactions" events
+	c	synthesize branches events (calls only)
+	r	synthesize branches events (returns only)
+	e	synthesize tracing error events
+	d	create a debug log
+	g	synthesize a call chain (use with i or x)
+
+"Instructions" events look like they were recorded by "perf record -e
+instructions".
+
+"Branches" events look like they were recorded by "perf record -e branches". "c"
+and "r" can be combined to get calls and returns.
+
+"Transactions" events correspond to the start or end of transactions. The
+'flags' field can be used in perf script to determine whether the event is a
+tranasaction start, commit or abort.
+
+Error events are new.  They show where the decoder lost the trace.  Error events
+are quite important.  Users must know if what they are seeing is a complete
+picture or not.
+
+The "d" option will cause the creation of a file "intel_pt.log" containing all
+decoded packets and instructions.  Note that this option slows down the decoder
+and that the resulting file may be very large.
+
+In addition, the period of the "instructions" event can be specified. e.g.
+
+	--itrace=i10us
+
+sets the period to 10us i.e. one  instruction sample is synthesized for each 10
+microseconds of trace.  Alternatives to "us" are "ms" (milliseconds),
+"ns" (nanoseconds), "t" (TSC ticks) or "i" (instructions).
+
+"ms", "us" and "ns" are converted to TSC ticks.
+
+The timing information included with Intel PT does not give the time of every
+instruction.  Consequently, for the purpose of sampling, the decoder estimates
+the time since the last timing packet based on 1 tick per instruction.  The time
+on the sample is *not* adjusted and reflects the last known value of TSC.
+
+For Intel PT, the default period is 100us.
+
+Also the call chain size (default 16, max. 1024) for instructions or
+transactions events can be specified. e.g.
+
+	--itrace=ig32
+	--itrace=xg32
+
+To disable trace decoding entirely, use the option --no-itrace.
+
+
+dump option
+-----------
+
+perf script has an option (-D) to "dump" the events i.e. display the binary
+data.
+
+When -D is used, Intel PT packets are displayed.  The packet decoder does not
+pay attention to PSB packets, but just decodes the bytes - so the packets seen
+by the actual decoder may not be identical in places where the data is corrupt.
+One example of that would be when the buffer-switching interrupt has been too
+slow, and the buffer has been filled completely.  In that case, the last packet
+in the buffer might be truncated and immediately followed by a PSB as the trace
+continues in the next buffer.
+
+To disable the display of Intel PT packets, combine the -D option with
+--no-itrace.
+
+
+perf report
+===========
+
+By default, perf report will decode trace data found in the perf.data file.
+This can be further controlled by new option --itrace exactly the same as
+perf script, with the exception that the default is --itrace=igxe.
+
+
+perf inject
+===========
+
+perf inject also accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+	perf inject --itrace -i perf.data -o perf.data.new
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index 1396088..a8be9f9 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -1,5 +1,6 @@
 libperf-y += header.o
 libperf-y += tsc.o
+libperf-y += pmu.o
 libperf-y += kvm-stat.o
 
 libperf-$(CONFIG_DWARF) += dwarf-regs.o
@@ -7,4 +8,5 @@ libperf-$(CONFIG_DWARF) += dwarf-regs.o
 libperf-$(CONFIG_LIBUNWIND)          += unwind-libunwind.o
 libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 
+libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
new file mode 100644
index 0000000..e7654b5
--- /dev/null
+++ b/tools/perf/arch/x86/util/auxtrace.c
@@ -0,0 +1,38 @@
+/*
+ * auxtrace.c: AUX area tracing support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include "../../util/header.h"
+#include "../../util/auxtrace.h"
+#include "../../util/intel-pt.h"
+
+struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe_unused,
+					      int *err)
+{
+	char buffer[64];
+	int ret;
+
+	*err = 0;
+
+	ret = get_cpuid(buffer, sizeof(buffer));
+	if (ret) {
+		*err = ret;
+		return NULL;
+	}
+
+	if (!strncmp(buffer, "GenuineIntel,", 13))
+		return intel_pt_recording_init(err);
+
+	return NULL;
+}
diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
new file mode 100644
index 0000000..fd11cc3
--- /dev/null
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -0,0 +1,15 @@
+#include <string.h>
+
+#include <linux/perf_event.h>
+
+#include "../../util/intel-pt.h"
+#include "../../util/pmu.h"
+
+struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
+{
+#ifdef HAVE_AUXTRACE_SUPPORT
+	if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
+		return intel_pt_pmu_default_config(pmu);
+#endif
+	return NULL;
+}
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 734c4d2..8b7d59a 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -47,6 +47,8 @@
 #include "debug.h"
 #include "parse-options.h"
 
+#include "intel-pt.h"
+
 int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
 			struct auxtrace_mmap_params *mp,
 			void *userpg, int fd)
@@ -876,7 +878,7 @@ static bool auxtrace__dont_decode(struct perf_session *session)
 
 int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 				      union perf_event *event,
-				      struct perf_session *session __maybe_unused)
+				      struct perf_session *session)
 {
 	enum auxtrace_type type = event->auxtrace_info.type;
 
@@ -885,6 +887,7 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 
 	switch (type) {
 	case PERF_AUXTRACE_INTEL_PT:
+		return intel_pt_process_auxtrace_info(event, session);
 	case PERF_AUXTRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 5d3ab7c..fddad8b 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -442,8 +442,8 @@ static struct perf_pmu *pmu_lookup(const char *name)
 	LIST_HEAD(aliases);
 	__u32 type;
 
-	/* No support for intel_bts or intel_pt so disallow them */
-	if (!strcmp(name, "intel_bts") || !strcmp(name, "intel_pt"))
+	/* No support for intel_bts so disallow it */
+	if (!strcmp(name, "intel_bts"))
 		return NULL;
 
 	/*
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 10/17] perf tools: Allow auxtrace data alignment
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (8 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 09/17] perf tools: Take Intel PT into use Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-06-25  7:58   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 11/17] perf tools: Add Intel BTS support Adrian Hunter
                   ` (6 subsequent siblings)
  16 siblings, 1 reply; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Allow auxtrace data to be a multiple of something other than page size.
That is needed for BTS where the buffer contains 24-byte records.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/auxtrace.c | 7 +++++++
 tools/perf/util/auxtrace.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 8b7d59a..2d57759 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1186,6 +1186,13 @@ static int __auxtrace_mmap__read(struct auxtrace_mmap *mm,
 		data2 = NULL;
 	}
 
+	if (itr->alignment) {
+		unsigned int unwanted = len1 % itr->alignment;
+
+		len1 -= unwanted;
+		size -= unwanted;
+	}
+
 	/* padding must be written by fn() e.g. record__process_auxtrace() */
 	padding = size & 7;
 	if (padding)
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index ed98743..7d12f33 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -304,6 +304,7 @@ struct auxtrace_record {
 				      const char *str);
 	u64 (*reference)(struct auxtrace_record *itr);
 	int (*read_finish)(struct auxtrace_record *itr, int idx);
+	unsigned int alignment;
 };
 
 #ifdef HAVE_AUXTRACE_SUPPORT
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 11/17] perf tools: Add Intel BTS support
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (9 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 10/17] perf tools: Allow auxtrace data alignment Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 12/17] perf tools: Output sample flags and insn_len from intel_pt Adrian Hunter
                   ` (5 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Intel BTS support fits within the new auxtrace infrastructure.
Recording is supporting by identifying the Intel BTS PMU,
parsing options and setting up events.  Decoding is supported
by queuing up trace data by thread and then decoding
synchronously delivering synthesized event samples into the
session processing for tools to consume.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/intel-bts.txt |  86 ++++
 tools/perf/arch/x86/util/Build         |   1 +
 tools/perf/arch/x86/util/auxtrace.c    |  49 +-
 tools/perf/arch/x86/util/intel-bts.c   | 458 +++++++++++++++++++
 tools/perf/arch/x86/util/pmu.c         |   3 +
 tools/perf/util/Build                  |   1 +
 tools/perf/util/auxtrace.c             |   3 +
 tools/perf/util/auxtrace.h             |   1 +
 tools/perf/util/intel-bts.c            | 793 +++++++++++++++++++++++++++++++++
 tools/perf/util/intel-bts.h            |  43 ++
 tools/perf/util/pmu.c                  |   4 -
 11 files changed, 1436 insertions(+), 6 deletions(-)
 create mode 100644 tools/perf/Documentation/intel-bts.txt
 create mode 100644 tools/perf/arch/x86/util/intel-bts.c
 create mode 100644 tools/perf/util/intel-bts.c
 create mode 100644 tools/perf/util/intel-bts.h

diff --git a/tools/perf/Documentation/intel-bts.txt b/tools/perf/Documentation/intel-bts.txt
new file mode 100644
index 0000000..8bdc93b
--- /dev/null
+++ b/tools/perf/Documentation/intel-bts.txt
@@ -0,0 +1,86 @@
+Intel Branch Trace Store
+========================
+
+Overview
+========
+
+Intel BTS could be regarded as a predecessor to Intel PT and has some
+similarities because it can also identify every branch a program takes.  A
+notable difference is that Intel BTS has no timing information and as a
+consequence the present implementation is limited to per-thread recording.
+
+While decoding Intel BTS does not require walking the object code, the object
+code is still needed to pair up calls and returns correctly, consequently much
+of the Intel PT documentation applies also to Intel BTS.  Refer to the Intel PT
+documentation and consider that the PMU 'intel_bts' can usually be used in
+place of 'intel_pt' in the examples provided, with the proviso that per-thread
+recording must also be stipulated i.e. the --per-thread option for
+'perf record'.
+
+
+perf record
+===========
+
+new event
+---------
+
+The Intel BTS kernel driver creates a new PMU for Intel BTS.  The perf record
+option is:
+
+	-e intel_bts//
+
+Currently Intel BTS is limited to per-thread tracing so the --per-thread option
+is also needed.
+
+
+snapshot option
+---------------
+
+The snapshot option is the same as Intel PT (refer Intel PT documentation).
+
+
+auxtrace mmap size option
+-----------------------
+
+The mmap size option is the same as Intel PT (refer Intel PT documentation).
+
+
+perf script
+===========
+
+By default, perf script will decode trace data found in the perf.data file.
+This can be further controlled by option --itrace.  The --itrace option is
+the same as Intel PT (refer Intel PT documentation) except that neither
+"instructions" events nor "transactions" events (and consequently call
+chains) are supported.
+
+To disable trace decoding entirely, use the option --no-itrace.
+
+
+dump option
+-----------
+
+perf script has an option (-D) to "dump" the events i.e. display the binary
+data.
+
+When -D is used, Intel BTS packets are displayed.
+
+To disable the display of Intel BTS packets, combine the -D option with
+--no-itrace.
+
+
+perf report
+===========
+
+By default, perf report will decode trace data found in the perf.data file.
+This can be further controlled by new option --itrace exactly the same as
+perf script.
+
+
+perf inject
+===========
+
+perf inject also accepts the --itrace option in which case tracing data is
+removed and replaced with the synthesized events. e.g.
+
+	perf inject --itrace -i perf.data -o perf.data.new
diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
index a8be9f9..2c55e1b 100644
--- a/tools/perf/arch/x86/util/Build
+++ b/tools/perf/arch/x86/util/Build
@@ -10,3 +10,4 @@ libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
 
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
+libperf-$(CONFIG_AUXTRACE) += intel-bts.o
diff --git a/tools/perf/arch/x86/util/auxtrace.c b/tools/perf/arch/x86/util/auxtrace.c
index e7654b5..7a78055 100644
--- a/tools/perf/arch/x86/util/auxtrace.c
+++ b/tools/perf/arch/x86/util/auxtrace.c
@@ -13,11 +13,56 @@
  *
  */
 
+#include <stdbool.h>
+
 #include "../../util/header.h"
+#include "../../util/debug.h"
+#include "../../util/pmu.h"
 #include "../../util/auxtrace.h"
 #include "../../util/intel-pt.h"
+#include "../../util/intel-bts.h"
+#include "../../util/evlist.h"
+
+static
+struct auxtrace_record *auxtrace_record__init_intel(struct perf_evlist *evlist,
+						    int *err)
+{
+	struct perf_pmu *intel_pt_pmu;
+	struct perf_pmu *intel_bts_pmu;
+	struct perf_evsel *evsel;
+	bool found_pt = false;
+	bool found_bts = false;
+
+	intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
+	intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
+
+	if (evlist) {
+		evlist__for_each(evlist, evsel) {
+			if (intel_pt_pmu &&
+			    evsel->attr.type == intel_pt_pmu->type)
+				found_pt = true;
+			if (intel_bts_pmu &&
+			    evsel->attr.type == intel_bts_pmu->type)
+				found_bts = true;
+		}
+	}
+
+	if (found_pt && found_bts) {
+		pr_err("intel_pt and intel_bts may not be used together\n");
+		*err = -EINVAL;
+		return NULL;
+	}
+
+	if (found_pt)
+		return intel_pt_recording_init(err);
+
+	if (found_bts)
+		return intel_bts_recording_init(err);
 
-struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe_unused,
+	return NULL;
+}
+
+struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist,
 					      int *err)
 {
 	char buffer[64];
@@ -32,7 +77,7 @@ struct auxtrace_record *auxtrace_record__init(struct perf_evlist *evlist __maybe
 	}
 
 	if (!strncmp(buffer, "GenuineIntel,", 13))
-		return intel_pt_recording_init(err);
+		return auxtrace_record__init_intel(evlist, err);
 
 	return NULL;
 }
diff --git a/tools/perf/arch/x86/util/intel-bts.c b/tools/perf/arch/x86/util/intel-bts.c
new file mode 100644
index 0000000..9b94ce5
--- /dev/null
+++ b/tools/perf/arch/x86/util/intel-bts.c
@@ -0,0 +1,458 @@
+/*
+ * intel-bts.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "../../util/cpumap.h"
+#include "../../util/evsel.h"
+#include "../../util/evlist.h"
+#include "../../util/session.h"
+#include "../../util/util.h"
+#include "../../util/pmu.h"
+#include "../../util/debug.h"
+#include "../../util/tsc.h"
+#include "../../util/auxtrace.h"
+#include "../../util/intel-bts.h"
+
+#define KiB(x) ((x) * 1024)
+#define MiB(x) ((x) * 1024 * 1024)
+#define KiB_MASK(x) (KiB(x) - 1)
+#define MiB_MASK(x) (MiB(x) - 1)
+
+#define INTEL_BTS_DFLT_SAMPLE_SIZE	KiB(4)
+
+#define INTEL_BTS_MAX_SAMPLE_SIZE	KiB(60)
+
+struct intel_bts_snapshot_ref {
+	void	*ref_buf;
+	size_t	ref_offset;
+	bool	wrapped;
+};
+
+struct intel_bts_recording {
+	struct auxtrace_record		itr;
+	struct perf_pmu			*intel_bts_pmu;
+	struct perf_evlist		*evlist;
+	bool				snapshot_mode;
+	size_t				snapshot_size;
+	int				snapshot_ref_cnt;
+	struct intel_bts_snapshot_ref	*snapshot_refs;
+};
+
+struct branch {
+	u64 from;
+	u64 to;
+	u64 misc;
+};
+
+static size_t intel_bts_info_priv_size(struct auxtrace_record *itr __maybe_unused)
+{
+	return INTEL_BTS_AUXTRACE_PRIV_SIZE;
+}
+
+static int intel_bts_info_fill(struct auxtrace_record *itr,
+			       struct perf_session *session,
+			       struct auxtrace_info_event *auxtrace_info,
+			       size_t priv_size)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
+	struct perf_event_mmap_page *pc;
+	struct perf_tsc_conversion tc = { .time_mult = 0, };
+	bool cap_user_time_zero = false;
+	int err;
+
+	if (priv_size != INTEL_BTS_AUXTRACE_PRIV_SIZE)
+		return -EINVAL;
+
+	if (!session->evlist->nr_mmaps)
+		return -EINVAL;
+
+	pc = session->evlist->mmap[0].base;
+	if (pc) {
+		err = perf_read_tsc_conversion(pc, &tc);
+		if (err) {
+			if (err != -EOPNOTSUPP)
+				return err;
+		} else {
+			cap_user_time_zero = tc.time_mult != 0;
+		}
+		if (!cap_user_time_zero)
+			ui__warning("Intel BTS: TSC not available\n");
+	}
+
+	auxtrace_info->type = PERF_AUXTRACE_INTEL_BTS;
+	auxtrace_info->priv[INTEL_BTS_PMU_TYPE] = intel_bts_pmu->type;
+	auxtrace_info->priv[INTEL_BTS_TIME_SHIFT] = tc.time_shift;
+	auxtrace_info->priv[INTEL_BTS_TIME_MULT] = tc.time_mult;
+	auxtrace_info->priv[INTEL_BTS_TIME_ZERO] = tc.time_zero;
+	auxtrace_info->priv[INTEL_BTS_CAP_USER_TIME_ZERO] = cap_user_time_zero;
+	auxtrace_info->priv[INTEL_BTS_SNAPSHOT_MODE] = btsr->snapshot_mode;
+
+	return 0;
+}
+
+static int intel_bts_recording_options(struct auxtrace_record *itr,
+				       struct perf_evlist *evlist,
+				       struct record_opts *opts)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_pmu *intel_bts_pmu = btsr->intel_bts_pmu;
+	struct perf_evsel *evsel, *intel_bts_evsel = NULL;
+	const struct cpu_map *cpus = evlist->cpus;
+	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
+
+	btsr->evlist = evlist;
+	btsr->snapshot_mode = opts->auxtrace_snapshot_mode;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == intel_bts_pmu->type) {
+			if (intel_bts_evsel) {
+				pr_err("There may be only one " INTEL_BTS_PMU_NAME " event\n");
+				return -EINVAL;
+			}
+			evsel->attr.freq = 0;
+			evsel->attr.sample_period = 1;
+			intel_bts_evsel = evsel;
+			opts->full_auxtrace = true;
+		}
+	}
+
+	if (opts->auxtrace_snapshot_mode && !opts->full_auxtrace) {
+		pr_err("Snapshot mode (-S option) requires " INTEL_BTS_PMU_NAME " PMU event (-e " INTEL_BTS_PMU_NAME ")\n");
+		return -EINVAL;
+	}
+
+	if (!opts->full_auxtrace)
+		return 0;
+
+	if (opts->full_auxtrace && !cpu_map__empty(cpus)) {
+		pr_err(INTEL_BTS_PMU_NAME " does not support per-cpu recording\n");
+		return -EINVAL;
+	}
+
+	/* Set default sizes for snapshot mode */
+	if (opts->auxtrace_snapshot_mode) {
+		if (!opts->auxtrace_snapshot_size && !opts->auxtrace_mmap_pages) {
+			if (privileged) {
+				opts->auxtrace_mmap_pages = MiB(4) / page_size;
+			} else {
+				opts->auxtrace_mmap_pages = KiB(128) / page_size;
+				if (opts->mmap_pages == UINT_MAX)
+					opts->mmap_pages = KiB(256) / page_size;
+			}
+		} else if (!opts->auxtrace_mmap_pages && !privileged &&
+			   opts->mmap_pages == UINT_MAX) {
+			opts->mmap_pages = KiB(256) / page_size;
+		}
+		if (!opts->auxtrace_snapshot_size)
+			opts->auxtrace_snapshot_size =
+				opts->auxtrace_mmap_pages * (size_t)page_size;
+		if (!opts->auxtrace_mmap_pages) {
+			size_t sz = opts->auxtrace_snapshot_size;
+
+			sz = round_up(sz, page_size) / page_size;
+			opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
+		}
+		if (opts->auxtrace_snapshot_size >
+				opts->auxtrace_mmap_pages * (size_t)page_size) {
+			pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
+			       opts->auxtrace_snapshot_size,
+			       opts->auxtrace_mmap_pages * (size_t)page_size);
+			return -EINVAL;
+		}
+		if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages) {
+			pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
+			return -EINVAL;
+		}
+		pr_debug2("Intel BTS snapshot size: %zu\n",
+			  opts->auxtrace_snapshot_size);
+	}
+
+	/* Set default sizes for full trace mode */
+	if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) {
+		if (privileged) {
+			opts->auxtrace_mmap_pages = MiB(4) / page_size;
+		} else {
+			opts->auxtrace_mmap_pages = KiB(128) / page_size;
+			if (opts->mmap_pages == UINT_MAX)
+				opts->mmap_pages = KiB(256) / page_size;
+		}
+	}
+
+	/* Validate auxtrace_mmap_pages */
+	if (opts->auxtrace_mmap_pages) {
+		size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
+		size_t min_sz;
+
+		if (opts->auxtrace_snapshot_mode)
+			min_sz = KiB(4);
+		else
+			min_sz = KiB(8);
+
+		if (sz < min_sz || !is_power_of_2(sz)) {
+			pr_err("Invalid mmap size for Intel BTS: must be at least %zuKiB and a power of 2\n",
+			       min_sz / 1024);
+			return -EINVAL;
+		}
+	}
+
+	if (intel_bts_evsel) {
+		/*
+		 * To obtain the auxtrace buffer file descriptor, the auxtrace event
+		 * must come first.
+		 */
+		perf_evlist__to_front(evlist, intel_bts_evsel);
+		/*
+		 * In the case of per-cpu mmaps, we need the CPU on the
+		 * AUX event.
+		 */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(intel_bts_evsel, CPU);
+	}
+
+	/* Add dummy event to keep tracking */
+	if (opts->full_auxtrace) {
+		struct perf_evsel *tracking_evsel;
+		int err;
+
+		err = parse_events(evlist, "dummy:u", NULL);
+		if (err)
+			return err;
+
+		tracking_evsel = perf_evlist__last(evlist);
+
+		perf_evlist__set_tracking_event(evlist, tracking_evsel);
+
+		tracking_evsel->attr.freq = 0;
+		tracking_evsel->attr.sample_period = 1;
+	}
+
+	return 0;
+}
+
+static int intel_bts_parse_snapshot_options(struct auxtrace_record *itr,
+					    struct record_opts *opts,
+					    const char *str)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	unsigned long long snapshot_size = 0;
+	char *endptr;
+
+	if (str) {
+		snapshot_size = strtoull(str, &endptr, 0);
+		if (*endptr || snapshot_size > SIZE_MAX)
+			return -1;
+	}
+
+	opts->auxtrace_snapshot_mode = true;
+	opts->auxtrace_snapshot_size = snapshot_size;
+
+	btsr->snapshot_size = snapshot_size;
+
+	return 0;
+}
+
+static u64 intel_bts_reference(struct auxtrace_record *itr __maybe_unused)
+{
+	return rdtsc();
+}
+
+static int intel_bts_alloc_snapshot_refs(struct intel_bts_recording *btsr,
+					 int idx)
+{
+	const size_t sz = sizeof(struct intel_bts_snapshot_ref);
+	int cnt = btsr->snapshot_ref_cnt, new_cnt = cnt * 2;
+	struct intel_bts_snapshot_ref *refs;
+
+	if (!new_cnt)
+		new_cnt = 16;
+
+	while (new_cnt <= idx)
+		new_cnt *= 2;
+
+	refs = calloc(new_cnt, sz);
+	if (!refs)
+		return -ENOMEM;
+
+	memcpy(refs, btsr->snapshot_refs, cnt * sz);
+
+	btsr->snapshot_refs = refs;
+	btsr->snapshot_ref_cnt = new_cnt;
+
+	return 0;
+}
+
+static void intel_bts_free_snapshot_refs(struct intel_bts_recording *btsr)
+{
+	int i;
+
+	for (i = 0; i < btsr->snapshot_ref_cnt; i++)
+		zfree(&btsr->snapshot_refs[i].ref_buf);
+	zfree(&btsr->snapshot_refs);
+}
+
+static void intel_bts_recording_free(struct auxtrace_record *itr)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+
+	intel_bts_free_snapshot_refs(btsr);
+	free(btsr);
+}
+
+static int intel_bts_snapshot_start(struct auxtrace_record *itr)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(btsr->evlist, evsel) {
+		if (evsel->attr.type == btsr->intel_bts_pmu->type)
+			return perf_evlist__disable_event(btsr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_bts_snapshot_finish(struct auxtrace_record *itr)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(btsr->evlist, evsel) {
+		if (evsel->attr.type == btsr->intel_bts_pmu->type)
+			return perf_evlist__enable_event(btsr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static bool intel_bts_first_wrap(u64 *data, size_t buf_size)
+{
+	int i, a, b;
+
+	b = buf_size >> 3;
+	a = b - 512;
+	if (a < 0)
+		a = 0;
+
+	for (i = a; i < b; i++) {
+		if (data[i])
+			return true;
+	}
+
+	return false;
+}
+
+static int intel_bts_find_snapshot(struct auxtrace_record *itr, int idx,
+				   struct auxtrace_mmap *mm, unsigned char *data,
+				   u64 *head, u64 *old)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	bool wrapped;
+	int err;
+
+	pr_debug3("%s: mmap index %d old head %zu new head %zu\n",
+		  __func__, idx, (size_t)*old, (size_t)*head);
+
+	if (idx >= btsr->snapshot_ref_cnt) {
+		err = intel_bts_alloc_snapshot_refs(btsr, idx);
+		if (err)
+			goto out_err;
+	}
+
+	wrapped = btsr->snapshot_refs[idx].wrapped;
+	if (!wrapped && intel_bts_first_wrap((u64 *)data, mm->len)) {
+		btsr->snapshot_refs[idx].wrapped = true;
+		wrapped = true;
+	}
+
+	/*
+	 * In full trace mode 'head' continually increases.  However in snapshot
+	 * mode 'head' is an offset within the buffer.  Here 'old' and 'head'
+	 * are adjusted to match the full trace case which expects that 'old' is
+	 * always less than 'head'.
+	 */
+	if (wrapped) {
+		*old = *head;
+		*head += mm->len;
+	} else {
+		if (mm->mask)
+			*old &= mm->mask;
+		else
+			*old %= mm->len;
+		if (*old > *head)
+			*head += mm->len;
+	}
+
+	pr_debug3("%s: wrap-around %sdetected, adjusted old head %zu adjusted new head %zu\n",
+		  __func__, wrapped ? "" : "not ", (size_t)*old, (size_t)*head);
+
+	return 0;
+
+out_err:
+	pr_err("%s: failed, error %d\n", __func__, err);
+	return err;
+}
+
+static int intel_bts_read_finish(struct auxtrace_record *itr, int idx)
+{
+	struct intel_bts_recording *btsr =
+			container_of(itr, struct intel_bts_recording, itr);
+	struct perf_evsel *evsel;
+
+	evlist__for_each(btsr->evlist, evsel) {
+		if (evsel->attr.type == btsr->intel_bts_pmu->type)
+			return perf_evlist__enable_event_idx(btsr->evlist,
+							     evsel, idx);
+	}
+	return -EINVAL;
+}
+
+struct auxtrace_record *intel_bts_recording_init(int *err)
+{
+	struct perf_pmu *intel_bts_pmu = perf_pmu__find(INTEL_BTS_PMU_NAME);
+	struct intel_bts_recording *btsr;
+
+	if (!intel_bts_pmu)
+		return NULL;
+
+	btsr = zalloc(sizeof(struct intel_bts_recording));
+	if (!btsr) {
+		*err = -ENOMEM;
+		return NULL;
+	}
+
+	btsr->intel_bts_pmu = intel_bts_pmu;
+	btsr->itr.recording_options = intel_bts_recording_options;
+	btsr->itr.info_priv_size = intel_bts_info_priv_size;
+	btsr->itr.info_fill = intel_bts_info_fill;
+	btsr->itr.free = intel_bts_recording_free;
+	btsr->itr.snapshot_start = intel_bts_snapshot_start;
+	btsr->itr.snapshot_finish = intel_bts_snapshot_finish;
+	btsr->itr.find_snapshot = intel_bts_find_snapshot;
+	btsr->itr.parse_snapshot_options = intel_bts_parse_snapshot_options;
+	btsr->itr.reference = intel_bts_reference;
+	btsr->itr.read_finish = intel_bts_read_finish;
+	btsr->itr.alignment = sizeof(struct branch);
+	return &btsr->itr;
+}
diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
index fd11cc3..79fe071 100644
--- a/tools/perf/arch/x86/util/pmu.c
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -3,6 +3,7 @@
 #include <linux/perf_event.h>
 
 #include "../../util/intel-pt.h"
+#include "../../util/intel-bts.h"
 #include "../../util/pmu.h"
 
 struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
@@ -10,6 +11,8 @@ struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu __mayb
 #ifdef HAVE_AUXTRACE_SUPPORT
 	if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
 		return intel_pt_pmu_default_config(pmu);
+	if (!strcmp(pmu->name, INTEL_BTS_PMU_NAME))
+		pmu->selectable = true;
 #endif
 	return NULL;
 }
diff --git a/tools/perf/util/Build b/tools/perf/util/Build
index ec7ab9d..4cc1c37 100644
--- a/tools/perf/util/Build
+++ b/tools/perf/util/Build
@@ -77,6 +77,7 @@ libperf-y += thread-stack.o
 libperf-$(CONFIG_AUXTRACE) += auxtrace.o
 libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
 libperf-$(CONFIG_AUXTRACE) += intel-pt.o
+libperf-$(CONFIG_AUXTRACE) += intel-bts.o
 libperf-y += parse-branch-options.o
 
 libperf-$(CONFIG_LIBELF) += symbol-elf.o
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 2d57759..aa93ab1 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -48,6 +48,7 @@
 #include "parse-options.h"
 
 #include "intel-pt.h"
+#include "intel-bts.h"
 
 int auxtrace_mmap__mmap(struct auxtrace_mmap *mm,
 			struct auxtrace_mmap_params *mp,
@@ -888,6 +889,8 @@ int perf_event__process_auxtrace_info(struct perf_tool *tool __maybe_unused,
 	switch (type) {
 	case PERF_AUXTRACE_INTEL_PT:
 		return intel_pt_process_auxtrace_info(event, session);
+	case PERF_AUXTRACE_INTEL_BTS:
+		return intel_bts_process_auxtrace_info(event, session);
 	case PERF_AUXTRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index 7d12f33..bf72b77 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -40,6 +40,7 @@ struct events_stats;
 enum auxtrace_type {
 	PERF_AUXTRACE_UNKNOWN,
 	PERF_AUXTRACE_INTEL_PT,
+	PERF_AUXTRACE_INTEL_BTS,
 };
 
 enum itrace_period_type {
diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
new file mode 100644
index 0000000..48bcbd6
--- /dev/null
+++ b/tools/perf/util/intel-bts.c
@@ -0,0 +1,793 @@
+/*
+ * intel-bts.c: Intel Processor Trace support
+ * Copyright (c) 2013-2015, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#include <endian.h>
+#include <byteswap.h>
+#include <linux/kernel.h>
+#include <linux/types.h>
+#include <linux/bitops.h>
+#include <linux/log2.h>
+
+#include "cpumap.h"
+#include "color.h"
+#include "evsel.h"
+#include "evlist.h"
+#include "machine.h"
+#include "session.h"
+#include "util.h"
+#include "debug.h"
+#include "tsc.h"
+#include "auxtrace.h"
+#include "intel-bts.h"
+
+#define MAX_TIMESTAMP (~0ULL)
+
+#define INTEL_BTS_ERR_NOINSN  5
+#define INTEL_BTS_ERR_LOST    9
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define le64_to_cpu bswap_64
+#else
+#define le64_to_cpu
+#endif
+
+struct intel_bts {
+	struct auxtrace			auxtrace;
+	struct auxtrace_queues		queues;
+	struct auxtrace_heap		heap;
+	u32				auxtrace_type;
+	struct perf_session		*session;
+	struct machine			*machine;
+	bool				sampling_mode;
+	bool				snapshot_mode;
+	bool				data_queued;
+	u32				pmu_type;
+	struct perf_tsc_conversion	tc;
+	bool				cap_user_time_zero;
+	struct itrace_synth_opts	synth_opts;
+	bool				sample_branches;
+	u64				branches_sample_type;
+	u64				branches_id;
+	size_t				branches_event_size;
+	bool				synth_needs_swap;
+};
+
+struct intel_bts_queue {
+	struct intel_bts	*bts;
+	unsigned int		queue_nr;
+	struct auxtrace_buffer	*buffer;
+	bool			on_heap;
+	bool			done;
+	pid_t			pid;
+	pid_t			tid;
+	int			cpu;
+	u64			time;
+};
+
+struct branch {
+	u64 from;
+	u64 to;
+	u64 misc;
+};
+
+static void intel_bts_dump(struct intel_bts *bts __maybe_unused,
+			   unsigned char *buf, size_t len)
+{
+	struct branch *branch;
+	size_t i, pos = 0, br_sz = sizeof(struct branch), sz;
+	const char *color = PERF_COLOR_BLUE;
+
+	color_fprintf(stdout, color,
+		      ". ... Intel BTS data: size %zu bytes\n",
+		      len);
+
+	while (len) {
+		if (len >= br_sz)
+			sz = br_sz;
+		else
+			sz = len;
+		printf(".");
+		color_fprintf(stdout, color, "  %08x: ", pos);
+		for (i = 0; i < sz; i++)
+			color_fprintf(stdout, color, " %02x", buf[i]);
+		for (; i < br_sz; i++)
+			color_fprintf(stdout, color, "   ");
+		if (len >= br_sz) {
+			branch = (struct branch *)buf;
+			color_fprintf(stdout, color, " %"PRIx64" -> %"PRIx64" %s\n",
+				      le64_to_cpu(branch->from),
+				      le64_to_cpu(branch->to),
+				      le64_to_cpu(branch->misc) & 0x10 ?
+							"pred" : "miss");
+		} else {
+			color_fprintf(stdout, color, " Bad record!\n");
+		}
+		pos += sz;
+		buf += sz;
+		len -= sz;
+	}
+}
+
+static void intel_bts_dump_event(struct intel_bts *bts, unsigned char *buf,
+				 size_t len)
+{
+	printf(".\n");
+	intel_bts_dump(bts, buf, len);
+}
+
+static int intel_bts_lost(struct intel_bts *bts, struct perf_sample *sample)
+{
+	union perf_event event;
+	int err;
+
+	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
+			     INTEL_BTS_ERR_LOST, sample->cpu, sample->pid,
+			     sample->tid, 0, "Lost trace data");
+
+	err = perf_session__deliver_synth_event(bts->session, &event, NULL);
+	if (err)
+		pr_err("Intel BTS: failed to deliver error event, error %d\n",
+		       err);
+
+	return err;
+}
+
+static struct intel_bts_queue *intel_bts_alloc_queue(struct intel_bts *bts,
+						     unsigned int queue_nr)
+{
+	struct intel_bts_queue *btsq;
+
+	btsq = zalloc(sizeof(struct intel_bts_queue));
+	if (!btsq)
+		return NULL;
+
+	btsq->bts = bts;
+	btsq->queue_nr = queue_nr;
+	btsq->pid = -1;
+	btsq->tid = -1;
+	btsq->cpu = -1;
+
+	return btsq;
+}
+
+static int intel_bts_setup_queue(struct intel_bts *bts,
+				 struct auxtrace_queue *queue,
+				 unsigned int queue_nr)
+{
+	struct intel_bts_queue *btsq = queue->priv;
+
+	if (list_empty(&queue->head))
+		return 0;
+
+	if (!btsq) {
+		btsq = intel_bts_alloc_queue(bts, queue_nr);
+		if (!btsq)
+			return -ENOMEM;
+		queue->priv = btsq;
+
+		if (queue->cpu != -1)
+			btsq->cpu = queue->cpu;
+		btsq->tid = queue->tid;
+	}
+
+	if (bts->sampling_mode)
+		return 0;
+
+	if (!btsq->on_heap && !btsq->buffer) {
+		int ret;
+
+		btsq->buffer = auxtrace_buffer__next(queue, NULL);
+		if (!btsq->buffer)
+			return 0;
+
+		ret = auxtrace_heap__add(&bts->heap, queue_nr,
+					 btsq->buffer->reference);
+		if (ret)
+			return ret;
+		btsq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int intel_bts_setup_queues(struct intel_bts *bts)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < bts->queues.nr_queues; i++) {
+		ret = intel_bts_setup_queue(bts, &bts->queues.queue_array[i],
+					    i);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static inline int intel_bts_update_queues(struct intel_bts *bts)
+{
+	if (bts->queues.new_data) {
+		bts->queues.new_data = false;
+		return intel_bts_setup_queues(bts);
+	}
+	return 0;
+}
+
+static unsigned char *intel_bts_find_overlap(unsigned char *buf_a, size_t len_a,
+					     unsigned char *buf_b, size_t len_b)
+{
+	size_t offs, len;
+
+	if (len_a > len_b)
+		offs = len_a - len_b;
+	else
+		offs = 0;
+
+	for (; offs < len_a; offs += sizeof(struct branch)) {
+		len = len_a - offs;
+		if (!memcmp(buf_a + offs, buf_b, len))
+			return buf_b + len;
+	}
+
+	return buf_b;
+}
+
+static int intel_bts_do_fix_overlap(struct auxtrace_queue *queue,
+				    struct auxtrace_buffer *b)
+{
+	struct auxtrace_buffer *a;
+	void *start;
+
+	if (b->list.prev == &queue->head)
+		return 0;
+	a = list_entry(b->list.prev, struct auxtrace_buffer, list);
+	start = intel_bts_find_overlap(a->data, a->size, b->data, b->size);
+	if (!start)
+		return -EINVAL;
+	b->use_size = b->data + b->size - start;
+	b->use_data = start;
+	return 0;
+}
+
+static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
+					 struct branch *branch)
+{
+	int ret;
+	struct intel_bts *bts = btsq->bts;
+	union perf_event event;
+	struct perf_sample sample = { .ip = 0, };
+
+	event.sample.header.type = PERF_RECORD_SAMPLE;
+	event.sample.header.misc = PERF_RECORD_MISC_USER;
+	event.sample.header.size = sizeof(struct perf_event_header);
+
+	sample.ip = le64_to_cpu(branch->from);
+	sample.pid = btsq->pid;
+	sample.tid = btsq->tid;
+	sample.addr = le64_to_cpu(branch->to);
+	sample.id = btsq->bts->branches_id;
+	sample.stream_id = btsq->bts->branches_id;
+	sample.period = 1;
+	sample.cpu = btsq->cpu;
+
+	if (bts->synth_opts.inject) {
+		event.sample.header.size = bts->branches_event_size;
+		ret = perf_event__synthesize_sample(&event,
+						    bts->branches_sample_type,
+						    0, &sample,
+						    bts->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(bts->session, &event, &sample);
+	if (ret)
+		pr_err("Intel BTS: failed to deliver branch event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
+				    struct auxtrace_buffer *buffer)
+{
+	struct branch *branch;
+	size_t sz;
+	int err = 0;
+
+	if (buffer->use_data) {
+		sz = buffer->use_size;
+		branch = buffer->use_data;
+	} else {
+		sz = buffer->size;
+		branch = buffer->data;
+	}
+
+	if (!btsq->bts->sample_branches)
+		return 0;
+
+	while (sz > sizeof(struct branch)) {
+		if (!branch->from && !branch->to)
+			continue;
+		err = intel_bts_synth_branch_sample(btsq, branch);
+		if (err)
+			break;
+		branch += 1;
+		sz -= sizeof(struct branch);
+	}
+	return err;
+}
+
+static int intel_bts_process_queue(struct intel_bts_queue *btsq, u64 *timestamp)
+{
+	struct auxtrace_buffer *buffer = btsq->buffer;
+	struct auxtrace_queue *queue;
+	int err;
+
+	if (btsq->done)
+		return 1;
+
+	if (btsq->pid == -1) {
+		struct thread *thread;
+
+		thread = machine__find_thread(btsq->bts->machine, -1, btsq->tid);
+		if (thread)
+			btsq->pid = thread->pid_;
+	}
+
+	queue = &btsq->bts->queues.queue_array[btsq->queue_nr];
+
+	if (!buffer)
+		buffer = auxtrace_buffer__next(queue, NULL);
+
+	if (!buffer) {
+		if (!btsq->bts->sampling_mode)
+			btsq->done = 1;
+		return 1;
+	}
+
+	/* Currently there is no support for split buffers */
+	if (buffer->consecutive)
+		return -EINVAL;
+
+	if (!buffer->data) {
+		int fd = perf_data_file__fd(btsq->bts->session->file);
+
+		buffer->data = auxtrace_buffer__get_data(buffer, fd);
+		if (!buffer->data)
+			return -ENOMEM;
+	}
+
+	if (btsq->bts->snapshot_mode && !buffer->consecutive &&
+	    intel_bts_do_fix_overlap(queue, buffer))
+		return -ENOMEM;
+
+	err = intel_bts_process_buffer(btsq, buffer);
+
+	auxtrace_buffer__drop_data(buffer);
+
+	btsq->buffer = auxtrace_buffer__next(queue, buffer);
+	if (btsq->buffer) {
+		if (timestamp)
+			*timestamp = btsq->buffer->reference;
+	} else {
+		if (!btsq->bts->sampling_mode)
+			btsq->done = 1;
+	}
+
+	return err;
+}
+
+static int intel_bts_flush_queue(struct intel_bts_queue *btsq)
+{
+	u64 ts = 0;
+	int ret;
+
+	while (1) {
+		ret = intel_bts_process_queue(btsq, &ts);
+		if (ret < 0)
+			return ret;
+		if (ret)
+			break;
+	}
+	return 0;
+}
+
+static int intel_bts_process_tid_exit(struct intel_bts *bts, pid_t tid)
+{
+	struct auxtrace_queues *queues = &bts->queues;
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct auxtrace_queue *queue = &bts->queues.queue_array[i];
+		struct intel_bts_queue *btsq = queue->priv;
+
+		if (btsq && btsq->tid == tid)
+			return intel_bts_flush_queue(btsq);
+	}
+	return 0;
+}
+
+static int intel_bts_process_queues(struct intel_bts *bts, u64 timestamp)
+{
+	while (1) {
+		unsigned int queue_nr;
+		struct auxtrace_queue *queue;
+		struct intel_bts_queue *btsq;
+		u64 ts = 0;
+		int ret;
+
+		if (!bts->heap.heap_cnt)
+			return 0;
+
+		if (bts->heap.heap_array[0].ordinal > timestamp)
+			return 0;
+
+		queue_nr = bts->heap.heap_array[0].queue_nr;
+		queue = &bts->queues.queue_array[queue_nr];
+		btsq = queue->priv;
+
+		auxtrace_heap__pop(&bts->heap);
+
+		ret = intel_bts_process_queue(btsq, &ts);
+		if (ret < 0) {
+			auxtrace_heap__add(&bts->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = auxtrace_heap__add(&bts->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			btsq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int intel_bts_process_event(struct perf_session *session,
+				   union perf_event *event,
+				   struct perf_sample *sample,
+				   struct perf_tool *tool)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+	u64 timestamp;
+	int err;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_events) {
+		pr_err("Intel BTS requires ordered events\n");
+		return -EINVAL;
+	}
+
+	if (sample->time)
+		timestamp = perf_time_to_tsc(sample->time, &bts->tc);
+	else
+		timestamp = 0;
+
+	err = intel_bts_update_queues(bts);
+	if (err)
+		return err;
+
+	err = intel_bts_process_queues(bts, timestamp);
+	if (err)
+		return err;
+	if (event->header.type == PERF_RECORD_EXIT) {
+		err = intel_bts_process_tid_exit(bts, event->comm.tid);
+		if (err)
+			return err;
+	}
+
+	if (event->header.type == PERF_RECORD_AUX &&
+	    (event->aux.flags & PERF_AUX_FLAG_TRUNCATED) &&
+	    bts->synth_opts.errors)
+		err = intel_bts_lost(bts, sample);
+
+	return err;
+}
+
+static int intel_bts_process_auxtrace_event(struct perf_session *session,
+					    union perf_event *event,
+					    struct perf_tool *tool __maybe_unused)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+
+	if (bts->sampling_mode)
+		return 0;
+
+	if (!bts->data_queued) {
+		struct auxtrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data_file__fd(session->file);
+		int err;
+
+		if (perf_data_file__is_pipe(session->file)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = auxtrace_queues__add_event(&bts->queues, session, event,
+						 data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (auxtrace_buffer__get_data(buffer, fd)) {
+				intel_bts_dump_event(bts, buffer->data,
+						     buffer->size);
+				auxtrace_buffer__put_data(buffer);
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int intel_bts_flush(struct perf_session *session __maybe_unused,
+			   struct perf_tool *tool __maybe_unused)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+	int ret;
+
+	if (dump_trace || bts->sampling_mode)
+		return 0;
+
+	if (!tool->ordered_events)
+		return -EINVAL;
+
+	ret = intel_bts_update_queues(bts);
+	if (ret < 0)
+		return ret;
+
+	return intel_bts_process_queues(bts, MAX_TIMESTAMP);
+}
+
+static void intel_bts_free_queue(void *priv)
+{
+	struct intel_bts_queue *btsq = priv;
+
+	if (!btsq)
+		return;
+	free(btsq);
+}
+
+static void intel_bts_free_events(struct perf_session *session)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+	struct auxtrace_queues *queues = &bts->queues;
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		intel_bts_free_queue(queues->queue_array[i].priv);
+		queues->queue_array[i].priv = NULL;
+	}
+	auxtrace_queues__free(queues);
+}
+
+static void intel_bts_free(struct perf_session *session)
+{
+	struct intel_bts *bts = container_of(session->auxtrace, struct intel_bts,
+					     auxtrace);
+
+	auxtrace_heap__free(&bts->heap);
+	intel_bts_free_events(session);
+	session->auxtrace = NULL;
+	free(bts);
+}
+
+struct intel_bts_synth {
+	struct perf_tool dummy_tool;
+	struct perf_session *session;
+};
+
+static int intel_bts_event_synth(struct perf_tool *tool,
+				 union perf_event *event,
+				 struct perf_sample *sample __maybe_unused,
+				 struct machine *machine __maybe_unused)
+{
+	struct intel_bts_synth *intel_bts_synth =
+			container_of(tool, struct intel_bts_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(intel_bts_synth->session,
+						 event, NULL);
+}
+
+static int intel_bts_synth_event(struct perf_session *session,
+				 struct perf_event_attr *attr, u64 id)
+{
+	struct intel_bts_synth intel_bts_synth;
+
+	memset(&intel_bts_synth, 0, sizeof(struct intel_bts_synth));
+	intel_bts_synth.session = session;
+
+	return perf_event__synthesize_attr(&intel_bts_synth.dummy_tool, attr, 1,
+					   &id, intel_bts_event_synth);
+}
+
+static int intel_bts_synth_events(struct intel_bts *bts,
+				  struct perf_session *session)
+{
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	evlist__for_each(evlist, evsel) {
+		if (evsel->attr.type == bts->pmu_type && evsel->ids) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_debug("There are no selected events with Intel BTS data\n");
+		return 0;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+			    PERF_SAMPLE_PERIOD;
+	attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	attr.sample_type &= ~(u64)PERF_SAMPLE_CPU;
+	attr.exclude_user = evsel->attr.exclude_user;
+	attr.exclude_kernel = evsel->attr.exclude_kernel;
+	attr.exclude_hv = evsel->attr.exclude_hv;
+	attr.exclude_host = evsel->attr.exclude_host;
+	attr.exclude_guest = evsel->attr.exclude_guest;
+	attr.sample_id_all = evsel->attr.sample_id_all;
+	attr.read_format = evsel->attr.read_format;
+
+	id = evsel->id[0] + 1000000000;
+	if (!id)
+		id = 1;
+
+	if (bts->synth_opts.branches) {
+		attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
+		attr.sample_period = 1;
+		attr.sample_type |= PERF_SAMPLE_ADDR;
+		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_bts_synth_event(session, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'branches' event type\n",
+			       __func__);
+			return err;
+		}
+		bts->sample_branches = true;
+		bts->branches_sample_type = attr.sample_type;
+		bts->branches_id = id;
+		/*
+		 * We only use sample types from PERF_SAMPLE_MASK so we can use
+		 * __perf_evsel__sample_size() here.
+		 */
+		bts->branches_event_size = sizeof(struct sample_event) +
+				__perf_evsel__sample_size(attr.sample_type);
+	}
+
+	bts->synth_needs_swap = evsel->needs_swap;
+
+	return 0;
+}
+
+static const char * const intel_bts_info_fmts[] = {
+	[INTEL_BTS_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
+	[INTEL_BTS_TIME_SHIFT]		= "  Time Shift         %"PRIu64"\n",
+	[INTEL_BTS_TIME_MULT]		= "  Time Muliplier     %"PRIu64"\n",
+	[INTEL_BTS_TIME_ZERO]		= "  Time Zero          %"PRIu64"\n",
+	[INTEL_BTS_CAP_USER_TIME_ZERO]	= "  Cap Time Zero      %"PRId64"\n",
+	[INTEL_BTS_SNAPSHOT_MODE]	= "  Snapshot mode      %"PRId64"\n",
+};
+
+static void intel_bts_print_info(u64 *arr, int start, int finish)
+{
+	int i;
+
+	if (!dump_trace)
+		return;
+
+	for (i = start; i <= finish; i++)
+		fprintf(stdout, intel_bts_info_fmts[i], arr[i]);
+}
+
+u64 intel_bts_auxtrace_info_priv[INTEL_BTS_AUXTRACE_PRIV_SIZE];
+
+int intel_bts_process_auxtrace_info(union perf_event *event,
+				    struct perf_session *session)
+{
+	struct auxtrace_info_event *auxtrace_info = &event->auxtrace_info;
+	size_t min_sz = sizeof(u64) * INTEL_BTS_SNAPSHOT_MODE;
+	struct intel_bts *bts;
+	int err;
+
+	if (auxtrace_info->header.size < sizeof(struct auxtrace_info_event) +
+					min_sz)
+		return -EINVAL;
+
+	bts = zalloc(sizeof(struct intel_bts));
+	if (!bts)
+		return -ENOMEM;
+
+	err = auxtrace_queues__init(&bts->queues);
+	if (err)
+		goto err_free;
+
+	bts->session = session;
+	bts->machine = &session->machines.host; /* No kvm support */
+	bts->auxtrace_type = auxtrace_info->type;
+	bts->pmu_type = auxtrace_info->priv[INTEL_BTS_PMU_TYPE];
+	bts->tc.time_shift = auxtrace_info->priv[INTEL_BTS_TIME_SHIFT];
+	bts->tc.time_mult = auxtrace_info->priv[INTEL_BTS_TIME_MULT];
+	bts->tc.time_zero = auxtrace_info->priv[INTEL_BTS_TIME_ZERO];
+	bts->cap_user_time_zero =
+			auxtrace_info->priv[INTEL_BTS_CAP_USER_TIME_ZERO];
+	bts->snapshot_mode = auxtrace_info->priv[INTEL_BTS_SNAPSHOT_MODE];
+
+	bts->sampling_mode = false;
+
+	bts->auxtrace.process_event = intel_bts_process_event;
+	bts->auxtrace.process_auxtrace_event = intel_bts_process_auxtrace_event;
+	bts->auxtrace.flush_events = intel_bts_flush;
+	bts->auxtrace.free_events = intel_bts_free_events;
+	bts->auxtrace.free = intel_bts_free;
+	session->auxtrace = &bts->auxtrace;
+
+	intel_bts_print_info(&auxtrace_info->priv[0], INTEL_BTS_PMU_TYPE,
+			     INTEL_BTS_SNAPSHOT_MODE);
+
+	if (dump_trace)
+		return 0;
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
+		bts->synth_opts = *session->itrace_synth_opts;
+	else
+		itrace_synth_opts__set_default(&bts->synth_opts);
+
+	err = intel_bts_synth_events(bts, session);
+	if (err)
+		goto err_free_queues;
+
+	err = auxtrace_queues__process_index(&bts->queues, session);
+	if (err)
+		goto err_free_queues;
+
+	if (bts->queues.populated)
+		bts->data_queued = true;
+
+	return 0;
+
+err_free_queues:
+	auxtrace_queues__free(&bts->queues);
+	session->auxtrace = NULL;
+err_free:
+	free(bts);
+	return err;
+}
diff --git a/tools/perf/util/intel-bts.h b/tools/perf/util/intel-bts.h
new file mode 100644
index 0000000..ca65e21
--- /dev/null
+++ b/tools/perf/util/intel-bts.h
@@ -0,0 +1,43 @@
+/*
+ * intel-bts.h: Intel Processor Trace support
+ * Copyright (c) 2013-2014, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ */
+
+#ifndef INCLUDE__PERF_INTEL_BTS_H__
+#define INCLUDE__PERF_INTEL_BTS_H__
+
+#define INTEL_BTS_PMU_NAME "intel_bts"
+
+enum {
+	INTEL_BTS_PMU_TYPE,
+	INTEL_BTS_TIME_SHIFT,
+	INTEL_BTS_TIME_MULT,
+	INTEL_BTS_TIME_ZERO,
+	INTEL_BTS_CAP_USER_TIME_ZERO,
+	INTEL_BTS_SNAPSHOT_MODE,
+	INTEL_BTS_AUXTRACE_PRIV_MAX,
+};
+
+#define INTEL_BTS_AUXTRACE_PRIV_SIZE (INTEL_BTS_AUXTRACE_PRIV_MAX * sizeof(u64))
+
+struct auxtrace_record;
+struct perf_tool;
+union perf_event;
+struct perf_session;
+
+struct auxtrace_record *intel_bts_recording_init(int *err);
+
+int intel_bts_process_auxtrace_info(union perf_event *event,
+				    struct perf_session *session);
+
+#endif
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index fddad8b..244c66f 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -442,10 +442,6 @@ static struct perf_pmu *pmu_lookup(const char *name)
 	LIST_HEAD(aliases);
 	__u32 type;
 
-	/* No support for intel_bts so disallow it */
-	if (!strcmp(name, "intel_bts"))
-		return NULL;
-
 	/*
 	 * The pmu data we store & need consists of the pmu
 	 * type value and format definitions. Load both right
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 12/17] perf tools: Output sample flags and insn_len from intel_pt
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (10 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 11/17] perf tools: Add Intel BTS support Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 13/17] perf tools: Output sample flags and insn_len from intel_bts Adrian Hunter
                   ` (4 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

intel_pt synthesizes samples.  Fill in the new flags and insn_len
members with instruction information.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 6d66879..9c25bfa 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -876,6 +876,8 @@ static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
 	sample.stream_id = ptq->pt->branches_id;
 	sample.period = 1;
 	sample.cpu = ptq->cpu;
+	sample.flags = ptq->flags;
+	sample.insn_len = ptq->insn_len;
 
 	if (pt->branches_filter && !(pt->branches_filter & ptq->flags))
 		return 0;
@@ -918,6 +920,8 @@ static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
 	sample.stream_id = ptq->pt->instructions_id;
 	sample.period = ptq->pt->instructions_sample_period;
 	sample.cpu = ptq->cpu;
+	sample.flags = ptq->flags;
+	sample.insn_len = ptq->insn_len;
 
 	if (pt->synth_opts.callchain) {
 		thread_stack__sample(ptq->thread, ptq->chain,
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 13/17] perf tools: Output sample flags and insn_len from intel_bts
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (11 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 12/17] perf tools: Output sample flags and insn_len from intel_pt Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 14/17] perf tools: Intel PT to always update thread stack trace number Adrian Hunter
                   ` (3 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

intel_bts synthesizes samples.  Fill in the new flags and insn_len
members with instruction information.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-bts.c | 126 ++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 122 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 48bcbd6..b068860 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -30,6 +30,7 @@
 #include "debug.h"
 #include "tsc.h"
 #include "auxtrace.h"
+#include "intel-pt-decoder/intel-pt-insn-decoder.h"
 #include "intel-bts.h"
 
 #define MAX_TIMESTAMP (~0ULL)
@@ -58,6 +59,7 @@ struct intel_bts {
 	bool				cap_user_time_zero;
 	struct itrace_synth_opts	synth_opts;
 	bool				sample_branches;
+	u32				branches_filter;
 	u64				branches_sample_type;
 	u64				branches_id;
 	size_t				branches_event_size;
@@ -74,6 +76,8 @@ struct intel_bts_queue {
 	pid_t			tid;
 	int			cpu;
 	u64			time;
+	struct intel_pt_insn	intel_pt_insn;
+	u32			sample_flags;
 };
 
 struct branch {
@@ -281,6 +285,8 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
 	sample.stream_id = btsq->bts->branches_id;
 	sample.period = 1;
 	sample.cpu = btsq->cpu;
+	sample.flags = btsq->sample_flags;
+	sample.insn_len = btsq->intel_pt_insn.length;
 
 	if (bts->synth_opts.inject) {
 		event.sample.header.size = bts->branches_event_size;
@@ -300,11 +306,115 @@ static int intel_bts_synth_branch_sample(struct intel_bts_queue *btsq,
 	return ret;
 }
 
+static int intel_bts_get_next_insn(struct intel_bts_queue *btsq, u64 ip)
+{
+	struct machine *machine = btsq->bts->machine;
+	struct thread *thread;
+	struct addr_location al;
+	unsigned char buf[1024];
+	size_t bufsz;
+	ssize_t len;
+	int x86_64;
+	uint8_t cpumode;
+
+	bufsz = intel_pt_insn_max_size();
+
+	if (machine__kernel_ip(machine, ip))
+		cpumode = PERF_RECORD_MISC_KERNEL;
+	else
+		cpumode = PERF_RECORD_MISC_USER;
+
+	thread = machine__find_thread(machine, -1, btsq->tid);
+	if (!thread)
+		return -1;
+
+	thread__find_addr_map(thread, cpumode, MAP__FUNCTION, ip, &al);
+	if (!al.map || !al.map->dso)
+		return -1;
+
+	len = dso__data_read_addr(al.map->dso, al.map, machine, ip, buf, bufsz);
+	if (len <= 0)
+		return -1;
+
+	/* Load maps to ensure dso->is_64_bit has been updated */
+	map__load(al.map, machine->symbol_filter);
+
+	x86_64 = al.map->dso->is_64_bit;
+
+	if (intel_pt_get_insn(buf, len, x86_64, &btsq->intel_pt_insn))
+		return -1;
+
+	return 0;
+}
+
+static int intel_bts_synth_error(struct intel_bts *bts, int cpu, pid_t pid,
+				 pid_t tid, u64 ip)
+{
+	union perf_event event;
+	int err;
+
+	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
+			     INTEL_BTS_ERR_NOINSN, cpu, pid, tid, ip,
+			     "Failed to get instruction");
+
+	err = perf_session__deliver_synth_event(bts->session, &event, NULL);
+	if (err)
+		pr_err("Intel BTS: failed to deliver error event, error %d\n",
+		       err);
+
+	return err;
+}
+
+static int intel_bts_get_branch_type(struct intel_bts_queue *btsq,
+				     struct branch *branch)
+{
+	int err;
+
+	if (!branch->from) {
+		if (branch->to)
+			btsq->sample_flags = PERF_IP_FLAG_BRANCH |
+					     PERF_IP_FLAG_TRACE_BEGIN;
+		else
+			btsq->sample_flags = 0;
+		btsq->intel_pt_insn.length = 0;
+	} else if (!branch->to) {
+		btsq->sample_flags = PERF_IP_FLAG_BRANCH |
+				     PERF_IP_FLAG_TRACE_END;
+		btsq->intel_pt_insn.length = 0;
+	} else {
+		err = intel_bts_get_next_insn(btsq, branch->from);
+		if (err) {
+			btsq->sample_flags = 0;
+			btsq->intel_pt_insn.length = 0;
+			if (!btsq->bts->synth_opts.errors)
+				return 0;
+			err = intel_bts_synth_error(btsq->bts, btsq->cpu,
+						    btsq->pid, btsq->tid,
+						    branch->from);
+			return err;
+		}
+		btsq->sample_flags = intel_pt_insn_type(btsq->intel_pt_insn.op);
+		/* Check for an async branch into the kernel */
+		if (!machine__kernel_ip(btsq->bts->machine, branch->from) &&
+		    machine__kernel_ip(btsq->bts->machine, branch->to) &&
+		    btsq->sample_flags != (PERF_IP_FLAG_BRANCH |
+					   PERF_IP_FLAG_CALL |
+					   PERF_IP_FLAG_SYSCALLRET))
+			btsq->sample_flags = PERF_IP_FLAG_BRANCH |
+					     PERF_IP_FLAG_CALL |
+					     PERF_IP_FLAG_ASYNC |
+					     PERF_IP_FLAG_INTERRUPT;
+	}
+
+	return 0;
+}
+
 static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
 				    struct auxtrace_buffer *buffer)
 {
 	struct branch *branch;
-	size_t sz;
+	size_t sz, bsz = sizeof(struct branch);
+	u32 filter = btsq->bts->branches_filter;
 	int err = 0;
 
 	if (buffer->use_data) {
@@ -318,14 +428,15 @@ static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
 	if (!btsq->bts->sample_branches)
 		return 0;
 
-	while (sz > sizeof(struct branch)) {
+	for (; sz > bsz; branch += 1, sz -= bsz) {
 		if (!branch->from && !branch->to)
 			continue;
+		intel_bts_get_branch_type(btsq, branch);
+		if (filter && !(filter & btsq->sample_flags))
+			continue;
 		err = intel_bts_synth_branch_sample(btsq, branch);
 		if (err)
 			break;
-		branch += 1;
-		sz -= sizeof(struct branch);
 	}
 	return err;
 }
@@ -771,6 +882,13 @@ int intel_bts_process_auxtrace_info(union perf_event *event,
 	else
 		itrace_synth_opts__set_default(&bts->synth_opts);
 
+	if (bts->synth_opts.calls)
+		bts->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
+					PERF_IP_FLAG_TRACE_END;
+	if (bts->synth_opts.returns)
+		bts->branches_filter |= PERF_IP_FLAG_RETURN |
+					PERF_IP_FLAG_TRACE_BEGIN;
+
 	err = intel_bts_synth_events(bts, session);
 	if (err)
 		goto err_free_queues;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 14/17] perf tools: Intel PT to always update thread stack trace number
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (12 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 13/17] perf tools: Output sample flags and insn_len from intel_bts Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 15/17] perf tools: Intel BTS " Adrian Hunter
                   ` (2 subsequent siblings)
  16 siblings, 0 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

The enhanced thread stack is used by higher layers but still requires
the trace number.  The trace number is used to distinguish discontinuous
sections of trace (for example from Snapshot mode or Sample mode), which
cause the thread stack to be flushed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-pt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 9c25bfa..5a59fd8 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -265,7 +265,7 @@ static int intel_pt_get_trace(struct intel_pt_buffer *b, void *data)
 	if (!old_buffer || ptq->pt->sampling_mode || (ptq->pt->snapshot_mode &&
 						      !buffer->consecutive)) {
 		b->consecutive = false;
-		b->trace_nr = buffer->buffer_nr;
+		b->trace_nr = buffer->buffer_nr + 1;
 	} else {
 		b->consecutive = true;
 	}
@@ -1075,6 +1075,8 @@ static int intel_pt_sample(struct intel_pt_queue *ptq)
 		thread_stack__event(ptq->thread, ptq->flags, state->from_ip,
 				    state->to_ip, ptq->insn_len,
 				    state->trace_nr);
+	else
+		thread_stack__set_trace_nr(ptq->thread, state->trace_nr);
 
 	if (pt->sample_branches) {
 		err = intel_pt_synth_branch_sample(ptq);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 15/17] perf tools: Intel BTS to always update thread stack trace number
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (13 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 14/17] perf tools: Intel PT to always update thread stack trace number Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-06-19 16:11   ` Arnaldo Carvalho de Melo
  2015-05-29 13:33 ` [PATCH V6 16/17] perf tools: Put itrace options into an asciidoc include Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 17/17] perf tools: Add example call-graph script Adrian Hunter
  16 siblings, 1 reply; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

The enhanced thread stack is used by higher layers but still requires
the trace number.  The trace number is used to distinguish discontinuous
sections of trace (for example from Snapshot mode or Sample mode), which
cause the thread stack to be flushed.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-bts.c | 18 ++++++++++++++----
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index b068860..cd7bde3 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -27,6 +27,8 @@
 #include "machine.h"
 #include "session.h"
 #include "util.h"
+#include "thread.h"
+#include "thread-stack.h"
 #include "debug.h"
 #include "tsc.h"
 #include "auxtrace.h"
@@ -443,19 +445,22 @@ static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
 
 static int intel_bts_process_queue(struct intel_bts_queue *btsq, u64 *timestamp)
 {
-	struct auxtrace_buffer *buffer = btsq->buffer;
+	struct auxtrace_buffer *buffer = btsq->buffer, *old_buffer = buffer;
 	struct auxtrace_queue *queue;
+	struct thread *thread;
 	int err;
 
 	if (btsq->done)
 		return 1;
 
 	if (btsq->pid == -1) {
-		struct thread *thread;
-
-		thread = machine__find_thread(btsq->bts->machine, -1, btsq->tid);
+		thread = machine__find_thread(btsq->bts->machine, -1,
+					      btsq->tid);
 		if (thread)
 			btsq->pid = thread->pid_;
+	} else {
+		thread = machine__findnew_thread(btsq->bts->machine, btsq->pid,
+						 btsq->tid);
 	}
 
 	queue = &btsq->bts->queues.queue_array[btsq->queue_nr];
@@ -485,6 +490,11 @@ static int intel_bts_process_queue(struct intel_bts_queue *btsq, u64 *timestamp)
 	    intel_bts_do_fix_overlap(queue, buffer))
 		return -ENOMEM;
 
+	if (!btsq->bts->synth_opts.callchain && thread &&
+	    (!old_buffer || btsq->bts->sampling_mode ||
+	     (btsq->bts->snapshot_mode && !buffer->consecutive)))
+		thread_stack__set_trace_nr(thread, buffer->buffer_nr + 1);
+
 	err = intel_bts_process_buffer(btsq, buffer);
 
 	auxtrace_buffer__drop_data(buffer);
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 16/17] perf tools: Put itrace options into an asciidoc include
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (14 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 15/17] perf tools: Intel BTS " Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  2015-05-29 13:33 ` [PATCH V6 17/17] perf tools: Add example call-graph script Adrian Hunter
  16 siblings, 0 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

perf script, report and inject all have the same itrace options. Put
them into an asciidoc include file.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/itrace.txt      | 22 ++++++++++++++++++++++
 tools/perf/Documentation/perf-inject.txt | 23 +----------------------
 tools/perf/Documentation/perf-report.txt | 23 +----------------------
 tools/perf/Documentation/perf-script.txt | 23 +----------------------
 4 files changed, 25 insertions(+), 66 deletions(-)
 create mode 100644 tools/perf/Documentation/itrace.txt

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
new file mode 100644
index 0000000..2ff9466
--- /dev/null
+++ b/tools/perf/Documentation/itrace.txt
@@ -0,0 +1,22 @@
+		i	synthesize instructions events
+		b	synthesize branches events
+		c	synthesize branches events (calls only)
+		r	synthesize branches events (returns only)
+		x	synthesize transactions events
+		e	synthesize error events
+		d	create a debug log
+		g	synthesize a call chain (use with i or x)
+
+	The default is all events i.e. the same as --itrace=ibxe
+
+	In addition, the period (default 100000) for instructions events
+	can be specified in units of:
+
+		i	instructions
+		t	ticks
+		ms	milliseconds
+		us	microseconds
+		ns	nanoseconds (default)
+
+	Also the call chain size (default 16, max. 1024) for instructions or
+	transactions events can be specified.
diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt
index b876ae3..0c721c3 100644
--- a/tools/perf/Documentation/perf-inject.txt
+++ b/tools/perf/Documentation/perf-inject.txt
@@ -48,28 +48,7 @@ OPTIONS
 	Decode Instruction Tracing data, replacing it with synthesized events.
 	Options are:
 
-		i	synthesize instructions events
-		b	synthesize branches events
-		c	synthesize branches events (calls only)
-		r	synthesize branches events (returns only)
-		x	synthesize transactions events
-		e	synthesize error events
-		d	create a debug log
-		g	synthesize a call chain (use with i or x)
-
-	The default is all events i.e. the same as --itrace=ibxe
-
-	In addition, the period (default 100000) for instructions events
-	can be specified in units of:
-
-		i	instructions
-		t	ticks
-		ms	milliseconds
-		us	microseconds
-		ns	nanoseconds (default)
-
-	Also the call chain size (default 16, max. 1024) for instructions or
-	transactions events can be specified.
+include::itrace.txt[]
 
 SEE ALSO
 --------
diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index c33b69f..6c44928 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -328,28 +328,7 @@ OPTIONS
 --itrace::
 	Options for decoding instruction tracing data. The options are:
 
-		i	synthesize instructions events
-		b	synthesize branches events
-		c	synthesize branches events (calls only)
-		r	synthesize branches events (returns only)
-		x	synthesize transactions events
-		e	synthesize error events
-		d	create a debug log
-		g	synthesize a call chain (use with i or x)
-
-	The default is all events i.e. the same as --itrace=ibxe
-
-	In addition, the period (default 100000) for instructions events
-	can be specified in units of:
-
-		i	instructions
-		t	ticks
-		ms	milliseconds
-		us	microseconds
-		ns	nanoseconds (default)
-
-	Also the call chain size (default 16, max. 1024) for instructions or
-	transactions events can be specified.
+include::itrace.txt[]
 
 	To disable decoding entirely, use --no-itrace.
 
diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index c82df57..ac9e99a 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -231,28 +231,7 @@ OPTIONS
 --itrace::
 	Options for decoding instruction tracing data. The options are:
 
-		i	synthesize instructions events
-		b	synthesize branches events
-		c	synthesize branches events (calls only)
-		r	synthesize branches events (returns only)
-		x	synthesize transactions events
-		e	synthesize error events
-		d	create a debug log
-		g	synthesize a call chain (use with i or x)
-
-	The default is all events i.e. the same as --itrace=ibxe
-
-	In addition, the period (default 100000) for instructions events
-	can be specified in units of:
-
-		i	instructions
-		t	ticks
-		ms	milliseconds
-		us	microseconds
-		ns	nanoseconds (default)
-
-	Also the call chain size (default 16, max. 1024) for instructions or
-	transactions events can be specified.
+include::itrace.txt[]
 
 	To disable decoding entirely, use --no-itrace.
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [PATCH V6 17/17] perf tools: Add example call-graph script
  2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
                   ` (15 preceding siblings ...)
  2015-05-29 13:33 ` [PATCH V6 16/17] perf tools: Put itrace options into an asciidoc include Adrian Hunter
@ 2015-05-29 13:33 ` Adrian Hunter
  16 siblings, 0 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-05-29 13:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Add a script to produce a call-graph from data exported
to a postgresql database and derived from a processor trace
event like intel_pt or intel_bts. Refer to comments in the
scripts call-graph-from-postgresql.py and export-to-postgresql.py
for more details.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 .../scripts/python/call-graph-from-postgresql.py   | 327 +++++++++++++++++++++
 tools/perf/scripts/python/export-to-postgresql.py  |  47 +++
 2 files changed, 374 insertions(+)
 create mode 100644 tools/perf/scripts/python/call-graph-from-postgresql.py

diff --git a/tools/perf/scripts/python/call-graph-from-postgresql.py b/tools/perf/scripts/python/call-graph-from-postgresql.py
new file mode 100644
index 0000000..e78fdc2
--- /dev/null
+++ b/tools/perf/scripts/python/call-graph-from-postgresql.py
@@ -0,0 +1,327 @@
+#!/usr/bin/python2
+# call-graph-from-postgresql.py: create call-graph from postgresql database
+# Copyright (c) 2014, Intel Corporation.
+#
+# This program is free software; you can redistribute it and/or modify it
+# under the terms and conditions of the GNU General Public License,
+# version 2, as published by the Free Software Foundation.
+#
+# This program is distributed in the hope it will be useful, but WITHOUT
+# ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+# FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+# more details.
+
+# To use this script you will need to have exported data using the
+# export-to-postgresql.py script.  Refer to that script for details.
+#
+# Following on from the example in the export-to-postgresql.py script, a
+# call-graph can be displayed for the pt_example database like this:
+#
+#	python tools/perf/scripts/python/call-graph-from-postgresql.py pt_example
+#
+# Note this script supports connecting to remote databases by setting hostname,
+# port, username, password, and dbname e.g.
+#
+#	python tools/perf/scripts/python/call-graph-from-postgresql.py "hostname=myhost username=myuser password=mypassword dbname=pt_example"
+#
+# The result is a GUI window with a tree representing a context-sensitive
+# call-graph.  Expanding a couple of levels of the tree and adjusting column
+# widths to suit will display something like:
+#
+#                                         Call Graph: pt_example
+# Call Path                          Object      Count   Time(ns)  Time(%)  Branch Count   Branch Count(%)
+# v- ls
+#     v- 2638:2638
+#         v- _start                  ld-2.19.so    1     10074071   100.0         211135            100.0
+#           |- unknown               unknown       1        13198     0.1              1              0.0
+#           >- _dl_start             ld-2.19.so    1      1400980    13.9          19637              9.3
+#           >- _d_linit_internal     ld-2.19.so    1       448152     4.4          11094              5.3
+#           v-__libc_start_main@plt  ls            1      8211741    81.5         180397             85.4
+#              >- _dl_fixup          ld-2.19.so    1         7607     0.1            108              0.1
+#              >- __cxa_atexit       libc-2.19.so  1        11737     0.1             10              0.0
+#              >- __libc_csu_init    ls            1        10354     0.1             10              0.0
+#              |- _setjmp            libc-2.19.so  1            0     0.0              4              0.0
+#              v- main               ls            1      8182043    99.6         180254             99.9
+#
+# Points to note:
+#	The top level is a command name (comm)
+#	The next level is a thread (pid:tid)
+#	Subsequent levels are functions
+#	'Count' is the number of calls
+#	'Time' is the elapsed time until the function returns
+#	Percentages are relative to the level above
+#	'Branch Count' is the total number of branches for that function and all
+#       functions that it calls
+
+import sys
+from PySide.QtCore import *
+from PySide.QtGui import *
+from PySide.QtSql import *
+from decimal import *
+
+class TreeItem():
+
+	def __init__(self, db, row, parent_item):
+		self.db = db
+		self.row = row
+		self.parent_item = parent_item
+		self.query_done = False;
+		self.child_count = 0
+		self.child_items = []
+		self.data = ["", "", "", "", "", "", ""]
+		self.comm_id = 0
+		self.thread_id = 0
+		self.call_path_id = 1
+		self.branch_count = 0
+		self.time = 0
+		if not parent_item:
+			self.setUpRoot()
+
+	def setUpRoot(self):
+		self.query_done = True
+		query = QSqlQuery(self.db)
+		ret = query.exec_('SELECT id, comm FROM comms')
+		if not ret:
+			raise Exception("Query failed: " + query.lastError().text())
+		while query.next():
+			if not query.value(0):
+				continue
+			child_item = TreeItem(self.db, self.child_count, self)
+			self.child_items.append(child_item)
+			self.child_count += 1
+			child_item.setUpLevel1(query.value(0), query.value(1))
+
+	def setUpLevel1(self, comm_id, comm):
+		self.query_done = True;
+		self.comm_id = comm_id
+		self.data[0] = comm
+		self.child_items = []
+		self.child_count = 0
+		query = QSqlQuery(self.db)
+		ret = query.exec_('SELECT thread_id, ( SELECT pid FROM threads WHERE id = thread_id ), ( SELECT tid FROM threads WHERE id = thread_id ) FROM comm_threads WHERE comm_id = ' + str(comm_id))
+		if not ret:
+			raise Exception("Query failed: " + query.lastError().text())
+		while query.next():
+			child_item = TreeItem(self.db, self.child_count, self)
+			self.child_items.append(child_item)
+			self.child_count += 1
+			child_item.setUpLevel2(comm_id, query.value(0), query.value(1), query.value(2))
+
+	def setUpLevel2(self, comm_id, thread_id, pid, tid):
+		self.comm_id = comm_id
+		self.thread_id = thread_id
+		self.data[0] = str(pid) + ":" + str(tid)
+
+	def getChildItem(self, row):
+		return self.child_items[row]
+
+	def getParentItem(self):
+		return self.parent_item
+
+	def getRow(self):
+		return self.row
+
+	def timePercent(self, b):
+		if not self.time:
+			return "0.0"
+		x = (b * Decimal(100)) / self.time
+		return str(x.quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
+
+	def branchPercent(self, b):
+		if not self.branch_count:
+			return "0.0"
+		x = (b * Decimal(100)) / self.branch_count
+		return str(x.quantize(Decimal('.1'), rounding=ROUND_HALF_UP))
+
+	def addChild(self, call_path_id, name, dso, count, time, branch_count):
+		child_item = TreeItem(self.db, self.child_count, self)
+		child_item.comm_id = self.comm_id
+		child_item.thread_id = self.thread_id
+		child_item.call_path_id = call_path_id
+		child_item.branch_count = branch_count
+		child_item.time = time
+		child_item.data[0] = name
+		if dso == "[kernel.kallsyms]":
+			dso = "[kernel]"
+		child_item.data[1] = dso
+		child_item.data[2] = str(count)
+		child_item.data[3] = str(time)
+		child_item.data[4] = self.timePercent(time)
+		child_item.data[5] = str(branch_count)
+		child_item.data[6] = self.branchPercent(branch_count)
+		self.child_items.append(child_item)
+		self.child_count += 1
+
+	def selectCalls(self):
+		self.query_done = True;
+		query = QSqlQuery(self.db)
+		ret = query.exec_('SELECT id, call_path_id, branch_count, call_time, return_time, '
+				  '( SELECT name FROM symbols WHERE id = ( SELECT symbol_id FROM call_paths WHERE id = call_path_id ) ), '
+				  '( SELECT short_name FROM dsos WHERE id = ( SELECT dso_id FROM symbols WHERE id = ( SELECT symbol_id FROM call_paths WHERE id = call_path_id ) ) ), '
+				  '( SELECT ip FROM call_paths where id = call_path_id ) '
+				  'FROM calls WHERE parent_call_path_id = ' + str(self.call_path_id) + ' AND comm_id = ' + str(self.comm_id) + ' AND thread_id = ' + str(self.thread_id) +
+				  'ORDER BY call_path_id')
+		if not ret:
+			raise Exception("Query failed: " + query.lastError().text())
+		last_call_path_id = 0
+		name = ""
+		dso = ""
+		count = 0
+		branch_count = 0
+		total_branch_count = 0
+		time = 0
+		total_time = 0
+		while query.next():
+			if query.value(1) == last_call_path_id:
+				count += 1
+				branch_count += query.value(2)
+				time += query.value(4) - query.value(3)
+			else:
+				if count:
+					self.addChild(last_call_path_id, name, dso, count, time, branch_count)
+				last_call_path_id = query.value(1)
+				name = query.value(5)
+				dso = query.value(6)
+				count = 1
+				total_branch_count += branch_count
+				total_time += time
+				branch_count = query.value(2)
+				time = query.value(4) - query.value(3)
+		if count:
+			self.addChild(last_call_path_id, name, dso, count, time, branch_count)
+		total_branch_count += branch_count
+		total_time += time
+		# Top level does not have time or branch count, so fix that here
+		if total_branch_count > self.branch_count:
+			self.branch_count = total_branch_count
+			if self.branch_count:
+				for child_item in self.child_items:
+					child_item.data[6] = self.branchPercent(child_item.branch_count)
+		if total_time > self.time:
+			self.time = total_time
+			if self.time:
+				for child_item in self.child_items:
+					child_item.data[4] = self.timePercent(child_item.time)
+
+	def childCount(self):
+		if not self.query_done:
+			self.selectCalls()
+		return self.child_count
+
+	def columnCount(self):
+		return 7
+
+	def columnHeader(self, column):
+		headers = ["Call Path", "Object", "Count ", "Time (ns) ", "Time (%) ", "Branch Count ", "Branch Count (%) "]
+		return headers[column]
+
+	def getData(self, column):
+		return self.data[column]
+
+class TreeModel(QAbstractItemModel):
+
+	def __init__(self, db, parent=None):
+		super(TreeModel, self).__init__(parent)
+		self.db = db
+		self.root = TreeItem(db, 0, None)
+
+	def columnCount(self, parent):
+		return self.root.columnCount()
+
+	def rowCount(self, parent):
+		if parent.isValid():
+			parent_item = parent.internalPointer()
+		else:
+			parent_item = self.root
+		return parent_item.childCount()
+
+	def headerData(self, section, orientation, role):
+		if role == Qt.TextAlignmentRole:
+			if section > 1:
+				return Qt.AlignRight
+		if role != Qt.DisplayRole:
+			return None
+		if orientation != Qt.Horizontal:
+			return None
+		return self.root.columnHeader(section)
+
+	def parent(self, child):
+		child_item = child.internalPointer()
+		if child_item is self.root:
+			return QModelIndex()
+		parent_item = child_item.getParentItem()
+		return self.createIndex(parent_item.getRow(), 0, parent_item)
+
+	def index(self, row, column, parent):
+		if parent.isValid():
+			parent_item = parent.internalPointer()
+		else:
+			parent_item = self.root
+		child_item = parent_item.getChildItem(row)
+		return self.createIndex(row, column, child_item)
+
+	def data(self, index, role):
+		if role == Qt.TextAlignmentRole:
+			if index.column() > 1:
+				return Qt.AlignRight
+		if role != Qt.DisplayRole:
+			return None
+		index_item = index.internalPointer()
+		return index_item.getData(index.column())
+
+class MainWindow(QMainWindow):
+
+	def __init__(self, db, dbname, parent=None):
+		super(MainWindow, self).__init__(parent)
+
+		self.setObjectName("MainWindow")
+		self.setWindowTitle("Call Graph: " + dbname)
+		self.move(100, 100)
+		self.resize(800, 600)
+		style = self.style()
+		icon = style.standardIcon(QStyle.SP_MessageBoxInformation)
+		self.setWindowIcon(icon);
+
+		self.model = TreeModel(db)
+
+		self.view = QTreeView()
+		self.view.setModel(self.model)
+
+		self.setCentralWidget(self.view)
+
+if __name__ == '__main__':
+	if (len(sys.argv) < 2):
+		print >> sys.stderr, "Usage is: call-graph-from-postgresql.py <database name>"
+		raise Exception("Too few arguments")
+
+	dbname = sys.argv[1]
+
+	db = QSqlDatabase.addDatabase('QPSQL')
+
+	opts = dbname.split()
+	for opt in opts:
+		if '=' in opt:
+			opt = opt.split('=')
+			if opt[0] == 'hostname':
+				db.setHostName(opt[1])
+			elif opt[0] == 'port':
+				db.setPort(int(opt[1]))
+			elif opt[0] == 'username':
+				db.setUserName(opt[1])
+			elif opt[0] == 'password':
+				db.setPassword(opt[1])
+			elif opt[0] == 'dbname':
+				dbname = opt[1]
+		else:
+			dbname = opt
+
+	db.setDatabaseName(dbname)
+	if not db.open():
+		raise Exception("Failed to open database " + dbname + " error: " + db.lastError().text())
+
+	app = QApplication(sys.argv)
+	window = MainWindow(db, dbname)
+	window.show()
+	err = app.exec_()
+	db.close()
+	sys.exit(err)
diff --git a/tools/perf/scripts/python/export-to-postgresql.py b/tools/perf/scripts/python/export-to-postgresql.py
index 4cdafd8..5e939ea 100644
--- a/tools/perf/scripts/python/export-to-postgresql.py
+++ b/tools/perf/scripts/python/export-to-postgresql.py
@@ -15,6 +15,53 @@ import sys
 import struct
 import datetime
 
+# To use this script you will need to have installed package python-pyside which
+# provides LGPL-licensed Python bindings for Qt.  You will also need the package
+# libqt4-sql-psql for Qt postgresql support.
+#
+# The script assumes postgresql is running on the local machine and that the
+# user has postgresql permissions to create databases. Examples of installing
+# postgresql and adding such a user are:
+#
+# fedora:
+#
+#	$ sudo yum install postgresql postgresql-server
+#	$ sudo su - postgres -c initdb
+#	$ sudo service postgresql start
+#	$ sudo su - postgres
+#	$ createuser <your user id here>
+#	Shall the new role be a superuser? (y/n) y
+#
+# ubuntu:
+#
+#	$ sudo apt-get install postgresql
+#	$ sudo su - postgres
+#	$ createuser <your user id here>
+#	Shall the new role be a superuser? (y/n) y
+#
+# An example of using this script with Intel PT:
+#
+#	$ perf record -e intel_pt//u ls
+#	$ perf script -s ~/libexec/perf-core/scripts/python/export-to-postgresql.py pt_example branches calls
+#	2015-05-29 12:49:23.464364 Creating database...
+#	2015-05-29 12:49:26.281717 Writing to intermediate files...
+#	2015-05-29 12:49:27.190383 Copying to database...
+#	2015-05-29 12:49:28.140451 Removing intermediate files...
+#	2015-05-29 12:49:28.147451 Adding primary keys
+#	2015-05-29 12:49:28.655683 Adding foreign keys
+#	2015-05-29 12:49:29.365350 Done
+#
+# To browse the database, psql can be used e.g.
+#
+#	$ psql pt_example
+#	pt_example=# select * from samples_view where id < 100;
+#	pt_example=# \d+
+#	pt_example=# \d+ samples_view
+#	pt_example=# \q
+#
+# An example of using the database is provided by the script
+# call-graph-from-postgresql.py.  Refer to that script for details.
+
 from PySide.QtSql import *
 
 # Need to access PostgreSQL C library directly to use COPY FROM STDIN
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* [tip:perf/core] perf db-export: Fix thread ref-counting
  2015-05-29 13:33 ` [PATCH V6 01/17] perf db-export: Fix thread ref-counting Adrian Hunter
@ 2015-05-29 18:35   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 47+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-05-29 18:35 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: jolsa, tglx, linux-kernel, adrian.hunter, acme, hpa, mingo

Commit-ID:  427cde3287f2c6349f308d0e22c9223f9ea05ef1
Gitweb:     http://git.kernel.org/tip/427cde3287f2c6349f308d0e22c9223f9ea05ef1
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 29 May 2015 16:33:29 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 29 May 2015 12:43:39 -0300

perf db-export: Fix thread ref-counting

Thread ref-counting was not done for get_main_thread() meaning that
there was a thread__get() from machine__find_thread() that was not being
paired with thread__put(). Fix that.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1432906425-9911-2-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/db-export.c | 19 ++++++++++---------
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/tools/perf/util/db-export.c b/tools/perf/util/db-export.c
index eb7a2ac..1c9689e 100644
--- a/tools/perf/util/db-export.c
+++ b/tools/perf/util/db-export.c
@@ -234,7 +234,7 @@ int db_export__symbol(struct db_export *dbe, struct symbol *sym,
 static struct thread *get_main_thread(struct machine *machine, struct thread *thread)
 {
 	if (thread->pid_ == thread->tid)
-		return thread;
+		return thread__get(thread);
 
 	if (thread->pid_ == -1)
 		return NULL;
@@ -308,19 +308,18 @@ int db_export__sample(struct db_export *dbe, union perf_event *event,
 	if (err)
 		return err;
 
-	/* FIXME: check refcounting for get_main_thread, that calls machine__find_thread... */
 	main_thread = get_main_thread(al->machine, thread);
 	if (main_thread)
 		comm = machine__thread_exec_comm(al->machine, main_thread);
 
 	err = db_export__thread(dbe, thread, al->machine, comm);
 	if (err)
-		return err;
+		goto out_put;
 
 	if (comm) {
 		err = db_export__comm(dbe, comm, main_thread);
 		if (err)
-			return err;
+			goto out_put;
 		es.comm_db_id = comm->db_id;
 	}
 
@@ -328,7 +327,7 @@ int db_export__sample(struct db_export *dbe, union perf_event *event,
 
 	err = db_ids_from_al(dbe, al, &es.dso_db_id, &es.sym_db_id, &es.offset);
 	if (err)
-		return err;
+		goto out_put;
 
 	if ((evsel->attr.sample_type & PERF_SAMPLE_ADDR) &&
 	    sample_addr_correlates_sym(&evsel->attr)) {
@@ -338,20 +337,22 @@ int db_export__sample(struct db_export *dbe, union perf_event *event,
 		err = db_ids_from_al(dbe, &addr_al, &es.addr_dso_db_id,
 				     &es.addr_sym_db_id, &es.addr_offset);
 		if (err)
-			return err;
+			goto out_put;
 		if (dbe->crp) {
 			err = thread_stack__process(thread, comm, sample, al,
 						    &addr_al, es.db_id,
 						    dbe->crp);
 			if (err)
-				return err;
+				goto out_put;
 		}
 	}
 
 	if (dbe->export_sample)
-		return dbe->export_sample(dbe, &es);
+		err = dbe->export_sample(dbe, &es);
 
-	return 0;
+out_put:
+	thread__put(main_thread);
+	return err;
 }
 
 static struct {

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 02/17] perf tools: Ensure thread-stack is flushed
  2015-05-29 13:33 ` [PATCH V6 02/17] perf tools: Ensure thread-stack is flushed Adrian Hunter
@ 2015-06-18 21:56   ` Arnaldo Carvalho de Melo
  2015-06-19  5:50     ` Adrian Hunter
  2015-06-19 23:15   ` [tip:perf/core] " tip-bot for Adrian Hunter
  1 sibling, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-18 21:56 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, May 29, 2015 at 04:33:30PM +0300, Adrian Hunter escreveu:
> The thread-stack represents a thread's current stack.  When
> a thread exits there can still be many functions on the stack
> e.g. exit() can be called many levels deep, so all the callers
> will never return.  To get that information output, the
> thread-stack must be flushed.
> 
> Previously it was assumed the thread-stack would be flushed
> when the struct thread was deleted.  With thread ref-counting
> it is no longer clear when that will be, if ever. So instead

It'll be when the last reference to that thread is released.

- Arnaldo

> explicitly flush all the thread-stacks at the end of a session.

If after the session ends you have no more need for those thread stacks,
that is the right way to do it.

With tools like 'report', after the session ends we should have all the
unreferenced threads deleted.

Previously they were not being deleted at all, i.e. they were simply
moved to the dead_threads list and sat there because I didn't knew if
some hist_entry, say, had a pointer to it.

So, unless I am missing something, this patch is required irrespective
of thread refcounting, no?

I'm applying it to my work branch where I'm trying to test all this.

- Arnaldo
 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/machine.c      | 21 +++++++++++++++++++++
>  tools/perf/util/machine.h      |  3 +++
>  tools/perf/util/session.c      | 20 ++++++++++++++++++++
>  tools/perf/util/thread-stack.c | 18 +++++++++++++-----
>  tools/perf/util/thread-stack.h |  1 +
>  5 files changed, 58 insertions(+), 5 deletions(-)
900380
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 0c0e61c..c0c29b9 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1845,6 +1845,27 @@ int machine__for_each_thread(struct machine *machine,
>  	return rc;
>  }
>  
> +int machines__for_each_thread(struct machines *machines,
> +			      int (*fn)(struct thread *thread, void *p),
> +			      void *priv)
> +{
> +	struct rb_node *nd;
> +	int rc = 0;
> +
> +	rc = machine__for_each_thread(&machines->host, fn, priv);
> +	if (rc != 0)
> +		return rc;
> +
> +	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
> +		struct machine *machine = rb_entry(nd, struct machine, rb_node);
> +
> +		rc = machine__for_each_thread(machine, fn, priv);
> +		if (rc != 0)
> +			return rc;
> +	}
> +	return rc;
> +}
> +
>  int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
>  				  struct target *target, struct thread_map *threads,
>  				  perf_event__handler_t process, bool data_mmap)
> diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
> index c7963c6..6b4a6fb 100644
> --- a/tools/perf/util/machine.h
> +++ b/tools/perf/util/machine.h
> @@ -213,6 +213,9 @@ size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp);
>  int machine__for_each_thread(struct machine *machine,
>  			     int (*fn)(struct thread *thread, void *p),
>  			     void *priv);
> +int machines__for_each_thread(struct machines *machines,
> +			      int (*fn)(struct thread *thread, void *p),
> +			      void *priv);
>  
>  int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
>  				  struct target *target, struct thread_map *threads,
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 39fe09d..b44bb2a 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -16,6 +16,7 @@
>  #include "perf_regs.h"
>  #include "asm/bug.h"
>  #include "auxtrace.h"
> +#include "thread-stack.h"
>  
>  static int perf_session__deliver_event(struct perf_session *session,
>  				       union perf_event *event,
> @@ -1320,6 +1321,19 @@ static void perf_session__warn_about_errors(const struct perf_session *session)
>  	events_stats__auxtrace_error_warn(stats);
>  }
>  
> +static int perf_session__flush_thread_stack(struct thread *thread,
> +					    void *p __maybe_unused)
> +{
> +	return thread_stack__flush(thread);
> +}
> +
> +static int perf_session__flush_thread_stacks(struct perf_session *session)
> +{
> +	return machines__for_each_thread(&session->machines,
> +					 perf_session__flush_thread_stack,
> +					 NULL);
> +}
> +
>  volatile int session_done;
>  
>  static int __perf_session__process_pipe_events(struct perf_session *session)
> @@ -1409,6 +1423,9 @@ done:
>  	if (err)
>  		goto out_err;
>  	err = auxtrace__flush_events(session, tool);
> +	if (err)
> +		goto out_err;
> +	err = perf_session__flush_thread_stacks(session);
>  out_err:
>  	free(buf);
>  	perf_session__warn_about_errors(session);
> @@ -1559,6 +1576,9 @@ out:
>  	if (err)
>  		goto out_err;
>  	err = auxtrace__flush_events(session, tool);
> +	if (err)
> +		goto out_err;
> +	err = perf_session__flush_thread_stacks(session);
>  out_err:
>  	ui_progress__finish();
>  	perf_session__warn_about_errors(session);
> diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
> index 9ed59a4..679688e 100644
> --- a/tools/perf/util/thread-stack.c
> +++ b/tools/perf/util/thread-stack.c
> @@ -219,7 +219,7 @@ static int thread_stack__call_return(struct thread *thread,
>  	return crp->process(&cr, crp->data);
>  }
>  
> -static int thread_stack__flush(struct thread *thread, struct thread_stack *ts)
> +static int __thread_stack__flush(struct thread *thread, struct thread_stack *ts)
>  {
>  	struct call_return_processor *crp = ts->crp;
>  	int err;
> @@ -242,6 +242,14 @@ static int thread_stack__flush(struct thread *thread, struct thread_stack *ts)
>  	return 0;
>  }
>  
> +int thread_stack__flush(struct thread *thread)
> +{
> +	if (thread->ts)
> +		return __thread_stack__flush(thread, thread->ts);
> +
> +	return 0;
> +}
> +
>  int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
>  			u64 to_ip, u16 insn_len, u64 trace_nr)
>  {
> @@ -264,7 +272,7 @@ int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
>  	 */
>  	if (trace_nr != thread->ts->trace_nr) {
>  		if (thread->ts->trace_nr)
> -			thread_stack__flush(thread, thread->ts);
> +			__thread_stack__flush(thread, thread->ts);
>  		thread->ts->trace_nr = trace_nr;
>  	}
>  
> @@ -297,7 +305,7 @@ void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr)
>  
>  	if (trace_nr != thread->ts->trace_nr) {
>  		if (thread->ts->trace_nr)
> -			thread_stack__flush(thread, thread->ts);
> +			__thread_stack__flush(thread, thread->ts);
>  		thread->ts->trace_nr = trace_nr;
>  	}
>  }
> @@ -305,7 +313,7 @@ void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr)
>  void thread_stack__free(struct thread *thread)
>  {
>  	if (thread->ts) {
> -		thread_stack__flush(thread, thread->ts);
> +		__thread_stack__flush(thread, thread->ts);
>  		zfree(&thread->ts->stack);
>  		zfree(&thread->ts);
>  	}
> @@ -689,7 +697,7 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
>  
>  	/* Flush stack on exec */
>  	if (ts->comm != comm && thread->pid_ == thread->tid) {
> -		err = thread_stack__flush(thread, ts);
> +		err = __thread_stack__flush(thread, ts);
>  		if (err)
>  			return err;
>  		ts->comm = comm;
> diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
> index b843bbe..e1528f1 100644
> --- a/tools/perf/util/thread-stack.h
> +++ b/tools/perf/util/thread-stack.h
> @@ -96,6 +96,7 @@ int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
>  void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr);
>  void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
>  			  size_t sz, u64 ip);
> +int thread_stack__flush(struct thread *thread);
>  void thread_stack__free(struct thread *thread);
>  
>  struct call_return_processor *
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 05/17] perf tools: Add Intel PT instruction decoder
  2015-05-29 13:33 ` [PATCH V6 05/17] perf tools: Add Intel PT instruction decoder Adrian Hunter
@ 2015-06-18 22:29   ` Arnaldo Carvalho de Melo
  2015-06-19 15:44     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-18 22:29 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, May 29, 2015 at 04:33:33PM +0300, Adrian Hunter escreveu:
> Add support for decoding instructions for Intel Processor Trace.  The
> kernel x86 instruction decoder is used for this.

Ok, but we don't access kernel header files directly, and:

[acme@zoo linux]$ find . -name "insn.h"
./arch/x86/include/asm/insn.h
./arch/arm64/include/asm/insn.h
./arch/arm/include/asm/insn.h
[acme@zoo linux]$ find /usr/include -name "insn.h"
[acme@zoo linux]$ 

But I need to look more into this patch to figure out if this is
something generated at build time, etc, but before that I found a
problem:

So:

> +inat_tables_script = ../../arch/x86/tools/gen-insn-attr-x86.awk
> +inat_tables_maps = ../../arch/x86/lib/x86-opcode-map.txt

These need to go into tools/perf/MANIFEST, so that:

[acme@zoo linux]$ make help | grep perf
  perf-tar-src-pkg    - Build perf-4.1.0-rc5.tar source tarball
  perf-targz-src-pkg  - Build perf-4.1.0-rc5.tar.gz source tarball
  perf-tarbz2-src-pkg - Build perf-4.1.0-rc5.tar.bz2 source tarball
  perf-tarxz-src-pkg  - Build perf-4.1.0-rc5.tar.xz source tarball
[acme@zoo linux]$ 

Continue to work, in fact, there is a test for that, that will run
when you do the build tests:

make -C tools/perf build-test

It is one of the last to be tested, so you may want to do it directly:

[acme@zoo linux]$ make -C tools/perf -f tests/make tarpkg
make: Entering directory '/home/git/linux/tools/perf'
- tarpkg: ./tests/perf-targz-src-pkg .
make: Leaving directory '/home/git/linux/tools/perf'
[acme@zoo linux]$

After I apply this patch, I get:

Applying: perf tools: Add Intel PT instruction decoder
[tmp.perf/pt 1ab14c4be64b] perf tools: Add Intel PT instruction decoder
 Author: Adrian Hunter <adrian.hunter@intel.com>
 Date: Fri May 29 16:33:33 2015 +0300
 6 files changed, 339 insertions(+), 3 deletions(-)
 rewrite tools/perf/util/intel-pt-decoder/Build (100%)
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
[acme@zoo linux]$ fg
bash: fg: current: no such job
[acme@zoo linux]$ make -C tools/perf -f tests/make tarpkg
make: Entering directory '/home/git/linux/tools/perf'
- tarpkg: ./tests/perf-targz-src-pkg .
tests/make:224: recipe for target 'tarpkg' failed
make: *** [tarpkg] Error 2
make: Leaving directory '/home/git/linux/tools/perf'
[acme@zoo linux]$ 

Doing it manually to see what happened:

[acme@zoo linux]$ make perf-tar-src-pkg
  TAR
[acme@zoo linux]$ ls -la perf-4.1.0-rc5.tar 
-rw-rw-r--. 1 acme acme 5027840 Jun 18 19:24 perf-4.1.0-rc5.tar
[acme@zoo linux]$ mv perf-4.1.0-rc5.tar /tmp
[acme@zoo linux]$ cd /tmp
[acme@zoo tmp]$ tar xf perf-4.1.0-rc5.tar 
[acme@zoo tmp]$ cd perf-4.1.0-rc5/
[acme@zoo perf-4.1.0-rc5]$ make -C tools/perf
make: Entering directory '/tmp/perf-4.1.0-rc5/tools/perf'
  BUILD:   Doing 'make -j4' parallel build

Auto-detecting system features:
...                         dwarf: [ on  ]
...                         glibc: [ on  ]
...                          gtk2: [ on  ]
...                      libaudit: [ on  ]
...                        libbfd: [ on  ]
...                        libelf: [ on  ]
...                       libnuma: [ on  ]
...                       libperl: [ on  ]
...                     libpython: [ on  ]
...                      libslang: [ on  ]
...                     libunwind: [ on  ]
...            libdw-dwarf-unwind: [ on  ]
...                          zlib: [ on  ]
...                          lzma: [ on  ]

  CC       util/abspath.o
  CC       fd/array.o
  PERF_VERSION = 4.1.rc5.g1ab14c
  CC       fs/fs.o
  CC       event-parse.o
  LD       fd/libapi-in.o
  CC       event-plugin.o
  CC       fs/debugfs.o
  CC       util/alias.o
  CC       trace-seq.o
<SNIP>
  CC       util/cloexec.o
  CC       util/thread-stack.o
  CC       builtin-kmem.o
  CC       builtin-lock.o
  CC       util/auxtrace.o
  CC       util/intel-pt-decoder/intel-pt-pkt-decoder.o
make[4]: *** No rule to make target '../../arch/x86/tools/gen-insn-attr-x86.awk', needed by 'util/intel-pt-decoder/inat-tables.c'.  Stop.
make[4]: *** Waiting for unfinished jobs....
  GEN      util/intel-pt-decoder/inat.c
cp: cannot stat ‘../../arch/x86/lib/inat.c’: No such file or directory
util/intel-pt-decoder/Build:10: recipe for target 'util/intel-pt-decoder/inat.c' failed
make[4]: *** [util/intel-pt-decoder/inat.c] Error 1
/tmp/perf-4.1.0-rc5/tools/build/Makefile.build:109: recipe for target 'intel-pt-decoder' failed
make[3]: *** [intel-pt-decoder] Error 2
make[3]: *** Waiting for unfinished jobs....
  CC       builtin-kvm.o
  CC       builtin-inject.o
/tmp/perf-4.1.0-rc5/tools/build/Makefile.build:109: recipe for target 'util' failed
make[2]: *** [util] Error 2
Makefile.perf:380: recipe for target 'libperf-in.o' failed
make[1]: *** [libperf-in.o] Error 2
make[1]: *** Waiting for unfinished jobs....
  CC       builtin-mem.o
  CC       builtin-data.o
  CC       builtin-trace.o
<SNIP>
  LD       tests/perf-in.o
  LD       perf-in.o
Makefile:68: recipe for target 'all' failed
make: *** [all] Error 2
make: Leaving directory '/tmp/perf-4.1.0-rc5/tools/perf'
[acme@zoo perf-4.1.0-rc5]$

The patch fixed up wrt some recent changes to .gitignore and the
makefiles is in my git tree at git.kernel.org, branch tmp.perf/pt.

Calling it a day, will continue on this tomorrow.

- Arnaldo

> This essentially provides intel_pt_get_insn() which takes a binary
> buffer, uses the kernel's x86 instruction decoder to get details
> of the instruction and then categorizes it for consumption by
> an Intel PT decoder.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/build/Makefile.build                         |   2 +
>  tools/perf/.gitignore                              |   2 +
>  tools/perf/Makefile.perf                           |  12 +-
>  tools/perf/util/intel-pt-decoder/Build             |  15 +-
>  .../util/intel-pt-decoder/intel-pt-insn-decoder.c  | 246 +++++++++++++++++++++
>  .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |  65 ++++++
>  6 files changed, 339 insertions(+), 3 deletions(-)
>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
> 
> diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
> index 10df572..7ad74e4 100644
> --- a/tools/build/Makefile.build
> +++ b/tools/build/Makefile.build
> @@ -57,6 +57,8 @@ quiet_cmd_cc_i_c = CPP      $@
>  quiet_cmd_cc_s_c = AS       $@
>        cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $<
>  
> +quiet_cmd_gen = GEN      $@
> +
>  # Link agregate command
>  # If there's nothing to link, create empty $@ object.
>  quiet_cmd_ld_multi = LD       $@
> diff --git a/tools/perf/.gitignore b/tools/perf/.gitignore
> index 812f904..c88d5c5 100644
> --- a/tools/perf/.gitignore
> +++ b/tools/perf/.gitignore
> @@ -28,3 +28,5 @@ config.mak.autogen
>  *-flex.*
>  *.pyc
>  *.pyo
> +util/intel-pt-decoder/inat-tables.c
> +util/intel-pt-decoder/inat.c
> diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> index 5816a3b..3ae3a8e 100644
> --- a/tools/perf/Makefile.perf
> +++ b/tools/perf/Makefile.perf
> @@ -76,6 +76,12 @@ include config/utilities.mak
>  #
>  # Define NO_AUXTRACE if you do not want AUX area tracing support
>  
> +# As per kernel Makefile, avoid funny character set dependencies
> +unexport LC_ALL
> +LC_COLLATE=C
> +LC_NUMERIC=C
> +export LC_COLLATE LC_NUMERIC
> +
>  ifeq ($(srctree),)
>  srctree := $(patsubst %/,%,$(dir $(shell pwd)))
>  srctree := $(patsubst %/,%,$(dir $(srctree)))
> @@ -122,6 +128,7 @@ INSTALL = install
>  FLEX    = flex
>  BISON   = bison
>  STRIP   = strip
> +AWK     = awk
>  
>  LIB_DIR          = $(srctree)/tools/lib/api/
>  TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
> @@ -272,7 +279,7 @@ strip: $(PROGRAMS) $(OUTPUT)perf
>  
>  PERF_IN := $(OUTPUT)perf-in.o
>  
> -export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX
> +export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX AWK
>  build := -f $(srctree)/tools/build/Makefile.build dir=. obj
>  
>  $(PERF_IN): $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h FORCE
> @@ -536,7 +543,8 @@ clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
>  	$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
>  	$(Q)$(RM) .config-detected
>  	$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf perf-read-vdso32 perf-read-vdsox32
> -	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex*
> +	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex* \
> +		$(OUTPUT)util/intel-pt-decoder/inat.c $(OUTPUT)util/intel-pt-decoder/inat-tables.c
>  	$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean
>  	$(python-clean)
>  
> diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
> index 9d67381..f5f7f87 100644
> --- a/tools/perf/util/intel-pt-decoder/Build
> +++ b/tools/perf/util/intel-pt-decoder/Build
> @@ -1 +1,14 @@
> -libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o
> +libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o
> +
> +inat_tables_script = ../../arch/x86/tools/gen-insn-attr-x86.awk
> +inat_tables_maps = ../../arch/x86/lib/x86-opcode-map.txt
> +
> +$(OUTPUT)util/intel-pt-decoder/inat-tables.c: $(inat_tables_script) $(inat_tables_maps)
> +	@$(call echo-cmd,gen)$(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
> +
> +$(OUTPUT)util/intel-pt-decoder/inat.c:
> +	@$(call echo-cmd,gen)cp ../../arch/x86/lib/inat.c $(OUTPUT)util/intel-pt-decoder/inat.c
> +
> +$(OUTPUT)util/intel-pt-decoder/intel-pt-insn-decoder.o: $(OUTPUT)util/intel-pt-decoder/inat.c $(OUTPUT)util/intel-pt-decoder/inat-tables.c
> +
> +CFLAGS_intel-pt-insn-decoder.o += -I../../arch/x86/include -I$(OUTPUT)util/intel-pt-decoder -I../../arch/x86/lib -Wno-override-init
> diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
> new file mode 100644
> index 0000000..2fa82c5
> --- /dev/null
> +++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
> @@ -0,0 +1,246 @@
> +/*
> + * intel_pt_insn_decoder.c: Intel Processor Trace support
> + * Copyright (c) 2013-2014, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + */
> +
> +#include <stdio.h>
> +#include <string.h>
> +#include <endian.h>
> +#include <byteswap.h>
> +
> +#include "event.h"
> +
> +#include <asm/insn.h>
> +
> +#include "inat.c"
> +#include <insn.c>
> +
> +#include "intel-pt-insn-decoder.h"
> +
> +/* Based on branch_type() from perf_event_intel_lbr.c */
> +static void intel_pt_insn_decoder(struct insn *insn,
> +				  struct intel_pt_insn *intel_pt_insn)
> +{
> +	enum intel_pt_insn_op op = INTEL_PT_OP_OTHER;
> +	enum intel_pt_insn_branch branch = INTEL_PT_BR_NO_BRANCH;
> +	int ext;
> +
> +	if (insn_is_avx(insn)) {
> +		intel_pt_insn->op = INTEL_PT_OP_OTHER;
> +		intel_pt_insn->branch = INTEL_PT_BR_NO_BRANCH;
> +		intel_pt_insn->length = insn->length;
> +		return;
> +	}
> +
> +	switch (insn->opcode.bytes[0]) {
> +	case 0xf:
> +		switch (insn->opcode.bytes[1]) {
> +		case 0x05: /* syscall */
> +		case 0x34: /* sysenter */
> +			op = INTEL_PT_OP_SYSCALL;
> +			branch = INTEL_PT_BR_INDIRECT;
> +			break;
> +		case 0x07: /* sysret */
> +		case 0x35: /* sysexit */
> +			op = INTEL_PT_OP_SYSRET;
> +			branch = INTEL_PT_BR_INDIRECT;
> +			break;
> +		case 0x80 ... 0x8f: /* jcc */
> +			op = INTEL_PT_OP_JCC;
> +			branch = INTEL_PT_BR_CONDITIONAL;
> +			break;
> +		default:
> +			break;
> +		}
> +		break;
> +	case 0x70 ... 0x7f: /* jcc */
> +		op = INTEL_PT_OP_JCC;
> +		branch = INTEL_PT_BR_CONDITIONAL;
> +		break;
> +	case 0xc2: /* near ret */
> +	case 0xc3: /* near ret */
> +	case 0xca: /* far ret */
> +	case 0xcb: /* far ret */
> +		op = INTEL_PT_OP_RET;
> +		branch = INTEL_PT_BR_INDIRECT;
> +		break;
> +	case 0xcf: /* iret */
> +		op = INTEL_PT_OP_IRET;
> +		branch = INTEL_PT_BR_INDIRECT;
> +		break;
> +	case 0xcc ... 0xce: /* int */
> +		op = INTEL_PT_OP_INT;
> +		branch = INTEL_PT_BR_INDIRECT;
> +		break;
> +	case 0xe8: /* call near rel */
> +		op = INTEL_PT_OP_CALL;
> +		branch = INTEL_PT_BR_UNCONDITIONAL;
> +		break;
> +	case 0x9a: /* call far absolute */
> +		op = INTEL_PT_OP_CALL;
> +		branch = INTEL_PT_BR_INDIRECT;
> +		break;
> +	case 0xe0 ... 0xe2: /* loop */
> +		op = INTEL_PT_OP_LOOP;
> +		branch = INTEL_PT_BR_CONDITIONAL;
> +		break;
> +	case 0xe3: /* jcc */
> +		op = INTEL_PT_OP_JCC;
> +		branch = INTEL_PT_BR_CONDITIONAL;
> +		break;
> +	case 0xe9: /* jmp */
> +	case 0xeb: /* jmp */
> +		op = INTEL_PT_OP_JMP;
> +		branch = INTEL_PT_BR_UNCONDITIONAL;
> +		break;
> +	case 0xea: /* far jmp */
> +		op = INTEL_PT_OP_JMP;
> +		branch = INTEL_PT_BR_INDIRECT;
> +		break;
> +	case 0xff: /* call near absolute, call far absolute ind */
> +		ext = (insn->modrm.bytes[0] >> 3) & 0x7;
> +		switch (ext) {
> +		case 2: /* near ind call */
> +		case 3: /* far ind call */
> +			op = INTEL_PT_OP_CALL;
> +			branch = INTEL_PT_BR_INDIRECT;
> +			break;
> +		case 4:
> +		case 5:
> +			op = INTEL_PT_OP_JMP;
> +			branch = INTEL_PT_BR_INDIRECT;
> +			break;
> +		default:
> +			break;
> +		}
> +		break;
> +	default:
> +		break;
> +	}
> +
> +	intel_pt_insn->op = op;
> +	intel_pt_insn->branch = branch;
> +	intel_pt_insn->length = insn->length;
> +
> +	if (branch == INTEL_PT_BR_CONDITIONAL ||
> +	    branch == INTEL_PT_BR_UNCONDITIONAL) {
> +#if __BYTE_ORDER == __BIG_ENDIAN
> +		switch (insn->immediate.nbytes) {
> +		case 1:
> +			intel_pt_insn->rel = insn->immediate.value;
> +			break;
> +		case 2:
> +			intel_pt_insn->rel =
> +					bswap_16((short)insn->immediate.value);
> +			break;
> +		case 4:
> +			intel_pt_insn->rel = bswap_32(insn->immediate.value);
> +			break;
> +		}
> +#else
> +		intel_pt_insn->rel = insn->immediate.value;
> +#endif
> +	}
> +}
> +
> +int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
> +		      struct intel_pt_insn *intel_pt_insn)
> +{
> +	struct insn insn;
> +
> +	insn_init(&insn, buf, len, x86_64);
> +	insn_get_length(&insn);
> +	if (!insn_complete(&insn) || insn.length > len)
> +		return -1;
> +	intel_pt_insn_decoder(&insn, intel_pt_insn);
> +	if (insn.length < INTEL_PT_INSN_DBG_BUF_SZ)
> +		memcpy(intel_pt_insn->buf, buf, insn.length);
> +	else
> +		memcpy(intel_pt_insn->buf, buf, INTEL_PT_INSN_DBG_BUF_SZ);
> +	return 0;
> +}
> +
> +const char *branch_name[] = {
> +	[INTEL_PT_OP_OTHER]	= "Other",
> +	[INTEL_PT_OP_CALL]	= "Call",
> +	[INTEL_PT_OP_RET]	= "Ret",
> +	[INTEL_PT_OP_JCC]	= "Jcc",
> +	[INTEL_PT_OP_JMP]	= "Jmp",
> +	[INTEL_PT_OP_LOOP]	= "Loop",
> +	[INTEL_PT_OP_IRET]	= "IRet",
> +	[INTEL_PT_OP_INT]	= "Int",
> +	[INTEL_PT_OP_SYSCALL]	= "Syscall",
> +	[INTEL_PT_OP_SYSRET]	= "Sysret",
> +};
> +
> +const char *intel_pt_insn_name(enum intel_pt_insn_op op)
> +{
> +	return branch_name[op];
> +}
> +
> +int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
> +		       size_t buf_len)
> +{
> +	switch (intel_pt_insn->branch) {
> +	case INTEL_PT_BR_CONDITIONAL:
> +	case INTEL_PT_BR_UNCONDITIONAL:
> +		return snprintf(buf, buf_len, "%s %s%d",
> +				intel_pt_insn_name(intel_pt_insn->op),
> +				intel_pt_insn->rel > 0 ? "+" : "",
> +				intel_pt_insn->rel);
> +	case INTEL_PT_BR_NO_BRANCH:
> +	case INTEL_PT_BR_INDIRECT:
> +		return snprintf(buf, buf_len, "%s",
> +				intel_pt_insn_name(intel_pt_insn->op));
> +	default:
> +		break;
> +	}
> +	return 0;
> +}
> +
> +size_t intel_pt_insn_max_size(void)
> +{
> +	return MAX_INSN_SIZE;
> +}
> +
> +int intel_pt_insn_type(enum intel_pt_insn_op op)
> +{
> +	switch (op) {
> +	case INTEL_PT_OP_OTHER:
> +		return 0;
> +	case INTEL_PT_OP_CALL:
> +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL;
> +	case INTEL_PT_OP_RET:
> +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN;
> +	case INTEL_PT_OP_JCC:
> +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
> +	case INTEL_PT_OP_JMP:
> +		return PERF_IP_FLAG_BRANCH;
> +	case INTEL_PT_OP_LOOP:
> +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
> +	case INTEL_PT_OP_IRET:
> +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
> +		       PERF_IP_FLAG_INTERRUPT;
> +	case INTEL_PT_OP_INT:
> +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
> +		       PERF_IP_FLAG_INTERRUPT;
> +	case INTEL_PT_OP_SYSCALL:
> +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
> +		       PERF_IP_FLAG_SYSCALLRET;
> +	case INTEL_PT_OP_SYSRET:
> +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
> +		       PERF_IP_FLAG_SYSCALLRET;
> +	default:
> +		return 0;
> +	}
> +}
> diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
> new file mode 100644
> index 0000000..b0adbf3
> --- /dev/null
> +++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
> @@ -0,0 +1,65 @@
> +/*
> + * intel_pt_insn_decoder.h: Intel Processor Trace support
> + * Copyright (c) 2013-2014, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + */
> +
> +#ifndef INCLUDE__INTEL_PT_INSN_DECODER_H__
> +#define INCLUDE__INTEL_PT_INSN_DECODER_H__
> +
> +#include <stddef.h>
> +#include <stdint.h>
> +
> +#define INTEL_PT_INSN_DESC_MAX		32
> +#define INTEL_PT_INSN_DBG_BUF_SZ	16
> +
> +enum intel_pt_insn_op {
> +	INTEL_PT_OP_OTHER,
> +	INTEL_PT_OP_CALL,
> +	INTEL_PT_OP_RET,
> +	INTEL_PT_OP_JCC,
> +	INTEL_PT_OP_JMP,
> +	INTEL_PT_OP_LOOP,
> +	INTEL_PT_OP_IRET,
> +	INTEL_PT_OP_INT,
> +	INTEL_PT_OP_SYSCALL,
> +	INTEL_PT_OP_SYSRET,
> +};
> +
> +enum intel_pt_insn_branch {
> +	INTEL_PT_BR_NO_BRANCH,
> +	INTEL_PT_BR_INDIRECT,
> +	INTEL_PT_BR_CONDITIONAL,
> +	INTEL_PT_BR_UNCONDITIONAL,
> +};
> +
> +struct intel_pt_insn {
> +	enum intel_pt_insn_op		op;
> +	enum intel_pt_insn_branch	branch;
> +	int				length;
> +	int32_t				rel;
> +	unsigned char			buf[INTEL_PT_INSN_DBG_BUF_SZ];
> +};
> +
> +int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
> +		      struct intel_pt_insn *intel_pt_insn);
> +
> +const char *intel_pt_insn_name(enum intel_pt_insn_op op);
> +
> +int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
> +		       size_t buf_len);
> +
> +size_t intel_pt_insn_max_size(void);
> +
> +int intel_pt_insn_type(enum intel_pt_insn_op op);
> +
> +#endif
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 02/17] perf tools: Ensure thread-stack is flushed
  2015-06-18 21:56   ` Arnaldo Carvalho de Melo
@ 2015-06-19  5:50     ` Adrian Hunter
  0 siblings, 0 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-06-19  5:50 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

On 19/06/2015 12:56 a.m., Arnaldo Carvalho de Melo wrote:
> Em Fri, May 29, 2015 at 04:33:30PM +0300, Adrian Hunter escreveu:
>> The thread-stack represents a thread's current stack.  When
>> a thread exits there can still be many functions on the stack
>> e.g. exit() can be called many levels deep, so all the callers
>> will never return.  To get that information output, the
>> thread-stack must be flushed.
>>
>> Previously it was assumed the thread-stack would be flushed
>> when the struct thread was deleted.  With thread ref-counting
>> it is no longer clear when that will be, if ever. So instead
>
> It'll be when the last reference to that thread is released.
>
> - Arnaldo
>
>> explicitly flush all the thread-stacks at the end of a session.
>
> If after the session ends you have no more need for those thread stacks,
> that is the right way to do it.
>
> With tools like 'report', after the session ends we should have all the
> unreferenced threads deleted.
>
> Previously they were not being deleted at all, i.e. they were simply
> moved to the dead_threads list and sat there because I didn't knew if
> some hist_entry, say, had a pointer to it.
>
> So, unless I am missing something, this patch is required irrespective
> of thread refcounting, no?

IIRC we used to delete all the dead threads too, but explicit flushing is better in any case.

>
> I'm applying it to my work branch where I'm trying to test all this.

Thank you!

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 05/17] perf tools: Add Intel PT instruction decoder
  2015-06-18 22:29   ` Arnaldo Carvalho de Melo
@ 2015-06-19 15:44     ` Arnaldo Carvalho de Melo
  2015-06-22 12:40       ` Adrian Hunter
  0 siblings, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-19 15:44 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Thu, Jun 18, 2015 at 07:29:41PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, May 29, 2015 at 04:33:33PM +0300, Adrian Hunter escreveu:
> > Add support for decoding instructions for Intel Processor Trace.  The
> > kernel x86 instruction decoder is used for this.
> 
> Ok, but we don't access kernel header files directly, and:
> 
> [acme@zoo linux]$ find . -name "insn.h"
> ./arch/x86/include/asm/insn.h
> ./arch/arm64/include/asm/insn.h
> ./arch/arm/include/asm/insn.h
> [acme@zoo linux]$ find /usr/include -name "insn.h"
> [acme@zoo linux]$ 
> 
> But I need to look more into this patch to figure out if this is
> something generated at build time, etc, but before that I found a
> problem:
> 
> So:
> 
> > +inat_tables_script = ../../arch/x86/tools/gen-insn-attr-x86.awk
> > +inat_tables_maps = ../../arch/x86/lib/x86-opcode-map.txt
> 
> These need to go into tools/perf/MANIFEST, so that:

So, after adding:

diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST
index fe50a1b34aa0..4e5662d8c274 100644
--- a/tools/perf/MANIFEST
+++ b/tools/perf/MANIFEST
@@ -58,6 +58,13 @@ include/linux/stringify.h
 lib/hweight.c
 lib/rbtree.c
 include/linux/swab.h
+arch/x86/lib/insn.c
+arch/x86/lib/inat.c
+arch/x86/include/asm/insn.h
+arch/x86/include/asm/inat.h
+arch/x86/include/asm/inat_types.h
+arch/x86/tools/gen-insn-attr-x86.awk
+arch/x86/lib/x86-opcode-map.txt
 arch/*/include/asm/unistd*.h
 arch/*/include/uapi/asm/unistd*.h
 arch/*/include/uapi/asm/perf_regs.h

The test passes:

[acme@zoo linux]$ make -C tools/perf -f tests/make tarpkg && echo Ok
make: Entering directory '/home/git/linux/tools/perf'
- tarpkg: ./tests/perf-targz-src-pkg .
make: Leaving directory '/home/git/linux/tools/perf'
Ok
[acme@zoo linux]$

Merging these changes with this changeset to continue testing...

Also force pushing it to the tmp.perf/pt branch.

- Arnaldo
 
> [acme@zoo linux]$ make help | grep perf
>   perf-tar-src-pkg    - Build perf-4.1.0-rc5.tar source tarball
>   perf-targz-src-pkg  - Build perf-4.1.0-rc5.tar.gz source tarball
>   perf-tarbz2-src-pkg - Build perf-4.1.0-rc5.tar.bz2 source tarball
>   perf-tarxz-src-pkg  - Build perf-4.1.0-rc5.tar.xz source tarball
> [acme@zoo linux]$ 
> 
> Continue to work, in fact, there is a test for that, that will run
> when you do the build tests:
> 
> make -C tools/perf build-test
> 
> It is one of the last to be tested, so you may want to do it directly:
> 
> [acme@zoo linux]$ make -C tools/perf -f tests/make tarpkg
> make: Entering directory '/home/git/linux/tools/perf'
> - tarpkg: ./tests/perf-targz-src-pkg .
> make: Leaving directory '/home/git/linux/tools/perf'
> [acme@zoo linux]$
> 
> After I apply this patch, I get:
> 
> Applying: perf tools: Add Intel PT instruction decoder
> [tmp.perf/pt 1ab14c4be64b] perf tools: Add Intel PT instruction decoder
>  Author: Adrian Hunter <adrian.hunter@intel.com>
>  Date: Fri May 29 16:33:33 2015 +0300
>  6 files changed, 339 insertions(+), 3 deletions(-)
>  rewrite tools/perf/util/intel-pt-decoder/Build (100%)
>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
>  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
> [acme@zoo linux]$ fg
> bash: fg: current: no such job
> [acme@zoo linux]$ make -C tools/perf -f tests/make tarpkg
> make: Entering directory '/home/git/linux/tools/perf'
> - tarpkg: ./tests/perf-targz-src-pkg .
> tests/make:224: recipe for target 'tarpkg' failed
> make: *** [tarpkg] Error 2
> make: Leaving directory '/home/git/linux/tools/perf'
> [acme@zoo linux]$ 
> 
> Doing it manually to see what happened:
> 
> [acme@zoo linux]$ make perf-tar-src-pkg
>   TAR
> [acme@zoo linux]$ ls -la perf-4.1.0-rc5.tar 
> -rw-rw-r--. 1 acme acme 5027840 Jun 18 19:24 perf-4.1.0-rc5.tar
> [acme@zoo linux]$ mv perf-4.1.0-rc5.tar /tmp
> [acme@zoo linux]$ cd /tmp
> [acme@zoo tmp]$ tar xf perf-4.1.0-rc5.tar 
> [acme@zoo tmp]$ cd perf-4.1.0-rc5/
> [acme@zoo perf-4.1.0-rc5]$ make -C tools/perf
> make: Entering directory '/tmp/perf-4.1.0-rc5/tools/perf'
>   BUILD:   Doing 'make -j4' parallel build
> 
> Auto-detecting system features:
> ...                         dwarf: [ on  ]
> ...                         glibc: [ on  ]
> ...                          gtk2: [ on  ]
> ...                      libaudit: [ on  ]
> ...                        libbfd: [ on  ]
> ...                        libelf: [ on  ]
> ...                       libnuma: [ on  ]
> ...                       libperl: [ on  ]
> ...                     libpython: [ on  ]
> ...                      libslang: [ on  ]
> ...                     libunwind: [ on  ]
> ...            libdw-dwarf-unwind: [ on  ]
> ...                          zlib: [ on  ]
> ...                          lzma: [ on  ]
> 
>   CC       util/abspath.o
>   CC       fd/array.o
>   PERF_VERSION = 4.1.rc5.g1ab14c
>   CC       fs/fs.o
>   CC       event-parse.o
>   LD       fd/libapi-in.o
>   CC       event-plugin.o
>   CC       fs/debugfs.o
>   CC       util/alias.o
>   CC       trace-seq.o
> <SNIP>
>   CC       util/cloexec.o
>   CC       util/thread-stack.o
>   CC       builtin-kmem.o
>   CC       builtin-lock.o
>   CC       util/auxtrace.o
>   CC       util/intel-pt-decoder/intel-pt-pkt-decoder.o
> make[4]: *** No rule to make target '../../arch/x86/tools/gen-insn-attr-x86.awk', needed by 'util/intel-pt-decoder/inat-tables.c'.  Stop.
> make[4]: *** Waiting for unfinished jobs....
>   GEN      util/intel-pt-decoder/inat.c
> cp: cannot stat ‘../../arch/x86/lib/inat.c’: No such file or directory
> util/intel-pt-decoder/Build:10: recipe for target 'util/intel-pt-decoder/inat.c' failed
> make[4]: *** [util/intel-pt-decoder/inat.c] Error 1
> /tmp/perf-4.1.0-rc5/tools/build/Makefile.build:109: recipe for target 'intel-pt-decoder' failed
> make[3]: *** [intel-pt-decoder] Error 2
> make[3]: *** Waiting for unfinished jobs....
>   CC       builtin-kvm.o
>   CC       builtin-inject.o
> /tmp/perf-4.1.0-rc5/tools/build/Makefile.build:109: recipe for target 'util' failed
> make[2]: *** [util] Error 2
> Makefile.perf:380: recipe for target 'libperf-in.o' failed
> make[1]: *** [libperf-in.o] Error 2
> make[1]: *** Waiting for unfinished jobs....
>   CC       builtin-mem.o
>   CC       builtin-data.o
>   CC       builtin-trace.o
> <SNIP>
>   LD       tests/perf-in.o
>   LD       perf-in.o
> Makefile:68: recipe for target 'all' failed
> make: *** [all] Error 2
> make: Leaving directory '/tmp/perf-4.1.0-rc5/tools/perf'
> [acme@zoo perf-4.1.0-rc5]$
> 
> The patch fixed up wrt some recent changes to .gitignore and the
> makefiles is in my git tree at git.kernel.org, branch tmp.perf/pt.
> 
> Calling it a day, will continue on this tomorrow.
> 
> - Arnaldo
> 
> > This essentially provides intel_pt_get_insn() which takes a binary
> > buffer, uses the kernel's x86 instruction decoder to get details
> > of the instruction and then categorizes it for consumption by
> > an Intel PT decoder.
> > 
> > Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> > ---
> >  tools/build/Makefile.build                         |   2 +
> >  tools/perf/.gitignore                              |   2 +
> >  tools/perf/Makefile.perf                           |  12 +-
> >  tools/perf/util/intel-pt-decoder/Build             |  15 +-
> >  .../util/intel-pt-decoder/intel-pt-insn-decoder.c  | 246 +++++++++++++++++++++
> >  .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |  65 ++++++
> >  6 files changed, 339 insertions(+), 3 deletions(-)
> >  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
> >  create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
> > 
> > diff --git a/tools/build/Makefile.build b/tools/build/Makefile.build
> > index 10df572..7ad74e4 100644
> > --- a/tools/build/Makefile.build
> > +++ b/tools/build/Makefile.build
> > @@ -57,6 +57,8 @@ quiet_cmd_cc_i_c = CPP      $@
> >  quiet_cmd_cc_s_c = AS       $@
> >        cmd_cc_s_c = $(CC) $(c_flags) -S -o $@ $<
> >  
> > +quiet_cmd_gen = GEN      $@
> > +
> >  # Link agregate command
> >  # If there's nothing to link, create empty $@ object.
> >  quiet_cmd_ld_multi = LD       $@
> > diff --git a/tools/perf/.gitignore b/tools/perf/.gitignore
> > index 812f904..c88d5c5 100644
> > --- a/tools/perf/.gitignore
> > +++ b/tools/perf/.gitignore
> > @@ -28,3 +28,5 @@ config.mak.autogen
> >  *-flex.*
> >  *.pyc
> >  *.pyo
> > +util/intel-pt-decoder/inat-tables.c
> > +util/intel-pt-decoder/inat.c
> > diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
> > index 5816a3b..3ae3a8e 100644
> > --- a/tools/perf/Makefile.perf
> > +++ b/tools/perf/Makefile.perf
> > @@ -76,6 +76,12 @@ include config/utilities.mak
> >  #
> >  # Define NO_AUXTRACE if you do not want AUX area tracing support
> >  
> > +# As per kernel Makefile, avoid funny character set dependencies
> > +unexport LC_ALL
> > +LC_COLLATE=C
> > +LC_NUMERIC=C
> > +export LC_COLLATE LC_NUMERIC
> > +
> >  ifeq ($(srctree),)
> >  srctree := $(patsubst %/,%,$(dir $(shell pwd)))
> >  srctree := $(patsubst %/,%,$(dir $(srctree)))
> > @@ -122,6 +128,7 @@ INSTALL = install
> >  FLEX    = flex
> >  BISON   = bison
> >  STRIP   = strip
> > +AWK     = awk
> >  
> >  LIB_DIR          = $(srctree)/tools/lib/api/
> >  TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
> > @@ -272,7 +279,7 @@ strip: $(PROGRAMS) $(OUTPUT)perf
> >  
> >  PERF_IN := $(OUTPUT)perf-in.o
> >  
> > -export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX
> > +export srctree OUTPUT RM CC LD AR CFLAGS V BISON FLEX AWK
> >  build := -f $(srctree)/tools/build/Makefile.build dir=. obj
> >  
> >  $(PERF_IN): $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)common-cmds.h FORCE
> > @@ -536,7 +543,8 @@ clean: $(LIBTRACEEVENT)-clean $(LIBAPI)-clean config-clean
> >  	$(Q)find . -name '*.o' -delete -o -name '\.*.cmd' -delete -o -name '\.*.d' -delete
> >  	$(Q)$(RM) .config-detected
> >  	$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf perf-read-vdso32 perf-read-vdsox32
> > -	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex*
> > +	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)FEATURE-DUMP $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex* \
> > +		$(OUTPUT)util/intel-pt-decoder/inat.c $(OUTPUT)util/intel-pt-decoder/inat-tables.c
> >  	$(QUIET_SUBDIR0)Documentation $(QUIET_SUBDIR1) clean
> >  	$(python-clean)
> >  
> > diff --git a/tools/perf/util/intel-pt-decoder/Build b/tools/perf/util/intel-pt-decoder/Build
> > index 9d67381..f5f7f87 100644
> > --- a/tools/perf/util/intel-pt-decoder/Build
> > +++ b/tools/perf/util/intel-pt-decoder/Build
> > @@ -1 +1,14 @@
> > -libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o
> > +libperf-$(CONFIG_AUXTRACE) += intel-pt-pkt-decoder.o intel-pt-insn-decoder.o
> > +
> > +inat_tables_script = ../../arch/x86/tools/gen-insn-attr-x86.awk
> > +inat_tables_maps = ../../arch/x86/lib/x86-opcode-map.txt
> > +
> > +$(OUTPUT)util/intel-pt-decoder/inat-tables.c: $(inat_tables_script) $(inat_tables_maps)
> > +	@$(call echo-cmd,gen)$(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
> > +
> > +$(OUTPUT)util/intel-pt-decoder/inat.c:
> > +	@$(call echo-cmd,gen)cp ../../arch/x86/lib/inat.c $(OUTPUT)util/intel-pt-decoder/inat.c
> > +
> > +$(OUTPUT)util/intel-pt-decoder/intel-pt-insn-decoder.o: $(OUTPUT)util/intel-pt-decoder/inat.c $(OUTPUT)util/intel-pt-decoder/inat-tables.c
> > +
> > +CFLAGS_intel-pt-insn-decoder.o += -I../../arch/x86/include -I$(OUTPUT)util/intel-pt-decoder -I../../arch/x86/lib -Wno-override-init
> > diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
> > new file mode 100644
> > index 0000000..2fa82c5
> > --- /dev/null
> > +++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
> > @@ -0,0 +1,246 @@
> > +/*
> > + * intel_pt_insn_decoder.c: Intel Processor Trace support
> > + * Copyright (c) 2013-2014, Intel Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms and conditions of the GNU General Public License,
> > + * version 2, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope it will be useful, but WITHOUT
> > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> > + * more details.
> > + *
> > + */
> > +
> > +#include <stdio.h>
> > +#include <string.h>
> > +#include <endian.h>
> > +#include <byteswap.h>
> > +
> > +#include "event.h"
> > +
> > +#include <asm/insn.h>
> > +
> > +#include "inat.c"
> > +#include <insn.c>
> > +
> > +#include "intel-pt-insn-decoder.h"
> > +
> > +/* Based on branch_type() from perf_event_intel_lbr.c */
> > +static void intel_pt_insn_decoder(struct insn *insn,
> > +				  struct intel_pt_insn *intel_pt_insn)
> > +{
> > +	enum intel_pt_insn_op op = INTEL_PT_OP_OTHER;
> > +	enum intel_pt_insn_branch branch = INTEL_PT_BR_NO_BRANCH;
> > +	int ext;
> > +
> > +	if (insn_is_avx(insn)) {
> > +		intel_pt_insn->op = INTEL_PT_OP_OTHER;
> > +		intel_pt_insn->branch = INTEL_PT_BR_NO_BRANCH;
> > +		intel_pt_insn->length = insn->length;
> > +		return;
> > +	}
> > +
> > +	switch (insn->opcode.bytes[0]) {
> > +	case 0xf:
> > +		switch (insn->opcode.bytes[1]) {
> > +		case 0x05: /* syscall */
> > +		case 0x34: /* sysenter */
> > +			op = INTEL_PT_OP_SYSCALL;
> > +			branch = INTEL_PT_BR_INDIRECT;
> > +			break;
> > +		case 0x07: /* sysret */
> > +		case 0x35: /* sysexit */
> > +			op = INTEL_PT_OP_SYSRET;
> > +			branch = INTEL_PT_BR_INDIRECT;
> > +			break;
> > +		case 0x80 ... 0x8f: /* jcc */
> > +			op = INTEL_PT_OP_JCC;
> > +			branch = INTEL_PT_BR_CONDITIONAL;
> > +			break;
> > +		default:
> > +			break;
> > +		}
> > +		break;
> > +	case 0x70 ... 0x7f: /* jcc */
> > +		op = INTEL_PT_OP_JCC;
> > +		branch = INTEL_PT_BR_CONDITIONAL;
> > +		break;
> > +	case 0xc2: /* near ret */
> > +	case 0xc3: /* near ret */
> > +	case 0xca: /* far ret */
> > +	case 0xcb: /* far ret */
> > +		op = INTEL_PT_OP_RET;
> > +		branch = INTEL_PT_BR_INDIRECT;
> > +		break;
> > +	case 0xcf: /* iret */
> > +		op = INTEL_PT_OP_IRET;
> > +		branch = INTEL_PT_BR_INDIRECT;
> > +		break;
> > +	case 0xcc ... 0xce: /* int */
> > +		op = INTEL_PT_OP_INT;
> > +		branch = INTEL_PT_BR_INDIRECT;
> > +		break;
> > +	case 0xe8: /* call near rel */
> > +		op = INTEL_PT_OP_CALL;
> > +		branch = INTEL_PT_BR_UNCONDITIONAL;
> > +		break;
> > +	case 0x9a: /* call far absolute */
> > +		op = INTEL_PT_OP_CALL;
> > +		branch = INTEL_PT_BR_INDIRECT;
> > +		break;
> > +	case 0xe0 ... 0xe2: /* loop */
> > +		op = INTEL_PT_OP_LOOP;
> > +		branch = INTEL_PT_BR_CONDITIONAL;
> > +		break;
> > +	case 0xe3: /* jcc */
> > +		op = INTEL_PT_OP_JCC;
> > +		branch = INTEL_PT_BR_CONDITIONAL;
> > +		break;
> > +	case 0xe9: /* jmp */
> > +	case 0xeb: /* jmp */
> > +		op = INTEL_PT_OP_JMP;
> > +		branch = INTEL_PT_BR_UNCONDITIONAL;
> > +		break;
> > +	case 0xea: /* far jmp */
> > +		op = INTEL_PT_OP_JMP;
> > +		branch = INTEL_PT_BR_INDIRECT;
> > +		break;
> > +	case 0xff: /* call near absolute, call far absolute ind */
> > +		ext = (insn->modrm.bytes[0] >> 3) & 0x7;
> > +		switch (ext) {
> > +		case 2: /* near ind call */
> > +		case 3: /* far ind call */
> > +			op = INTEL_PT_OP_CALL;
> > +			branch = INTEL_PT_BR_INDIRECT;
> > +			break;
> > +		case 4:
> > +		case 5:
> > +			op = INTEL_PT_OP_JMP;
> > +			branch = INTEL_PT_BR_INDIRECT;
> > +			break;
> > +		default:
> > +			break;
> > +		}
> > +		break;
> > +	default:
> > +		break;
> > +	}
> > +
> > +	intel_pt_insn->op = op;
> > +	intel_pt_insn->branch = branch;
> > +	intel_pt_insn->length = insn->length;
> > +
> > +	if (branch == INTEL_PT_BR_CONDITIONAL ||
> > +	    branch == INTEL_PT_BR_UNCONDITIONAL) {
> > +#if __BYTE_ORDER == __BIG_ENDIAN
> > +		switch (insn->immediate.nbytes) {
> > +		case 1:
> > +			intel_pt_insn->rel = insn->immediate.value;
> > +			break;
> > +		case 2:
> > +			intel_pt_insn->rel =
> > +					bswap_16((short)insn->immediate.value);
> > +			break;
> > +		case 4:
> > +			intel_pt_insn->rel = bswap_32(insn->immediate.value);
> > +			break;
> > +		}
> > +#else
> > +		intel_pt_insn->rel = insn->immediate.value;
> > +#endif
> > +	}
> > +}
> > +
> > +int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
> > +		      struct intel_pt_insn *intel_pt_insn)
> > +{
> > +	struct insn insn;
> > +
> > +	insn_init(&insn, buf, len, x86_64);
> > +	insn_get_length(&insn);
> > +	if (!insn_complete(&insn) || insn.length > len)
> > +		return -1;
> > +	intel_pt_insn_decoder(&insn, intel_pt_insn);
> > +	if (insn.length < INTEL_PT_INSN_DBG_BUF_SZ)
> > +		memcpy(intel_pt_insn->buf, buf, insn.length);
> > +	else
> > +		memcpy(intel_pt_insn->buf, buf, INTEL_PT_INSN_DBG_BUF_SZ);
> > +	return 0;
> > +}
> > +
> > +const char *branch_name[] = {
> > +	[INTEL_PT_OP_OTHER]	= "Other",
> > +	[INTEL_PT_OP_CALL]	= "Call",
> > +	[INTEL_PT_OP_RET]	= "Ret",
> > +	[INTEL_PT_OP_JCC]	= "Jcc",
> > +	[INTEL_PT_OP_JMP]	= "Jmp",
> > +	[INTEL_PT_OP_LOOP]	= "Loop",
> > +	[INTEL_PT_OP_IRET]	= "IRet",
> > +	[INTEL_PT_OP_INT]	= "Int",
> > +	[INTEL_PT_OP_SYSCALL]	= "Syscall",
> > +	[INTEL_PT_OP_SYSRET]	= "Sysret",
> > +};
> > +
> > +const char *intel_pt_insn_name(enum intel_pt_insn_op op)
> > +{
> > +	return branch_name[op];
> > +}
> > +
> > +int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
> > +		       size_t buf_len)
> > +{
> > +	switch (intel_pt_insn->branch) {
> > +	case INTEL_PT_BR_CONDITIONAL:
> > +	case INTEL_PT_BR_UNCONDITIONAL:
> > +		return snprintf(buf, buf_len, "%s %s%d",
> > +				intel_pt_insn_name(intel_pt_insn->op),
> > +				intel_pt_insn->rel > 0 ? "+" : "",
> > +				intel_pt_insn->rel);
> > +	case INTEL_PT_BR_NO_BRANCH:
> > +	case INTEL_PT_BR_INDIRECT:
> > +		return snprintf(buf, buf_len, "%s",
> > +				intel_pt_insn_name(intel_pt_insn->op));
> > +	default:
> > +		break;
> > +	}
> > +	return 0;
> > +}
> > +
> > +size_t intel_pt_insn_max_size(void)
> > +{
> > +	return MAX_INSN_SIZE;
> > +}
> > +
> > +int intel_pt_insn_type(enum intel_pt_insn_op op)
> > +{
> > +	switch (op) {
> > +	case INTEL_PT_OP_OTHER:
> > +		return 0;
> > +	case INTEL_PT_OP_CALL:
> > +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL;
> > +	case INTEL_PT_OP_RET:
> > +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN;
> > +	case INTEL_PT_OP_JCC:
> > +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
> > +	case INTEL_PT_OP_JMP:
> > +		return PERF_IP_FLAG_BRANCH;
> > +	case INTEL_PT_OP_LOOP:
> > +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CONDITIONAL;
> > +	case INTEL_PT_OP_IRET:
> > +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
> > +		       PERF_IP_FLAG_INTERRUPT;
> > +	case INTEL_PT_OP_INT:
> > +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
> > +		       PERF_IP_FLAG_INTERRUPT;
> > +	case INTEL_PT_OP_SYSCALL:
> > +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
> > +		       PERF_IP_FLAG_SYSCALLRET;
> > +	case INTEL_PT_OP_SYSRET:
> > +		return PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_RETURN |
> > +		       PERF_IP_FLAG_SYSCALLRET;
> > +	default:
> > +		return 0;
> > +	}
> > +}
> > diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
> > new file mode 100644
> > index 0000000..b0adbf3
> > --- /dev/null
> > +++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
> > @@ -0,0 +1,65 @@
> > +/*
> > + * intel_pt_insn_decoder.h: Intel Processor Trace support
> > + * Copyright (c) 2013-2014, Intel Corporation.
> > + *
> > + * This program is free software; you can redistribute it and/or modify it
> > + * under the terms and conditions of the GNU General Public License,
> > + * version 2, as published by the Free Software Foundation.
> > + *
> > + * This program is distributed in the hope it will be useful, but WITHOUT
> > + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> > + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> > + * more details.
> > + *
> > + */
> > +
> > +#ifndef INCLUDE__INTEL_PT_INSN_DECODER_H__
> > +#define INCLUDE__INTEL_PT_INSN_DECODER_H__
> > +
> > +#include <stddef.h>
> > +#include <stdint.h>
> > +
> > +#define INTEL_PT_INSN_DESC_MAX		32
> > +#define INTEL_PT_INSN_DBG_BUF_SZ	16
> > +
> > +enum intel_pt_insn_op {
> > +	INTEL_PT_OP_OTHER,
> > +	INTEL_PT_OP_CALL,
> > +	INTEL_PT_OP_RET,
> > +	INTEL_PT_OP_JCC,
> > +	INTEL_PT_OP_JMP,
> > +	INTEL_PT_OP_LOOP,
> > +	INTEL_PT_OP_IRET,
> > +	INTEL_PT_OP_INT,
> > +	INTEL_PT_OP_SYSCALL,
> > +	INTEL_PT_OP_SYSRET,
> > +};
> > +
> > +enum intel_pt_insn_branch {
> > +	INTEL_PT_BR_NO_BRANCH,
> > +	INTEL_PT_BR_INDIRECT,
> > +	INTEL_PT_BR_CONDITIONAL,
> > +	INTEL_PT_BR_UNCONDITIONAL,
> > +};
> > +
> > +struct intel_pt_insn {
> > +	enum intel_pt_insn_op		op;
> > +	enum intel_pt_insn_branch	branch;
> > +	int				length;
> > +	int32_t				rel;
> > +	unsigned char			buf[INTEL_PT_INSN_DBG_BUF_SZ];
> > +};
> > +
> > +int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
> > +		      struct intel_pt_insn *intel_pt_insn);
> > +
> > +const char *intel_pt_insn_name(enum intel_pt_insn_op op);
> > +
> > +int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
> > +		       size_t buf_len);
> > +
> > +size_t intel_pt_insn_max_size(void);
> > +
> > +int intel_pt_insn_type(enum intel_pt_insn_op op);
> > +
> > +#endif
> > -- 
> > 1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-05-29 13:33 ` [PATCH V6 08/17] perf tools: Add Intel PT support Adrian Hunter
@ 2015-06-19 16:04   ` Arnaldo Carvalho de Melo
  2015-06-19 16:22     ` Arnaldo Carvalho de Melo
  2015-06-19 19:33     ` Adrian Hunter
  0 siblings, 2 replies; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-19 16:04 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, May 29, 2015 at 04:33:36PM +0300, Adrian Hunter escreveu:
> Add support for Intel Processor Trace.
> 
> Intel PT support fits within the new auxtrace infrastructure.
> Recording is supporting by identifying the Intel PT PMU,
> parsing options and setting up events.  Decoding is supported
> by queuing up trace data by cpu or thread and then decoding
> synchronously delivering synthesized event samples into the
> session processing for tools to consume.

So, at this point what commands should I use to test this? I expected to
be able to have some command here, in this changeset log, telling me
that what has been applied so far + this "Add Intel PT support", can be
used in such and such a fashion, obtaining this and that output.

Now I'll go back and look at the cover letter to see what I can do at
this point and with access to a Broadwell class machine.

- Arnaldo
 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/arch/x86/util/Build      |    2 +
>  tools/perf/arch/x86/util/intel-pt.c |  752 ++++++++++++++
>  tools/perf/util/Build               |    1 +
>  tools/perf/util/intel-pt.c          | 1889 +++++++++++++++++++++++++++++++++++
>  tools/perf/util/intel-pt.h          |   51 +
>  5 files changed, 2695 insertions(+)
>  create mode 100644 tools/perf/arch/x86/util/intel-pt.c
>  create mode 100644 tools/perf/util/intel-pt.c
>  create mode 100644 tools/perf/util/intel-pt.h
> 
> diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
> index cfbccc4..1396088 100644
> --- a/tools/perf/arch/x86/util/Build
> +++ b/tools/perf/arch/x86/util/Build
> @@ -6,3 +6,5 @@ libperf-$(CONFIG_DWARF) += dwarf-regs.o
>  
>  libperf-$(CONFIG_LIBUNWIND)          += unwind-libunwind.o
>  libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
> +
> +libperf-$(CONFIG_AUXTRACE) += intel-pt.o
> diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
> new file mode 100644
> index 0000000..da7d2c1
> --- /dev/null
> +++ b/tools/perf/arch/x86/util/intel-pt.c
> @@ -0,0 +1,752 @@
> +/*
> + * intel_pt.c: Intel Processor Trace support
> + * Copyright (c) 2013-2015, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + */
> +
> +#include <stdbool.h>
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +#include <linux/bitops.h>
> +#include <linux/log2.h>
> +
> +#include "../../perf.h"
> +#include "../../util/session.h"
> +#include "../../util/event.h"
> +#include "../../util/evlist.h"
> +#include "../../util/evsel.h"
> +#include "../../util/cpumap.h"
> +#include "../../util/parse-options.h"
> +#include "../../util/parse-events.h"
> +#include "../../util/pmu.h"
> +#include "../../util/debug.h"
> +#include "../../util/auxtrace.h"
> +#include "../../util/tsc.h"
> +#include "../../util/intel-pt.h"
> +
> +#define KiB(x) ((x) * 1024)
> +#define MiB(x) ((x) * 1024 * 1024)
> +#define KiB_MASK(x) (KiB(x) - 1)
> +#define MiB_MASK(x) (MiB(x) - 1)
> +
> +#define INTEL_PT_DEFAULT_SAMPLE_SIZE	KiB(4)
> +
> +#define INTEL_PT_MAX_SAMPLE_SIZE	KiB(60)
> +
> +#define INTEL_PT_PSB_PERIOD_NEAR	256
> +
> +struct intel_pt_snapshot_ref {
> +	void *ref_buf;
> +	size_t ref_offset;
> +	bool wrapped;
> +};
> +
> +struct intel_pt_recording {
> +	struct auxtrace_record		itr;
> +	struct perf_pmu			*intel_pt_pmu;
> +	int				have_sched_switch;
> +	struct perf_evlist		*evlist;
> +	bool				snapshot_mode;
> +	bool				snapshot_init_done;
> +	size_t				snapshot_size;
> +	size_t				snapshot_ref_buf_size;
> +	int				snapshot_ref_cnt;
> +	struct intel_pt_snapshot_ref	*snapshot_refs;
> +};
> +
> +static int intel_pt_parse_terms_with_default(struct list_head *formats,
> +					     const char *str,
> +					     u64 *config)
> +{
> +	struct list_head *terms;
> +	struct perf_event_attr attr = { .size = 0, };
> +	int err;
> +
> +	terms = malloc(sizeof(struct list_head));
> +	if (!terms)
> +		return -ENOMEM;
> +
> +	INIT_LIST_HEAD(terms);
> +
> +	err = parse_events_terms(terms, str);
> +	if (err)
> +		goto out_free;
> +
> +	attr.config = *config;
> +	err = perf_pmu__config_terms(formats, &attr, terms, true, NULL);
> +	if (err)
> +		goto out_free;
> +
> +	*config = attr.config;
> +out_free:
> +	parse_events__free_terms(terms);
> +	return err;
> +}
> +
> +static int intel_pt_parse_terms(struct list_head *formats, const char *str,
> +				u64 *config)
> +{
> +	*config = 0;
> +	return intel_pt_parse_terms_with_default(formats, str, config);
> +}
> +
> +static size_t intel_pt_psb_period(struct perf_pmu *intel_pt_pmu __maybe_unused,
> +				  struct perf_evlist *evlist __maybe_unused)
> +{
> +	return 256;
> +}
> +
> +static u64 intel_pt_default_config(struct perf_pmu *intel_pt_pmu)
> +{
> +	u64 config;
> +
> +	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &config);
> +	return config;
> +}
> +
> +static int intel_pt_parse_snapshot_options(struct auxtrace_record *itr,
> +					   struct record_opts *opts,
> +					   const char *str)
> +{
> +	struct intel_pt_recording *ptr =
> +			container_of(itr, struct intel_pt_recording, itr);
> +	unsigned long long snapshot_size = 0;
> +	char *endptr;
> +
> +	if (str) {
> +		snapshot_size = strtoull(str, &endptr, 0);
> +		if (*endptr || snapshot_size > SIZE_MAX)
> +			return -1;
> +	}
> +
> +	opts->auxtrace_snapshot_mode = true;
> +	opts->auxtrace_snapshot_size = snapshot_size;
> +
> +	ptr->snapshot_size = snapshot_size;
> +
> +	return 0;
> +}
> +
> +struct perf_event_attr *
> +intel_pt_pmu_default_config(struct perf_pmu *intel_pt_pmu)
> +{
> +	struct perf_event_attr *attr;
> +
> +	attr = zalloc(sizeof(struct perf_event_attr));
> +	if (!attr)
> +		return NULL;
> +
> +	attr->config = intel_pt_default_config(intel_pt_pmu);
> +
> +	intel_pt_pmu->selectable = true;
> +
> +	return attr;
> +}
> +
> +static size_t intel_pt_info_priv_size(struct auxtrace_record *itr __maybe_unused)
> +{
> +	return INTEL_PT_AUXTRACE_PRIV_SIZE;
> +}
> +
> +static int intel_pt_info_fill(struct auxtrace_record *itr,
> +			      struct perf_session *session,
> +			      struct auxtrace_info_event *auxtrace_info,
> +			      size_t priv_size)
> +{
> +	struct intel_pt_recording *ptr =
> +			container_of(itr, struct intel_pt_recording, itr);
> +	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
> +	struct perf_event_mmap_page *pc;
> +	struct perf_tsc_conversion tc = { .time_mult = 0, };
> +	bool cap_user_time_zero = false, per_cpu_mmaps;
> +	u64 tsc_bit, noretcomp_bit;
> +	int err;
> +
> +	if (priv_size != INTEL_PT_AUXTRACE_PRIV_SIZE)
> +		return -EINVAL;
> +
> +	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
> +	intel_pt_parse_terms(&intel_pt_pmu->format, "noretcomp",
> +			     &noretcomp_bit);
> +
> +	if (!session->evlist->nr_mmaps)
> +		return -EINVAL;
> +
> +	pc = session->evlist->mmap[0].base;
> +	if (pc) {
> +		err = perf_read_tsc_conversion(pc, &tc);
> +		if (err) {
> +			if (err != -EOPNOTSUPP)
> +				return err;
> +		} else {
> +			cap_user_time_zero = tc.time_mult != 0;
> +		}
> +		if (!cap_user_time_zero)
> +			ui__warning("Intel Processor Trace: TSC not available\n");
> +	}
> +
> +	per_cpu_mmaps = !cpu_map__empty(session->evlist->cpus);
> +
> +	auxtrace_info->type = PERF_AUXTRACE_INTEL_PT;
> +	auxtrace_info->priv[INTEL_PT_PMU_TYPE] = intel_pt_pmu->type;
> +	auxtrace_info->priv[INTEL_PT_TIME_SHIFT] = tc.time_shift;
> +	auxtrace_info->priv[INTEL_PT_TIME_MULT] = tc.time_mult;
> +	auxtrace_info->priv[INTEL_PT_TIME_ZERO] = tc.time_zero;
> +	auxtrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO] = cap_user_time_zero;
> +	auxtrace_info->priv[INTEL_PT_TSC_BIT] = tsc_bit;
> +	auxtrace_info->priv[INTEL_PT_NORETCOMP_BIT] = noretcomp_bit;
> +	auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH] = ptr->have_sched_switch;
> +	auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE] = ptr->snapshot_mode;
> +	auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS] = per_cpu_mmaps;
> +
> +	return 0;
> +}
> +
> +static int intel_pt_track_switches(struct perf_evlist *evlist)
> +{
> +	const char *sched_switch = "sched:sched_switch";
> +	struct perf_evsel *evsel;
> +	int err;
> +
> +	if (!perf_evlist__can_select_event(evlist, sched_switch))
> +		return -EPERM;
> +
> +	err = parse_events(evlist, sched_switch, NULL);
> +	if (err) {
> +		pr_debug2("%s: failed to parse %s, error %d\n",
> +			  __func__, sched_switch, err);
> +		return err;
> +	}
> +
> +	evsel = perf_evlist__last(evlist);
> +
> +	perf_evsel__set_sample_bit(evsel, CPU);
> +	perf_evsel__set_sample_bit(evsel, TIME);
> +
> +	evsel->system_wide = true;
> +	evsel->no_aux_samples = true;
> +	evsel->immediate = true;
> +
> +	return 0;
> +}
> +
> +static int intel_pt_recording_options(struct auxtrace_record *itr,
> +				      struct perf_evlist *evlist,
> +				      struct record_opts *opts)
> +{
> +	struct intel_pt_recording *ptr =
> +			container_of(itr, struct intel_pt_recording, itr);
> +	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
> +	bool have_timing_info;
> +	struct perf_evsel *evsel, *intel_pt_evsel = NULL;
> +	const struct cpu_map *cpus = evlist->cpus;
> +	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
> +	u64 tsc_bit;
> +
> +	ptr->evlist = evlist;
> +	ptr->snapshot_mode = opts->auxtrace_snapshot_mode;
> +
> +	evlist__for_each(evlist, evsel) {
> +		if (evsel->attr.type == intel_pt_pmu->type) {
> +			if (intel_pt_evsel) {
> +				pr_err("There may be only one " INTEL_PT_PMU_NAME " event\n");
> +				return -EINVAL;
> +			}
> +			evsel->attr.freq = 0;
> +			evsel->attr.sample_period = 1;
> +			intel_pt_evsel = evsel;
> +			opts->full_auxtrace = true;
> +		}
> +	}
> +
> +	if (opts->auxtrace_snapshot_mode && !opts->full_auxtrace) {
> +		pr_err("Snapshot mode (-S option) requires " INTEL_PT_PMU_NAME " PMU event (-e " INTEL_PT_PMU_NAME ")\n");
> +		return -EINVAL;
> +	}
> +
> +	if (opts->use_clockid) {
> +		pr_err("Cannot use clockid (-k option) with " INTEL_PT_PMU_NAME "\n");
> +		return -EINVAL;
> +	}
> +
> +	if (!opts->full_auxtrace)
> +		return 0;
> +
> +	/* Set default sizes for snapshot mode */
> +	if (opts->auxtrace_snapshot_mode) {
> +		size_t psb_period = intel_pt_psb_period(intel_pt_pmu, evlist);
> +
> +		if (!opts->auxtrace_snapshot_size && !opts->auxtrace_mmap_pages) {
> +			if (privileged) {
> +				opts->auxtrace_mmap_pages = MiB(4) / page_size;
> +			} else {
> +				opts->auxtrace_mmap_pages = KiB(128) / page_size;
> +				if (opts->mmap_pages == UINT_MAX)
> +					opts->mmap_pages = KiB(256) / page_size;
> +			}
> +		} else if (!opts->auxtrace_mmap_pages && !privileged &&
> +			   opts->mmap_pages == UINT_MAX) {
> +			opts->mmap_pages = KiB(256) / page_size;
> +		}
> +		if (!opts->auxtrace_snapshot_size)
> +			opts->auxtrace_snapshot_size =
> +				opts->auxtrace_mmap_pages * (size_t)page_size;
> +		if (!opts->auxtrace_mmap_pages) {
> +			size_t sz = opts->auxtrace_snapshot_size;
> +
> +			sz = round_up(sz, page_size) / page_size;
> +			opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
> +		}
> +		if (opts->auxtrace_snapshot_size >
> +				opts->auxtrace_mmap_pages * (size_t)page_size) {
> +			pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
> +			       opts->auxtrace_snapshot_size,
> +			       opts->auxtrace_mmap_pages * (size_t)page_size);
> +			return -EINVAL;
> +		}
> +		if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages) {
> +			pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
> +			return -EINVAL;
> +		}
> +		pr_debug2("Intel PT snapshot size: %zu\n",
> +			  opts->auxtrace_snapshot_size);
> +		if (psb_period &&
> +		    opts->auxtrace_snapshot_size <= psb_period +
> +						  INTEL_PT_PSB_PERIOD_NEAR)
> +			ui__warning("Intel PT snapshot size (%zu) may be too small for PSB period (%zu)\n",
> +				    opts->auxtrace_snapshot_size, psb_period);
> +	}
> +
> +	/* Set default sizes for full trace mode */
> +	if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) {
> +		if (privileged) {
> +			opts->auxtrace_mmap_pages = MiB(4) / page_size;
> +		} else {
> +			opts->auxtrace_mmap_pages = KiB(128) / page_size;
> +			if (opts->mmap_pages == UINT_MAX)
> +				opts->mmap_pages = KiB(256) / page_size;
> +		}
> +	}
> +
> +	/* Validate auxtrace_mmap_pages */
> +	if (opts->auxtrace_mmap_pages) {
> +		size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
> +		size_t min_sz;
> +
> +		if (opts->auxtrace_snapshot_mode)
> +			min_sz = KiB(4);
> +		else
> +			min_sz = KiB(8);
> +
> +		if (sz < min_sz || !is_power_of_2(sz)) {
> +			pr_err("Invalid mmap size for Intel Processor Trace: must be at least %zuKiB and a power of 2\n",
> +			       min_sz / 1024);
> +			return -EINVAL;
> +		}
> +	}
> +
> +	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
> +
> +	if (opts->full_auxtrace && (intel_pt_evsel->attr.config & tsc_bit))
> +		have_timing_info = true;
> +	else
> +		have_timing_info = false;
> +
> +	/*
> +	 * Per-cpu recording needs sched_switch events to distinguish different
> +	 * threads.
> +	 */
> +	if (have_timing_info && !cpu_map__empty(cpus)) {
> +		int err;
> +
> +		err = intel_pt_track_switches(evlist);
> +		if (err == -EPERM)
> +			pr_debug2("Unable to select sched:sched_switch\n");
> +		else if (err)
> +			return err;
> +		else
> +			ptr->have_sched_switch = 1;
> +	}
> +
> +	if (intel_pt_evsel) {
> +		/*
> +		 * To obtain the auxtrace buffer file descriptor, the auxtrace
> +		 * event must come first.
> +		 */
> +		perf_evlist__to_front(evlist, intel_pt_evsel);
> +		/*
> +		 * In the case of per-cpu mmaps, we need the CPU on the
> +		 * AUX event.
> +		 */
> +		if (!cpu_map__empty(cpus))
> +			perf_evsel__set_sample_bit(intel_pt_evsel, CPU);
> +	}
> +
> +	/* Add dummy event to keep tracking */
> +	if (opts->full_auxtrace) {
> +		struct perf_evsel *tracking_evsel;
> +		int err;
> +
> +		err = parse_events(evlist, "dummy:u", NULL);
> +		if (err)
> +			return err;
> +
> +		tracking_evsel = perf_evlist__last(evlist);
> +
> +		perf_evlist__set_tracking_event(evlist, tracking_evsel);
> +
> +		tracking_evsel->attr.freq = 0;
> +		tracking_evsel->attr.sample_period = 1;
> +
> +		/* In per-cpu case, always need the time of mmap events etc */
> +		if (!cpu_map__empty(cpus))
> +			perf_evsel__set_sample_bit(tracking_evsel, TIME);
> +	}
> +
> +	/*
> +	 * Warn the user when we do not have enough information to decode i.e.
> +	 * per-cpu with no sched_switch (except workload-only).
> +	 */
> +	if (!ptr->have_sched_switch && !cpu_map__empty(cpus) &&
> +	    !target__none(&opts->target))
> +		ui__warning("Intel Processor Trace decoding will not be possible except for kernel tracing!\n");
> +
> +	return 0;
> +}
> +
> +static int intel_pt_snapshot_start(struct auxtrace_record *itr)
> +{
> +	struct intel_pt_recording *ptr =
> +			container_of(itr, struct intel_pt_recording, itr);
> +	struct perf_evsel *evsel;
> +
> +	evlist__for_each(ptr->evlist, evsel) {
> +		if (evsel->attr.type == ptr->intel_pt_pmu->type)
> +			return perf_evlist__disable_event(ptr->evlist, evsel);
> +	}
> +	return -EINVAL;
> +}
> +
> +static int intel_pt_snapshot_finish(struct auxtrace_record *itr)
> +{
> +	struct intel_pt_recording *ptr =
> +			container_of(itr, struct intel_pt_recording, itr);
> +	struct perf_evsel *evsel;
> +
> +	evlist__for_each(ptr->evlist, evsel) {
> +		if (evsel->attr.type == ptr->intel_pt_pmu->type)
> +			return perf_evlist__enable_event(ptr->evlist, evsel);
> +	}
> +	return -EINVAL;
> +}
> +
> +static int intel_pt_alloc_snapshot_refs(struct intel_pt_recording *ptr, int idx)
> +{
> +	const size_t sz = sizeof(struct intel_pt_snapshot_ref);
> +	int cnt = ptr->snapshot_ref_cnt, new_cnt = cnt * 2;
> +	struct intel_pt_snapshot_ref *refs;
> +
> +	if (!new_cnt)
> +		new_cnt = 16;
> +
> +	while (new_cnt <= idx)
> +		new_cnt *= 2;
> +
> +	refs = calloc(new_cnt, sz);
> +	if (!refs)
> +		return -ENOMEM;
> +
> +	memcpy(refs, ptr->snapshot_refs, cnt * sz);
> +
> +	ptr->snapshot_refs = refs;
> +	ptr->snapshot_ref_cnt = new_cnt;
> +
> +	return 0;
> +}
> +
> +static void intel_pt_free_snapshot_refs(struct intel_pt_recording *ptr)
> +{
> +	int i;
> +
> +	for (i = 0; i < ptr->snapshot_ref_cnt; i++)
> +		zfree(&ptr->snapshot_refs[i].ref_buf);
> +	zfree(&ptr->snapshot_refs);
> +}
> +
> +static void intel_pt_recording_free(struct auxtrace_record *itr)
> +{
> +	struct intel_pt_recording *ptr =
> +			container_of(itr, struct intel_pt_recording, itr);
> +
> +	intel_pt_free_snapshot_refs(ptr);
> +	free(ptr);
> +}
> +
> +static int intel_pt_alloc_snapshot_ref(struct intel_pt_recording *ptr, int idx,
> +				       size_t snapshot_buf_size)
> +{
> +	size_t ref_buf_size = ptr->snapshot_ref_buf_size;
> +	void *ref_buf;
> +
> +	ref_buf = zalloc(ref_buf_size);
> +	if (!ref_buf)
> +		return -ENOMEM;
> +
> +	ptr->snapshot_refs[idx].ref_buf = ref_buf;
> +	ptr->snapshot_refs[idx].ref_offset = snapshot_buf_size - ref_buf_size;
> +
> +	return 0;
> +}
> +
> +static size_t intel_pt_snapshot_ref_buf_size(struct intel_pt_recording *ptr,
> +					     size_t snapshot_buf_size)
> +{
> +	const size_t max_size = 256 * 1024;
> +	size_t buf_size = 0, psb_period;
> +
> +	if (ptr->snapshot_size <= 64 * 1024)
> +		return 0;
> +
> +	psb_period = intel_pt_psb_period(ptr->intel_pt_pmu, ptr->evlist);
> +	if (psb_period)
> +		buf_size = psb_period * 2;
> +
> +	if (!buf_size || buf_size > max_size)
> +		buf_size = max_size;
> +
> +	if (buf_size >= snapshot_buf_size)
> +		return 0;
> +
> +	if (buf_size >= ptr->snapshot_size / 2)
> +		return 0;
> +
> +	return buf_size;
> +}
> +
> +static int intel_pt_snapshot_init(struct intel_pt_recording *ptr,
> +				  size_t snapshot_buf_size)
> +{
> +	if (ptr->snapshot_init_done)
> +		return 0;
> +
> +	ptr->snapshot_init_done = true;
> +
> +	ptr->snapshot_ref_buf_size = intel_pt_snapshot_ref_buf_size(ptr,
> +							snapshot_buf_size);
> +
> +	return 0;
> +}
> +
> +/**
> + * intel_pt_compare_buffers - compare bytes in a buffer to a circular buffer.
> + * @buf1: first buffer
> + * @compare_size: number of bytes to compare
> + * @buf2: second buffer (a circular buffer)
> + * @offs2: offset in second buffer
> + * @buf2_size: size of second buffer
> + *
> + * The comparison allows for the possibility that the bytes to compare in the
> + * circular buffer are not contiguous.  It is assumed that @compare_size <=
> + * @buf2_size.  This function returns %false if the bytes are identical, %true
> + * otherwise.
> + */
> +static bool intel_pt_compare_buffers(void *buf1, size_t compare_size,
> +				     void *buf2, size_t offs2, size_t buf2_size)
> +{
> +	size_t end2 = offs2 + compare_size, part_size;
> +
> +	if (end2 <= buf2_size)
> +		return memcmp(buf1, buf2 + offs2, compare_size);
> +
> +	part_size = end2 - buf2_size;
> +	if (memcmp(buf1, buf2 + offs2, part_size))
> +		return true;
> +
> +	compare_size -= part_size;
> +
> +	return memcmp(buf1 + part_size, buf2, compare_size);
> +}
> +
> +static bool intel_pt_compare_ref(void *ref_buf, size_t ref_offset,
> +				 size_t ref_size, size_t buf_size,
> +				 void *data, size_t head)
> +{
> +	size_t ref_end = ref_offset + ref_size;
> +
> +	if (ref_end > buf_size) {
> +		if (head > ref_offset || head < ref_end - buf_size)
> +			return true;
> +	} else if (head > ref_offset && head < ref_end) {
> +		return true;
> +	}
> +
> +	return intel_pt_compare_buffers(ref_buf, ref_size, data, ref_offset,
> +					buf_size);
> +}
> +
> +static void intel_pt_copy_ref(void *ref_buf, size_t ref_size, size_t buf_size,
> +			      void *data, size_t head)
> +{
> +	if (head >= ref_size) {
> +		memcpy(ref_buf, data + head - ref_size, ref_size);
> +	} else {
> +		memcpy(ref_buf, data, head);
> +		ref_size -= head;
> +		memcpy(ref_buf + head, data + buf_size - ref_size, ref_size);
> +	}
> +}
> +
> +static bool intel_pt_wrapped(struct intel_pt_recording *ptr, int idx,
> +			     struct auxtrace_mmap *mm, unsigned char *data,
> +			     u64 head)
> +{
> +	struct intel_pt_snapshot_ref *ref = &ptr->snapshot_refs[idx];
> +	bool wrapped;
> +
> +	wrapped = intel_pt_compare_ref(ref->ref_buf, ref->ref_offset,
> +				       ptr->snapshot_ref_buf_size, mm->len,
> +				       data, head);
> +
> +	intel_pt_copy_ref(ref->ref_buf, ptr->snapshot_ref_buf_size, mm->len,
> +			  data, head);
> +
> +	return wrapped;
> +}
> +
> +static bool intel_pt_first_wrap(u64 *data, size_t buf_size)
> +{
> +	int i, a, b;
> +
> +	b = buf_size >> 3;
> +	a = b - 512;
> +	if (a < 0)
> +		a = 0;
> +
> +	for (i = a; i < b; i++) {
> +		if (data[i])
> +			return true;
> +	}
> +
> +	return false;
> +}
> +
> +static int intel_pt_find_snapshot(struct auxtrace_record *itr, int idx,
> +				  struct auxtrace_mmap *mm, unsigned char *data,
> +				  u64 *head, u64 *old)
> +{
> +	struct intel_pt_recording *ptr =
> +			container_of(itr, struct intel_pt_recording, itr);
> +	bool wrapped;
> +	int err;
> +
> +	pr_debug3("%s: mmap index %d old head %zu new head %zu\n",
> +		  __func__, idx, (size_t)*old, (size_t)*head);
> +
> +	err = intel_pt_snapshot_init(ptr, mm->len);
> +	if (err)
> +		goto out_err;
> +
> +	if (idx >= ptr->snapshot_ref_cnt) {
> +		err = intel_pt_alloc_snapshot_refs(ptr, idx);
> +		if (err)
> +			goto out_err;
> +	}
> +
> +	if (ptr->snapshot_ref_buf_size) {
> +		if (!ptr->snapshot_refs[idx].ref_buf) {
> +			err = intel_pt_alloc_snapshot_ref(ptr, idx, mm->len);
> +			if (err)
> +				goto out_err;
> +		}
> +		wrapped = intel_pt_wrapped(ptr, idx, mm, data, *head);
> +	} else {
> +		wrapped = ptr->snapshot_refs[idx].wrapped;
> +		if (!wrapped && intel_pt_first_wrap((u64 *)data, mm->len)) {
> +			ptr->snapshot_refs[idx].wrapped = true;
> +			wrapped = true;
> +		}
> +	}
> +
> +	/*
> +	 * In full trace mode 'head' continually increases.  However in snapshot
> +	 * mode 'head' is an offset within the buffer.  Here 'old' and 'head'
> +	 * are adjusted to match the full trace case which expects that 'old' is
> +	 * always less than 'head'.
> +	 */
> +	if (wrapped) {
> +		*old = *head;
> +		*head += mm->len;
> +	} else {
> +		if (mm->mask)
> +			*old &= mm->mask;
> +		else
> +			*old %= mm->len;
> +		if (*old > *head)
> +			*head += mm->len;
> +	}
> +
> +	pr_debug3("%s: wrap-around %sdetected, adjusted old head %zu adjusted new head %zu\n",
> +		  __func__, wrapped ? "" : "not ", (size_t)*old, (size_t)*head);
> +
> +	return 0;
> +
> +out_err:
> +	pr_err("%s: failed, error %d\n", __func__, err);
> +	return err;
> +}
> +
> +static u64 intel_pt_reference(struct auxtrace_record *itr __maybe_unused)
> +{
> +	return rdtsc();
> +}
> +
> +static int intel_pt_read_finish(struct auxtrace_record *itr, int idx)
> +{
> +	struct intel_pt_recording *ptr =
> +			container_of(itr, struct intel_pt_recording, itr);
> +	struct perf_evsel *evsel;
> +
> +	evlist__for_each(ptr->evlist, evsel) {
> +		if (evsel->attr.type == ptr->intel_pt_pmu->type)
> +			return perf_evlist__enable_event_idx(ptr->evlist, evsel,
> +							     idx);
> +	}
> +	return -EINVAL;
> +}
> +
> +struct auxtrace_record *intel_pt_recording_init(int *err)
> +{
> +	struct perf_pmu *intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
> +	struct intel_pt_recording *ptr;
> +
> +	if (!intel_pt_pmu)
> +		return NULL;
> +
> +	ptr = zalloc(sizeof(struct intel_pt_recording));
> +	if (!ptr) {
> +		*err = -ENOMEM;
> +		return NULL;
> +	}
> +
> +	ptr->intel_pt_pmu = intel_pt_pmu;
> +	ptr->itr.recording_options = intel_pt_recording_options;
> +	ptr->itr.info_priv_size = intel_pt_info_priv_size;
> +	ptr->itr.info_fill = intel_pt_info_fill;
> +	ptr->itr.free = intel_pt_recording_free;
> +	ptr->itr.snapshot_start = intel_pt_snapshot_start;
> +	ptr->itr.snapshot_finish = intel_pt_snapshot_finish;
> +	ptr->itr.find_snapshot = intel_pt_find_snapshot;
> +	ptr->itr.parse_snapshot_options = intel_pt_parse_snapshot_options;
> +	ptr->itr.reference = intel_pt_reference;
> +	ptr->itr.read_finish = intel_pt_read_finish;
> +	return &ptr->itr;
> +}
> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
> index 86c81f6..ec7ab9d 100644
> --- a/tools/perf/util/Build
> +++ b/tools/perf/util/Build
> @@ -76,6 +76,7 @@ libperf-y += cloexec.o
>  libperf-y += thread-stack.o
>  libperf-$(CONFIG_AUXTRACE) += auxtrace.o
>  libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
> +libperf-$(CONFIG_AUXTRACE) += intel-pt.o
>  libperf-y += parse-branch-options.o
>  
>  libperf-$(CONFIG_LIBELF) += symbol-elf.o
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> new file mode 100644
> index 0000000..6d66879
> --- /dev/null
> +++ b/tools/perf/util/intel-pt.c
> @@ -0,0 +1,1889 @@
> +/*
> + * intel_pt.c: Intel Processor Trace support
> + * Copyright (c) 2013-2015, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + */
> +
> +#include <stdio.h>
> +#include <stdbool.h>
> +#include <errno.h>
> +#include <linux/kernel.h>
> +#include <linux/types.h>
> +
> +#include "../perf.h"
> +#include "session.h"
> +#include "machine.h"
> +#include "tool.h"
> +#include "event.h"
> +#include "evlist.h"
> +#include "evsel.h"
> +#include "map.h"
> +#include "color.h"
> +#include "util.h"
> +#include "thread.h"
> +#include "thread-stack.h"
> +#include "symbol.h"
> +#include "callchain.h"
> +#include "dso.h"
> +#include "debug.h"
> +#include "auxtrace.h"
> +#include "tsc.h"
> +#include "intel-pt.h"
> +
> +#include "intel-pt-decoder/intel-pt-log.h"
> +#include "intel-pt-decoder/intel-pt-decoder.h"
> +#include "intel-pt-decoder/intel-pt-insn-decoder.h"
> +#include "intel-pt-decoder/intel-pt-pkt-decoder.h"
> +
> +#define MAX_TIMESTAMP (~0ULL)
> +
> +struct intel_pt {
> +	struct auxtrace auxtrace;
> +	struct auxtrace_queues queues;
> +	struct auxtrace_heap heap;
> +	u32 auxtrace_type;
> +	struct perf_session *session;
> +	struct machine *machine;
> +	struct perf_evsel *switch_evsel;
> +	struct thread *unknown_thread;
> +	bool timeless_decoding;
> +	bool sampling_mode;
> +	bool snapshot_mode;
> +	bool per_cpu_mmaps;
> +	bool have_tsc;
> +	bool data_queued;
> +	bool est_tsc;
> +	bool sync_switch;
> +	bool est_tsc_orig;
> +	int have_sched_switch;
> +	u32 pmu_type;
> +	u64 kernel_start;
> +	u64 switch_ip;
> +	u64 ptss_ip;
> +
> +	struct perf_tsc_conversion tc;
> +	bool cap_user_time_zero;
> +
> +	struct itrace_synth_opts synth_opts;
> +
> +	bool sample_instructions;
> +	u64 instructions_sample_type;
> +	u64 instructions_sample_period;
> +	u64 instructions_id;
> +
> +	bool sample_branches;
> +	u32 branches_filter;
> +	u64 branches_sample_type;
> +	u64 branches_id;
> +
> +	bool sample_transactions;
> +	u64 transactions_sample_type;
> +	u64 transactions_id;
> +
> +	bool synth_needs_swap;
> +
> +	u64 tsc_bit;
> +	u64 noretcomp_bit;
> +};
> +
> +enum switch_state {
> +	INTEL_PT_SS_NOT_TRACING,
> +	INTEL_PT_SS_UNKNOWN,
> +	INTEL_PT_SS_TRACING,
> +	INTEL_PT_SS_EXPECTING_SWITCH_EVENT,
> +	INTEL_PT_SS_EXPECTING_SWITCH_IP,
> +};
> +
> +struct intel_pt_queue {
> +	struct intel_pt *pt;
> +	unsigned int queue_nr;
> +	struct auxtrace_buffer *buffer;
> +	void *decoder;
> +	const struct intel_pt_state *state;
> +	struct ip_callchain *chain;
> +	union perf_event *event_buf;
> +	bool on_heap;
> +	bool stop;
> +	bool step_through_buffers;
> +	bool use_buffer_pid_tid;
> +	pid_t pid, tid;
> +	int cpu;
> +	int switch_state;
> +	pid_t next_tid;
> +	struct thread *thread;
> +	bool exclude_kernel;
> +	bool have_sample;
> +	u64 time;
> +	u64 timestamp;
> +	u32 flags;
> +	u16 insn_len;
> +};
> +
> +static void intel_pt_dump(struct intel_pt *pt __maybe_unused,
> +			  unsigned char *buf, size_t len)
> +{
> +	struct intel_pt_pkt packet;
> +	size_t pos = 0;
> +	int ret, pkt_len, i;
> +	char desc[INTEL_PT_PKT_DESC_MAX];
> +	const char *color = PERF_COLOR_BLUE;
> +
> +	color_fprintf(stdout, color,
> +		      ". ... Intel Processor Trace data: size %zu bytes\n",
> +		      len);
> +
> +	while (len) {
> +		ret = intel_pt_get_packet(buf, len, &packet);
> +		if (ret > 0)
> +			pkt_len = ret;
> +		else
> +			pkt_len = 1;
> +		printf(".");
> +		color_fprintf(stdout, color, "  %08x: ", pos);
> +		for (i = 0; i < pkt_len; i++)
> +			color_fprintf(stdout, color, " %02x", buf[i]);
> +		for (; i < 16; i++)
> +			color_fprintf(stdout, color, "   ");
> +		if (ret > 0) {
> +			ret = intel_pt_pkt_desc(&packet, desc,
> +						INTEL_PT_PKT_DESC_MAX);
> +			if (ret > 0)
> +				color_fprintf(stdout, color, " %s\n", desc);
> +		} else {
> +			color_fprintf(stdout, color, " Bad packet!\n");
> +		}
> +		pos += pkt_len;
> +		buf += pkt_len;
> +		len -= pkt_len;
> +	}
> +}
> +
> +static void intel_pt_dump_event(struct intel_pt *pt, unsigned char *buf,
> +				size_t len)
> +{
> +	printf(".\n");
> +	intel_pt_dump(pt, buf, len);
> +}
> +
> +static int intel_pt_do_fix_overlap(struct intel_pt *pt, struct auxtrace_buffer *a,
> +				   struct auxtrace_buffer *b)
> +{
> +	void *start;
> +
> +	start = intel_pt_find_overlap(a->data, a->size, b->data, b->size,
> +				      pt->have_tsc);
> +	if (!start)
> +		return -EINVAL;
> +	b->use_size = b->data + b->size - start;
> +	b->use_data = start;
> +	return 0;
> +}
> +
> +static void intel_pt_use_buffer_pid_tid(struct intel_pt_queue *ptq,
> +					struct auxtrace_queue *queue,
> +					struct auxtrace_buffer *buffer)
> +{
> +	if (queue->cpu == -1 && buffer->cpu != -1)
> +		ptq->cpu = buffer->cpu;
> +
> +	ptq->pid = buffer->pid;
> +	ptq->tid = buffer->tid;
> +
> +	intel_pt_log("queue %u cpu %d pid %d tid %d\n",
> +		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
> +
> +	ptq->thread = NULL;
> +
> +	if (ptq->tid != -1) {
> +		if (ptq->pid != -1)
> +			ptq->thread = machine__findnew_thread(ptq->pt->machine,
> +							      ptq->pid,
> +							      ptq->tid);
> +		else
> +			ptq->thread = machine__find_thread(ptq->pt->machine, -1,
> +							   ptq->tid);
> +	}
> +}
> +
> +/* This function assumes data is processed sequentially only */
> +static int intel_pt_get_trace(struct intel_pt_buffer *b, void *data)
> +{
> +	struct intel_pt_queue *ptq = data;
> +	struct auxtrace_buffer *buffer = ptq->buffer, *old_buffer = buffer;
> +	struct auxtrace_queue *queue;
> +
> +	if (ptq->stop) {
> +		b->len = 0;
> +		return 0;
> +	}
> +
> +	queue = &ptq->pt->queues.queue_array[ptq->queue_nr];
> +
> +	buffer = auxtrace_buffer__next(queue, buffer);
> +	if (!buffer) {
> +		if (old_buffer)
> +			auxtrace_buffer__drop_data(old_buffer);
> +		b->len = 0;
> +		return 0;
> +	}
> +
> +	ptq->buffer = buffer;
> +
> +	if (!buffer->data) {
> +		int fd = perf_data_file__fd(ptq->pt->session->file);
> +
> +		buffer->data = auxtrace_buffer__get_data(buffer, fd);
> +		if (!buffer->data)
> +			return -ENOMEM;
> +	}
> +
> +	if (ptq->pt->snapshot_mode && !buffer->consecutive && old_buffer &&
> +	    intel_pt_do_fix_overlap(ptq->pt, old_buffer, buffer))
> +		return -ENOMEM;
> +
> +	if (old_buffer)
> +		auxtrace_buffer__drop_data(old_buffer);
> +
> +	if (buffer->use_data) {
> +		b->len = buffer->use_size;
> +		b->buf = buffer->use_data;
> +	} else {
> +		b->len = buffer->size;
> +		b->buf = buffer->data;
> +	}
> +	b->ref_timestamp = buffer->reference;
> +
> +	if (!old_buffer || ptq->pt->sampling_mode || (ptq->pt->snapshot_mode &&
> +						      !buffer->consecutive)) {
> +		b->consecutive = false;
> +		b->trace_nr = buffer->buffer_nr;
> +	} else {
> +		b->consecutive = true;
> +	}
> +
> +	if (ptq->use_buffer_pid_tid && (ptq->pid != buffer->pid ||
> +					ptq->tid != buffer->tid))
> +		intel_pt_use_buffer_pid_tid(ptq, queue, buffer);
> +
> +	if (ptq->step_through_buffers)
> +		ptq->stop = true;
> +
> +	if (!b->len)
> +		return intel_pt_get_trace(b, data);
> +
> +	return 0;
> +}
> +
> +struct intel_pt_cache_entry {
> +	struct auxtrace_cache_entry	entry;
> +	u64				insn_cnt;
> +	u64				byte_cnt;
> +	enum intel_pt_insn_op		op;
> +	enum intel_pt_insn_branch	branch;
> +	int				length;
> +	int32_t				rel;
> +};
> +
> +static int intel_pt_config_div(const char *var, const char *value, void *data)
> +{
> +	int *d = data;
> +	long val;
> +
> +	if (!strcmp(var, "intel-pt.cache-divisor")) {
> +		val = strtol(value, NULL, 0);
> +		if (val > 0 && val <= INT_MAX)
> +			*d = val;
> +	}
> +
> +	return 0;
> +}
> +
> +static int intel_pt_cache_divisor(void)
> +{
> +	static int d;
> +
> +	if (d)
> +		return d;
> +
> +	perf_config(intel_pt_config_div, &d);
> +
> +	if (!d)
> +		d = 64;
> +
> +	return d;
> +}
> +
> +static unsigned int intel_pt_cache_size(struct dso *dso,
> +					struct machine *machine)
> +{
> +	off_t size;
> +
> +	size = dso__data_size(dso, machine);
> +	size /= intel_pt_cache_divisor();
> +	if (size < 1000)
> +		return 10;
> +	if (size > (1 << 21))
> +		return 21;
> +	return 32 - __builtin_clz(size);
> +}
> +
> +static struct auxtrace_cache *intel_pt_cache(struct dso *dso,
> +					     struct machine *machine)
> +{
> +	struct auxtrace_cache *c;
> +	unsigned int bits;
> +
> +	if (dso->auxtrace_cache)
> +		return dso->auxtrace_cache;
> +
> +	bits = intel_pt_cache_size(dso, machine);
> +
> +	/* Ignoring cache creation failure */
> +	c = auxtrace_cache__new(bits, sizeof(struct intel_pt_cache_entry), 200);
> +
> +	dso->auxtrace_cache = c;
> +
> +	return c;
> +}
> +
> +static int intel_pt_cache_add(struct dso *dso, struct machine *machine,
> +			      u64 offset, u64 insn_cnt, u64 byte_cnt,
> +			      struct intel_pt_insn *intel_pt_insn)
> +{
> +	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
> +	struct intel_pt_cache_entry *e;
> +	int err;
> +
> +	if (!c)
> +		return -ENOMEM;
> +
> +	e = auxtrace_cache__alloc_entry(c);
> +	if (!e)
> +		return -ENOMEM;
> +
> +	e->insn_cnt = insn_cnt;
> +	e->byte_cnt = byte_cnt;
> +	e->op = intel_pt_insn->op;
> +	e->branch = intel_pt_insn->branch;
> +	e->length = intel_pt_insn->length;
> +	e->rel = intel_pt_insn->rel;
> +
> +	err = auxtrace_cache__add(c, offset, &e->entry);
> +	if (err)
> +		auxtrace_cache__free_entry(c, e);
> +
> +	return err;
> +}
> +
> +static struct intel_pt_cache_entry *
> +intel_pt_cache_lookup(struct dso *dso, struct machine *machine, u64 offset)
> +{
> +	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
> +
> +	if (!c)
> +		return NULL;
> +
> +	return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
> +}
> +
> +static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
> +				   uint64_t *insn_cnt_ptr, uint64_t *ip,
> +				   uint64_t to_ip, uint64_t max_insn_cnt,
> +				   void *data)
> +{
> +	struct intel_pt_queue *ptq = data;
> +	struct machine *machine = ptq->pt->machine;
> +	struct thread *thread;
> +	struct addr_location al;
> +	unsigned char buf[1024];
> +	size_t bufsz;
> +	ssize_t len;
> +	int x86_64;
> +	u8 cpumode;
> +	u64 offset, start_offset, start_ip;
> +	u64 insn_cnt = 0;
> +	bool one_map = true;
> +
> +	if (to_ip && *ip == to_ip)
> +		goto out_no_cache;
> +
> +	bufsz = intel_pt_insn_max_size();
> +
> +	if (*ip >= ptq->pt->kernel_start)
> +		cpumode = PERF_RECORD_MISC_KERNEL;
> +	else
> +		cpumode = PERF_RECORD_MISC_USER;
> +
> +	thread = ptq->thread;
> +	if (!thread) {
> +		if (cpumode != PERF_RECORD_MISC_KERNEL)
> +			return -EINVAL;
> +		thread = ptq->pt->unknown_thread;
> +	}
> +
> +	while (1) {
> +		thread__find_addr_map(thread, cpumode, MAP__FUNCTION, *ip, &al);
> +		if (!al.map || !al.map->dso)
> +			return -EINVAL;
> +
> +		if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR &&
> +		    dso__data_status_seen(al.map->dso,
> +					  DSO_DATA_STATUS_SEEN_ITRACE))
> +			return -ENOENT;
> +
> +		offset = al.map->map_ip(al.map, *ip);
> +
> +		if (!to_ip && one_map) {
> +			struct intel_pt_cache_entry *e;
> +
> +			e = intel_pt_cache_lookup(al.map->dso, machine, offset);
> +			if (e &&
> +			    (!max_insn_cnt || e->insn_cnt <= max_insn_cnt)) {
> +				*insn_cnt_ptr = e->insn_cnt;
> +				*ip += e->byte_cnt;
> +				intel_pt_insn->op = e->op;
> +				intel_pt_insn->branch = e->branch;
> +				intel_pt_insn->length = e->length;
> +				intel_pt_insn->rel = e->rel;
> +				intel_pt_log_insn_no_data(intel_pt_insn, *ip);
> +				return 0;
> +			}
> +		}
> +
> +		start_offset = offset;
> +		start_ip = *ip;
> +
> +		/* Load maps to ensure dso->is_64_bit has been updated */
> +		map__load(al.map, machine->symbol_filter);
> +
> +		x86_64 = al.map->dso->is_64_bit;
> +
> +		while (1) {
> +			len = dso__data_read_offset(al.map->dso, machine,
> +						    offset, buf, bufsz);
> +			if (len <= 0)
> +				return -EINVAL;
> +
> +			if (intel_pt_get_insn(buf, len, x86_64, intel_pt_insn))
> +				return -EINVAL;
> +
> +			intel_pt_log_insn(intel_pt_insn, *ip);
> +
> +			insn_cnt += 1;
> +
> +			if (intel_pt_insn->branch != INTEL_PT_BR_NO_BRANCH)
> +				goto out;
> +
> +			if (max_insn_cnt && insn_cnt >= max_insn_cnt)
> +				goto out_no_cache;
> +
> +			*ip += intel_pt_insn->length;
> +
> +			if (to_ip && *ip == to_ip)
> +				goto out_no_cache;
> +
> +			if (*ip >= al.map->end)
> +				break;
> +
> +			offset += intel_pt_insn->length;
> +		}
> +		one_map = false;
> +	}
> +out:
> +	*insn_cnt_ptr = insn_cnt;
> +
> +	if (!one_map)
> +		goto out_no_cache;
> +
> +	/*
> +	 * Didn't lookup in the 'to_ip' case, so do it now to prevent duplicate
> +	 * entries.
> +	 */
> +	if (to_ip) {
> +		struct intel_pt_cache_entry *e;
> +
> +		e = intel_pt_cache_lookup(al.map->dso, machine, start_offset);
> +		if (e)
> +			return 0;
> +	}
> +
> +	/* Ignore cache errors */
> +	intel_pt_cache_add(al.map->dso, machine, start_offset, insn_cnt,
> +			   *ip - start_ip, intel_pt_insn);
> +
> +	return 0;
> +
> +out_no_cache:
> +	*insn_cnt_ptr = insn_cnt;
> +	return 0;
> +}
> +
> +static bool intel_pt_get_config(struct intel_pt *pt,
> +				struct perf_event_attr *attr, u64 *config)
> +{
> +	if (attr->type == pt->pmu_type) {
> +		if (config)
> +			*config = attr->config;
> +		return true;
> +	}
> +
> +	return false;
> +}
> +
> +static bool intel_pt_exclude_kernel(struct intel_pt *pt)
> +{
> +	struct perf_evsel *evsel;
> +
> +	evlist__for_each(pt->session->evlist, evsel) {
> +		if (intel_pt_get_config(pt, &evsel->attr, NULL) &&
> +		    !evsel->attr.exclude_kernel)
> +			return false;
> +	}
> +	return true;
> +}
> +
> +static bool intel_pt_return_compression(struct intel_pt *pt)
> +{
> +	struct perf_evsel *evsel;
> +	u64 config;
> +
> +	if (!pt->noretcomp_bit)
> +		return true;
> +
> +	evlist__for_each(pt->session->evlist, evsel) {
> +		if (intel_pt_get_config(pt, &evsel->attr, &config) &&
> +		    (config & pt->noretcomp_bit))
> +			return false;
> +	}
> +	return true;
> +}
> +
> +static bool intel_pt_timeless_decoding(struct intel_pt *pt)
> +{
> +	struct perf_evsel *evsel;
> +	bool timeless_decoding = true;
> +	u64 config;
> +
> +	if (!pt->tsc_bit || !pt->cap_user_time_zero)
> +		return true;
> +
> +	evlist__for_each(pt->session->evlist, evsel) {
> +		if (!(evsel->attr.sample_type & PERF_SAMPLE_TIME))
> +			return true;
> +		if (intel_pt_get_config(pt, &evsel->attr, &config)) {
> +			if (config & pt->tsc_bit)
> +				timeless_decoding = false;
> +			else
> +				return true;
> +		}
> +	}
> +	return timeless_decoding;
> +}
> +
> +static bool intel_pt_tracing_kernel(struct intel_pt *pt)
> +{
> +	struct perf_evsel *evsel;
> +
> +	evlist__for_each(pt->session->evlist, evsel) {
> +		if (intel_pt_get_config(pt, &evsel->attr, NULL) &&
> +		    !evsel->attr.exclude_kernel)
> +			return true;
> +	}
> +	return false;
> +}
> +
> +static bool intel_pt_have_tsc(struct intel_pt *pt)
> +{
> +	struct perf_evsel *evsel;
> +	bool have_tsc = false;
> +	u64 config;
> +
> +	if (!pt->tsc_bit)
> +		return false;
> +
> +	evlist__for_each(pt->session->evlist, evsel) {
> +		if (intel_pt_get_config(pt, &evsel->attr, &config)) {
> +			if (config & pt->tsc_bit)
> +				have_tsc = true;
> +			else
> +				return false;
> +		}
> +	}
> +	return have_tsc;
> +}
> +
> +static u64 intel_pt_ns_to_ticks(const struct intel_pt *pt, u64 ns)
> +{
> +	u64 quot, rem;
> +
> +	quot = ns / pt->tc.time_mult;
> +	rem  = ns % pt->tc.time_mult;
> +	return (quot << pt->tc.time_shift) + (rem << pt->tc.time_shift) /
> +		pt->tc.time_mult;
> +}
> +
> +static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
> +						   unsigned int queue_nr)
> +{
> +	struct intel_pt_params params = { .get_trace = 0, };
> +	struct intel_pt_queue *ptq;
> +
> +	ptq = zalloc(sizeof(struct intel_pt_queue));
> +	if (!ptq)
> +		return NULL;
> +
> +	if (pt->synth_opts.callchain) {
> +		size_t sz = sizeof(struct ip_callchain);
> +
> +		sz += pt->synth_opts.callchain_sz * sizeof(u64);
> +		ptq->chain = zalloc(sz);
> +		if (!ptq->chain)
> +			goto out_free;
> +	}
> +
> +	ptq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
> +	if (!ptq->event_buf)
> +		goto out_free;
> +
> +	ptq->pt = pt;
> +	ptq->queue_nr = queue_nr;
> +	ptq->exclude_kernel = intel_pt_exclude_kernel(pt);
> +	ptq->pid = -1;
> +	ptq->tid = -1;
> +	ptq->cpu = -1;
> +	ptq->next_tid = -1;
> +
> +	params.get_trace = intel_pt_get_trace;
> +	params.walk_insn = intel_pt_walk_next_insn;
> +	params.data = ptq;
> +	params.return_compression = intel_pt_return_compression(pt);
> +
> +	if (pt->synth_opts.instructions) {
> +		if (pt->synth_opts.period) {
> +			switch (pt->synth_opts.period_type) {
> +			case PERF_ITRACE_PERIOD_INSTRUCTIONS:
> +				params.period_type =
> +						INTEL_PT_PERIOD_INSTRUCTIONS;
> +				params.period = pt->synth_opts.period;
> +				break;
> +			case PERF_ITRACE_PERIOD_TICKS:
> +				params.period_type = INTEL_PT_PERIOD_TICKS;
> +				params.period = pt->synth_opts.period;
> +				break;
> +			case PERF_ITRACE_PERIOD_NANOSECS:
> +				params.period_type = INTEL_PT_PERIOD_TICKS;
> +				params.period = intel_pt_ns_to_ticks(pt,
> +							pt->synth_opts.period);
> +				break;
> +			default:
> +				break;
> +			}
> +		}
> +
> +		if (!params.period) {
> +			params.period_type = INTEL_PT_PERIOD_INSTRUCTIONS;
> +			params.period = 1000;
> +		}
> +	}
> +
> +	ptq->decoder = intel_pt_decoder_new(&params);
> +	if (!ptq->decoder)
> +		goto out_free;
> +
> +	return ptq;
> +
> +out_free:
> +	zfree(&ptq->event_buf);
> +	zfree(&ptq->chain);
> +	free(ptq);
> +	return NULL;
> +}
> +
> +static void intel_pt_free_queue(void *priv)
> +{
> +	struct intel_pt_queue *ptq = priv;
> +
> +	if (!ptq)
> +		return;
> +	intel_pt_decoder_free(ptq->decoder);
> +	zfree(&ptq->event_buf);
> +	zfree(&ptq->chain);
> +	free(ptq);
> +}
> +
> +static void intel_pt_set_pid_tid_cpu(struct intel_pt *pt,
> +				     struct auxtrace_queue *queue)
> +{
> +	struct intel_pt_queue *ptq = queue->priv;
> +
> +	if (queue->tid == -1 || pt->have_sched_switch) {
> +		ptq->tid = machine__get_current_tid(pt->machine, ptq->cpu);
> +		ptq->thread = NULL;
> +	}
> +
> +	if (!ptq->thread && ptq->tid != -1)
> +		ptq->thread = machine__find_thread(pt->machine, -1, ptq->tid);
> +
> +	if (ptq->thread) {
> +		ptq->pid = ptq->thread->pid_;
> +		if (queue->cpu == -1)
> +			ptq->cpu = ptq->thread->cpu;
> +	}
> +}
> +
> +static void intel_pt_sample_flags(struct intel_pt_queue *ptq)
> +{
> +	if (ptq->state->flags & INTEL_PT_ABORT_TX) {
> +		ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TX_ABORT;
> +	} else if (ptq->state->flags & INTEL_PT_ASYNC) {
> +		if (ptq->state->to_ip)
> +			ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
> +				     PERF_IP_FLAG_ASYNC |
> +				     PERF_IP_FLAG_INTERRUPT;
> +		else
> +			ptq->flags = PERF_IP_FLAG_BRANCH |
> +				     PERF_IP_FLAG_TRACE_END;
> +		ptq->insn_len = 0;
> +	} else {
> +		if (ptq->state->from_ip)
> +			ptq->flags = intel_pt_insn_type(ptq->state->insn_op);
> +		else
> +			ptq->flags = PERF_IP_FLAG_BRANCH |
> +				     PERF_IP_FLAG_TRACE_BEGIN;
> +		if (ptq->state->flags & INTEL_PT_IN_TX)
> +			ptq->flags |= PERF_IP_FLAG_IN_TX;
> +		ptq->insn_len = ptq->state->insn_len;
> +	}
> +}
> +
> +static int intel_pt_setup_queue(struct intel_pt *pt,
> +				struct auxtrace_queue *queue,
> +				unsigned int queue_nr)
> +{
> +	struct intel_pt_queue *ptq = queue->priv;
> +
> +	if (list_empty(&queue->head))
> +		return 0;
> +
> +	if (!ptq) {
> +		ptq = intel_pt_alloc_queue(pt, queue_nr);
> +		if (!ptq)
> +			return -ENOMEM;
> +		queue->priv = ptq;
> +
> +		if (queue->cpu != -1)
> +			ptq->cpu = queue->cpu;
> +		ptq->tid = queue->tid;
> +
> +		if (pt->sampling_mode) {
> +			if (pt->timeless_decoding)
> +				ptq->step_through_buffers = true;
> +			if (pt->timeless_decoding || !pt->have_sched_switch)
> +				ptq->use_buffer_pid_tid = true;
> +		}
> +	}
> +
> +	if (!ptq->on_heap &&
> +	    (!pt->sync_switch ||
> +	     ptq->switch_state != INTEL_PT_SS_EXPECTING_SWITCH_EVENT)) {
> +		const struct intel_pt_state *state;
> +		int ret;
> +
> +		if (pt->timeless_decoding)
> +			return 0;
> +
> +		intel_pt_log("queue %u getting timestamp\n", queue_nr);
> +		intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
> +			     queue_nr, ptq->cpu, ptq->pid, ptq->tid);
> +		while (1) {
> +			state = intel_pt_decode(ptq->decoder);
> +			if (state->err) {
> +				if (state->err == INTEL_PT_ERR_NODATA) {
> +					intel_pt_log("queue %u has no timestamp\n",
> +						     queue_nr);
> +					return 0;
> +				}
> +				continue;
> +			}
> +			if (state->timestamp)
> +				break;
> +		}
> +
> +		ptq->timestamp = state->timestamp;
> +		intel_pt_log("queue %u timestamp 0x%" PRIx64 "\n",
> +			     queue_nr, ptq->timestamp);
> +		ptq->state = state;
> +		ptq->have_sample = true;
> +		intel_pt_sample_flags(ptq);
> +		ret = auxtrace_heap__add(&pt->heap, queue_nr, ptq->timestamp);
> +		if (ret)
> +			return ret;
> +		ptq->on_heap = true;
> +	}
> +
> +	return 0;
> +}
> +
> +static int intel_pt_setup_queues(struct intel_pt *pt)
> +{
> +	unsigned int i;
> +	int ret;
> +
> +	for (i = 0; i < pt->queues.nr_queues; i++) {
> +		ret = intel_pt_setup_queue(pt, &pt->queues.queue_array[i], i);
> +		if (ret)
> +			return ret;
> +	}
> +	return 0;
> +}
> +
> +static int intel_pt_inject_event(union perf_event *event,
> +				 struct perf_sample *sample, u64 type,
> +				 bool swapped)
> +{
> +	event->header.size = perf_event__sample_event_size(sample, type, 0);
> +	return perf_event__synthesize_sample(event, type, 0, sample, swapped);
> +}
> +
> +static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
> +{
> +	int ret;
> +	struct intel_pt *pt = ptq->pt;
> +	union perf_event *event = ptq->event_buf;
> +	struct perf_sample sample = { .ip = 0, };
> +
> +	event->sample.header.type = PERF_RECORD_SAMPLE;
> +	event->sample.header.misc = PERF_RECORD_MISC_USER;
> +	event->sample.header.size = sizeof(struct perf_event_header);
> +
> +	if (!pt->timeless_decoding)
> +		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
> +
> +	sample.ip = ptq->state->from_ip;
> +	sample.pid = ptq->pid;
> +	sample.tid = ptq->tid;
> +	sample.addr = ptq->state->to_ip;
> +	sample.id = ptq->pt->branches_id;
> +	sample.stream_id = ptq->pt->branches_id;
> +	sample.period = 1;
> +	sample.cpu = ptq->cpu;
> +
> +	if (pt->branches_filter && !(pt->branches_filter & ptq->flags))
> +		return 0;
> +
> +	if (pt->synth_opts.inject) {
> +		ret = intel_pt_inject_event(event, &sample,
> +					    pt->branches_sample_type,
> +					    pt->synth_needs_swap);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
> +	if (ret)
> +		pr_err("Intel Processor Trace: failed to deliver branch event, error %d\n",
> +		       ret);
> +
> +	return ret;
> +}
> +
> +static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
> +{
> +	int ret;
> +	struct intel_pt *pt = ptq->pt;
> +	union perf_event *event = ptq->event_buf;
> +	struct perf_sample sample = { .ip = 0, };
> +
> +	event->sample.header.type = PERF_RECORD_SAMPLE;
> +	event->sample.header.misc = PERF_RECORD_MISC_USER;
> +	event->sample.header.size = sizeof(struct perf_event_header);
> +
> +	if (!pt->timeless_decoding)
> +		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
> +
> +	sample.ip = ptq->state->from_ip;
> +	sample.pid = ptq->pid;
> +	sample.tid = ptq->tid;
> +	sample.addr = ptq->state->to_ip;
> +	sample.id = ptq->pt->instructions_id;
> +	sample.stream_id = ptq->pt->instructions_id;
> +	sample.period = ptq->pt->instructions_sample_period;
> +	sample.cpu = ptq->cpu;
> +
> +	if (pt->synth_opts.callchain) {
> +		thread_stack__sample(ptq->thread, ptq->chain,
> +				     pt->synth_opts.callchain_sz, sample.ip);
> +		sample.callchain = ptq->chain;
> +	}
> +
> +	if (pt->synth_opts.inject) {
> +		ret = intel_pt_inject_event(event, &sample,
> +					    pt->instructions_sample_type,
> +					    pt->synth_needs_swap);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
> +	if (ret)
> +		pr_err("Intel Processor Trace: failed to deliver instruction event, error %d\n",
> +		       ret);
> +
> +	return ret;
> +}
> +
> +static int intel_pt_synth_transaction_sample(struct intel_pt_queue *ptq)
> +{
> +	int ret;
> +	struct intel_pt *pt = ptq->pt;
> +	union perf_event *event = ptq->event_buf;
> +	struct perf_sample sample = { .ip = 0, };
> +
> +	event->sample.header.type = PERF_RECORD_SAMPLE;
> +	event->sample.header.misc = PERF_RECORD_MISC_USER;
> +	event->sample.header.size = sizeof(struct perf_event_header);
> +
> +	if (!pt->timeless_decoding)
> +		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
> +
> +	sample.ip = ptq->state->from_ip;
> +	sample.pid = ptq->pid;
> +	sample.tid = ptq->tid;
> +	sample.addr = ptq->state->to_ip;
> +	sample.id = ptq->pt->transactions_id;
> +	sample.stream_id = ptq->pt->transactions_id;
> +	sample.period = 1;
> +	sample.cpu = ptq->cpu;
> +	sample.flags = ptq->flags;
> +	sample.insn_len = ptq->insn_len;
> +
> +	if (pt->synth_opts.callchain) {
> +		thread_stack__sample(ptq->thread, ptq->chain,
> +				     pt->synth_opts.callchain_sz, sample.ip);
> +		sample.callchain = ptq->chain;
> +	}
> +
> +	if (pt->synth_opts.inject) {
> +		ret = intel_pt_inject_event(event, &sample,
> +					    pt->transactions_sample_type,
> +					    pt->synth_needs_swap);
> +		if (ret)
> +			return ret;
> +	}
> +
> +	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
> +	if (ret)
> +		pr_err("Intel Processor Trace: failed to deliver transaction event, error %d\n",
> +		       ret);
> +
> +	return ret;
> +}
> +
> +static int intel_pt_synth_error(struct intel_pt *pt, int code, int cpu,
> +				pid_t pid, pid_t tid, u64 ip)
> +{
> +	union perf_event event;
> +	char msg[MAX_AUXTRACE_ERROR_MSG];
> +	int err;
> +
> +	intel_pt__strerror(code, msg, MAX_AUXTRACE_ERROR_MSG);
> +
> +	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
> +			     code, cpu, pid, tid, ip, msg);
> +
> +	err = perf_session__deliver_synth_event(pt->session, &event, NULL);
> +	if (err)
> +		pr_err("Intel Processor Trace: failed to deliver error event, error %d\n",
> +		       err);
> +
> +	return err;
> +}
> +
> +static int intel_pt_next_tid(struct intel_pt *pt, struct intel_pt_queue *ptq)
> +{
> +	struct auxtrace_queue *queue;
> +	pid_t tid = ptq->next_tid;
> +	int err;
> +
> +	if (tid == -1)
> +		return 0;
> +
> +	intel_pt_log("switch: cpu %d tid %d\n", ptq->cpu, tid);
> +
> +	err = machine__set_current_tid(pt->machine, ptq->cpu, -1, tid);
> +
> +	queue = &pt->queues.queue_array[ptq->queue_nr];
> +	intel_pt_set_pid_tid_cpu(pt, queue);
> +
> +	ptq->next_tid = -1;
> +
> +	return err;
> +}
> +
> +static inline bool intel_pt_is_switch_ip(struct intel_pt_queue *ptq, u64 ip)
> +{
> +	struct intel_pt *pt = ptq->pt;
> +
> +	return ip == pt->switch_ip &&
> +	       (ptq->flags & PERF_IP_FLAG_BRANCH) &&
> +	       !(ptq->flags & (PERF_IP_FLAG_CONDITIONAL | PERF_IP_FLAG_ASYNC |
> +			       PERF_IP_FLAG_INTERRUPT | PERF_IP_FLAG_TX_ABORT));
> +}
> +
> +static int intel_pt_sample(struct intel_pt_queue *ptq)
> +{
> +	const struct intel_pt_state *state = ptq->state;
> +	struct intel_pt *pt = ptq->pt;
> +	int err;
> +
> +	if (!ptq->have_sample)
> +		return 0;
> +
> +	ptq->have_sample = false;
> +
> +	if (pt->sample_instructions &&
> +	    (state->type & INTEL_PT_INSTRUCTION)) {
> +		err = intel_pt_synth_instruction_sample(ptq);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (pt->sample_transactions &&
> +	    (state->type & INTEL_PT_TRANSACTION)) {
> +		err = intel_pt_synth_transaction_sample(ptq);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (!(state->type & INTEL_PT_BRANCH))
> +		return 0;
> +
> +	if (pt->synth_opts.callchain)
> +		thread_stack__event(ptq->thread, ptq->flags, state->from_ip,
> +				    state->to_ip, ptq->insn_len,
> +				    state->trace_nr);
> +
> +	if (pt->sample_branches) {
> +		err = intel_pt_synth_branch_sample(ptq);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (!pt->sync_switch)
> +		return 0;
> +
> +	if (intel_pt_is_switch_ip(ptq, state->to_ip)) {
> +		switch (ptq->switch_state) {
> +		case INTEL_PT_SS_UNKNOWN:
> +		case INTEL_PT_SS_EXPECTING_SWITCH_IP:
> +			err = intel_pt_next_tid(pt, ptq);
> +			if (err)
> +				return err;
> +			ptq->switch_state = INTEL_PT_SS_TRACING;
> +			break;
> +		default:
> +			ptq->switch_state = INTEL_PT_SS_EXPECTING_SWITCH_EVENT;
> +			return 1;
> +		}
> +	} else if (!state->to_ip) {
> +		ptq->switch_state = INTEL_PT_SS_NOT_TRACING;
> +	} else if (ptq->switch_state == INTEL_PT_SS_NOT_TRACING) {
> +		ptq->switch_state = INTEL_PT_SS_UNKNOWN;
> +	} else if (ptq->switch_state == INTEL_PT_SS_UNKNOWN &&
> +		   state->to_ip == pt->ptss_ip &&
> +		   (ptq->flags & PERF_IP_FLAG_CALL)) {
> +		ptq->switch_state = INTEL_PT_SS_TRACING;
> +	}
> +
> +	return 0;
> +}
> +
> +static u64 intel_pt_switch_ip(struct machine *machine, u64 *ptss_ip)
> +{
> +	struct map *map;
> +	struct symbol *sym, *start;
> +	u64 ip, switch_ip = 0;
> +
> +	if (ptss_ip)
> +		*ptss_ip = 0;
> +
> +	map = machine__kernel_map(machine, MAP__FUNCTION);
> +	if (!map)
> +		return 0;
> +
> +	if (map__load(map, machine->symbol_filter))
> +		return 0;
> +
> +	start = dso__first_symbol(map->dso, MAP__FUNCTION);
> +
> +	for (sym = start; sym; sym = dso__next_symbol(sym)) {
> +		if (sym->binding == STB_GLOBAL &&
> +		    !strcmp(sym->name, "__switch_to")) {
> +			ip = map->unmap_ip(map, sym->start);
> +			if (ip >= map->start && ip < map->end) {
> +				switch_ip = ip;
> +				break;
> +			}
> +		}
> +	}
> +
> +	if (!switch_ip || !ptss_ip)
> +		return 0;
> +
> +	for (sym = start; sym; sym = dso__next_symbol(sym)) {
> +		if (!strcmp(sym->name, "perf_trace_sched_switch")) {
> +			ip = map->unmap_ip(map, sym->start);
> +			if (ip >= map->start && ip < map->end) {
> +				*ptss_ip = ip;
> +				break;
> +			}
> +		}
> +	}
> +
> +	return switch_ip;
> +}
> +
> +static int intel_pt_run_decoder(struct intel_pt_queue *ptq, u64 *timestamp)
> +{
> +	const struct intel_pt_state *state = ptq->state;
> +	struct intel_pt *pt = ptq->pt;
> +	int err;
> +
> +	if (!pt->kernel_start) {
> +		pt->kernel_start = machine__kernel_start(pt->machine);
> +		if (pt->per_cpu_mmaps && pt->have_sched_switch &&
> +		    !pt->timeless_decoding && intel_pt_tracing_kernel(pt) &&
> +		    !pt->sampling_mode) {
> +			pt->switch_ip = intel_pt_switch_ip(pt->machine,
> +							   &pt->ptss_ip);
> +			if (pt->switch_ip) {
> +				intel_pt_log("switch_ip: %"PRIx64" ptss_ip: %"PRIx64"\n",
> +					     pt->switch_ip, pt->ptss_ip);
> +				pt->sync_switch = true;
> +				pt->est_tsc_orig = pt->est_tsc;
> +				pt->est_tsc = false;
> +			}
> +		}
> +	}
> +
> +	intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
> +		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
> +	while (1) {
> +		err = intel_pt_sample(ptq);
> +		if (err)
> +			return err;
> +
> +		state = intel_pt_decode(ptq->decoder);
> +		if (state->err) {
> +			if (state->err == INTEL_PT_ERR_NODATA)
> +				return 1;
> +			if (pt->sync_switch &&
> +			    state->from_ip >= pt->kernel_start) {
> +				pt->sync_switch = false;
> +				pt->est_tsc = pt->est_tsc_orig;
> +				intel_pt_next_tid(pt, ptq);
> +			}
> +			if (pt->synth_opts.errors) {
> +				err = intel_pt_synth_error(pt, state->err,
> +							   ptq->cpu, ptq->pid,
> +							   ptq->tid,
> +							   state->from_ip);
> +				if (err)
> +					return err;
> +			}
> +			continue;
> +		}
> +
> +		ptq->state = state;
> +		ptq->have_sample = true;
> +		intel_pt_sample_flags(ptq);
> +
> +		/* Use estimated TSC upon return to user space */
> +		if (pt->est_tsc) {
> +			if (state->from_ip >= pt->kernel_start &&
> +			    state->to_ip &&
> +			    state->to_ip < pt->kernel_start)
> +				ptq->timestamp = state->est_timestamp;
> +			else if (state->timestamp > ptq->timestamp)
> +				ptq->timestamp = state->timestamp;
> +		/* Use estimated TSC in unknown switch state */
> +		} else if (pt->sync_switch &&
> +			   ptq->switch_state == INTEL_PT_SS_UNKNOWN &&
> +			   state->to_ip == pt->switch_ip &&
> +			   (ptq->flags & PERF_IP_FLAG_CALL) &&
> +			   ptq->next_tid == -1) {
> +			ptq->timestamp = state->est_timestamp;
> +		} else if (state->timestamp > ptq->timestamp) {
> +			ptq->timestamp = state->timestamp;
> +		}
> +
> +		if (!pt->timeless_decoding && ptq->timestamp >= *timestamp) {
> +			*timestamp = ptq->timestamp;
> +			return 0;
> +		}
> +	}
> +	return 0;
> +}
> +
> +static inline int intel_pt_update_queues(struct intel_pt *pt)
> +{
> +	if (pt->queues.new_data) {
> +		pt->queues.new_data = false;
> +		return intel_pt_setup_queues(pt);
> +	}
> +	return 0;
> +}
> +
> +static int intel_pt_process_queues(struct intel_pt *pt, u64 timestamp)
> +{
> +	unsigned int queue_nr;
> +	u64 ts;
> +	int ret;
> +
> +	while (1) {
> +		struct auxtrace_queue *queue;
> +		struct intel_pt_queue *ptq;
> +
> +		if (!pt->heap.heap_cnt)
> +			return 0;
> +
> +		if (pt->heap.heap_array[0].ordinal >= timestamp)
> +			return 0;
> +
> +		queue_nr = pt->heap.heap_array[0].queue_nr;
> +		queue = &pt->queues.queue_array[queue_nr];
> +		ptq = queue->priv;
> +
> +		intel_pt_log("queue %u processing 0x%" PRIx64 " to 0x%" PRIx64 "\n",
> +			     queue_nr, pt->heap.heap_array[0].ordinal,
> +			     timestamp);
> +
> +		auxtrace_heap__pop(&pt->heap);
> +
> +		if (pt->heap.heap_cnt) {
> +			ts = pt->heap.heap_array[0].ordinal + 1;
> +			if (ts > timestamp)
> +				ts = timestamp;
> +		} else {
> +			ts = timestamp;
> +		}
> +
> +		intel_pt_set_pid_tid_cpu(pt, queue);
> +
> +		ret = intel_pt_run_decoder(ptq, &ts);
> +
> +		if (ret < 0) {
> +			auxtrace_heap__add(&pt->heap, queue_nr, ts);
> +			return ret;
> +		}
> +
> +		if (!ret) {
> +			ret = auxtrace_heap__add(&pt->heap, queue_nr, ts);
> +			if (ret < 0)
> +				return ret;
> +		} else {
> +			ptq->on_heap = false;
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int intel_pt_process_timeless_queues(struct intel_pt *pt, pid_t tid,
> +					    u64 time_)
> +{
> +	struct auxtrace_queues *queues = &pt->queues;
> +	unsigned int i;
> +	u64 ts = 0;
> +
> +	for (i = 0; i < queues->nr_queues; i++) {
> +		struct auxtrace_queue *queue = &pt->queues.queue_array[i];
> +		struct intel_pt_queue *ptq = queue->priv;
> +
> +		if (ptq && (tid == -1 || ptq->tid == tid)) {
> +			ptq->time = time_;
> +			intel_pt_set_pid_tid_cpu(pt, queue);
> +			intel_pt_run_decoder(ptq, &ts);
> +		}
> +	}
> +	return 0;
> +}
> +
> +static int intel_pt_lost(struct intel_pt *pt, struct perf_sample *sample)
> +{
> +	return intel_pt_synth_error(pt, INTEL_PT_ERR_LOST, sample->cpu,
> +				    sample->pid, sample->tid, 0);
> +}
> +
> +static struct intel_pt_queue *intel_pt_cpu_to_ptq(struct intel_pt *pt, int cpu)
> +{
> +	unsigned i, j;
> +
> +	if (cpu < 0 || !pt->queues.nr_queues)
> +		return NULL;
> +
> +	if ((unsigned)cpu >= pt->queues.nr_queues)
> +		i = pt->queues.nr_queues - 1;
> +	else
> +		i = cpu;
> +
> +	if (pt->queues.queue_array[i].cpu == cpu)
> +		return pt->queues.queue_array[i].priv;
> +
> +	for (j = 0; i > 0; j++) {
> +		if (pt->queues.queue_array[--i].cpu == cpu)
> +			return pt->queues.queue_array[i].priv;
> +	}
> +
> +	for (; j < pt->queues.nr_queues; j++) {
> +		if (pt->queues.queue_array[j].cpu == cpu)
> +			return pt->queues.queue_array[j].priv;
> +	}
> +
> +	return NULL;
> +}
> +
> +static int intel_pt_process_switch(struct intel_pt *pt,
> +				   struct perf_sample *sample)
> +{
> +	struct intel_pt_queue *ptq;
> +	struct perf_evsel *evsel;
> +	pid_t tid;
> +	int cpu, err;
> +
> +	evsel = perf_evlist__id2evsel(pt->session->evlist, sample->id);
> +	if (evsel != pt->switch_evsel)
> +		return 0;
> +
> +	tid = perf_evsel__intval(evsel, sample, "next_pid");
> +	cpu = sample->cpu;
> +
> +	intel_pt_log("sched_switch: cpu %d tid %d time %"PRIu64" tsc %#"PRIx64"\n",
> +		     cpu, tid, sample->time, perf_time_to_tsc(sample->time,
> +		     &pt->tc));
> +
> +	if (!pt->sync_switch)
> +		goto out;
> +
> +	ptq = intel_pt_cpu_to_ptq(pt, cpu);
> +	if (!ptq)
> +		goto out;
> +
> +	switch (ptq->switch_state) {
> +	case INTEL_PT_SS_NOT_TRACING:
> +		ptq->next_tid = -1;
> +		break;
> +	case INTEL_PT_SS_UNKNOWN:
> +	case INTEL_PT_SS_TRACING:
> +		ptq->next_tid = tid;
> +		ptq->switch_state = INTEL_PT_SS_EXPECTING_SWITCH_IP;
> +		return 0;
> +	case INTEL_PT_SS_EXPECTING_SWITCH_EVENT:
> +		if (!ptq->on_heap) {
> +			ptq->timestamp = perf_time_to_tsc(sample->time,
> +							  &pt->tc);
> +			err = auxtrace_heap__add(&pt->heap, ptq->queue_nr,
> +						 ptq->timestamp);
> +			if (err)
> +				return err;
> +			ptq->on_heap = true;
> +		}
> +		ptq->switch_state = INTEL_PT_SS_TRACING;
> +		break;
> +	case INTEL_PT_SS_EXPECTING_SWITCH_IP:
> +		ptq->next_tid = tid;
> +		intel_pt_log("ERROR: cpu %d expecting switch ip\n", cpu);
> +		break;
> +	default:
> +		break;
> +	}
> +out:
> +	return machine__set_current_tid(pt->machine, cpu, -1, tid);
> +}
> +
> +static int intel_pt_process_itrace_start(struct intel_pt *pt,
> +					 union perf_event *event,
> +					 struct perf_sample *sample)
> +{
> +	if (!pt->per_cpu_mmaps)
> +		return 0;
> +
> +	intel_pt_log("itrace_start: cpu %d pid %d tid %d time %"PRIu64" tsc %#"PRIx64"\n",
> +		     sample->cpu, event->itrace_start.pid,
> +		     event->itrace_start.tid, sample->time,
> +		     perf_time_to_tsc(sample->time, &pt->tc));
> +
> +	return machine__set_current_tid(pt->machine, sample->cpu,
> +					event->itrace_start.pid,
> +					event->itrace_start.tid);
> +}
> +
> +static int intel_pt_process_event(struct perf_session *session,
> +				  union perf_event *event,
> +				  struct perf_sample *sample,
> +				  struct perf_tool *tool)
> +{
> +	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
> +					   auxtrace);
> +	u64 timestamp;
> +	int err = 0;
> +
> +	if (dump_trace)
> +		return 0;
> +
> +	if (!tool->ordered_events) {
> +		pr_err("Intel Processor Trace requires ordered events\n");
> +		return -EINVAL;
> +	}
> +
> +	if (sample->time)
> +		timestamp = perf_time_to_tsc(sample->time, &pt->tc);
> +	else
> +		timestamp = 0;
> +
> +	if (timestamp || pt->timeless_decoding) {
> +		err = intel_pt_update_queues(pt);
> +		if (err)
> +			return err;
> +	}
> +
> +	if (pt->timeless_decoding) {
> +		if (event->header.type == PERF_RECORD_EXIT) {
> +			err = intel_pt_process_timeless_queues(pt,
> +							       event->comm.tid,
> +							       sample->time);
> +		}
> +	} else if (timestamp) {
> +		err = intel_pt_process_queues(pt, timestamp);
> +	}
> +	if (err)
> +		return err;
> +
> +	if (event->header.type == PERF_RECORD_AUX &&
> +	    (event->aux.flags & PERF_AUX_FLAG_TRUNCATED) &&
> +	    pt->synth_opts.errors)
> +		err = intel_pt_lost(pt, sample);
> +
> +	if (pt->switch_evsel && event->header.type == PERF_RECORD_SAMPLE)
> +		err = intel_pt_process_switch(pt, sample);
> +	else if (event->header.type == PERF_RECORD_ITRACE_START)
> +		err = intel_pt_process_itrace_start(pt, event, sample);
> +
> +	return err;
> +}
> +
> +static int intel_pt_flush(struct perf_session *session, struct perf_tool *tool)
> +{
> +	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
> +					   auxtrace);
> +	int ret;
> +
> +	if (dump_trace)
> +		return 0;
> +
> +	if (!tool->ordered_events)
> +		return -EINVAL;
> +
> +	ret = intel_pt_update_queues(pt);
> +	if (ret < 0)
> +		return ret;
> +
> +	if (pt->timeless_decoding)
> +		return intel_pt_process_timeless_queues(pt, -1,
> +							MAX_TIMESTAMP - 1);
> +
> +	return intel_pt_process_queues(pt, MAX_TIMESTAMP);
> +}
> +
> +static void intel_pt_free_events(struct perf_session *session)
> +{
> +	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
> +					   auxtrace);
> +	struct auxtrace_queues *queues = &pt->queues;
> +	unsigned int i;
> +
> +	for (i = 0; i < queues->nr_queues; i++) {
> +		intel_pt_free_queue(queues->queue_array[i].priv);
> +		queues->queue_array[i].priv = NULL;
> +	}
> +	intel_pt_log_disable();
> +	auxtrace_queues__free(queues);
> +}
> +
> +static void intel_pt_free(struct perf_session *session)
> +{
> +	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
> +					   auxtrace);
> +
> +	auxtrace_heap__free(&pt->heap);
> +	intel_pt_free_events(session);
> +	session->auxtrace = NULL;
> +	thread__delete(pt->unknown_thread);
> +	free(pt);
> +}
> +
> +static int intel_pt_process_auxtrace_event(struct perf_session *session,
> +					   union perf_event *event,
> +					   struct perf_tool *tool __maybe_unused)
> +{
> +	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
> +					   auxtrace);
> +
> +	if (pt->sampling_mode)
> +		return 0;
> +
> +	if (!pt->data_queued) {
> +		struct auxtrace_buffer *buffer;
> +		off_t data_offset;
> +		int fd = perf_data_file__fd(session->file);
> +		int err;
> +
> +		if (perf_data_file__is_pipe(session->file)) {
> +			data_offset = 0;
> +		} else {
> +			data_offset = lseek(fd, 0, SEEK_CUR);
> +			if (data_offset == -1)
> +				return -errno;
> +		}
> +
> +		err = auxtrace_queues__add_event(&pt->queues, session, event,
> +						 data_offset, &buffer);
> +		if (err)
> +			return err;
> +
> +		/* Dump here now we have copied a piped trace out of the pipe */
> +		if (dump_trace) {
> +			if (auxtrace_buffer__get_data(buffer, fd)) {
> +				intel_pt_dump_event(pt, buffer->data,
> +						    buffer->size);
> +				auxtrace_buffer__put_data(buffer);
> +			}
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +struct intel_pt_synth {
> +	struct perf_tool dummy_tool;
> +	struct perf_session *session;
> +};
> +
> +static int intel_pt_event_synth(struct perf_tool *tool,
> +				union perf_event *event,
> +				struct perf_sample *sample __maybe_unused,
> +				struct machine *machine __maybe_unused)
> +{
> +	struct intel_pt_synth *intel_pt_synth =
> +			container_of(tool, struct intel_pt_synth, dummy_tool);
> +
> +	return perf_session__deliver_synth_event(intel_pt_synth->session, event,
> +						 NULL);
> +}
> +
> +static int intel_pt_synth_event(struct perf_session *session,
> +				struct perf_event_attr *attr, u64 id)
> +{
> +	struct intel_pt_synth intel_pt_synth;
> +
> +	memset(&intel_pt_synth, 0, sizeof(struct intel_pt_synth));
> +	intel_pt_synth.session = session;
> +
> +	return perf_event__synthesize_attr(&intel_pt_synth.dummy_tool, attr, 1,
> +					   &id, intel_pt_event_synth);
> +}
> +
> +static int intel_pt_synth_events(struct intel_pt *pt,
> +				 struct perf_session *session)
> +{
> +	struct perf_evlist *evlist = session->evlist;
> +	struct perf_evsel *evsel;
> +	struct perf_event_attr attr;
> +	bool found = false;
> +	u64 id;
> +	int err;
> +
> +	evlist__for_each(evlist, evsel) {
> +		if (evsel->attr.type == pt->pmu_type && evsel->ids) {
> +			found = true;
> +			break;
> +		}
> +	}
> +
> +	if (!found) {
> +		pr_debug("There are no selected events with Intel Processor Trace data\n");
> +		return 0;
> +	}
> +
> +	memset(&attr, 0, sizeof(struct perf_event_attr));
> +	attr.size = sizeof(struct perf_event_attr);
> +	attr.type = PERF_TYPE_HARDWARE;
> +	attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
> +	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
> +			    PERF_SAMPLE_PERIOD;
> +	if (pt->timeless_decoding)
> +		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
> +	else
> +		attr.sample_type |= PERF_SAMPLE_TIME;
> +	if (!pt->per_cpu_mmaps)
> +		attr.sample_type &= ~(u64)PERF_SAMPLE_CPU;
> +	attr.exclude_user = evsel->attr.exclude_user;
> +	attr.exclude_kernel = evsel->attr.exclude_kernel;
> +	attr.exclude_hv = evsel->attr.exclude_hv;
> +	attr.exclude_host = evsel->attr.exclude_host;
> +	attr.exclude_guest = evsel->attr.exclude_guest;
> +	attr.sample_id_all = evsel->attr.sample_id_all;
> +	attr.read_format = evsel->attr.read_format;
> +
> +	id = evsel->id[0] + 1000000000;
> +	if (!id)
> +		id = 1;
> +
> +	if (pt->synth_opts.instructions) {
> +		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
> +		if (pt->synth_opts.period_type == PERF_ITRACE_PERIOD_NANOSECS)
> +			attr.sample_period =
> +				intel_pt_ns_to_ticks(pt, pt->synth_opts.period);
> +		else
> +			attr.sample_period = pt->synth_opts.period;
> +		pt->instructions_sample_period = attr.sample_period;
> +		if (pt->synth_opts.callchain)
> +			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
> +		pr_debug("Synthesizing 'instructions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
> +			 id, (u64)attr.sample_type);
> +		err = intel_pt_synth_event(session, &attr, id);
> +		if (err) {
> +			pr_err("%s: failed to synthesize 'instructions' event type\n",
> +			       __func__);
> +			return err;
> +		}
> +		pt->sample_instructions = true;
> +		pt->instructions_sample_type = attr.sample_type;
> +		pt->instructions_id = id;
> +		id += 1;
> +	}
> +
> +	if (pt->synth_opts.transactions) {
> +		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
> +		attr.sample_period = 1;
> +		if (pt->synth_opts.callchain)
> +			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
> +		pr_debug("Synthesizing 'transactions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
> +			 id, (u64)attr.sample_type);
> +		err = intel_pt_synth_event(session, &attr, id);
> +		if (err) {
> +			pr_err("%s: failed to synthesize 'transactions' event type\n",
> +			       __func__);
> +			return err;
> +		}
> +		pt->sample_transactions = true;
> +		pt->transactions_id = id;
> +		id += 1;
> +		evlist__for_each(evlist, evsel) {
> +			if (evsel->id && evsel->id[0] == pt->transactions_id) {
> +				if (evsel->name)
> +					zfree(&evsel->name);
> +				evsel->name = strdup("transactions");
> +				break;
> +			}
> +		}
> +	}
> +
> +	if (pt->synth_opts.branches) {
> +		attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
> +		attr.sample_period = 1;
> +		attr.sample_type |= PERF_SAMPLE_ADDR;
> +		attr.sample_type &= ~(u64)PERF_SAMPLE_CALLCHAIN;
> +		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
> +			 id, (u64)attr.sample_type);
> +		err = intel_pt_synth_event(session, &attr, id);
> +		if (err) {
> +			pr_err("%s: failed to synthesize 'branches' event type\n",
> +			       __func__);
> +			return err;
> +		}
> +		pt->sample_branches = true;
> +		pt->branches_sample_type = attr.sample_type;
> +		pt->branches_id = id;
> +	}
> +
> +	pt->synth_needs_swap = evsel->needs_swap;
> +
> +	return 0;
> +}
> +
> +static struct perf_evsel *intel_pt_find_sched_switch(struct perf_evlist *evlist)
> +{
> +	struct perf_evsel *evsel;
> +
> +	evlist__for_each_reverse(evlist, evsel) {
> +		const char *name = perf_evsel__name(evsel);
> +
> +		if (!strcmp(name, "sched:sched_switch"))
> +			return evsel;
> +	}
> +
> +	return NULL;
> +}
> +
> +static const char * const intel_pt_info_fmts[] = {
> +	[INTEL_PT_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
> +	[INTEL_PT_TIME_SHIFT]		= "  Time Shift         %"PRIu64"\n",
> +	[INTEL_PT_TIME_MULT]		= "  Time Muliplier     %"PRIu64"\n",
> +	[INTEL_PT_TIME_ZERO]		= "  Time Zero          %"PRIu64"\n",
> +	[INTEL_PT_CAP_USER_TIME_ZERO]	= "  Cap Time Zero      %"PRId64"\n",
> +	[INTEL_PT_TSC_BIT]		= "  TSC bit            %#"PRIx64"\n",
> +	[INTEL_PT_NORETCOMP_BIT]	= "  NoRETComp bit      %#"PRIx64"\n",
> +	[INTEL_PT_HAVE_SCHED_SWITCH]	= "  Have sched_switch  %"PRId64"\n",
> +	[INTEL_PT_SNAPSHOT_MODE]	= "  Snapshot mode      %"PRId64"\n",
> +	[INTEL_PT_PER_CPU_MMAPS]	= "  Per-cpu maps       %"PRId64"\n",
> +};
> +
> +static void intel_pt_print_info(u64 *arr, int start, int finish)
> +{
> +	int i;
> +
> +	if (!dump_trace)
> +		return;
> +
> +	for (i = start; i <= finish; i++)
> +		fprintf(stdout, intel_pt_info_fmts[i], arr[i]);
> +}
> +
> +int intel_pt_process_auxtrace_info(union perf_event *event,
> +				   struct perf_session *session)
> +{
> +	struct auxtrace_info_event *auxtrace_info = &event->auxtrace_info;
> +	size_t min_sz = sizeof(u64) * INTEL_PT_PER_CPU_MMAPS;
> +	struct intel_pt *pt;
> +	int err;
> +
> +	if (auxtrace_info->header.size < sizeof(struct auxtrace_info_event) +
> +					min_sz)
> +		return -EINVAL;
> +
> +	pt = zalloc(sizeof(struct intel_pt));
> +	if (!pt)
> +		return -ENOMEM;
> +
> +	err = auxtrace_queues__init(&pt->queues);
> +	if (err)
> +		goto err_free;
> +
> +	intel_pt_log_set_name(INTEL_PT_PMU_NAME);
> +
> +	pt->session = session;
> +	pt->machine = &session->machines.host; /* No kvm support */
> +	pt->auxtrace_type = auxtrace_info->type;
> +	pt->pmu_type = auxtrace_info->priv[INTEL_PT_PMU_TYPE];
> +	pt->tc.time_shift = auxtrace_info->priv[INTEL_PT_TIME_SHIFT];
> +	pt->tc.time_mult = auxtrace_info->priv[INTEL_PT_TIME_MULT];
> +	pt->tc.time_zero = auxtrace_info->priv[INTEL_PT_TIME_ZERO];
> +	pt->cap_user_time_zero = auxtrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO];
> +	pt->tsc_bit = auxtrace_info->priv[INTEL_PT_TSC_BIT];
> +	pt->noretcomp_bit = auxtrace_info->priv[INTEL_PT_NORETCOMP_BIT];
> +	pt->have_sched_switch = auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH];
> +	pt->snapshot_mode = auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE];
> +	pt->per_cpu_mmaps = auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS];
> +	intel_pt_print_info(&auxtrace_info->priv[0], INTEL_PT_PMU_TYPE,
> +			    INTEL_PT_PER_CPU_MMAPS);
> +
> +	pt->timeless_decoding = intel_pt_timeless_decoding(pt);
> +	pt->have_tsc = intel_pt_have_tsc(pt);
> +	pt->sampling_mode = false;
> +	pt->est_tsc = pt->per_cpu_mmaps && !pt->timeless_decoding;
> +
> +	pt->unknown_thread = thread__new(999999999, 999999999);
> +	if (!pt->unknown_thread) {
> +		err = -ENOMEM;
> +		goto err_free_queues;
> +	}
> +	err = thread__set_comm(pt->unknown_thread, "unknown", 0);
> +	if (err)
> +		goto err_delete_thread;
> +	if (thread__init_map_groups(pt->unknown_thread, pt->machine)) {
> +		err = -ENOMEM;
> +		goto err_delete_thread;
> +	}
> +
> +	pt->auxtrace.process_event = intel_pt_process_event;
> +	pt->auxtrace.process_auxtrace_event = intel_pt_process_auxtrace_event;
> +	pt->auxtrace.flush_events = intel_pt_flush;
> +	pt->auxtrace.free_events = intel_pt_free_events;
> +	pt->auxtrace.free = intel_pt_free;
> +	session->auxtrace = &pt->auxtrace;
> +
> +	if (dump_trace)
> +		return 0;
> +
> +	if (pt->have_sched_switch == 1) {
> +		pt->switch_evsel = intel_pt_find_sched_switch(session->evlist);
> +		if (!pt->switch_evsel) {
> +			pr_err("%s: missing sched_switch event\n", __func__);
> +			goto err_delete_thread;
> +		}
> +	}
> +
> +	if (session->itrace_synth_opts && session->itrace_synth_opts->set) {
> +		pt->synth_opts = *session->itrace_synth_opts;
> +	} else {
> +		itrace_synth_opts__set_default(&pt->synth_opts);
> +		if (use_browser != -1) {
> +			pt->synth_opts.branches = false;
> +			pt->synth_opts.callchain = true;
> +		}
> +	}
> +
> +	if (pt->synth_opts.log)
> +		intel_pt_log_enable();
> +
> +	if (pt->synth_opts.calls)
> +		pt->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
> +				       PERF_IP_FLAG_TRACE_END;
> +	if (pt->synth_opts.returns)
> +		pt->branches_filter |= PERF_IP_FLAG_RETURN |
> +				       PERF_IP_FLAG_TRACE_BEGIN;
> +
> +	if (pt->synth_opts.callchain && !symbol_conf.use_callchain) {
> +		symbol_conf.use_callchain = true;
> +		if (callchain_register_param(&callchain_param) < 0) {
> +			symbol_conf.use_callchain = false;
> +			pt->synth_opts.callchain = false;
> +		}
> +	}
> +
> +	err = intel_pt_synth_events(pt, session);
> +	if (err)
> +		goto err_delete_thread;
> +
> +	err = auxtrace_queues__process_index(&pt->queues, session);
> +	if (err)
> +		goto err_delete_thread;
> +
> +	if (pt->queues.populated)
> +		pt->data_queued = true;
> +
> +	if (pt->timeless_decoding)
> +		pr_debug2("Intel PT decoding without timestamps\n");
> +
> +	return 0;
> +
> +err_delete_thread:
> +	thread__delete(pt->unknown_thread);
> +err_free_queues:
> +	intel_pt_log_disable();
> +	auxtrace_queues__free(&pt->queues);
> +	session->auxtrace = NULL;
> +err_free:
> +	free(pt);
> +	return err;
> +}
> diff --git a/tools/perf/util/intel-pt.h b/tools/perf/util/intel-pt.h
> new file mode 100644
> index 0000000..a1bfe93
> --- /dev/null
> +++ b/tools/perf/util/intel-pt.h
> @@ -0,0 +1,51 @@
> +/*
> + * intel_pt.h: Intel Processor Trace support
> + * Copyright (c) 2013-2015, Intel Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify it
> + * under the terms and conditions of the GNU General Public License,
> + * version 2, as published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope it will be useful, but WITHOUT
> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
> + * more details.
> + *
> + */
> +
> +#ifndef INCLUDE__PERF_INTEL_PT_H__
> +#define INCLUDE__PERF_INTEL_PT_H__
> +
> +#define INTEL_PT_PMU_NAME "intel_pt"
> +
> +enum {
> +	INTEL_PT_PMU_TYPE,
> +	INTEL_PT_TIME_SHIFT,
> +	INTEL_PT_TIME_MULT,
> +	INTEL_PT_TIME_ZERO,
> +	INTEL_PT_CAP_USER_TIME_ZERO,
> +	INTEL_PT_TSC_BIT,
> +	INTEL_PT_NORETCOMP_BIT,
> +	INTEL_PT_HAVE_SCHED_SWITCH,
> +	INTEL_PT_SNAPSHOT_MODE,
> +	INTEL_PT_PER_CPU_MMAPS,
> +	INTEL_PT_AUXTRACE_PRIV_MAX,
> +};
> +
> +#define INTEL_PT_AUXTRACE_PRIV_SIZE (INTEL_PT_AUXTRACE_PRIV_MAX * sizeof(u64))
> +
> +struct auxtrace_record;
> +struct perf_tool;
> +union perf_event;
> +struct perf_session;
> +struct perf_event_attr;
> +struct perf_pmu;
> +
> +struct auxtrace_record *intel_pt_recording_init(int *err);
> +
> +int intel_pt_process_auxtrace_info(union perf_event *event,
> +				   struct perf_session *session);
> +
> +struct perf_event_attr *intel_pt_pmu_default_config(struct perf_pmu *pmu);
> +
> +#endif
> -- 
> 1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 15/17] perf tools: Intel BTS to always update thread stack trace number
  2015-05-29 13:33 ` [PATCH V6 15/17] perf tools: Intel BTS " Adrian Hunter
@ 2015-06-19 16:11   ` Arnaldo Carvalho de Melo
  2015-06-22 12:38     ` Adrian Hunter
  0 siblings, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-19 16:11 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, May 29, 2015 at 04:33:43PM +0300, Adrian Hunter escreveu:
> The enhanced thread stack is used by higher layers but still requires
> the trace number.  The trace number is used to distinguish discontinuous
> sections of trace (for example from Snapshot mode or Sample mode), which
> cause the thread stack to be flushed.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/intel-bts.c | 18 ++++++++++++++----
>  1 file changed, 14 insertions(+), 4 deletions(-)
> 
> diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
> index b068860..cd7bde3 100644
> --- a/tools/perf/util/intel-bts.c
> +++ b/tools/perf/util/intel-bts.c
> @@ -27,6 +27,8 @@
>  #include "machine.h"
>  #include "session.h"
>  #include "util.h"
> +#include "thread.h"
> +#include "thread-stack.h"
>  #include "debug.h"
>  #include "tsc.h"
>  #include "auxtrace.h"
> @@ -443,19 +445,22 @@ static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
>  
>  static int intel_bts_process_queue(struct intel_bts_queue *btsq, u64 *timestamp)
>  {
> -	struct auxtrace_buffer *buffer = btsq->buffer;
> +	struct auxtrace_buffer *buffer = btsq->buffer, *old_buffer = buffer;
>  	struct auxtrace_queue *queue;
> +	struct thread *thread;
>  	int err;
>  
>  	if (btsq->done)
>  		return 1;
>  
>  	if (btsq->pid == -1) {
> -		struct thread *thread;
> -
> -		thread = machine__find_thread(btsq->bts->machine, -1, btsq->tid);
> +		thread = machine__find_thread(btsq->bts->machine, -1,
> +					      btsq->tid);
>  		if (thread)
>  			btsq->pid = thread->pid_;
> +	} else {
> +		thread = machine__findnew_thread(btsq->bts->machine, btsq->pid,
> +						 btsq->tid);

Humm, so what will be done with the reference count you got from
machine__findnew_thread()? You have to drop it when you're done with
using this thread.

>  	}
>  
>  	queue = &btsq->bts->queues.queue_array[btsq->queue_nr];
> @@ -485,6 +490,11 @@ static int intel_bts_process_queue(struct intel_bts_queue *btsq, u64 *timestamp)
>  	    intel_bts_do_fix_overlap(queue, buffer))
>  		return -ENOMEM;
>  
> +	if (!btsq->bts->synth_opts.callchain && thread &&
> +	    (!old_buffer || btsq->bts->sampling_mode ||
> +	     (btsq->bts->snapshot_mode && !buffer->consecutive)))
> +		thread_stack__set_trace_nr(thread, buffer->buffer_nr + 1);
> +
>  	err = intel_bts_process_buffer(btsq, buffer);
>  
>  	auxtrace_buffer__drop_data(buffer);
> -- 
> 1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-19 16:04   ` Arnaldo Carvalho de Melo
@ 2015-06-19 16:22     ` Arnaldo Carvalho de Melo
  2015-06-19 19:33     ` Adrian Hunter
  1 sibling, 0 replies; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-19 16:22 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Jun 19, 2015 at 01:04:51PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, May 29, 2015 at 04:33:36PM +0300, Adrian Hunter escreveu:
> > Add support for Intel Processor Trace.
> > 
> > Intel PT support fits within the new auxtrace infrastructure.
> > Recording is supporting by identifying the Intel PT PMU,
> > parsing options and setting up events.  Decoding is supported
> > by queuing up trace data by cpu or thread and then decoding
> > synchronously delivering synthesized event samples into the
> > session processing for tools to consume.
> 
> So, at this point what commands should I use to test this? I expected to
> be able to have some command here, in this changeset log, telling me
> that what has been applied so far + this "Add Intel PT support", can be
> used in such and such a fashion, obtaining this and that output.
> 
> Now I'll go back and look at the cover letter to see what I can do at
> this point and with access to a Broadwell class machine.

Trying to test something at this point:

[root@perf4 ~]# find /sys -name "*intel_pt*"
/sys/bus/event_source/devices/intel_pt
/sys/devices/intel_pt
[root@perf4 ~]# perf record -e intel_pt//u ls
invalid or unsupported event: 'intel_pt//u'
Run 'perf list' for a list of valid events

 usage: perf record [<options>] [<command>]
    or: perf record [<options>] -- <command> [<options>]

    -e, --event <event>   event selector. use 'perf list' to list
available events
[root@perf4 ~]# uname -a
Linux perf4 4.1.0-rc8 #1 SMP Tue Jun 16 19:47:25 EDT 2015 x86_64 x86_64 x86_64 GNU/Linux
[root@perf4 ~]#

The kernel seems to have the intel_pt stuff and what I have in the
tooling side is at my tmp.perf/pt branch, will continue after lunch.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-19 16:04   ` Arnaldo Carvalho de Melo
  2015-06-19 16:22     ` Arnaldo Carvalho de Melo
@ 2015-06-19 19:33     ` Adrian Hunter
  2015-06-19 19:41       ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 47+ messages in thread
From: Adrian Hunter @ 2015-06-19 19:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa



On 19/06/2015 7:04 p.m., Arnaldo Carvalho de Melo wrote:
> Em Fri, May 29, 2015 at 04:33:36PM +0300, Adrian Hunter escreveu:
>> Add support for Intel Processor Trace.
>>
>> Intel PT support fits within the new auxtrace infrastructure.
>> Recording is supporting by identifying the Intel PT PMU,
>> parsing options and setting up events.  Decoding is supported
>> by queuing up trace data by cpu or thread and then decoding
>> synchronously delivering synthesized event samples into the
>> session processing for tools to consume.
>
> So, at this point what commands should I use to test this? I expected to
> be able to have some command here, in this changeset log, telling me
> that what has been applied so far + this "Add Intel PT support", can be
> used in such and such a fashion, obtaining this and that output.
>
> Now I'll go back and look at the cover letter to see what I can do at
> this point and with access to a Broadwell class machine.

Actually you need the next patch "perf tools: Take Intel PT into use" to do anything.

>
> - Arnaldo
>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>   tools/perf/arch/x86/util/Build      |    2 +
>>   tools/perf/arch/x86/util/intel-pt.c |  752 ++++++++++++++
>>   tools/perf/util/Build               |    1 +
>>   tools/perf/util/intel-pt.c          | 1889 +++++++++++++++++++++++++++++++++++
>>   tools/perf/util/intel-pt.h          |   51 +
>>   5 files changed, 2695 insertions(+)
>>   create mode 100644 tools/perf/arch/x86/util/intel-pt.c
>>   create mode 100644 tools/perf/util/intel-pt.c
>>   create mode 100644 tools/perf/util/intel-pt.h
>>
>> diff --git a/tools/perf/arch/x86/util/Build b/tools/perf/arch/x86/util/Build
>> index cfbccc4..1396088 100644
>> --- a/tools/perf/arch/x86/util/Build
>> +++ b/tools/perf/arch/x86/util/Build
>> @@ -6,3 +6,5 @@ libperf-$(CONFIG_DWARF) += dwarf-regs.o
>>
>>   libperf-$(CONFIG_LIBUNWIND)          += unwind-libunwind.o
>>   libperf-$(CONFIG_LIBDW_DWARF_UNWIND) += unwind-libdw.o
>> +
>> +libperf-$(CONFIG_AUXTRACE) += intel-pt.o
>> diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
>> new file mode 100644
>> index 0000000..da7d2c1
>> --- /dev/null
>> +++ b/tools/perf/arch/x86/util/intel-pt.c
>> @@ -0,0 +1,752 @@
>> +/*
>> + * intel_pt.c: Intel Processor Trace support
>> + * Copyright (c) 2013-2015, Intel Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + */
>> +
>> +#include <stdbool.h>
>> +#include <linux/kernel.h>
>> +#include <linux/types.h>
>> +#include <linux/bitops.h>
>> +#include <linux/log2.h>
>> +
>> +#include "../../perf.h"
>> +#include "../../util/session.h"
>> +#include "../../util/event.h"
>> +#include "../../util/evlist.h"
>> +#include "../../util/evsel.h"
>> +#include "../../util/cpumap.h"
>> +#include "../../util/parse-options.h"
>> +#include "../../util/parse-events.h"
>> +#include "../../util/pmu.h"
>> +#include "../../util/debug.h"
>> +#include "../../util/auxtrace.h"
>> +#include "../../util/tsc.h"
>> +#include "../../util/intel-pt.h"
>> +
>> +#define KiB(x) ((x) * 1024)
>> +#define MiB(x) ((x) * 1024 * 1024)
>> +#define KiB_MASK(x) (KiB(x) - 1)
>> +#define MiB_MASK(x) (MiB(x) - 1)
>> +
>> +#define INTEL_PT_DEFAULT_SAMPLE_SIZE	KiB(4)
>> +
>> +#define INTEL_PT_MAX_SAMPLE_SIZE	KiB(60)
>> +
>> +#define INTEL_PT_PSB_PERIOD_NEAR	256
>> +
>> +struct intel_pt_snapshot_ref {
>> +	void *ref_buf;
>> +	size_t ref_offset;
>> +	bool wrapped;
>> +};
>> +
>> +struct intel_pt_recording {
>> +	struct auxtrace_record		itr;
>> +	struct perf_pmu			*intel_pt_pmu;
>> +	int				have_sched_switch;
>> +	struct perf_evlist		*evlist;
>> +	bool				snapshot_mode;
>> +	bool				snapshot_init_done;
>> +	size_t				snapshot_size;
>> +	size_t				snapshot_ref_buf_size;
>> +	int				snapshot_ref_cnt;
>> +	struct intel_pt_snapshot_ref	*snapshot_refs;
>> +};
>> +
>> +static int intel_pt_parse_terms_with_default(struct list_head *formats,
>> +					     const char *str,
>> +					     u64 *config)
>> +{
>> +	struct list_head *terms;
>> +	struct perf_event_attr attr = { .size = 0, };
>> +	int err;
>> +
>> +	terms = malloc(sizeof(struct list_head));
>> +	if (!terms)
>> +		return -ENOMEM;
>> +
>> +	INIT_LIST_HEAD(terms);
>> +
>> +	err = parse_events_terms(terms, str);
>> +	if (err)
>> +		goto out_free;
>> +
>> +	attr.config = *config;
>> +	err = perf_pmu__config_terms(formats, &attr, terms, true, NULL);
>> +	if (err)
>> +		goto out_free;
>> +
>> +	*config = attr.config;
>> +out_free:
>> +	parse_events__free_terms(terms);
>> +	return err;
>> +}
>> +
>> +static int intel_pt_parse_terms(struct list_head *formats, const char *str,
>> +				u64 *config)
>> +{
>> +	*config = 0;
>> +	return intel_pt_parse_terms_with_default(formats, str, config);
>> +}
>> +
>> +static size_t intel_pt_psb_period(struct perf_pmu *intel_pt_pmu __maybe_unused,
>> +				  struct perf_evlist *evlist __maybe_unused)
>> +{
>> +	return 256;
>> +}
>> +
>> +static u64 intel_pt_default_config(struct perf_pmu *intel_pt_pmu)
>> +{
>> +	u64 config;
>> +
>> +	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &config);
>> +	return config;
>> +}
>> +
>> +static int intel_pt_parse_snapshot_options(struct auxtrace_record *itr,
>> +					   struct record_opts *opts,
>> +					   const char *str)
>> +{
>> +	struct intel_pt_recording *ptr =
>> +			container_of(itr, struct intel_pt_recording, itr);
>> +	unsigned long long snapshot_size = 0;
>> +	char *endptr;
>> +
>> +	if (str) {
>> +		snapshot_size = strtoull(str, &endptr, 0);
>> +		if (*endptr || snapshot_size > SIZE_MAX)
>> +			return -1;
>> +	}
>> +
>> +	opts->auxtrace_snapshot_mode = true;
>> +	opts->auxtrace_snapshot_size = snapshot_size;
>> +
>> +	ptr->snapshot_size = snapshot_size;
>> +
>> +	return 0;
>> +}
>> +
>> +struct perf_event_attr *
>> +intel_pt_pmu_default_config(struct perf_pmu *intel_pt_pmu)
>> +{
>> +	struct perf_event_attr *attr;
>> +
>> +	attr = zalloc(sizeof(struct perf_event_attr));
>> +	if (!attr)
>> +		return NULL;
>> +
>> +	attr->config = intel_pt_default_config(intel_pt_pmu);
>> +
>> +	intel_pt_pmu->selectable = true;
>> +
>> +	return attr;
>> +}
>> +
>> +static size_t intel_pt_info_priv_size(struct auxtrace_record *itr __maybe_unused)
>> +{
>> +	return INTEL_PT_AUXTRACE_PRIV_SIZE;
>> +}
>> +
>> +static int intel_pt_info_fill(struct auxtrace_record *itr,
>> +			      struct perf_session *session,
>> +			      struct auxtrace_info_event *auxtrace_info,
>> +			      size_t priv_size)
>> +{
>> +	struct intel_pt_recording *ptr =
>> +			container_of(itr, struct intel_pt_recording, itr);
>> +	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
>> +	struct perf_event_mmap_page *pc;
>> +	struct perf_tsc_conversion tc = { .time_mult = 0, };
>> +	bool cap_user_time_zero = false, per_cpu_mmaps;
>> +	u64 tsc_bit, noretcomp_bit;
>> +	int err;
>> +
>> +	if (priv_size != INTEL_PT_AUXTRACE_PRIV_SIZE)
>> +		return -EINVAL;
>> +
>> +	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
>> +	intel_pt_parse_terms(&intel_pt_pmu->format, "noretcomp",
>> +			     &noretcomp_bit);
>> +
>> +	if (!session->evlist->nr_mmaps)
>> +		return -EINVAL;
>> +
>> +	pc = session->evlist->mmap[0].base;
>> +	if (pc) {
>> +		err = perf_read_tsc_conversion(pc, &tc);
>> +		if (err) {
>> +			if (err != -EOPNOTSUPP)
>> +				return err;
>> +		} else {
>> +			cap_user_time_zero = tc.time_mult != 0;
>> +		}
>> +		if (!cap_user_time_zero)
>> +			ui__warning("Intel Processor Trace: TSC not available\n");
>> +	}
>> +
>> +	per_cpu_mmaps = !cpu_map__empty(session->evlist->cpus);
>> +
>> +	auxtrace_info->type = PERF_AUXTRACE_INTEL_PT;
>> +	auxtrace_info->priv[INTEL_PT_PMU_TYPE] = intel_pt_pmu->type;
>> +	auxtrace_info->priv[INTEL_PT_TIME_SHIFT] = tc.time_shift;
>> +	auxtrace_info->priv[INTEL_PT_TIME_MULT] = tc.time_mult;
>> +	auxtrace_info->priv[INTEL_PT_TIME_ZERO] = tc.time_zero;
>> +	auxtrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO] = cap_user_time_zero;
>> +	auxtrace_info->priv[INTEL_PT_TSC_BIT] = tsc_bit;
>> +	auxtrace_info->priv[INTEL_PT_NORETCOMP_BIT] = noretcomp_bit;
>> +	auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH] = ptr->have_sched_switch;
>> +	auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE] = ptr->snapshot_mode;
>> +	auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS] = per_cpu_mmaps;
>> +
>> +	return 0;
>> +}
>> +
>> +static int intel_pt_track_switches(struct perf_evlist *evlist)
>> +{
>> +	const char *sched_switch = "sched:sched_switch";
>> +	struct perf_evsel *evsel;
>> +	int err;
>> +
>> +	if (!perf_evlist__can_select_event(evlist, sched_switch))
>> +		return -EPERM;
>> +
>> +	err = parse_events(evlist, sched_switch, NULL);
>> +	if (err) {
>> +		pr_debug2("%s: failed to parse %s, error %d\n",
>> +			  __func__, sched_switch, err);
>> +		return err;
>> +	}
>> +
>> +	evsel = perf_evlist__last(evlist);
>> +
>> +	perf_evsel__set_sample_bit(evsel, CPU);
>> +	perf_evsel__set_sample_bit(evsel, TIME);
>> +
>> +	evsel->system_wide = true;
>> +	evsel->no_aux_samples = true;
>> +	evsel->immediate = true;
>> +
>> +	return 0;
>> +}
>> +
>> +static int intel_pt_recording_options(struct auxtrace_record *itr,
>> +				      struct perf_evlist *evlist,
>> +				      struct record_opts *opts)
>> +{
>> +	struct intel_pt_recording *ptr =
>> +			container_of(itr, struct intel_pt_recording, itr);
>> +	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
>> +	bool have_timing_info;
>> +	struct perf_evsel *evsel, *intel_pt_evsel = NULL;
>> +	const struct cpu_map *cpus = evlist->cpus;
>> +	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
>> +	u64 tsc_bit;
>> +
>> +	ptr->evlist = evlist;
>> +	ptr->snapshot_mode = opts->auxtrace_snapshot_mode;
>> +
>> +	evlist__for_each(evlist, evsel) {
>> +		if (evsel->attr.type == intel_pt_pmu->type) {
>> +			if (intel_pt_evsel) {
>> +				pr_err("There may be only one " INTEL_PT_PMU_NAME " event\n");
>> +				return -EINVAL;
>> +			}
>> +			evsel->attr.freq = 0;
>> +			evsel->attr.sample_period = 1;
>> +			intel_pt_evsel = evsel;
>> +			opts->full_auxtrace = true;
>> +		}
>> +	}
>> +
>> +	if (opts->auxtrace_snapshot_mode && !opts->full_auxtrace) {
>> +		pr_err("Snapshot mode (-S option) requires " INTEL_PT_PMU_NAME " PMU event (-e " INTEL_PT_PMU_NAME ")\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (opts->use_clockid) {
>> +		pr_err("Cannot use clockid (-k option) with " INTEL_PT_PMU_NAME "\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (!opts->full_auxtrace)
>> +		return 0;
>> +
>> +	/* Set default sizes for snapshot mode */
>> +	if (opts->auxtrace_snapshot_mode) {
>> +		size_t psb_period = intel_pt_psb_period(intel_pt_pmu, evlist);
>> +
>> +		if (!opts->auxtrace_snapshot_size && !opts->auxtrace_mmap_pages) {
>> +			if (privileged) {
>> +				opts->auxtrace_mmap_pages = MiB(4) / page_size;
>> +			} else {
>> +				opts->auxtrace_mmap_pages = KiB(128) / page_size;
>> +				if (opts->mmap_pages == UINT_MAX)
>> +					opts->mmap_pages = KiB(256) / page_size;
>> +			}
>> +		} else if (!opts->auxtrace_mmap_pages && !privileged &&
>> +			   opts->mmap_pages == UINT_MAX) {
>> +			opts->mmap_pages = KiB(256) / page_size;
>> +		}
>> +		if (!opts->auxtrace_snapshot_size)
>> +			opts->auxtrace_snapshot_size =
>> +				opts->auxtrace_mmap_pages * (size_t)page_size;
>> +		if (!opts->auxtrace_mmap_pages) {
>> +			size_t sz = opts->auxtrace_snapshot_size;
>> +
>> +			sz = round_up(sz, page_size) / page_size;
>> +			opts->auxtrace_mmap_pages = roundup_pow_of_two(sz);
>> +		}
>> +		if (opts->auxtrace_snapshot_size >
>> +				opts->auxtrace_mmap_pages * (size_t)page_size) {
>> +			pr_err("Snapshot size %zu must not be greater than AUX area tracing mmap size %zu\n",
>> +			       opts->auxtrace_snapshot_size,
>> +			       opts->auxtrace_mmap_pages * (size_t)page_size);
>> +			return -EINVAL;
>> +		}
>> +		if (!opts->auxtrace_snapshot_size || !opts->auxtrace_mmap_pages) {
>> +			pr_err("Failed to calculate default snapshot size and/or AUX area tracing mmap pages\n");
>> +			return -EINVAL;
>> +		}
>> +		pr_debug2("Intel PT snapshot size: %zu\n",
>> +			  opts->auxtrace_snapshot_size);
>> +		if (psb_period &&
>> +		    opts->auxtrace_snapshot_size <= psb_period +
>> +						  INTEL_PT_PSB_PERIOD_NEAR)
>> +			ui__warning("Intel PT snapshot size (%zu) may be too small for PSB period (%zu)\n",
>> +				    opts->auxtrace_snapshot_size, psb_period);
>> +	}
>> +
>> +	/* Set default sizes for full trace mode */
>> +	if (opts->full_auxtrace && !opts->auxtrace_mmap_pages) {
>> +		if (privileged) {
>> +			opts->auxtrace_mmap_pages = MiB(4) / page_size;
>> +		} else {
>> +			opts->auxtrace_mmap_pages = KiB(128) / page_size;
>> +			if (opts->mmap_pages == UINT_MAX)
>> +				opts->mmap_pages = KiB(256) / page_size;
>> +		}
>> +	}
>> +
>> +	/* Validate auxtrace_mmap_pages */
>> +	if (opts->auxtrace_mmap_pages) {
>> +		size_t sz = opts->auxtrace_mmap_pages * (size_t)page_size;
>> +		size_t min_sz;
>> +
>> +		if (opts->auxtrace_snapshot_mode)
>> +			min_sz = KiB(4);
>> +		else
>> +			min_sz = KiB(8);
>> +
>> +		if (sz < min_sz || !is_power_of_2(sz)) {
>> +			pr_err("Invalid mmap size for Intel Processor Trace: must be at least %zuKiB and a power of 2\n",
>> +			       min_sz / 1024);
>> +			return -EINVAL;
>> +		}
>> +	}
>> +
>> +	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
>> +
>> +	if (opts->full_auxtrace && (intel_pt_evsel->attr.config & tsc_bit))
>> +		have_timing_info = true;
>> +	else
>> +		have_timing_info = false;
>> +
>> +	/*
>> +	 * Per-cpu recording needs sched_switch events to distinguish different
>> +	 * threads.
>> +	 */
>> +	if (have_timing_info && !cpu_map__empty(cpus)) {
>> +		int err;
>> +
>> +		err = intel_pt_track_switches(evlist);
>> +		if (err == -EPERM)
>> +			pr_debug2("Unable to select sched:sched_switch\n");
>> +		else if (err)
>> +			return err;
>> +		else
>> +			ptr->have_sched_switch = 1;
>> +	}
>> +
>> +	if (intel_pt_evsel) {
>> +		/*
>> +		 * To obtain the auxtrace buffer file descriptor, the auxtrace
>> +		 * event must come first.
>> +		 */
>> +		perf_evlist__to_front(evlist, intel_pt_evsel);
>> +		/*
>> +		 * In the case of per-cpu mmaps, we need the CPU on the
>> +		 * AUX event.
>> +		 */
>> +		if (!cpu_map__empty(cpus))
>> +			perf_evsel__set_sample_bit(intel_pt_evsel, CPU);
>> +	}
>> +
>> +	/* Add dummy event to keep tracking */
>> +	if (opts->full_auxtrace) {
>> +		struct perf_evsel *tracking_evsel;
>> +		int err;
>> +
>> +		err = parse_events(evlist, "dummy:u", NULL);
>> +		if (err)
>> +			return err;
>> +
>> +		tracking_evsel = perf_evlist__last(evlist);
>> +
>> +		perf_evlist__set_tracking_event(evlist, tracking_evsel);
>> +
>> +		tracking_evsel->attr.freq = 0;
>> +		tracking_evsel->attr.sample_period = 1;
>> +
>> +		/* In per-cpu case, always need the time of mmap events etc */
>> +		if (!cpu_map__empty(cpus))
>> +			perf_evsel__set_sample_bit(tracking_evsel, TIME);
>> +	}
>> +
>> +	/*
>> +	 * Warn the user when we do not have enough information to decode i.e.
>> +	 * per-cpu with no sched_switch (except workload-only).
>> +	 */
>> +	if (!ptr->have_sched_switch && !cpu_map__empty(cpus) &&
>> +	    !target__none(&opts->target))
>> +		ui__warning("Intel Processor Trace decoding will not be possible except for kernel tracing!\n");
>> +
>> +	return 0;
>> +}
>> +
>> +static int intel_pt_snapshot_start(struct auxtrace_record *itr)
>> +{
>> +	struct intel_pt_recording *ptr =
>> +			container_of(itr, struct intel_pt_recording, itr);
>> +	struct perf_evsel *evsel;
>> +
>> +	evlist__for_each(ptr->evlist, evsel) {
>> +		if (evsel->attr.type == ptr->intel_pt_pmu->type)
>> +			return perf_evlist__disable_event(ptr->evlist, evsel);
>> +	}
>> +	return -EINVAL;
>> +}
>> +
>> +static int intel_pt_snapshot_finish(struct auxtrace_record *itr)
>> +{
>> +	struct intel_pt_recording *ptr =
>> +			container_of(itr, struct intel_pt_recording, itr);
>> +	struct perf_evsel *evsel;
>> +
>> +	evlist__for_each(ptr->evlist, evsel) {
>> +		if (evsel->attr.type == ptr->intel_pt_pmu->type)
>> +			return perf_evlist__enable_event(ptr->evlist, evsel);
>> +	}
>> +	return -EINVAL;
>> +}
>> +
>> +static int intel_pt_alloc_snapshot_refs(struct intel_pt_recording *ptr, int idx)
>> +{
>> +	const size_t sz = sizeof(struct intel_pt_snapshot_ref);
>> +	int cnt = ptr->snapshot_ref_cnt, new_cnt = cnt * 2;
>> +	struct intel_pt_snapshot_ref *refs;
>> +
>> +	if (!new_cnt)
>> +		new_cnt = 16;
>> +
>> +	while (new_cnt <= idx)
>> +		new_cnt *= 2;
>> +
>> +	refs = calloc(new_cnt, sz);
>> +	if (!refs)
>> +		return -ENOMEM;
>> +
>> +	memcpy(refs, ptr->snapshot_refs, cnt * sz);
>> +
>> +	ptr->snapshot_refs = refs;
>> +	ptr->snapshot_ref_cnt = new_cnt;
>> +
>> +	return 0;
>> +}
>> +
>> +static void intel_pt_free_snapshot_refs(struct intel_pt_recording *ptr)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < ptr->snapshot_ref_cnt; i++)
>> +		zfree(&ptr->snapshot_refs[i].ref_buf);
>> +	zfree(&ptr->snapshot_refs);
>> +}
>> +
>> +static void intel_pt_recording_free(struct auxtrace_record *itr)
>> +{
>> +	struct intel_pt_recording *ptr =
>> +			container_of(itr, struct intel_pt_recording, itr);
>> +
>> +	intel_pt_free_snapshot_refs(ptr);
>> +	free(ptr);
>> +}
>> +
>> +static int intel_pt_alloc_snapshot_ref(struct intel_pt_recording *ptr, int idx,
>> +				       size_t snapshot_buf_size)
>> +{
>> +	size_t ref_buf_size = ptr->snapshot_ref_buf_size;
>> +	void *ref_buf;
>> +
>> +	ref_buf = zalloc(ref_buf_size);
>> +	if (!ref_buf)
>> +		return -ENOMEM;
>> +
>> +	ptr->snapshot_refs[idx].ref_buf = ref_buf;
>> +	ptr->snapshot_refs[idx].ref_offset = snapshot_buf_size - ref_buf_size;
>> +
>> +	return 0;
>> +}
>> +
>> +static size_t intel_pt_snapshot_ref_buf_size(struct intel_pt_recording *ptr,
>> +					     size_t snapshot_buf_size)
>> +{
>> +	const size_t max_size = 256 * 1024;
>> +	size_t buf_size = 0, psb_period;
>> +
>> +	if (ptr->snapshot_size <= 64 * 1024)
>> +		return 0;
>> +
>> +	psb_period = intel_pt_psb_period(ptr->intel_pt_pmu, ptr->evlist);
>> +	if (psb_period)
>> +		buf_size = psb_period * 2;
>> +
>> +	if (!buf_size || buf_size > max_size)
>> +		buf_size = max_size;
>> +
>> +	if (buf_size >= snapshot_buf_size)
>> +		return 0;
>> +
>> +	if (buf_size >= ptr->snapshot_size / 2)
>> +		return 0;
>> +
>> +	return buf_size;
>> +}
>> +
>> +static int intel_pt_snapshot_init(struct intel_pt_recording *ptr,
>> +				  size_t snapshot_buf_size)
>> +{
>> +	if (ptr->snapshot_init_done)
>> +		return 0;
>> +
>> +	ptr->snapshot_init_done = true;
>> +
>> +	ptr->snapshot_ref_buf_size = intel_pt_snapshot_ref_buf_size(ptr,
>> +							snapshot_buf_size);
>> +
>> +	return 0;
>> +}
>> +
>> +/**
>> + * intel_pt_compare_buffers - compare bytes in a buffer to a circular buffer.
>> + * @buf1: first buffer
>> + * @compare_size: number of bytes to compare
>> + * @buf2: second buffer (a circular buffer)
>> + * @offs2: offset in second buffer
>> + * @buf2_size: size of second buffer
>> + *
>> + * The comparison allows for the possibility that the bytes to compare in the
>> + * circular buffer are not contiguous.  It is assumed that @compare_size <=
>> + * @buf2_size.  This function returns %false if the bytes are identical, %true
>> + * otherwise.
>> + */
>> +static bool intel_pt_compare_buffers(void *buf1, size_t compare_size,
>> +				     void *buf2, size_t offs2, size_t buf2_size)
>> +{
>> +	size_t end2 = offs2 + compare_size, part_size;
>> +
>> +	if (end2 <= buf2_size)
>> +		return memcmp(buf1, buf2 + offs2, compare_size);
>> +
>> +	part_size = end2 - buf2_size;
>> +	if (memcmp(buf1, buf2 + offs2, part_size))
>> +		return true;
>> +
>> +	compare_size -= part_size;
>> +
>> +	return memcmp(buf1 + part_size, buf2, compare_size);
>> +}
>> +
>> +static bool intel_pt_compare_ref(void *ref_buf, size_t ref_offset,
>> +				 size_t ref_size, size_t buf_size,
>> +				 void *data, size_t head)
>> +{
>> +	size_t ref_end = ref_offset + ref_size;
>> +
>> +	if (ref_end > buf_size) {
>> +		if (head > ref_offset || head < ref_end - buf_size)
>> +			return true;
>> +	} else if (head > ref_offset && head < ref_end) {
>> +		return true;
>> +	}
>> +
>> +	return intel_pt_compare_buffers(ref_buf, ref_size, data, ref_offset,
>> +					buf_size);
>> +}
>> +
>> +static void intel_pt_copy_ref(void *ref_buf, size_t ref_size, size_t buf_size,
>> +			      void *data, size_t head)
>> +{
>> +	if (head >= ref_size) {
>> +		memcpy(ref_buf, data + head - ref_size, ref_size);
>> +	} else {
>> +		memcpy(ref_buf, data, head);
>> +		ref_size -= head;
>> +		memcpy(ref_buf + head, data + buf_size - ref_size, ref_size);
>> +	}
>> +}
>> +
>> +static bool intel_pt_wrapped(struct intel_pt_recording *ptr, int idx,
>> +			     struct auxtrace_mmap *mm, unsigned char *data,
>> +			     u64 head)
>> +{
>> +	struct intel_pt_snapshot_ref *ref = &ptr->snapshot_refs[idx];
>> +	bool wrapped;
>> +
>> +	wrapped = intel_pt_compare_ref(ref->ref_buf, ref->ref_offset,
>> +				       ptr->snapshot_ref_buf_size, mm->len,
>> +				       data, head);
>> +
>> +	intel_pt_copy_ref(ref->ref_buf, ptr->snapshot_ref_buf_size, mm->len,
>> +			  data, head);
>> +
>> +	return wrapped;
>> +}
>> +
>> +static bool intel_pt_first_wrap(u64 *data, size_t buf_size)
>> +{
>> +	int i, a, b;
>> +
>> +	b = buf_size >> 3;
>> +	a = b - 512;
>> +	if (a < 0)
>> +		a = 0;
>> +
>> +	for (i = a; i < b; i++) {
>> +		if (data[i])
>> +			return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +static int intel_pt_find_snapshot(struct auxtrace_record *itr, int idx,
>> +				  struct auxtrace_mmap *mm, unsigned char *data,
>> +				  u64 *head, u64 *old)
>> +{
>> +	struct intel_pt_recording *ptr =
>> +			container_of(itr, struct intel_pt_recording, itr);
>> +	bool wrapped;
>> +	int err;
>> +
>> +	pr_debug3("%s: mmap index %d old head %zu new head %zu\n",
>> +		  __func__, idx, (size_t)*old, (size_t)*head);
>> +
>> +	err = intel_pt_snapshot_init(ptr, mm->len);
>> +	if (err)
>> +		goto out_err;
>> +
>> +	if (idx >= ptr->snapshot_ref_cnt) {
>> +		err = intel_pt_alloc_snapshot_refs(ptr, idx);
>> +		if (err)
>> +			goto out_err;
>> +	}
>> +
>> +	if (ptr->snapshot_ref_buf_size) {
>> +		if (!ptr->snapshot_refs[idx].ref_buf) {
>> +			err = intel_pt_alloc_snapshot_ref(ptr, idx, mm->len);
>> +			if (err)
>> +				goto out_err;
>> +		}
>> +		wrapped = intel_pt_wrapped(ptr, idx, mm, data, *head);
>> +	} else {
>> +		wrapped = ptr->snapshot_refs[idx].wrapped;
>> +		if (!wrapped && intel_pt_first_wrap((u64 *)data, mm->len)) {
>> +			ptr->snapshot_refs[idx].wrapped = true;
>> +			wrapped = true;
>> +		}
>> +	}
>> +
>> +	/*
>> +	 * In full trace mode 'head' continually increases.  However in snapshot
>> +	 * mode 'head' is an offset within the buffer.  Here 'old' and 'head'
>> +	 * are adjusted to match the full trace case which expects that 'old' is
>> +	 * always less than 'head'.
>> +	 */
>> +	if (wrapped) {
>> +		*old = *head;
>> +		*head += mm->len;
>> +	} else {
>> +		if (mm->mask)
>> +			*old &= mm->mask;
>> +		else
>> +			*old %= mm->len;
>> +		if (*old > *head)
>> +			*head += mm->len;
>> +	}
>> +
>> +	pr_debug3("%s: wrap-around %sdetected, adjusted old head %zu adjusted new head %zu\n",
>> +		  __func__, wrapped ? "" : "not ", (size_t)*old, (size_t)*head);
>> +
>> +	return 0;
>> +
>> +out_err:
>> +	pr_err("%s: failed, error %d\n", __func__, err);
>> +	return err;
>> +}
>> +
>> +static u64 intel_pt_reference(struct auxtrace_record *itr __maybe_unused)
>> +{
>> +	return rdtsc();
>> +}
>> +
>> +static int intel_pt_read_finish(struct auxtrace_record *itr, int idx)
>> +{
>> +	struct intel_pt_recording *ptr =
>> +			container_of(itr, struct intel_pt_recording, itr);
>> +	struct perf_evsel *evsel;
>> +
>> +	evlist__for_each(ptr->evlist, evsel) {
>> +		if (evsel->attr.type == ptr->intel_pt_pmu->type)
>> +			return perf_evlist__enable_event_idx(ptr->evlist, evsel,
>> +							     idx);
>> +	}
>> +	return -EINVAL;
>> +}
>> +
>> +struct auxtrace_record *intel_pt_recording_init(int *err)
>> +{
>> +	struct perf_pmu *intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
>> +	struct intel_pt_recording *ptr;
>> +
>> +	if (!intel_pt_pmu)
>> +		return NULL;
>> +
>> +	ptr = zalloc(sizeof(struct intel_pt_recording));
>> +	if (!ptr) {
>> +		*err = -ENOMEM;
>> +		return NULL;
>> +	}
>> +
>> +	ptr->intel_pt_pmu = intel_pt_pmu;
>> +	ptr->itr.recording_options = intel_pt_recording_options;
>> +	ptr->itr.info_priv_size = intel_pt_info_priv_size;
>> +	ptr->itr.info_fill = intel_pt_info_fill;
>> +	ptr->itr.free = intel_pt_recording_free;
>> +	ptr->itr.snapshot_start = intel_pt_snapshot_start;
>> +	ptr->itr.snapshot_finish = intel_pt_snapshot_finish;
>> +	ptr->itr.find_snapshot = intel_pt_find_snapshot;
>> +	ptr->itr.parse_snapshot_options = intel_pt_parse_snapshot_options;
>> +	ptr->itr.reference = intel_pt_reference;
>> +	ptr->itr.read_finish = intel_pt_read_finish;
>> +	return &ptr->itr;
>> +}
>> diff --git a/tools/perf/util/Build b/tools/perf/util/Build
>> index 86c81f6..ec7ab9d 100644
>> --- a/tools/perf/util/Build
>> +++ b/tools/perf/util/Build
>> @@ -76,6 +76,7 @@ libperf-y += cloexec.o
>>   libperf-y += thread-stack.o
>>   libperf-$(CONFIG_AUXTRACE) += auxtrace.o
>>   libperf-$(CONFIG_AUXTRACE) += intel-pt-decoder/
>> +libperf-$(CONFIG_AUXTRACE) += intel-pt.o
>>   libperf-y += parse-branch-options.o
>>
>>   libperf-$(CONFIG_LIBELF) += symbol-elf.o
>> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
>> new file mode 100644
>> index 0000000..6d66879
>> --- /dev/null
>> +++ b/tools/perf/util/intel-pt.c
>> @@ -0,0 +1,1889 @@
>> +/*
>> + * intel_pt.c: Intel Processor Trace support
>> + * Copyright (c) 2013-2015, Intel Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + */
>> +
>> +#include <stdio.h>
>> +#include <stdbool.h>
>> +#include <errno.h>
>> +#include <linux/kernel.h>
>> +#include <linux/types.h>
>> +
>> +#include "../perf.h"
>> +#include "session.h"
>> +#include "machine.h"
>> +#include "tool.h"
>> +#include "event.h"
>> +#include "evlist.h"
>> +#include "evsel.h"
>> +#include "map.h"
>> +#include "color.h"
>> +#include "util.h"
>> +#include "thread.h"
>> +#include "thread-stack.h"
>> +#include "symbol.h"
>> +#include "callchain.h"
>> +#include "dso.h"
>> +#include "debug.h"
>> +#include "auxtrace.h"
>> +#include "tsc.h"
>> +#include "intel-pt.h"
>> +
>> +#include "intel-pt-decoder/intel-pt-log.h"
>> +#include "intel-pt-decoder/intel-pt-decoder.h"
>> +#include "intel-pt-decoder/intel-pt-insn-decoder.h"
>> +#include "intel-pt-decoder/intel-pt-pkt-decoder.h"
>> +
>> +#define MAX_TIMESTAMP (~0ULL)
>> +
>> +struct intel_pt {
>> +	struct auxtrace auxtrace;
>> +	struct auxtrace_queues queues;
>> +	struct auxtrace_heap heap;
>> +	u32 auxtrace_type;
>> +	struct perf_session *session;
>> +	struct machine *machine;
>> +	struct perf_evsel *switch_evsel;
>> +	struct thread *unknown_thread;
>> +	bool timeless_decoding;
>> +	bool sampling_mode;
>> +	bool snapshot_mode;
>> +	bool per_cpu_mmaps;
>> +	bool have_tsc;
>> +	bool data_queued;
>> +	bool est_tsc;
>> +	bool sync_switch;
>> +	bool est_tsc_orig;
>> +	int have_sched_switch;
>> +	u32 pmu_type;
>> +	u64 kernel_start;
>> +	u64 switch_ip;
>> +	u64 ptss_ip;
>> +
>> +	struct perf_tsc_conversion tc;
>> +	bool cap_user_time_zero;
>> +
>> +	struct itrace_synth_opts synth_opts;
>> +
>> +	bool sample_instructions;
>> +	u64 instructions_sample_type;
>> +	u64 instructions_sample_period;
>> +	u64 instructions_id;
>> +
>> +	bool sample_branches;
>> +	u32 branches_filter;
>> +	u64 branches_sample_type;
>> +	u64 branches_id;
>> +
>> +	bool sample_transactions;
>> +	u64 transactions_sample_type;
>> +	u64 transactions_id;
>> +
>> +	bool synth_needs_swap;
>> +
>> +	u64 tsc_bit;
>> +	u64 noretcomp_bit;
>> +};
>> +
>> +enum switch_state {
>> +	INTEL_PT_SS_NOT_TRACING,
>> +	INTEL_PT_SS_UNKNOWN,
>> +	INTEL_PT_SS_TRACING,
>> +	INTEL_PT_SS_EXPECTING_SWITCH_EVENT,
>> +	INTEL_PT_SS_EXPECTING_SWITCH_IP,
>> +};
>> +
>> +struct intel_pt_queue {
>> +	struct intel_pt *pt;
>> +	unsigned int queue_nr;
>> +	struct auxtrace_buffer *buffer;
>> +	void *decoder;
>> +	const struct intel_pt_state *state;
>> +	struct ip_callchain *chain;
>> +	union perf_event *event_buf;
>> +	bool on_heap;
>> +	bool stop;
>> +	bool step_through_buffers;
>> +	bool use_buffer_pid_tid;
>> +	pid_t pid, tid;
>> +	int cpu;
>> +	int switch_state;
>> +	pid_t next_tid;
>> +	struct thread *thread;
>> +	bool exclude_kernel;
>> +	bool have_sample;
>> +	u64 time;
>> +	u64 timestamp;
>> +	u32 flags;
>> +	u16 insn_len;
>> +};
>> +
>> +static void intel_pt_dump(struct intel_pt *pt __maybe_unused,
>> +			  unsigned char *buf, size_t len)
>> +{
>> +	struct intel_pt_pkt packet;
>> +	size_t pos = 0;
>> +	int ret, pkt_len, i;
>> +	char desc[INTEL_PT_PKT_DESC_MAX];
>> +	const char *color = PERF_COLOR_BLUE;
>> +
>> +	color_fprintf(stdout, color,
>> +		      ". ... Intel Processor Trace data: size %zu bytes\n",
>> +		      len);
>> +
>> +	while (len) {
>> +		ret = intel_pt_get_packet(buf, len, &packet);
>> +		if (ret > 0)
>> +			pkt_len = ret;
>> +		else
>> +			pkt_len = 1;
>> +		printf(".");
>> +		color_fprintf(stdout, color, "  %08x: ", pos);
>> +		for (i = 0; i < pkt_len; i++)
>> +			color_fprintf(stdout, color, " %02x", buf[i]);
>> +		for (; i < 16; i++)
>> +			color_fprintf(stdout, color, "   ");
>> +		if (ret > 0) {
>> +			ret = intel_pt_pkt_desc(&packet, desc,
>> +						INTEL_PT_PKT_DESC_MAX);
>> +			if (ret > 0)
>> +				color_fprintf(stdout, color, " %s\n", desc);
>> +		} else {
>> +			color_fprintf(stdout, color, " Bad packet!\n");
>> +		}
>> +		pos += pkt_len;
>> +		buf += pkt_len;
>> +		len -= pkt_len;
>> +	}
>> +}
>> +
>> +static void intel_pt_dump_event(struct intel_pt *pt, unsigned char *buf,
>> +				size_t len)
>> +{
>> +	printf(".\n");
>> +	intel_pt_dump(pt, buf, len);
>> +}
>> +
>> +static int intel_pt_do_fix_overlap(struct intel_pt *pt, struct auxtrace_buffer *a,
>> +				   struct auxtrace_buffer *b)
>> +{
>> +	void *start;
>> +
>> +	start = intel_pt_find_overlap(a->data, a->size, b->data, b->size,
>> +				      pt->have_tsc);
>> +	if (!start)
>> +		return -EINVAL;
>> +	b->use_size = b->data + b->size - start;
>> +	b->use_data = start;
>> +	return 0;
>> +}
>> +
>> +static void intel_pt_use_buffer_pid_tid(struct intel_pt_queue *ptq,
>> +					struct auxtrace_queue *queue,
>> +					struct auxtrace_buffer *buffer)
>> +{
>> +	if (queue->cpu == -1 && buffer->cpu != -1)
>> +		ptq->cpu = buffer->cpu;
>> +
>> +	ptq->pid = buffer->pid;
>> +	ptq->tid = buffer->tid;
>> +
>> +	intel_pt_log("queue %u cpu %d pid %d tid %d\n",
>> +		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
>> +
>> +	ptq->thread = NULL;
>> +
>> +	if (ptq->tid != -1) {
>> +		if (ptq->pid != -1)
>> +			ptq->thread = machine__findnew_thread(ptq->pt->machine,
>> +							      ptq->pid,
>> +							      ptq->tid);
>> +		else
>> +			ptq->thread = machine__find_thread(ptq->pt->machine, -1,
>> +							   ptq->tid);
>> +	}
>> +}
>> +
>> +/* This function assumes data is processed sequentially only */
>> +static int intel_pt_get_trace(struct intel_pt_buffer *b, void *data)
>> +{
>> +	struct intel_pt_queue *ptq = data;
>> +	struct auxtrace_buffer *buffer = ptq->buffer, *old_buffer = buffer;
>> +	struct auxtrace_queue *queue;
>> +
>> +	if (ptq->stop) {
>> +		b->len = 0;
>> +		return 0;
>> +	}
>> +
>> +	queue = &ptq->pt->queues.queue_array[ptq->queue_nr];
>> +
>> +	buffer = auxtrace_buffer__next(queue, buffer);
>> +	if (!buffer) {
>> +		if (old_buffer)
>> +			auxtrace_buffer__drop_data(old_buffer);
>> +		b->len = 0;
>> +		return 0;
>> +	}
>> +
>> +	ptq->buffer = buffer;
>> +
>> +	if (!buffer->data) {
>> +		int fd = perf_data_file__fd(ptq->pt->session->file);
>> +
>> +		buffer->data = auxtrace_buffer__get_data(buffer, fd);
>> +		if (!buffer->data)
>> +			return -ENOMEM;
>> +	}
>> +
>> +	if (ptq->pt->snapshot_mode && !buffer->consecutive && old_buffer &&
>> +	    intel_pt_do_fix_overlap(ptq->pt, old_buffer, buffer))
>> +		return -ENOMEM;
>> +
>> +	if (old_buffer)
>> +		auxtrace_buffer__drop_data(old_buffer);
>> +
>> +	if (buffer->use_data) {
>> +		b->len = buffer->use_size;
>> +		b->buf = buffer->use_data;
>> +	} else {
>> +		b->len = buffer->size;
>> +		b->buf = buffer->data;
>> +	}
>> +	b->ref_timestamp = buffer->reference;
>> +
>> +	if (!old_buffer || ptq->pt->sampling_mode || (ptq->pt->snapshot_mode &&
>> +						      !buffer->consecutive)) {
>> +		b->consecutive = false;
>> +		b->trace_nr = buffer->buffer_nr;
>> +	} else {
>> +		b->consecutive = true;
>> +	}
>> +
>> +	if (ptq->use_buffer_pid_tid && (ptq->pid != buffer->pid ||
>> +					ptq->tid != buffer->tid))
>> +		intel_pt_use_buffer_pid_tid(ptq, queue, buffer);
>> +
>> +	if (ptq->step_through_buffers)
>> +		ptq->stop = true;
>> +
>> +	if (!b->len)
>> +		return intel_pt_get_trace(b, data);
>> +
>> +	return 0;
>> +}
>> +
>> +struct intel_pt_cache_entry {
>> +	struct auxtrace_cache_entry	entry;
>> +	u64				insn_cnt;
>> +	u64				byte_cnt;
>> +	enum intel_pt_insn_op		op;
>> +	enum intel_pt_insn_branch	branch;
>> +	int				length;
>> +	int32_t				rel;
>> +};
>> +
>> +static int intel_pt_config_div(const char *var, const char *value, void *data)
>> +{
>> +	int *d = data;
>> +	long val;
>> +
>> +	if (!strcmp(var, "intel-pt.cache-divisor")) {
>> +		val = strtol(value, NULL, 0);
>> +		if (val > 0 && val <= INT_MAX)
>> +			*d = val;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int intel_pt_cache_divisor(void)
>> +{
>> +	static int d;
>> +
>> +	if (d)
>> +		return d;
>> +
>> +	perf_config(intel_pt_config_div, &d);
>> +
>> +	if (!d)
>> +		d = 64;
>> +
>> +	return d;
>> +}
>> +
>> +static unsigned int intel_pt_cache_size(struct dso *dso,
>> +					struct machine *machine)
>> +{
>> +	off_t size;
>> +
>> +	size = dso__data_size(dso, machine);
>> +	size /= intel_pt_cache_divisor();
>> +	if (size < 1000)
>> +		return 10;
>> +	if (size > (1 << 21))
>> +		return 21;
>> +	return 32 - __builtin_clz(size);
>> +}
>> +
>> +static struct auxtrace_cache *intel_pt_cache(struct dso *dso,
>> +					     struct machine *machine)
>> +{
>> +	struct auxtrace_cache *c;
>> +	unsigned int bits;
>> +
>> +	if (dso->auxtrace_cache)
>> +		return dso->auxtrace_cache;
>> +
>> +	bits = intel_pt_cache_size(dso, machine);
>> +
>> +	/* Ignoring cache creation failure */
>> +	c = auxtrace_cache__new(bits, sizeof(struct intel_pt_cache_entry), 200);
>> +
>> +	dso->auxtrace_cache = c;
>> +
>> +	return c;
>> +}
>> +
>> +static int intel_pt_cache_add(struct dso *dso, struct machine *machine,
>> +			      u64 offset, u64 insn_cnt, u64 byte_cnt,
>> +			      struct intel_pt_insn *intel_pt_insn)
>> +{
>> +	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
>> +	struct intel_pt_cache_entry *e;
>> +	int err;
>> +
>> +	if (!c)
>> +		return -ENOMEM;
>> +
>> +	e = auxtrace_cache__alloc_entry(c);
>> +	if (!e)
>> +		return -ENOMEM;
>> +
>> +	e->insn_cnt = insn_cnt;
>> +	e->byte_cnt = byte_cnt;
>> +	e->op = intel_pt_insn->op;
>> +	e->branch = intel_pt_insn->branch;
>> +	e->length = intel_pt_insn->length;
>> +	e->rel = intel_pt_insn->rel;
>> +
>> +	err = auxtrace_cache__add(c, offset, &e->entry);
>> +	if (err)
>> +		auxtrace_cache__free_entry(c, e);
>> +
>> +	return err;
>> +}
>> +
>> +static struct intel_pt_cache_entry *
>> +intel_pt_cache_lookup(struct dso *dso, struct machine *machine, u64 offset)
>> +{
>> +	struct auxtrace_cache *c = intel_pt_cache(dso, machine);
>> +
>> +	if (!c)
>> +		return NULL;
>> +
>> +	return auxtrace_cache__lookup(dso->auxtrace_cache, offset);
>> +}
>> +
>> +static int intel_pt_walk_next_insn(struct intel_pt_insn *intel_pt_insn,
>> +				   uint64_t *insn_cnt_ptr, uint64_t *ip,
>> +				   uint64_t to_ip, uint64_t max_insn_cnt,
>> +				   void *data)
>> +{
>> +	struct intel_pt_queue *ptq = data;
>> +	struct machine *machine = ptq->pt->machine;
>> +	struct thread *thread;
>> +	struct addr_location al;
>> +	unsigned char buf[1024];
>> +	size_t bufsz;
>> +	ssize_t len;
>> +	int x86_64;
>> +	u8 cpumode;
>> +	u64 offset, start_offset, start_ip;
>> +	u64 insn_cnt = 0;
>> +	bool one_map = true;
>> +
>> +	if (to_ip && *ip == to_ip)
>> +		goto out_no_cache;
>> +
>> +	bufsz = intel_pt_insn_max_size();
>> +
>> +	if (*ip >= ptq->pt->kernel_start)
>> +		cpumode = PERF_RECORD_MISC_KERNEL;
>> +	else
>> +		cpumode = PERF_RECORD_MISC_USER;
>> +
>> +	thread = ptq->thread;
>> +	if (!thread) {
>> +		if (cpumode != PERF_RECORD_MISC_KERNEL)
>> +			return -EINVAL;
>> +		thread = ptq->pt->unknown_thread;
>> +	}
>> +
>> +	while (1) {
>> +		thread__find_addr_map(thread, cpumode, MAP__FUNCTION, *ip, &al);
>> +		if (!al.map || !al.map->dso)
>> +			return -EINVAL;
>> +
>> +		if (al.map->dso->data.status == DSO_DATA_STATUS_ERROR &&
>> +		    dso__data_status_seen(al.map->dso,
>> +					  DSO_DATA_STATUS_SEEN_ITRACE))
>> +			return -ENOENT;
>> +
>> +		offset = al.map->map_ip(al.map, *ip);
>> +
>> +		if (!to_ip && one_map) {
>> +			struct intel_pt_cache_entry *e;
>> +
>> +			e = intel_pt_cache_lookup(al.map->dso, machine, offset);
>> +			if (e &&
>> +			    (!max_insn_cnt || e->insn_cnt <= max_insn_cnt)) {
>> +				*insn_cnt_ptr = e->insn_cnt;
>> +				*ip += e->byte_cnt;
>> +				intel_pt_insn->op = e->op;
>> +				intel_pt_insn->branch = e->branch;
>> +				intel_pt_insn->length = e->length;
>> +				intel_pt_insn->rel = e->rel;
>> +				intel_pt_log_insn_no_data(intel_pt_insn, *ip);
>> +				return 0;
>> +			}
>> +		}
>> +
>> +		start_offset = offset;
>> +		start_ip = *ip;
>> +
>> +		/* Load maps to ensure dso->is_64_bit has been updated */
>> +		map__load(al.map, machine->symbol_filter);
>> +
>> +		x86_64 = al.map->dso->is_64_bit;
>> +
>> +		while (1) {
>> +			len = dso__data_read_offset(al.map->dso, machine,
>> +						    offset, buf, bufsz);
>> +			if (len <= 0)
>> +				return -EINVAL;
>> +
>> +			if (intel_pt_get_insn(buf, len, x86_64, intel_pt_insn))
>> +				return -EINVAL;
>> +
>> +			intel_pt_log_insn(intel_pt_insn, *ip);
>> +
>> +			insn_cnt += 1;
>> +
>> +			if (intel_pt_insn->branch != INTEL_PT_BR_NO_BRANCH)
>> +				goto out;
>> +
>> +			if (max_insn_cnt && insn_cnt >= max_insn_cnt)
>> +				goto out_no_cache;
>> +
>> +			*ip += intel_pt_insn->length;
>> +
>> +			if (to_ip && *ip == to_ip)
>> +				goto out_no_cache;
>> +
>> +			if (*ip >= al.map->end)
>> +				break;
>> +
>> +			offset += intel_pt_insn->length;
>> +		}
>> +		one_map = false;
>> +	}
>> +out:
>> +	*insn_cnt_ptr = insn_cnt;
>> +
>> +	if (!one_map)
>> +		goto out_no_cache;
>> +
>> +	/*
>> +	 * Didn't lookup in the 'to_ip' case, so do it now to prevent duplicate
>> +	 * entries.
>> +	 */
>> +	if (to_ip) {
>> +		struct intel_pt_cache_entry *e;
>> +
>> +		e = intel_pt_cache_lookup(al.map->dso, machine, start_offset);
>> +		if (e)
>> +			return 0;
>> +	}
>> +
>> +	/* Ignore cache errors */
>> +	intel_pt_cache_add(al.map->dso, machine, start_offset, insn_cnt,
>> +			   *ip - start_ip, intel_pt_insn);
>> +
>> +	return 0;
>> +
>> +out_no_cache:
>> +	*insn_cnt_ptr = insn_cnt;
>> +	return 0;
>> +}
>> +
>> +static bool intel_pt_get_config(struct intel_pt *pt,
>> +				struct perf_event_attr *attr, u64 *config)
>> +{
>> +	if (attr->type == pt->pmu_type) {
>> +		if (config)
>> +			*config = attr->config;
>> +		return true;
>> +	}
>> +
>> +	return false;
>> +}
>> +
>> +static bool intel_pt_exclude_kernel(struct intel_pt *pt)
>> +{
>> +	struct perf_evsel *evsel;
>> +
>> +	evlist__for_each(pt->session->evlist, evsel) {
>> +		if (intel_pt_get_config(pt, &evsel->attr, NULL) &&
>> +		    !evsel->attr.exclude_kernel)
>> +			return false;
>> +	}
>> +	return true;
>> +}
>> +
>> +static bool intel_pt_return_compression(struct intel_pt *pt)
>> +{
>> +	struct perf_evsel *evsel;
>> +	u64 config;
>> +
>> +	if (!pt->noretcomp_bit)
>> +		return true;
>> +
>> +	evlist__for_each(pt->session->evlist, evsel) {
>> +		if (intel_pt_get_config(pt, &evsel->attr, &config) &&
>> +		    (config & pt->noretcomp_bit))
>> +			return false;
>> +	}
>> +	return true;
>> +}
>> +
>> +static bool intel_pt_timeless_decoding(struct intel_pt *pt)
>> +{
>> +	struct perf_evsel *evsel;
>> +	bool timeless_decoding = true;
>> +	u64 config;
>> +
>> +	if (!pt->tsc_bit || !pt->cap_user_time_zero)
>> +		return true;
>> +
>> +	evlist__for_each(pt->session->evlist, evsel) {
>> +		if (!(evsel->attr.sample_type & PERF_SAMPLE_TIME))
>> +			return true;
>> +		if (intel_pt_get_config(pt, &evsel->attr, &config)) {
>> +			if (config & pt->tsc_bit)
>> +				timeless_decoding = false;
>> +			else
>> +				return true;
>> +		}
>> +	}
>> +	return timeless_decoding;
>> +}
>> +
>> +static bool intel_pt_tracing_kernel(struct intel_pt *pt)
>> +{
>> +	struct perf_evsel *evsel;
>> +
>> +	evlist__for_each(pt->session->evlist, evsel) {
>> +		if (intel_pt_get_config(pt, &evsel->attr, NULL) &&
>> +		    !evsel->attr.exclude_kernel)
>> +			return true;
>> +	}
>> +	return false;
>> +}
>> +
>> +static bool intel_pt_have_tsc(struct intel_pt *pt)
>> +{
>> +	struct perf_evsel *evsel;
>> +	bool have_tsc = false;
>> +	u64 config;
>> +
>> +	if (!pt->tsc_bit)
>> +		return false;
>> +
>> +	evlist__for_each(pt->session->evlist, evsel) {
>> +		if (intel_pt_get_config(pt, &evsel->attr, &config)) {
>> +			if (config & pt->tsc_bit)
>> +				have_tsc = true;
>> +			else
>> +				return false;
>> +		}
>> +	}
>> +	return have_tsc;
>> +}
>> +
>> +static u64 intel_pt_ns_to_ticks(const struct intel_pt *pt, u64 ns)
>> +{
>> +	u64 quot, rem;
>> +
>> +	quot = ns / pt->tc.time_mult;
>> +	rem  = ns % pt->tc.time_mult;
>> +	return (quot << pt->tc.time_shift) + (rem << pt->tc.time_shift) /
>> +		pt->tc.time_mult;
>> +}
>> +
>> +static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
>> +						   unsigned int queue_nr)
>> +{
>> +	struct intel_pt_params params = { .get_trace = 0, };
>> +	struct intel_pt_queue *ptq;
>> +
>> +	ptq = zalloc(sizeof(struct intel_pt_queue));
>> +	if (!ptq)
>> +		return NULL;
>> +
>> +	if (pt->synth_opts.callchain) {
>> +		size_t sz = sizeof(struct ip_callchain);
>> +
>> +		sz += pt->synth_opts.callchain_sz * sizeof(u64);
>> +		ptq->chain = zalloc(sz);
>> +		if (!ptq->chain)
>> +			goto out_free;
>> +	}
>> +
>> +	ptq->event_buf = malloc(PERF_SAMPLE_MAX_SIZE);
>> +	if (!ptq->event_buf)
>> +		goto out_free;
>> +
>> +	ptq->pt = pt;
>> +	ptq->queue_nr = queue_nr;
>> +	ptq->exclude_kernel = intel_pt_exclude_kernel(pt);
>> +	ptq->pid = -1;
>> +	ptq->tid = -1;
>> +	ptq->cpu = -1;
>> +	ptq->next_tid = -1;
>> +
>> +	params.get_trace = intel_pt_get_trace;
>> +	params.walk_insn = intel_pt_walk_next_insn;
>> +	params.data = ptq;
>> +	params.return_compression = intel_pt_return_compression(pt);
>> +
>> +	if (pt->synth_opts.instructions) {
>> +		if (pt->synth_opts.period) {
>> +			switch (pt->synth_opts.period_type) {
>> +			case PERF_ITRACE_PERIOD_INSTRUCTIONS:
>> +				params.period_type =
>> +						INTEL_PT_PERIOD_INSTRUCTIONS;
>> +				params.period = pt->synth_opts.period;
>> +				break;
>> +			case PERF_ITRACE_PERIOD_TICKS:
>> +				params.period_type = INTEL_PT_PERIOD_TICKS;
>> +				params.period = pt->synth_opts.period;
>> +				break;
>> +			case PERF_ITRACE_PERIOD_NANOSECS:
>> +				params.period_type = INTEL_PT_PERIOD_TICKS;
>> +				params.period = intel_pt_ns_to_ticks(pt,
>> +							pt->synth_opts.period);
>> +				break;
>> +			default:
>> +				break;
>> +			}
>> +		}
>> +
>> +		if (!params.period) {
>> +			params.period_type = INTEL_PT_PERIOD_INSTRUCTIONS;
>> +			params.period = 1000;
>> +		}
>> +	}
>> +
>> +	ptq->decoder = intel_pt_decoder_new(&params);
>> +	if (!ptq->decoder)
>> +		goto out_free;
>> +
>> +	return ptq;
>> +
>> +out_free:
>> +	zfree(&ptq->event_buf);
>> +	zfree(&ptq->chain);
>> +	free(ptq);
>> +	return NULL;
>> +}
>> +
>> +static void intel_pt_free_queue(void *priv)
>> +{
>> +	struct intel_pt_queue *ptq = priv;
>> +
>> +	if (!ptq)
>> +		return;
>> +	intel_pt_decoder_free(ptq->decoder);
>> +	zfree(&ptq->event_buf);
>> +	zfree(&ptq->chain);
>> +	free(ptq);
>> +}
>> +
>> +static void intel_pt_set_pid_tid_cpu(struct intel_pt *pt,
>> +				     struct auxtrace_queue *queue)
>> +{
>> +	struct intel_pt_queue *ptq = queue->priv;
>> +
>> +	if (queue->tid == -1 || pt->have_sched_switch) {
>> +		ptq->tid = machine__get_current_tid(pt->machine, ptq->cpu);
>> +		ptq->thread = NULL;
>> +	}
>> +
>> +	if (!ptq->thread && ptq->tid != -1)
>> +		ptq->thread = machine__find_thread(pt->machine, -1, ptq->tid);
>> +
>> +	if (ptq->thread) {
>> +		ptq->pid = ptq->thread->pid_;
>> +		if (queue->cpu == -1)
>> +			ptq->cpu = ptq->thread->cpu;
>> +	}
>> +}
>> +
>> +static void intel_pt_sample_flags(struct intel_pt_queue *ptq)
>> +{
>> +	if (ptq->state->flags & INTEL_PT_ABORT_TX) {
>> +		ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_TX_ABORT;
>> +	} else if (ptq->state->flags & INTEL_PT_ASYNC) {
>> +		if (ptq->state->to_ip)
>> +			ptq->flags = PERF_IP_FLAG_BRANCH | PERF_IP_FLAG_CALL |
>> +				     PERF_IP_FLAG_ASYNC |
>> +				     PERF_IP_FLAG_INTERRUPT;
>> +		else
>> +			ptq->flags = PERF_IP_FLAG_BRANCH |
>> +				     PERF_IP_FLAG_TRACE_END;
>> +		ptq->insn_len = 0;
>> +	} else {
>> +		if (ptq->state->from_ip)
>> +			ptq->flags = intel_pt_insn_type(ptq->state->insn_op);
>> +		else
>> +			ptq->flags = PERF_IP_FLAG_BRANCH |
>> +				     PERF_IP_FLAG_TRACE_BEGIN;
>> +		if (ptq->state->flags & INTEL_PT_IN_TX)
>> +			ptq->flags |= PERF_IP_FLAG_IN_TX;
>> +		ptq->insn_len = ptq->state->insn_len;
>> +	}
>> +}
>> +
>> +static int intel_pt_setup_queue(struct intel_pt *pt,
>> +				struct auxtrace_queue *queue,
>> +				unsigned int queue_nr)
>> +{
>> +	struct intel_pt_queue *ptq = queue->priv;
>> +
>> +	if (list_empty(&queue->head))
>> +		return 0;
>> +
>> +	if (!ptq) {
>> +		ptq = intel_pt_alloc_queue(pt, queue_nr);
>> +		if (!ptq)
>> +			return -ENOMEM;
>> +		queue->priv = ptq;
>> +
>> +		if (queue->cpu != -1)
>> +			ptq->cpu = queue->cpu;
>> +		ptq->tid = queue->tid;
>> +
>> +		if (pt->sampling_mode) {
>> +			if (pt->timeless_decoding)
>> +				ptq->step_through_buffers = true;
>> +			if (pt->timeless_decoding || !pt->have_sched_switch)
>> +				ptq->use_buffer_pid_tid = true;
>> +		}
>> +	}
>> +
>> +	if (!ptq->on_heap &&
>> +	    (!pt->sync_switch ||
>> +	     ptq->switch_state != INTEL_PT_SS_EXPECTING_SWITCH_EVENT)) {
>> +		const struct intel_pt_state *state;
>> +		int ret;
>> +
>> +		if (pt->timeless_decoding)
>> +			return 0;
>> +
>> +		intel_pt_log("queue %u getting timestamp\n", queue_nr);
>> +		intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
>> +			     queue_nr, ptq->cpu, ptq->pid, ptq->tid);
>> +		while (1) {
>> +			state = intel_pt_decode(ptq->decoder);
>> +			if (state->err) {
>> +				if (state->err == INTEL_PT_ERR_NODATA) {
>> +					intel_pt_log("queue %u has no timestamp\n",
>> +						     queue_nr);
>> +					return 0;
>> +				}
>> +				continue;
>> +			}
>> +			if (state->timestamp)
>> +				break;
>> +		}
>> +
>> +		ptq->timestamp = state->timestamp;
>> +		intel_pt_log("queue %u timestamp 0x%" PRIx64 "\n",
>> +			     queue_nr, ptq->timestamp);
>> +		ptq->state = state;
>> +		ptq->have_sample = true;
>> +		intel_pt_sample_flags(ptq);
>> +		ret = auxtrace_heap__add(&pt->heap, queue_nr, ptq->timestamp);
>> +		if (ret)
>> +			return ret;
>> +		ptq->on_heap = true;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int intel_pt_setup_queues(struct intel_pt *pt)
>> +{
>> +	unsigned int i;
>> +	int ret;
>> +
>> +	for (i = 0; i < pt->queues.nr_queues; i++) {
>> +		ret = intel_pt_setup_queue(pt, &pt->queues.queue_array[i], i);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +	return 0;
>> +}
>> +
>> +static int intel_pt_inject_event(union perf_event *event,
>> +				 struct perf_sample *sample, u64 type,
>> +				 bool swapped)
>> +{
>> +	event->header.size = perf_event__sample_event_size(sample, type, 0);
>> +	return perf_event__synthesize_sample(event, type, 0, sample, swapped);
>> +}
>> +
>> +static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq)
>> +{
>> +	int ret;
>> +	struct intel_pt *pt = ptq->pt;
>> +	union perf_event *event = ptq->event_buf;
>> +	struct perf_sample sample = { .ip = 0, };
>> +
>> +	event->sample.header.type = PERF_RECORD_SAMPLE;
>> +	event->sample.header.misc = PERF_RECORD_MISC_USER;
>> +	event->sample.header.size = sizeof(struct perf_event_header);
>> +
>> +	if (!pt->timeless_decoding)
>> +		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
>> +
>> +	sample.ip = ptq->state->from_ip;
>> +	sample.pid = ptq->pid;
>> +	sample.tid = ptq->tid;
>> +	sample.addr = ptq->state->to_ip;
>> +	sample.id = ptq->pt->branches_id;
>> +	sample.stream_id = ptq->pt->branches_id;
>> +	sample.period = 1;
>> +	sample.cpu = ptq->cpu;
>> +
>> +	if (pt->branches_filter && !(pt->branches_filter & ptq->flags))
>> +		return 0;
>> +
>> +	if (pt->synth_opts.inject) {
>> +		ret = intel_pt_inject_event(event, &sample,
>> +					    pt->branches_sample_type,
>> +					    pt->synth_needs_swap);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>> +	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
>> +	if (ret)
>> +		pr_err("Intel Processor Trace: failed to deliver branch event, error %d\n",
>> +		       ret);
>> +
>> +	return ret;
>> +}
>> +
>> +static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq)
>> +{
>> +	int ret;
>> +	struct intel_pt *pt = ptq->pt;
>> +	union perf_event *event = ptq->event_buf;
>> +	struct perf_sample sample = { .ip = 0, };
>> +
>> +	event->sample.header.type = PERF_RECORD_SAMPLE;
>> +	event->sample.header.misc = PERF_RECORD_MISC_USER;
>> +	event->sample.header.size = sizeof(struct perf_event_header);
>> +
>> +	if (!pt->timeless_decoding)
>> +		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
>> +
>> +	sample.ip = ptq->state->from_ip;
>> +	sample.pid = ptq->pid;
>> +	sample.tid = ptq->tid;
>> +	sample.addr = ptq->state->to_ip;
>> +	sample.id = ptq->pt->instructions_id;
>> +	sample.stream_id = ptq->pt->instructions_id;
>> +	sample.period = ptq->pt->instructions_sample_period;
>> +	sample.cpu = ptq->cpu;
>> +
>> +	if (pt->synth_opts.callchain) {
>> +		thread_stack__sample(ptq->thread, ptq->chain,
>> +				     pt->synth_opts.callchain_sz, sample.ip);
>> +		sample.callchain = ptq->chain;
>> +	}
>> +
>> +	if (pt->synth_opts.inject) {
>> +		ret = intel_pt_inject_event(event, &sample,
>> +					    pt->instructions_sample_type,
>> +					    pt->synth_needs_swap);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>> +	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
>> +	if (ret)
>> +		pr_err("Intel Processor Trace: failed to deliver instruction event, error %d\n",
>> +		       ret);
>> +
>> +	return ret;
>> +}
>> +
>> +static int intel_pt_synth_transaction_sample(struct intel_pt_queue *ptq)
>> +{
>> +	int ret;
>> +	struct intel_pt *pt = ptq->pt;
>> +	union perf_event *event = ptq->event_buf;
>> +	struct perf_sample sample = { .ip = 0, };
>> +
>> +	event->sample.header.type = PERF_RECORD_SAMPLE;
>> +	event->sample.header.misc = PERF_RECORD_MISC_USER;
>> +	event->sample.header.size = sizeof(struct perf_event_header);
>> +
>> +	if (!pt->timeless_decoding)
>> +		sample.time = tsc_to_perf_time(ptq->timestamp, &pt->tc);
>> +
>> +	sample.ip = ptq->state->from_ip;
>> +	sample.pid = ptq->pid;
>> +	sample.tid = ptq->tid;
>> +	sample.addr = ptq->state->to_ip;
>> +	sample.id = ptq->pt->transactions_id;
>> +	sample.stream_id = ptq->pt->transactions_id;
>> +	sample.period = 1;
>> +	sample.cpu = ptq->cpu;
>> +	sample.flags = ptq->flags;
>> +	sample.insn_len = ptq->insn_len;
>> +
>> +	if (pt->synth_opts.callchain) {
>> +		thread_stack__sample(ptq->thread, ptq->chain,
>> +				     pt->synth_opts.callchain_sz, sample.ip);
>> +		sample.callchain = ptq->chain;
>> +	}
>> +
>> +	if (pt->synth_opts.inject) {
>> +		ret = intel_pt_inject_event(event, &sample,
>> +					    pt->transactions_sample_type,
>> +					    pt->synth_needs_swap);
>> +		if (ret)
>> +			return ret;
>> +	}
>> +
>> +	ret = perf_session__deliver_synth_event(pt->session, event, &sample);
>> +	if (ret)
>> +		pr_err("Intel Processor Trace: failed to deliver transaction event, error %d\n",
>> +		       ret);
>> +
>> +	return ret;
>> +}
>> +
>> +static int intel_pt_synth_error(struct intel_pt *pt, int code, int cpu,
>> +				pid_t pid, pid_t tid, u64 ip)
>> +{
>> +	union perf_event event;
>> +	char msg[MAX_AUXTRACE_ERROR_MSG];
>> +	int err;
>> +
>> +	intel_pt__strerror(code, msg, MAX_AUXTRACE_ERROR_MSG);
>> +
>> +	auxtrace_synth_error(&event.auxtrace_error, PERF_AUXTRACE_ERROR_ITRACE,
>> +			     code, cpu, pid, tid, ip, msg);
>> +
>> +	err = perf_session__deliver_synth_event(pt->session, &event, NULL);
>> +	if (err)
>> +		pr_err("Intel Processor Trace: failed to deliver error event, error %d\n",
>> +		       err);
>> +
>> +	return err;
>> +}
>> +
>> +static int intel_pt_next_tid(struct intel_pt *pt, struct intel_pt_queue *ptq)
>> +{
>> +	struct auxtrace_queue *queue;
>> +	pid_t tid = ptq->next_tid;
>> +	int err;
>> +
>> +	if (tid == -1)
>> +		return 0;
>> +
>> +	intel_pt_log("switch: cpu %d tid %d\n", ptq->cpu, tid);
>> +
>> +	err = machine__set_current_tid(pt->machine, ptq->cpu, -1, tid);
>> +
>> +	queue = &pt->queues.queue_array[ptq->queue_nr];
>> +	intel_pt_set_pid_tid_cpu(pt, queue);
>> +
>> +	ptq->next_tid = -1;
>> +
>> +	return err;
>> +}
>> +
>> +static inline bool intel_pt_is_switch_ip(struct intel_pt_queue *ptq, u64 ip)
>> +{
>> +	struct intel_pt *pt = ptq->pt;
>> +
>> +	return ip == pt->switch_ip &&
>> +	       (ptq->flags & PERF_IP_FLAG_BRANCH) &&
>> +	       !(ptq->flags & (PERF_IP_FLAG_CONDITIONAL | PERF_IP_FLAG_ASYNC |
>> +			       PERF_IP_FLAG_INTERRUPT | PERF_IP_FLAG_TX_ABORT));
>> +}
>> +
>> +static int intel_pt_sample(struct intel_pt_queue *ptq)
>> +{
>> +	const struct intel_pt_state *state = ptq->state;
>> +	struct intel_pt *pt = ptq->pt;
>> +	int err;
>> +
>> +	if (!ptq->have_sample)
>> +		return 0;
>> +
>> +	ptq->have_sample = false;
>> +
>> +	if (pt->sample_instructions &&
>> +	    (state->type & INTEL_PT_INSTRUCTION)) {
>> +		err = intel_pt_synth_instruction_sample(ptq);
>> +		if (err)
>> +			return err;
>> +	}
>> +
>> +	if (pt->sample_transactions &&
>> +	    (state->type & INTEL_PT_TRANSACTION)) {
>> +		err = intel_pt_synth_transaction_sample(ptq);
>> +		if (err)
>> +			return err;
>> +	}
>> +
>> +	if (!(state->type & INTEL_PT_BRANCH))
>> +		return 0;
>> +
>> +	if (pt->synth_opts.callchain)
>> +		thread_stack__event(ptq->thread, ptq->flags, state->from_ip,
>> +				    state->to_ip, ptq->insn_len,
>> +				    state->trace_nr);
>> +
>> +	if (pt->sample_branches) {
>> +		err = intel_pt_synth_branch_sample(ptq);
>> +		if (err)
>> +			return err;
>> +	}
>> +
>> +	if (!pt->sync_switch)
>> +		return 0;
>> +
>> +	if (intel_pt_is_switch_ip(ptq, state->to_ip)) {
>> +		switch (ptq->switch_state) {
>> +		case INTEL_PT_SS_UNKNOWN:
>> +		case INTEL_PT_SS_EXPECTING_SWITCH_IP:
>> +			err = intel_pt_next_tid(pt, ptq);
>> +			if (err)
>> +				return err;
>> +			ptq->switch_state = INTEL_PT_SS_TRACING;
>> +			break;
>> +		default:
>> +			ptq->switch_state = INTEL_PT_SS_EXPECTING_SWITCH_EVENT;
>> +			return 1;
>> +		}
>> +	} else if (!state->to_ip) {
>> +		ptq->switch_state = INTEL_PT_SS_NOT_TRACING;
>> +	} else if (ptq->switch_state == INTEL_PT_SS_NOT_TRACING) {
>> +		ptq->switch_state = INTEL_PT_SS_UNKNOWN;
>> +	} else if (ptq->switch_state == INTEL_PT_SS_UNKNOWN &&
>> +		   state->to_ip == pt->ptss_ip &&
>> +		   (ptq->flags & PERF_IP_FLAG_CALL)) {
>> +		ptq->switch_state = INTEL_PT_SS_TRACING;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static u64 intel_pt_switch_ip(struct machine *machine, u64 *ptss_ip)
>> +{
>> +	struct map *map;
>> +	struct symbol *sym, *start;
>> +	u64 ip, switch_ip = 0;
>> +
>> +	if (ptss_ip)
>> +		*ptss_ip = 0;
>> +
>> +	map = machine__kernel_map(machine, MAP__FUNCTION);
>> +	if (!map)
>> +		return 0;
>> +
>> +	if (map__load(map, machine->symbol_filter))
>> +		return 0;
>> +
>> +	start = dso__first_symbol(map->dso, MAP__FUNCTION);
>> +
>> +	for (sym = start; sym; sym = dso__next_symbol(sym)) {
>> +		if (sym->binding == STB_GLOBAL &&
>> +		    !strcmp(sym->name, "__switch_to")) {
>> +			ip = map->unmap_ip(map, sym->start);
>> +			if (ip >= map->start && ip < map->end) {
>> +				switch_ip = ip;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
>> +	if (!switch_ip || !ptss_ip)
>> +		return 0;
>> +
>> +	for (sym = start; sym; sym = dso__next_symbol(sym)) {
>> +		if (!strcmp(sym->name, "perf_trace_sched_switch")) {
>> +			ip = map->unmap_ip(map, sym->start);
>> +			if (ip >= map->start && ip < map->end) {
>> +				*ptss_ip = ip;
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
>> +	return switch_ip;
>> +}
>> +
>> +static int intel_pt_run_decoder(struct intel_pt_queue *ptq, u64 *timestamp)
>> +{
>> +	const struct intel_pt_state *state = ptq->state;
>> +	struct intel_pt *pt = ptq->pt;
>> +	int err;
>> +
>> +	if (!pt->kernel_start) {
>> +		pt->kernel_start = machine__kernel_start(pt->machine);
>> +		if (pt->per_cpu_mmaps && pt->have_sched_switch &&
>> +		    !pt->timeless_decoding && intel_pt_tracing_kernel(pt) &&
>> +		    !pt->sampling_mode) {
>> +			pt->switch_ip = intel_pt_switch_ip(pt->machine,
>> +							   &pt->ptss_ip);
>> +			if (pt->switch_ip) {
>> +				intel_pt_log("switch_ip: %"PRIx64" ptss_ip: %"PRIx64"\n",
>> +					     pt->switch_ip, pt->ptss_ip);
>> +				pt->sync_switch = true;
>> +				pt->est_tsc_orig = pt->est_tsc;
>> +				pt->est_tsc = false;
>> +			}
>> +		}
>> +	}
>> +
>> +	intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
>> +		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
>> +	while (1) {
>> +		err = intel_pt_sample(ptq);
>> +		if (err)
>> +			return err;
>> +
>> +		state = intel_pt_decode(ptq->decoder);
>> +		if (state->err) {
>> +			if (state->err == INTEL_PT_ERR_NODATA)
>> +				return 1;
>> +			if (pt->sync_switch &&
>> +			    state->from_ip >= pt->kernel_start) {
>> +				pt->sync_switch = false;
>> +				pt->est_tsc = pt->est_tsc_orig;
>> +				intel_pt_next_tid(pt, ptq);
>> +			}
>> +			if (pt->synth_opts.errors) {
>> +				err = intel_pt_synth_error(pt, state->err,
>> +							   ptq->cpu, ptq->pid,
>> +							   ptq->tid,
>> +							   state->from_ip);
>> +				if (err)
>> +					return err;
>> +			}
>> +			continue;
>> +		}
>> +
>> +		ptq->state = state;
>> +		ptq->have_sample = true;
>> +		intel_pt_sample_flags(ptq);
>> +
>> +		/* Use estimated TSC upon return to user space */
>> +		if (pt->est_tsc) {
>> +			if (state->from_ip >= pt->kernel_start &&
>> +			    state->to_ip &&
>> +			    state->to_ip < pt->kernel_start)
>> +				ptq->timestamp = state->est_timestamp;
>> +			else if (state->timestamp > ptq->timestamp)
>> +				ptq->timestamp = state->timestamp;
>> +		/* Use estimated TSC in unknown switch state */
>> +		} else if (pt->sync_switch &&
>> +			   ptq->switch_state == INTEL_PT_SS_UNKNOWN &&
>> +			   state->to_ip == pt->switch_ip &&
>> +			   (ptq->flags & PERF_IP_FLAG_CALL) &&
>> +			   ptq->next_tid == -1) {
>> +			ptq->timestamp = state->est_timestamp;
>> +		} else if (state->timestamp > ptq->timestamp) {
>> +			ptq->timestamp = state->timestamp;
>> +		}
>> +
>> +		if (!pt->timeless_decoding && ptq->timestamp >= *timestamp) {
>> +			*timestamp = ptq->timestamp;
>> +			return 0;
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +static inline int intel_pt_update_queues(struct intel_pt *pt)
>> +{
>> +	if (pt->queues.new_data) {
>> +		pt->queues.new_data = false;
>> +		return intel_pt_setup_queues(pt);
>> +	}
>> +	return 0;
>> +}
>> +
>> +static int intel_pt_process_queues(struct intel_pt *pt, u64 timestamp)
>> +{
>> +	unsigned int queue_nr;
>> +	u64 ts;
>> +	int ret;
>> +
>> +	while (1) {
>> +		struct auxtrace_queue *queue;
>> +		struct intel_pt_queue *ptq;
>> +
>> +		if (!pt->heap.heap_cnt)
>> +			return 0;
>> +
>> +		if (pt->heap.heap_array[0].ordinal >= timestamp)
>> +			return 0;
>> +
>> +		queue_nr = pt->heap.heap_array[0].queue_nr;
>> +		queue = &pt->queues.queue_array[queue_nr];
>> +		ptq = queue->priv;
>> +
>> +		intel_pt_log("queue %u processing 0x%" PRIx64 " to 0x%" PRIx64 "\n",
>> +			     queue_nr, pt->heap.heap_array[0].ordinal,
>> +			     timestamp);
>> +
>> +		auxtrace_heap__pop(&pt->heap);
>> +
>> +		if (pt->heap.heap_cnt) {
>> +			ts = pt->heap.heap_array[0].ordinal + 1;
>> +			if (ts > timestamp)
>> +				ts = timestamp;
>> +		} else {
>> +			ts = timestamp;
>> +		}
>> +
>> +		intel_pt_set_pid_tid_cpu(pt, queue);
>> +
>> +		ret = intel_pt_run_decoder(ptq, &ts);
>> +
>> +		if (ret < 0) {
>> +			auxtrace_heap__add(&pt->heap, queue_nr, ts);
>> +			return ret;
>> +		}
>> +
>> +		if (!ret) {
>> +			ret = auxtrace_heap__add(&pt->heap, queue_nr, ts);
>> +			if (ret < 0)
>> +				return ret;
>> +		} else {
>> +			ptq->on_heap = false;
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int intel_pt_process_timeless_queues(struct intel_pt *pt, pid_t tid,
>> +					    u64 time_)
>> +{
>> +	struct auxtrace_queues *queues = &pt->queues;
>> +	unsigned int i;
>> +	u64 ts = 0;
>> +
>> +	for (i = 0; i < queues->nr_queues; i++) {
>> +		struct auxtrace_queue *queue = &pt->queues.queue_array[i];
>> +		struct intel_pt_queue *ptq = queue->priv;
>> +
>> +		if (ptq && (tid == -1 || ptq->tid == tid)) {
>> +			ptq->time = time_;
>> +			intel_pt_set_pid_tid_cpu(pt, queue);
>> +			intel_pt_run_decoder(ptq, &ts);
>> +		}
>> +	}
>> +	return 0;
>> +}
>> +
>> +static int intel_pt_lost(struct intel_pt *pt, struct perf_sample *sample)
>> +{
>> +	return intel_pt_synth_error(pt, INTEL_PT_ERR_LOST, sample->cpu,
>> +				    sample->pid, sample->tid, 0);
>> +}
>> +
>> +static struct intel_pt_queue *intel_pt_cpu_to_ptq(struct intel_pt *pt, int cpu)
>> +{
>> +	unsigned i, j;
>> +
>> +	if (cpu < 0 || !pt->queues.nr_queues)
>> +		return NULL;
>> +
>> +	if ((unsigned)cpu >= pt->queues.nr_queues)
>> +		i = pt->queues.nr_queues - 1;
>> +	else
>> +		i = cpu;
>> +
>> +	if (pt->queues.queue_array[i].cpu == cpu)
>> +		return pt->queues.queue_array[i].priv;
>> +
>> +	for (j = 0; i > 0; j++) {
>> +		if (pt->queues.queue_array[--i].cpu == cpu)
>> +			return pt->queues.queue_array[i].priv;
>> +	}
>> +
>> +	for (; j < pt->queues.nr_queues; j++) {
>> +		if (pt->queues.queue_array[j].cpu == cpu)
>> +			return pt->queues.queue_array[j].priv;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static int intel_pt_process_switch(struct intel_pt *pt,
>> +				   struct perf_sample *sample)
>> +{
>> +	struct intel_pt_queue *ptq;
>> +	struct perf_evsel *evsel;
>> +	pid_t tid;
>> +	int cpu, err;
>> +
>> +	evsel = perf_evlist__id2evsel(pt->session->evlist, sample->id);
>> +	if (evsel != pt->switch_evsel)
>> +		return 0;
>> +
>> +	tid = perf_evsel__intval(evsel, sample, "next_pid");
>> +	cpu = sample->cpu;
>> +
>> +	intel_pt_log("sched_switch: cpu %d tid %d time %"PRIu64" tsc %#"PRIx64"\n",
>> +		     cpu, tid, sample->time, perf_time_to_tsc(sample->time,
>> +		     &pt->tc));
>> +
>> +	if (!pt->sync_switch)
>> +		goto out;
>> +
>> +	ptq = intel_pt_cpu_to_ptq(pt, cpu);
>> +	if (!ptq)
>> +		goto out;
>> +
>> +	switch (ptq->switch_state) {
>> +	case INTEL_PT_SS_NOT_TRACING:
>> +		ptq->next_tid = -1;
>> +		break;
>> +	case INTEL_PT_SS_UNKNOWN:
>> +	case INTEL_PT_SS_TRACING:
>> +		ptq->next_tid = tid;
>> +		ptq->switch_state = INTEL_PT_SS_EXPECTING_SWITCH_IP;
>> +		return 0;
>> +	case INTEL_PT_SS_EXPECTING_SWITCH_EVENT:
>> +		if (!ptq->on_heap) {
>> +			ptq->timestamp = perf_time_to_tsc(sample->time,
>> +							  &pt->tc);
>> +			err = auxtrace_heap__add(&pt->heap, ptq->queue_nr,
>> +						 ptq->timestamp);
>> +			if (err)
>> +				return err;
>> +			ptq->on_heap = true;
>> +		}
>> +		ptq->switch_state = INTEL_PT_SS_TRACING;
>> +		break;
>> +	case INTEL_PT_SS_EXPECTING_SWITCH_IP:
>> +		ptq->next_tid = tid;
>> +		intel_pt_log("ERROR: cpu %d expecting switch ip\n", cpu);
>> +		break;
>> +	default:
>> +		break;
>> +	}
>> +out:
>> +	return machine__set_current_tid(pt->machine, cpu, -1, tid);
>> +}
>> +
>> +static int intel_pt_process_itrace_start(struct intel_pt *pt,
>> +					 union perf_event *event,
>> +					 struct perf_sample *sample)
>> +{
>> +	if (!pt->per_cpu_mmaps)
>> +		return 0;
>> +
>> +	intel_pt_log("itrace_start: cpu %d pid %d tid %d time %"PRIu64" tsc %#"PRIx64"\n",
>> +		     sample->cpu, event->itrace_start.pid,
>> +		     event->itrace_start.tid, sample->time,
>> +		     perf_time_to_tsc(sample->time, &pt->tc));
>> +
>> +	return machine__set_current_tid(pt->machine, sample->cpu,
>> +					event->itrace_start.pid,
>> +					event->itrace_start.tid);
>> +}
>> +
>> +static int intel_pt_process_event(struct perf_session *session,
>> +				  union perf_event *event,
>> +				  struct perf_sample *sample,
>> +				  struct perf_tool *tool)
>> +{
>> +	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
>> +					   auxtrace);
>> +	u64 timestamp;
>> +	int err = 0;
>> +
>> +	if (dump_trace)
>> +		return 0;
>> +
>> +	if (!tool->ordered_events) {
>> +		pr_err("Intel Processor Trace requires ordered events\n");
>> +		return -EINVAL;
>> +	}
>> +
>> +	if (sample->time)
>> +		timestamp = perf_time_to_tsc(sample->time, &pt->tc);
>> +	else
>> +		timestamp = 0;
>> +
>> +	if (timestamp || pt->timeless_decoding) {
>> +		err = intel_pt_update_queues(pt);
>> +		if (err)
>> +			return err;
>> +	}
>> +
>> +	if (pt->timeless_decoding) {
>> +		if (event->header.type == PERF_RECORD_EXIT) {
>> +			err = intel_pt_process_timeless_queues(pt,
>> +							       event->comm.tid,
>> +							       sample->time);
>> +		}
>> +	} else if (timestamp) {
>> +		err = intel_pt_process_queues(pt, timestamp);
>> +	}
>> +	if (err)
>> +		return err;
>> +
>> +	if (event->header.type == PERF_RECORD_AUX &&
>> +	    (event->aux.flags & PERF_AUX_FLAG_TRUNCATED) &&
>> +	    pt->synth_opts.errors)
>> +		err = intel_pt_lost(pt, sample);
>> +
>> +	if (pt->switch_evsel && event->header.type == PERF_RECORD_SAMPLE)
>> +		err = intel_pt_process_switch(pt, sample);
>> +	else if (event->header.type == PERF_RECORD_ITRACE_START)
>> +		err = intel_pt_process_itrace_start(pt, event, sample);
>> +
>> +	return err;
>> +}
>> +
>> +static int intel_pt_flush(struct perf_session *session, struct perf_tool *tool)
>> +{
>> +	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
>> +					   auxtrace);
>> +	int ret;
>> +
>> +	if (dump_trace)
>> +		return 0;
>> +
>> +	if (!tool->ordered_events)
>> +		return -EINVAL;
>> +
>> +	ret = intel_pt_update_queues(pt);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	if (pt->timeless_decoding)
>> +		return intel_pt_process_timeless_queues(pt, -1,
>> +							MAX_TIMESTAMP - 1);
>> +
>> +	return intel_pt_process_queues(pt, MAX_TIMESTAMP);
>> +}
>> +
>> +static void intel_pt_free_events(struct perf_session *session)
>> +{
>> +	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
>> +					   auxtrace);
>> +	struct auxtrace_queues *queues = &pt->queues;
>> +	unsigned int i;
>> +
>> +	for (i = 0; i < queues->nr_queues; i++) {
>> +		intel_pt_free_queue(queues->queue_array[i].priv);
>> +		queues->queue_array[i].priv = NULL;
>> +	}
>> +	intel_pt_log_disable();
>> +	auxtrace_queues__free(queues);
>> +}
>> +
>> +static void intel_pt_free(struct perf_session *session)
>> +{
>> +	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
>> +					   auxtrace);
>> +
>> +	auxtrace_heap__free(&pt->heap);
>> +	intel_pt_free_events(session);
>> +	session->auxtrace = NULL;
>> +	thread__delete(pt->unknown_thread);
>> +	free(pt);
>> +}
>> +
>> +static int intel_pt_process_auxtrace_event(struct perf_session *session,
>> +					   union perf_event *event,
>> +					   struct perf_tool *tool __maybe_unused)
>> +{
>> +	struct intel_pt *pt = container_of(session->auxtrace, struct intel_pt,
>> +					   auxtrace);
>> +
>> +	if (pt->sampling_mode)
>> +		return 0;
>> +
>> +	if (!pt->data_queued) {
>> +		struct auxtrace_buffer *buffer;
>> +		off_t data_offset;
>> +		int fd = perf_data_file__fd(session->file);
>> +		int err;
>> +
>> +		if (perf_data_file__is_pipe(session->file)) {
>> +			data_offset = 0;
>> +		} else {
>> +			data_offset = lseek(fd, 0, SEEK_CUR);
>> +			if (data_offset == -1)
>> +				return -errno;
>> +		}
>> +
>> +		err = auxtrace_queues__add_event(&pt->queues, session, event,
>> +						 data_offset, &buffer);
>> +		if (err)
>> +			return err;
>> +
>> +		/* Dump here now we have copied a piped trace out of the pipe */
>> +		if (dump_trace) {
>> +			if (auxtrace_buffer__get_data(buffer, fd)) {
>> +				intel_pt_dump_event(pt, buffer->data,
>> +						    buffer->size);
>> +				auxtrace_buffer__put_data(buffer);
>> +			}
>> +		}
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +struct intel_pt_synth {
>> +	struct perf_tool dummy_tool;
>> +	struct perf_session *session;
>> +};
>> +
>> +static int intel_pt_event_synth(struct perf_tool *tool,
>> +				union perf_event *event,
>> +				struct perf_sample *sample __maybe_unused,
>> +				struct machine *machine __maybe_unused)
>> +{
>> +	struct intel_pt_synth *intel_pt_synth =
>> +			container_of(tool, struct intel_pt_synth, dummy_tool);
>> +
>> +	return perf_session__deliver_synth_event(intel_pt_synth->session, event,
>> +						 NULL);
>> +}
>> +
>> +static int intel_pt_synth_event(struct perf_session *session,
>> +				struct perf_event_attr *attr, u64 id)
>> +{
>> +	struct intel_pt_synth intel_pt_synth;
>> +
>> +	memset(&intel_pt_synth, 0, sizeof(struct intel_pt_synth));
>> +	intel_pt_synth.session = session;
>> +
>> +	return perf_event__synthesize_attr(&intel_pt_synth.dummy_tool, attr, 1,
>> +					   &id, intel_pt_event_synth);
>> +}
>> +
>> +static int intel_pt_synth_events(struct intel_pt *pt,
>> +				 struct perf_session *session)
>> +{
>> +	struct perf_evlist *evlist = session->evlist;
>> +	struct perf_evsel *evsel;
>> +	struct perf_event_attr attr;
>> +	bool found = false;
>> +	u64 id;
>> +	int err;
>> +
>> +	evlist__for_each(evlist, evsel) {
>> +		if (evsel->attr.type == pt->pmu_type && evsel->ids) {
>> +			found = true;
>> +			break;
>> +		}
>> +	}
>> +
>> +	if (!found) {
>> +		pr_debug("There are no selected events with Intel Processor Trace data\n");
>> +		return 0;
>> +	}
>> +
>> +	memset(&attr, 0, sizeof(struct perf_event_attr));
>> +	attr.size = sizeof(struct perf_event_attr);
>> +	attr.type = PERF_TYPE_HARDWARE;
>> +	attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
>> +	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
>> +			    PERF_SAMPLE_PERIOD;
>> +	if (pt->timeless_decoding)
>> +		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
>> +	else
>> +		attr.sample_type |= PERF_SAMPLE_TIME;
>> +	if (!pt->per_cpu_mmaps)
>> +		attr.sample_type &= ~(u64)PERF_SAMPLE_CPU;
>> +	attr.exclude_user = evsel->attr.exclude_user;
>> +	attr.exclude_kernel = evsel->attr.exclude_kernel;
>> +	attr.exclude_hv = evsel->attr.exclude_hv;
>> +	attr.exclude_host = evsel->attr.exclude_host;
>> +	attr.exclude_guest = evsel->attr.exclude_guest;
>> +	attr.sample_id_all = evsel->attr.sample_id_all;
>> +	attr.read_format = evsel->attr.read_format;
>> +
>> +	id = evsel->id[0] + 1000000000;
>> +	if (!id)
>> +		id = 1;
>> +
>> +	if (pt->synth_opts.instructions) {
>> +		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
>> +		if (pt->synth_opts.period_type == PERF_ITRACE_PERIOD_NANOSECS)
>> +			attr.sample_period =
>> +				intel_pt_ns_to_ticks(pt, pt->synth_opts.period);
>> +		else
>> +			attr.sample_period = pt->synth_opts.period;
>> +		pt->instructions_sample_period = attr.sample_period;
>> +		if (pt->synth_opts.callchain)
>> +			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
>> +		pr_debug("Synthesizing 'instructions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
>> +			 id, (u64)attr.sample_type);
>> +		err = intel_pt_synth_event(session, &attr, id);
>> +		if (err) {
>> +			pr_err("%s: failed to synthesize 'instructions' event type\n",
>> +			       __func__);
>> +			return err;
>> +		}
>> +		pt->sample_instructions = true;
>> +		pt->instructions_sample_type = attr.sample_type;
>> +		pt->instructions_id = id;
>> +		id += 1;
>> +	}
>> +
>> +	if (pt->synth_opts.transactions) {
>> +		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
>> +		attr.sample_period = 1;
>> +		if (pt->synth_opts.callchain)
>> +			attr.sample_type |= PERF_SAMPLE_CALLCHAIN;
>> +		pr_debug("Synthesizing 'transactions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
>> +			 id, (u64)attr.sample_type);
>> +		err = intel_pt_synth_event(session, &attr, id);
>> +		if (err) {
>> +			pr_err("%s: failed to synthesize 'transactions' event type\n",
>> +			       __func__);
>> +			return err;
>> +		}
>> +		pt->sample_transactions = true;
>> +		pt->transactions_id = id;
>> +		id += 1;
>> +		evlist__for_each(evlist, evsel) {
>> +			if (evsel->id && evsel->id[0] == pt->transactions_id) {
>> +				if (evsel->name)
>> +					zfree(&evsel->name);
>> +				evsel->name = strdup("transactions");
>> +				break;
>> +			}
>> +		}
>> +	}
>> +
>> +	if (pt->synth_opts.branches) {
>> +		attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
>> +		attr.sample_period = 1;
>> +		attr.sample_type |= PERF_SAMPLE_ADDR;
>> +		attr.sample_type &= ~(u64)PERF_SAMPLE_CALLCHAIN;
>> +		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
>> +			 id, (u64)attr.sample_type);
>> +		err = intel_pt_synth_event(session, &attr, id);
>> +		if (err) {
>> +			pr_err("%s: failed to synthesize 'branches' event type\n",
>> +			       __func__);
>> +			return err;
>> +		}
>> +		pt->sample_branches = true;
>> +		pt->branches_sample_type = attr.sample_type;
>> +		pt->branches_id = id;
>> +	}
>> +
>> +	pt->synth_needs_swap = evsel->needs_swap;
>> +
>> +	return 0;
>> +}
>> +
>> +static struct perf_evsel *intel_pt_find_sched_switch(struct perf_evlist *evlist)
>> +{
>> +	struct perf_evsel *evsel;
>> +
>> +	evlist__for_each_reverse(evlist, evsel) {
>> +		const char *name = perf_evsel__name(evsel);
>> +
>> +		if (!strcmp(name, "sched:sched_switch"))
>> +			return evsel;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static const char * const intel_pt_info_fmts[] = {
>> +	[INTEL_PT_PMU_TYPE]		= "  PMU Type           %"PRId64"\n",
>> +	[INTEL_PT_TIME_SHIFT]		= "  Time Shift         %"PRIu64"\n",
>> +	[INTEL_PT_TIME_MULT]		= "  Time Muliplier     %"PRIu64"\n",
>> +	[INTEL_PT_TIME_ZERO]		= "  Time Zero          %"PRIu64"\n",
>> +	[INTEL_PT_CAP_USER_TIME_ZERO]	= "  Cap Time Zero      %"PRId64"\n",
>> +	[INTEL_PT_TSC_BIT]		= "  TSC bit            %#"PRIx64"\n",
>> +	[INTEL_PT_NORETCOMP_BIT]	= "  NoRETComp bit      %#"PRIx64"\n",
>> +	[INTEL_PT_HAVE_SCHED_SWITCH]	= "  Have sched_switch  %"PRId64"\n",
>> +	[INTEL_PT_SNAPSHOT_MODE]	= "  Snapshot mode      %"PRId64"\n",
>> +	[INTEL_PT_PER_CPU_MMAPS]	= "  Per-cpu maps       %"PRId64"\n",
>> +};
>> +
>> +static void intel_pt_print_info(u64 *arr, int start, int finish)
>> +{
>> +	int i;
>> +
>> +	if (!dump_trace)
>> +		return;
>> +
>> +	for (i = start; i <= finish; i++)
>> +		fprintf(stdout, intel_pt_info_fmts[i], arr[i]);
>> +}
>> +
>> +int intel_pt_process_auxtrace_info(union perf_event *event,
>> +				   struct perf_session *session)
>> +{
>> +	struct auxtrace_info_event *auxtrace_info = &event->auxtrace_info;
>> +	size_t min_sz = sizeof(u64) * INTEL_PT_PER_CPU_MMAPS;
>> +	struct intel_pt *pt;
>> +	int err;
>> +
>> +	if (auxtrace_info->header.size < sizeof(struct auxtrace_info_event) +
>> +					min_sz)
>> +		return -EINVAL;
>> +
>> +	pt = zalloc(sizeof(struct intel_pt));
>> +	if (!pt)
>> +		return -ENOMEM;
>> +
>> +	err = auxtrace_queues__init(&pt->queues);
>> +	if (err)
>> +		goto err_free;
>> +
>> +	intel_pt_log_set_name(INTEL_PT_PMU_NAME);
>> +
>> +	pt->session = session;
>> +	pt->machine = &session->machines.host; /* No kvm support */
>> +	pt->auxtrace_type = auxtrace_info->type;
>> +	pt->pmu_type = auxtrace_info->priv[INTEL_PT_PMU_TYPE];
>> +	pt->tc.time_shift = auxtrace_info->priv[INTEL_PT_TIME_SHIFT];
>> +	pt->tc.time_mult = auxtrace_info->priv[INTEL_PT_TIME_MULT];
>> +	pt->tc.time_zero = auxtrace_info->priv[INTEL_PT_TIME_ZERO];
>> +	pt->cap_user_time_zero = auxtrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO];
>> +	pt->tsc_bit = auxtrace_info->priv[INTEL_PT_TSC_BIT];
>> +	pt->noretcomp_bit = auxtrace_info->priv[INTEL_PT_NORETCOMP_BIT];
>> +	pt->have_sched_switch = auxtrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH];
>> +	pt->snapshot_mode = auxtrace_info->priv[INTEL_PT_SNAPSHOT_MODE];
>> +	pt->per_cpu_mmaps = auxtrace_info->priv[INTEL_PT_PER_CPU_MMAPS];
>> +	intel_pt_print_info(&auxtrace_info->priv[0], INTEL_PT_PMU_TYPE,
>> +			    INTEL_PT_PER_CPU_MMAPS);
>> +
>> +	pt->timeless_decoding = intel_pt_timeless_decoding(pt);
>> +	pt->have_tsc = intel_pt_have_tsc(pt);
>> +	pt->sampling_mode = false;
>> +	pt->est_tsc = pt->per_cpu_mmaps && !pt->timeless_decoding;
>> +
>> +	pt->unknown_thread = thread__new(999999999, 999999999);
>> +	if (!pt->unknown_thread) {
>> +		err = -ENOMEM;
>> +		goto err_free_queues;
>> +	}
>> +	err = thread__set_comm(pt->unknown_thread, "unknown", 0);
>> +	if (err)
>> +		goto err_delete_thread;
>> +	if (thread__init_map_groups(pt->unknown_thread, pt->machine)) {
>> +		err = -ENOMEM;
>> +		goto err_delete_thread;
>> +	}
>> +
>> +	pt->auxtrace.process_event = intel_pt_process_event;
>> +	pt->auxtrace.process_auxtrace_event = intel_pt_process_auxtrace_event;
>> +	pt->auxtrace.flush_events = intel_pt_flush;
>> +	pt->auxtrace.free_events = intel_pt_free_events;
>> +	pt->auxtrace.free = intel_pt_free;
>> +	session->auxtrace = &pt->auxtrace;
>> +
>> +	if (dump_trace)
>> +		return 0;
>> +
>> +	if (pt->have_sched_switch == 1) {
>> +		pt->switch_evsel = intel_pt_find_sched_switch(session->evlist);
>> +		if (!pt->switch_evsel) {
>> +			pr_err("%s: missing sched_switch event\n", __func__);
>> +			goto err_delete_thread;
>> +		}
>> +	}
>> +
>> +	if (session->itrace_synth_opts && session->itrace_synth_opts->set) {
>> +		pt->synth_opts = *session->itrace_synth_opts;
>> +	} else {
>> +		itrace_synth_opts__set_default(&pt->synth_opts);
>> +		if (use_browser != -1) {
>> +			pt->synth_opts.branches = false;
>> +			pt->synth_opts.callchain = true;
>> +		}
>> +	}
>> +
>> +	if (pt->synth_opts.log)
>> +		intel_pt_log_enable();
>> +
>> +	if (pt->synth_opts.calls)
>> +		pt->branches_filter |= PERF_IP_FLAG_CALL | PERF_IP_FLAG_ASYNC |
>> +				       PERF_IP_FLAG_TRACE_END;
>> +	if (pt->synth_opts.returns)
>> +		pt->branches_filter |= PERF_IP_FLAG_RETURN |
>> +				       PERF_IP_FLAG_TRACE_BEGIN;
>> +
>> +	if (pt->synth_opts.callchain && !symbol_conf.use_callchain) {
>> +		symbol_conf.use_callchain = true;
>> +		if (callchain_register_param(&callchain_param) < 0) {
>> +			symbol_conf.use_callchain = false;
>> +			pt->synth_opts.callchain = false;
>> +		}
>> +	}
>> +
>> +	err = intel_pt_synth_events(pt, session);
>> +	if (err)
>> +		goto err_delete_thread;
>> +
>> +	err = auxtrace_queues__process_index(&pt->queues, session);
>> +	if (err)
>> +		goto err_delete_thread;
>> +
>> +	if (pt->queues.populated)
>> +		pt->data_queued = true;
>> +
>> +	if (pt->timeless_decoding)
>> +		pr_debug2("Intel PT decoding without timestamps\n");
>> +
>> +	return 0;
>> +
>> +err_delete_thread:
>> +	thread__delete(pt->unknown_thread);
>> +err_free_queues:
>> +	intel_pt_log_disable();
>> +	auxtrace_queues__free(&pt->queues);
>> +	session->auxtrace = NULL;
>> +err_free:
>> +	free(pt);
>> +	return err;
>> +}
>> diff --git a/tools/perf/util/intel-pt.h b/tools/perf/util/intel-pt.h
>> new file mode 100644
>> index 0000000..a1bfe93
>> --- /dev/null
>> +++ b/tools/perf/util/intel-pt.h
>> @@ -0,0 +1,51 @@
>> +/*
>> + * intel_pt.h: Intel Processor Trace support
>> + * Copyright (c) 2013-2015, Intel Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope it will be useful, but WITHOUT
>> + * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
>> + * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
>> + * more details.
>> + *
>> + */
>> +
>> +#ifndef INCLUDE__PERF_INTEL_PT_H__
>> +#define INCLUDE__PERF_INTEL_PT_H__
>> +
>> +#define INTEL_PT_PMU_NAME "intel_pt"
>> +
>> +enum {
>> +	INTEL_PT_PMU_TYPE,
>> +	INTEL_PT_TIME_SHIFT,
>> +	INTEL_PT_TIME_MULT,
>> +	INTEL_PT_TIME_ZERO,
>> +	INTEL_PT_CAP_USER_TIME_ZERO,
>> +	INTEL_PT_TSC_BIT,
>> +	INTEL_PT_NORETCOMP_BIT,
>> +	INTEL_PT_HAVE_SCHED_SWITCH,
>> +	INTEL_PT_SNAPSHOT_MODE,
>> +	INTEL_PT_PER_CPU_MMAPS,
>> +	INTEL_PT_AUXTRACE_PRIV_MAX,
>> +};
>> +
>> +#define INTEL_PT_AUXTRACE_PRIV_SIZE (INTEL_PT_AUXTRACE_PRIV_MAX * sizeof(u64))
>> +
>> +struct auxtrace_record;
>> +struct perf_tool;
>> +union perf_event;
>> +struct perf_session;
>> +struct perf_event_attr;
>> +struct perf_pmu;
>> +
>> +struct auxtrace_record *intel_pt_recording_init(int *err);
>> +
>> +int intel_pt_process_auxtrace_info(union perf_event *event,
>> +				   struct perf_session *session);
>> +
>> +struct perf_event_attr *intel_pt_pmu_default_config(struct perf_pmu *pmu);
>> +
>> +#endif
>> --
>> 1.9.1
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-19 19:33     ` Adrian Hunter
@ 2015-06-19 19:41       ` Arnaldo Carvalho de Melo
  2015-06-22 18:24         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-19 19:41 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Jun 19, 2015 at 10:33:43PM +0300, Adrian Hunter escreveu:
> On 19/06/2015 7:04 p.m., Arnaldo Carvalho de Melo wrote:
> >Em Fri, May 29, 2015 at 04:33:36PM +0300, Adrian Hunter escreveu:
> >>Add support for Intel Processor Trace.

> >>Intel PT support fits within the new auxtrace infrastructure.
> >>Recording is supporting by identifying the Intel PT PMU,
> >>parsing options and setting up events.  Decoding is supported
> >>by queuing up trace data by cpu or thread and then decoding
> >>synchronously delivering synthesized event samples into the
> >>session processing for tools to consume.

> >So, at this point what commands should I use to test this? I expected to
> >be able to have some command here, in this changeset log, telling me
> >that what has been applied so far + this "Add Intel PT support", can be
> >used in such and such a fashion, obtaining this and that output.

> >Now I'll go back and look at the cover letter to see what I can do at
> >this point and with access to a Broadwell class machine.

> Actually you need the next patch "perf tools: Take Intel PT into use" to do anything.

Yeah, saw that, the title of this patch fooled me into thinking that
Intel PT support was added :-)

Anyway, stopping for a moment to push stuff ready to Ingo, will get back
to this after that.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [tip:perf/core] perf tools: Ensure thread-stack is flushed
  2015-05-29 13:33 ` [PATCH V6 02/17] perf tools: Ensure thread-stack is flushed Adrian Hunter
  2015-06-18 21:56   ` Arnaldo Carvalho de Melo
@ 2015-06-19 23:15   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 47+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-06-19 23:15 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: hpa, acme, jolsa, mingo, tglx, linux-kernel, adrian.hunter

Commit-ID:  a5499b37197ab4b5fed101370df7ccadacbb4340
Gitweb:     http://git.kernel.org/tip/a5499b37197ab4b5fed101370df7ccadacbb4340
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 29 May 2015 16:33:30 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 19 Jun 2015 16:03:33 -0300

perf tools: Ensure thread-stack is flushed

The thread-stack represents a thread's current stack.  When a thread
exits there can still be many functions on the stack e.g. exit() can be
called many levels deep, so all the callers will never return.  To get
that information output, the thread-stack must be flushed.

Previously it was assumed the thread-stack would be flushed when the
struct thread was deleted.  With thread ref-counting it is no longer
clear when that will be, if ever. So instead explicitly flush all the
thread-stacks at the end of a session.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1432906425-9911-3-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/machine.c      | 21 +++++++++++++++++++++
 tools/perf/util/machine.h      |  3 +++
 tools/perf/util/session.c      | 20 ++++++++++++++++++++
 tools/perf/util/thread-stack.c | 18 +++++++++++++-----
 tools/perf/util/thread-stack.h |  1 +
 5 files changed, 58 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 132e357..8b3b193 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1890,6 +1890,27 @@ int machine__for_each_thread(struct machine *machine,
 	return rc;
 }
 
+int machines__for_each_thread(struct machines *machines,
+			      int (*fn)(struct thread *thread, void *p),
+			      void *priv)
+{
+	struct rb_node *nd;
+	int rc = 0;
+
+	rc = machine__for_each_thread(&machines->host, fn, priv);
+	if (rc != 0)
+		return rc;
+
+	for (nd = rb_first(&machines->guests); nd; nd = rb_next(nd)) {
+		struct machine *machine = rb_entry(nd, struct machine, rb_node);
+
+		rc = machine__for_each_thread(machine, fn, priv);
+		if (rc != 0)
+			return rc;
+	}
+	return rc;
+}
+
 int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
 				  struct target *target, struct thread_map *threads,
 				  perf_event__handler_t process, bool data_mmap)
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index ca267c4..cea62f6 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -216,6 +216,9 @@ size_t machine__fprintf_vmlinux_path(struct machine *machine, FILE *fp);
 int machine__for_each_thread(struct machine *machine,
 			     int (*fn)(struct thread *thread, void *p),
 			     void *priv);
+int machines__for_each_thread(struct machines *machines,
+			      int (*fn)(struct thread *thread, void *p),
+			      void *priv);
 
 int __machine__synthesize_threads(struct machine *machine, struct perf_tool *tool,
 				  struct target *target, struct thread_map *threads,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index e1cd17c..c371336 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -16,6 +16,7 @@
 #include "perf_regs.h"
 #include "asm/bug.h"
 #include "auxtrace.h"
+#include "thread-stack.h"
 
 static int perf_session__deliver_event(struct perf_session *session,
 				       union perf_event *event,
@@ -1361,6 +1362,19 @@ static void perf_session__warn_about_errors(const struct perf_session *session)
 	events_stats__auxtrace_error_warn(stats);
 }
 
+static int perf_session__flush_thread_stack(struct thread *thread,
+					    void *p __maybe_unused)
+{
+	return thread_stack__flush(thread);
+}
+
+static int perf_session__flush_thread_stacks(struct perf_session *session)
+{
+	return machines__for_each_thread(&session->machines,
+					 perf_session__flush_thread_stack,
+					 NULL);
+}
+
 volatile int session_done;
 
 static int __perf_session__process_pipe_events(struct perf_session *session)
@@ -1450,6 +1464,9 @@ done:
 	if (err)
 		goto out_err;
 	err = auxtrace__flush_events(session, tool);
+	if (err)
+		goto out_err;
+	err = perf_session__flush_thread_stacks(session);
 out_err:
 	free(buf);
 	perf_session__warn_about_errors(session);
@@ -1600,6 +1617,9 @@ out:
 	if (err)
 		goto out_err;
 	err = auxtrace__flush_events(session, tool);
+	if (err)
+		goto out_err;
+	err = perf_session__flush_thread_stacks(session);
 out_err:
 	ui_progress__finish();
 	perf_session__warn_about_errors(session);
diff --git a/tools/perf/util/thread-stack.c b/tools/perf/util/thread-stack.c
index 9ed59a4..679688e 100644
--- a/tools/perf/util/thread-stack.c
+++ b/tools/perf/util/thread-stack.c
@@ -219,7 +219,7 @@ static int thread_stack__call_return(struct thread *thread,
 	return crp->process(&cr, crp->data);
 }
 
-static int thread_stack__flush(struct thread *thread, struct thread_stack *ts)
+static int __thread_stack__flush(struct thread *thread, struct thread_stack *ts)
 {
 	struct call_return_processor *crp = ts->crp;
 	int err;
@@ -242,6 +242,14 @@ static int thread_stack__flush(struct thread *thread, struct thread_stack *ts)
 	return 0;
 }
 
+int thread_stack__flush(struct thread *thread)
+{
+	if (thread->ts)
+		return __thread_stack__flush(thread, thread->ts);
+
+	return 0;
+}
+
 int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
 			u64 to_ip, u16 insn_len, u64 trace_nr)
 {
@@ -264,7 +272,7 @@ int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
 	 */
 	if (trace_nr != thread->ts->trace_nr) {
 		if (thread->ts->trace_nr)
-			thread_stack__flush(thread, thread->ts);
+			__thread_stack__flush(thread, thread->ts);
 		thread->ts->trace_nr = trace_nr;
 	}
 
@@ -297,7 +305,7 @@ void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr)
 
 	if (trace_nr != thread->ts->trace_nr) {
 		if (thread->ts->trace_nr)
-			thread_stack__flush(thread, thread->ts);
+			__thread_stack__flush(thread, thread->ts);
 		thread->ts->trace_nr = trace_nr;
 	}
 }
@@ -305,7 +313,7 @@ void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr)
 void thread_stack__free(struct thread *thread)
 {
 	if (thread->ts) {
-		thread_stack__flush(thread, thread->ts);
+		__thread_stack__flush(thread, thread->ts);
 		zfree(&thread->ts->stack);
 		zfree(&thread->ts);
 	}
@@ -689,7 +697,7 @@ int thread_stack__process(struct thread *thread, struct comm *comm,
 
 	/* Flush stack on exec */
 	if (ts->comm != comm && thread->pid_ == thread->tid) {
-		err = thread_stack__flush(thread, ts);
+		err = __thread_stack__flush(thread, ts);
 		if (err)
 			return err;
 		ts->comm = comm;
diff --git a/tools/perf/util/thread-stack.h b/tools/perf/util/thread-stack.h
index b843bbe..e1528f1 100644
--- a/tools/perf/util/thread-stack.h
+++ b/tools/perf/util/thread-stack.h
@@ -96,6 +96,7 @@ int thread_stack__event(struct thread *thread, u32 flags, u64 from_ip,
 void thread_stack__set_trace_nr(struct thread *thread, u64 trace_nr);
 void thread_stack__sample(struct thread *thread, struct ip_callchain *chain,
 			  size_t sz, u64 ip);
+int thread_stack__flush(struct thread *thread);
 void thread_stack__free(struct thread *thread);
 
 struct call_return_processor *
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 15/17] perf tools: Intel BTS to always update thread stack trace number
  2015-06-19 16:11   ` Arnaldo Carvalho de Melo
@ 2015-06-22 12:38     ` Adrian Hunter
  2015-06-22 14:33       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 47+ messages in thread
From: Adrian Hunter @ 2015-06-22 12:38 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

On 19/06/15 19:11, Arnaldo Carvalho de Melo wrote:
> Em Fri, May 29, 2015 at 04:33:43PM +0300, Adrian Hunter escreveu:
>> The enhanced thread stack is used by higher layers but still requires
>> the trace number.  The trace number is used to distinguish discontinuous
>> sections of trace (for example from Snapshot mode or Sample mode), which
>> cause the thread stack to be flushed.
>>
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/perf/util/intel-bts.c | 18 ++++++++++++++----
>>  1 file changed, 14 insertions(+), 4 deletions(-)
>>
>> diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
>> index b068860..cd7bde3 100644
>> --- a/tools/perf/util/intel-bts.c
>> +++ b/tools/perf/util/intel-bts.c
>> @@ -27,6 +27,8 @@
>>  #include "machine.h"
>>  #include "session.h"
>>  #include "util.h"
>> +#include "thread.h"
>> +#include "thread-stack.h"
>>  #include "debug.h"
>>  #include "tsc.h"
>>  #include "auxtrace.h"
>> @@ -443,19 +445,22 @@ static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
>>  
>>  static int intel_bts_process_queue(struct intel_bts_queue *btsq, u64 *timestamp)
>>  {
>> -	struct auxtrace_buffer *buffer = btsq->buffer;
>> +	struct auxtrace_buffer *buffer = btsq->buffer, *old_buffer = buffer;
>>  	struct auxtrace_queue *queue;
>> +	struct thread *thread;
>>  	int err;
>>  
>>  	if (btsq->done)
>>  		return 1;
>>  
>>  	if (btsq->pid == -1) {
>> -		struct thread *thread;
>> -
>> -		thread = machine__find_thread(btsq->bts->machine, -1, btsq->tid);
>> +		thread = machine__find_thread(btsq->bts->machine, -1,
>> +					      btsq->tid);
>>  		if (thread)
>>  			btsq->pid = thread->pid_;
>> +	} else {
>> +		thread = machine__findnew_thread(btsq->bts->machine, btsq->pid,
>> +						 btsq->tid);
> 
> Humm, so what will be done with the reference count you got from
> machine__findnew_thread()? You have to drop it when you're done with
> using this thread.
> 

Thought I fixed that. Went looking and, yes, the chunks got lost when
rolling V6 of the patches. Anyway here are the fixes as a separate patch.

From: Adrian Hunter <adrian.hunter@intel.com>
Date: Mon, 22 Jun 2015 15:02:04 +0300
Subject: [PATCH] perf tools: Fix missing thread__put()s

Processing for Intel BTS and Intel PT are using machine__find_thread() and
machine__findnew_thread() which increase the struct thread reference count.
Add missing thread__put()s when finished with that struct thread reference.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/intel-bts.c | 36 ++++++++++++++++++++++++------------
 tools/perf/util/intel-pt.c  |  5 +++--
 2 files changed, 27 insertions(+), 14 deletions(-)

diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index cd7bde33b635..dce99cfb1309 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -318,6 +318,7 @@ static int intel_bts_get_next_insn(struct
intel_bts_queue *btsq, u64 ip)
 	ssize_t len;
 	int x86_64;
 	uint8_t cpumode;
+	int err = -1;

 	bufsz = intel_pt_insn_max_size();

@@ -332,11 +333,11 @@ static int intel_bts_get_next_insn(struct
intel_bts_queue *btsq, u64 ip)

 	thread__find_addr_map(thread, cpumode, MAP__FUNCTION, ip, &al);
 	if (!al.map || !al.map->dso)
-		return -1;
+		goto out_put;

 	len = dso__data_read_addr(al.map->dso, al.map, machine, ip, buf, bufsz);
 	if (len <= 0)
-		return -1;
+		goto out_put;

 	/* Load maps to ensure dso->is_64_bit has been updated */
 	map__load(al.map, machine->symbol_filter);
@@ -344,9 +345,12 @@ static int intel_bts_get_next_insn(struct
intel_bts_queue *btsq, u64 ip)
 	x86_64 = al.map->dso->is_64_bit;

 	if (intel_pt_get_insn(buf, len, x86_64, &btsq->intel_pt_insn))
-		return -1;
+		goto out_put;

-	return 0;
+	err = 0;
+out_put:
+	thread__put(thread);
+	return err;
 }

 static int intel_bts_synth_error(struct intel_bts *bts, int cpu, pid_t pid,
@@ -471,24 +475,31 @@ static int intel_bts_process_queue(struct
intel_bts_queue *btsq, u64 *timestamp)
 	if (!buffer) {
 		if (!btsq->bts->sampling_mode)
 			btsq->done = 1;
-		return 1;
+		err = 1;
+		goto out_put;
 	}

 	/* Currently there is no support for split buffers */
-	if (buffer->consecutive)
-		return -EINVAL;
+	if (buffer->consecutive) {
+		err = -EINVAL;
+		goto out_put;
+	}

 	if (!buffer->data) {
 		int fd = perf_data_file__fd(btsq->bts->session->file);

 		buffer->data = auxtrace_buffer__get_data(buffer, fd);
-		if (!buffer->data)
-			return -ENOMEM;
+		if (!buffer->data) {
+			err = -ENOMEM;
+			goto out_put;
+		}
 	}

 	if (btsq->bts->snapshot_mode && !buffer->consecutive &&
-	    intel_bts_do_fix_overlap(queue, buffer))
-		return -ENOMEM;
+	    intel_bts_do_fix_overlap(queue, buffer)) {
+		err = -ENOMEM;
+		goto out_put;
+	}

 	if (!btsq->bts->synth_opts.callchain && thread &&
 	    (!old_buffer || btsq->bts->sampling_mode ||
@@ -507,7 +518,8 @@ static int intel_bts_process_queue(struct
intel_bts_queue *btsq, u64 *timestamp)
 		if (!btsq->bts->sampling_mode)
 			btsq->done = 1;
 	}
-
+out_put:
+	thread__put(thread);
 	return err;
 }

diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 751c43a1fbcc..8c8559615666 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -200,7 +200,7 @@ static void intel_pt_use_buffer_pid_tid(struct
intel_pt_queue *ptq,
 	intel_pt_log("queue %u cpu %d pid %d tid %d\n",
 		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);

-	ptq->thread = NULL;
+	thread__zput(ptq->thread);

 	if (ptq->tid != -1) {
 		if (ptq->pid != -1)
@@ -713,6 +713,7 @@ static void intel_pt_free_queue(void *priv)

 	if (!ptq)
 		return;
+	thread__zput(ptq->thread);
 	intel_pt_decoder_free(ptq->decoder);
 	zfree(&ptq->event_buf);
 	zfree(&ptq->chain);
@@ -726,7 +727,7 @@ static void intel_pt_set_pid_tid_cpu(struct intel_pt *pt,

 	if (queue->tid == -1 || pt->have_sched_switch) {
 		ptq->tid = machine__get_current_tid(pt->machine, ptq->cpu);
-		ptq->thread = NULL;
+		thread__zput(ptq->thread);
 	}

 	if (!ptq->thread && ptq->tid != -1)
-- 
1.9.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 05/17] perf tools: Add Intel PT instruction decoder
  2015-06-19 15:44     ` Arnaldo Carvalho de Melo
@ 2015-06-22 12:40       ` Adrian Hunter
  0 siblings, 0 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-06-22 12:40 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

On 19/06/15 18:44, Arnaldo Carvalho de Melo wrote:
> Em Thu, Jun 18, 2015 at 07:29:41PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Fri, May 29, 2015 at 04:33:33PM +0300, Adrian Hunter escreveu:
>>> Add support for decoding instructions for Intel Processor Trace.  The
>>> kernel x86 instruction decoder is used for this.
>>
>> Ok, but we don't access kernel header files directly, and:
>>
>> [acme@zoo linux]$ find . -name "insn.h"
>> ./arch/x86/include/asm/insn.h
>> ./arch/arm64/include/asm/insn.h
>> ./arch/arm/include/asm/insn.h
>> [acme@zoo linux]$ find /usr/include -name "insn.h"
>> [acme@zoo linux]$ 
>>
>> But I need to look more into this patch to figure out if this is
>> something generated at build time, etc, but before that I found a
>> problem:
>>
>> So:
>>
>>> +inat_tables_script = ../../arch/x86/tools/gen-insn-attr-x86.awk
>>> +inat_tables_maps = ../../arch/x86/lib/x86-opcode-map.txt
>>
>> These need to go into tools/perf/MANIFEST, so that:
> 
> So, after adding:
> 
> diff --git a/tools/perf/MANIFEST b/tools/perf/MANIFEST
> index fe50a1b34aa0..4e5662d8c274 100644
> --- a/tools/perf/MANIFEST
> +++ b/tools/perf/MANIFEST
> @@ -58,6 +58,13 @@ include/linux/stringify.h
>  lib/hweight.c
>  lib/rbtree.c
>  include/linux/swab.h
> +arch/x86/lib/insn.c
> +arch/x86/lib/inat.c
> +arch/x86/include/asm/insn.h
> +arch/x86/include/asm/inat.h
> +arch/x86/include/asm/inat_types.h
> +arch/x86/tools/gen-insn-attr-x86.awk
> +arch/x86/lib/x86-opcode-map.txt
>  arch/*/include/asm/unistd*.h
>  arch/*/include/uapi/asm/unistd*.h
>  arch/*/include/uapi/asm/perf_regs.h
> 
> The test passes:
> 
> [acme@zoo linux]$ make -C tools/perf -f tests/make tarpkg && echo Ok
> make: Entering directory '/home/git/linux/tools/perf'
> - tarpkg: ./tests/perf-targz-src-pkg .
> make: Leaving directory '/home/git/linux/tools/perf'
> Ok
> [acme@zoo linux]$
> 
> Merging these changes with this changeset to continue testing...

Thank you! :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 15/17] perf tools: Intel BTS to always update thread stack trace number
  2015-06-22 12:38     ` Adrian Hunter
@ 2015-06-22 14:33       ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-22 14:33 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Mon, Jun 22, 2015 at 03:38:35PM +0300, Adrian Hunter escreveu:
> On 19/06/15 19:11, Arnaldo Carvalho de Melo wrote:
> > Em Fri, May 29, 2015 at 04:33:43PM +0300, Adrian Hunter escreveu:
> >> The enhanced thread stack is used by higher layers but still requires
> >> the trace number.  The trace number is used to distinguish discontinuous
> >> sections of trace (for example from Snapshot mode or Sample mode), which
> >> cause the thread stack to be flushed.
> >>
> >> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> >> ---
> >>  tools/perf/util/intel-bts.c | 18 ++++++++++++++----
> >>  1 file changed, 14 insertions(+), 4 deletions(-)
> >>
> >> diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
> >> index b068860..cd7bde3 100644
> >> --- a/tools/perf/util/intel-bts.c
> >> +++ b/tools/perf/util/intel-bts.c
> >> @@ -27,6 +27,8 @@
> >>  #include "machine.h"
> >>  #include "session.h"
> >>  #include "util.h"
> >> +#include "thread.h"
> >> +#include "thread-stack.h"
> >>  #include "debug.h"
> >>  #include "tsc.h"
> >>  #include "auxtrace.h"
> >> @@ -443,19 +445,22 @@ static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
> >>  
> >>  static int intel_bts_process_queue(struct intel_bts_queue *btsq, u64 *timestamp)
> >>  {
> >> -	struct auxtrace_buffer *buffer = btsq->buffer;
> >> +	struct auxtrace_buffer *buffer = btsq->buffer, *old_buffer = buffer;
> >>  	struct auxtrace_queue *queue;
> >> +	struct thread *thread;
> >>  	int err;
> >>  
> >>  	if (btsq->done)
> >>  		return 1;
> >>  
> >>  	if (btsq->pid == -1) {
> >> -		struct thread *thread;
> >> -
> >> -		thread = machine__find_thread(btsq->bts->machine, -1, btsq->tid);
> >> +		thread = machine__find_thread(btsq->bts->machine, -1,
> >> +					      btsq->tid);
> >>  		if (thread)
> >>  			btsq->pid = thread->pid_;
> >> +	} else {
> >> +		thread = machine__findnew_thread(btsq->bts->machine, btsq->pid,
> >> +						 btsq->tid);
> > 
> > Humm, so what will be done with the reference count you got from
> > machine__findnew_thread()? You have to drop it when you're done with
> > using this thread.
> > 
> 
> Thought I fixed that. Went looking and, yes, the chunks got lost when
> rolling V6 of the patches. Anyway here are the fixes as a separate patch.

Thanks, I'll try and fold that into that patch, to keep bisect happy, as
refcount bugs sometimes are hard to find, I guess I'll even try writing
a generic perf based refcounting debugging tool at some point...

- Arnaldo
 
> From: Adrian Hunter <adrian.hunter@intel.com>
> Date: Mon, 22 Jun 2015 15:02:04 +0300
> Subject: [PATCH] perf tools: Fix missing thread__put()s
> 
> Processing for Intel BTS and Intel PT are using machine__find_thread() and
> machine__findnew_thread() which increase the struct thread reference count.
> Add missing thread__put()s when finished with that struct thread reference.
> 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/intel-bts.c | 36 ++++++++++++++++++++++++------------
>  tools/perf/util/intel-pt.c  |  5 +++--
>  2 files changed, 27 insertions(+), 14 deletions(-)
> 
> diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
> index cd7bde33b635..dce99cfb1309 100644
> --- a/tools/perf/util/intel-bts.c
> +++ b/tools/perf/util/intel-bts.c
> @@ -318,6 +318,7 @@ static int intel_bts_get_next_insn(struct
> intel_bts_queue *btsq, u64 ip)
>  	ssize_t len;
>  	int x86_64;
>  	uint8_t cpumode;
> +	int err = -1;
> 
>  	bufsz = intel_pt_insn_max_size();
> 
> @@ -332,11 +333,11 @@ static int intel_bts_get_next_insn(struct
> intel_bts_queue *btsq, u64 ip)
> 
>  	thread__find_addr_map(thread, cpumode, MAP__FUNCTION, ip, &al);
>  	if (!al.map || !al.map->dso)
> -		return -1;
> +		goto out_put;
> 
>  	len = dso__data_read_addr(al.map->dso, al.map, machine, ip, buf, bufsz);
>  	if (len <= 0)
> -		return -1;
> +		goto out_put;
> 
>  	/* Load maps to ensure dso->is_64_bit has been updated */
>  	map__load(al.map, machine->symbol_filter);
> @@ -344,9 +345,12 @@ static int intel_bts_get_next_insn(struct
> intel_bts_queue *btsq, u64 ip)
>  	x86_64 = al.map->dso->is_64_bit;
> 
>  	if (intel_pt_get_insn(buf, len, x86_64, &btsq->intel_pt_insn))
> -		return -1;
> +		goto out_put;
> 
> -	return 0;
> +	err = 0;
> +out_put:
> +	thread__put(thread);
> +	return err;
>  }
> 
>  static int intel_bts_synth_error(struct intel_bts *bts, int cpu, pid_t pid,
> @@ -471,24 +475,31 @@ static int intel_bts_process_queue(struct
> intel_bts_queue *btsq, u64 *timestamp)
>  	if (!buffer) {
>  		if (!btsq->bts->sampling_mode)
>  			btsq->done = 1;
> -		return 1;
> +		err = 1;
> +		goto out_put;
>  	}
> 
>  	/* Currently there is no support for split buffers */
> -	if (buffer->consecutive)
> -		return -EINVAL;
> +	if (buffer->consecutive) {
> +		err = -EINVAL;
> +		goto out_put;
> +	}
> 
>  	if (!buffer->data) {
>  		int fd = perf_data_file__fd(btsq->bts->session->file);
> 
>  		buffer->data = auxtrace_buffer__get_data(buffer, fd);
> -		if (!buffer->data)
> -			return -ENOMEM;
> +		if (!buffer->data) {
> +			err = -ENOMEM;
> +			goto out_put;
> +		}
>  	}
> 
>  	if (btsq->bts->snapshot_mode && !buffer->consecutive &&
> -	    intel_bts_do_fix_overlap(queue, buffer))
> -		return -ENOMEM;
> +	    intel_bts_do_fix_overlap(queue, buffer)) {
> +		err = -ENOMEM;
> +		goto out_put;
> +	}
> 
>  	if (!btsq->bts->synth_opts.callchain && thread &&
>  	    (!old_buffer || btsq->bts->sampling_mode ||
> @@ -507,7 +518,8 @@ static int intel_bts_process_queue(struct
> intel_bts_queue *btsq, u64 *timestamp)
>  		if (!btsq->bts->sampling_mode)
>  			btsq->done = 1;
>  	}
> -
> +out_put:
> +	thread__put(thread);
>  	return err;
>  }
> 
> diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
> index 751c43a1fbcc..8c8559615666 100644
> --- a/tools/perf/util/intel-pt.c
> +++ b/tools/perf/util/intel-pt.c
> @@ -200,7 +200,7 @@ static void intel_pt_use_buffer_pid_tid(struct
> intel_pt_queue *ptq,
>  	intel_pt_log("queue %u cpu %d pid %d tid %d\n",
>  		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
> 
> -	ptq->thread = NULL;
> +	thread__zput(ptq->thread);
> 
>  	if (ptq->tid != -1) {
>  		if (ptq->pid != -1)
> @@ -713,6 +713,7 @@ static void intel_pt_free_queue(void *priv)
> 
>  	if (!ptq)
>  		return;
> +	thread__zput(ptq->thread);
>  	intel_pt_decoder_free(ptq->decoder);
>  	zfree(&ptq->event_buf);
>  	zfree(&ptq->chain);
> @@ -726,7 +727,7 @@ static void intel_pt_set_pid_tid_cpu(struct intel_pt *pt,
> 
>  	if (queue->tid == -1 || pt->have_sched_switch) {
>  		ptq->tid = machine__get_current_tid(pt->machine, ptq->cpu);
> -		ptq->thread = NULL;
> +		thread__zput(ptq->thread);
>  	}
> 
>  	if (!ptq->thread && ptq->tid != -1)
> -- 
> 1.9.1
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-19 19:41       ` Arnaldo Carvalho de Melo
@ 2015-06-22 18:24         ` Arnaldo Carvalho de Melo
  2015-06-22 20:26           ` Adrian Hunter
  0 siblings, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-22 18:24 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Fri, Jun 19, 2015 at 04:41:56PM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Fri, Jun 19, 2015 at 10:33:43PM +0300, Adrian Hunter escreveu:
> > On 19/06/2015 7:04 p.m., Arnaldo Carvalho de Melo wrote:
> > >Em Fri, May 29, 2015 at 04:33:36PM +0300, Adrian Hunter escreveu:
> > >>Add support for Intel Processor Trace.
> 
> > >>Intel PT support fits within the new auxtrace infrastructure.
> > >>Recording is supporting by identifying the Intel PT PMU,
> > >>parsing options and setting up events.  Decoding is supported
> > >>by queuing up trace data by cpu or thread and then decoding
> > >>synchronously delivering synthesized event samples into the
> > >>session processing for tools to consume.
> 
> > >So, at this point what commands should I use to test this? I expected to
> > >be able to have some command here, in this changeset log, telling me
> > >that what has been applied so far + this "Add Intel PT support", can be
> > >used in such and such a fashion, obtaining this and that output.
> 
> > >Now I'll go back and look at the cover letter to see what I can do at
> > >this point and with access to a Broadwell class machine.
> 
> > Actually you need the next patch "perf tools: Take Intel PT into use" to do anything.
> 
> Yeah, saw that, the title of this patch fooled me into thinking that
> Intel PT support was added :-)
> 
> Anyway, stopping for a moment to push stuff ready to Ingo, will get back
> to this after that.

So, got back to it, added that "take it into use" patch and now trying
to follow that documentation:

[root@perf4 ~]# perf evlist
intel_pt//u
sched:sched_switch
dummy:u
[root@perf4 ~]# perf report
[root@perf4 ~]#  perf record -e intel_pt//u -a sleep 10
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.379 MB perf.data ]
[root@perf4 ~]# 
[root@perf4 ~]# 
[root@perf4 ~]# perf report
[root@perf4 ~]# perf evlist
intel_pt//u
sched:sched_switch
dummy:u
[root@perf4 ~]# uname -r
4.1.0-rc8
[root@perf4 ~]# 

I am not getting any "intel_pt//u" event, ideas?

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-22 18:24         ` Arnaldo Carvalho de Melo
@ 2015-06-22 20:26           ` Adrian Hunter
  2015-06-22 23:00             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 47+ messages in thread
From: Adrian Hunter @ 2015-06-22 20:26 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

On 22/06/2015 9:24 p.m., Arnaldo Carvalho de Melo wrote:
> Em Fri, Jun 19, 2015 at 04:41:56PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Em Fri, Jun 19, 2015 at 10:33:43PM +0300, Adrian Hunter escreveu:
>>> On 19/06/2015 7:04 p.m., Arnaldo Carvalho de Melo wrote:
>>>> Em Fri, May 29, 2015 at 04:33:36PM +0300, Adrian Hunter escreveu:
>>>>> Add support for Intel Processor Trace.
>>
>>>>> Intel PT support fits within the new auxtrace infrastructure.
>>>>> Recording is supporting by identifying the Intel PT PMU,
>>>>> parsing options and setting up events.  Decoding is supported
>>>>> by queuing up trace data by cpu or thread and then decoding
>>>>> synchronously delivering synthesized event samples into the
>>>>> session processing for tools to consume.
>>
>>>> So, at this point what commands should I use to test this? I expected to
>>>> be able to have some command here, in this changeset log, telling me
>>>> that what has been applied so far + this "Add Intel PT support", can be
>>>> used in such and such a fashion, obtaining this and that output.
>>
>>>> Now I'll go back and look at the cover letter to see what I can do at
>>>> this point and with access to a Broadwell class machine.
>>
>>> Actually you need the next patch "perf tools: Take Intel PT into use" to do anything.
>>
>> Yeah, saw that, the title of this patch fooled me into thinking that
>> Intel PT support was added :-)
>>
>> Anyway, stopping for a moment to push stuff ready to Ingo, will get back
>> to this after that.
>
> So, got back to it, added that "take it into use" patch and now trying
> to follow that documentation:
>
> [root@perf4 ~]# perf evlist
> intel_pt//u
> sched:sched_switch
> dummy:u
> [root@perf4 ~]# perf report
> [root@perf4 ~]#  perf record -e intel_pt//u -a sleep 10
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.379 MB perf.data ]
> [root@perf4 ~]#
> [root@perf4 ~]#
> [root@perf4 ~]# perf report
> [root@perf4 ~]# perf evlist
> intel_pt//u
> sched:sched_switch
> dummy:u
> [root@perf4 ~]# uname -r
> 4.1.0-rc8
> [root@perf4 ~]#
>
> I am not getting any "intel_pt//u" event, ideas?

Events are synthesized by the decoder.  You should see 'instructions:u' events.

What does perf report --stdio give?
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-22 20:26           ` Adrian Hunter
@ 2015-06-22 23:00             ` Arnaldo Carvalho de Melo
  2015-06-23  6:29               ` Adrian Hunter
  0 siblings, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-22 23:00 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Mon, Jun 22, 2015 at 11:26:34PM +0300, Adrian Hunter escreveu:
> On 22/06/2015 9:24 p.m., Arnaldo Carvalho de Melo wrote:
> >Em Fri, Jun 19, 2015 at 04:41:56PM -0300, Arnaldo Carvalho de Melo escreveu:
> >>Em Fri, Jun 19, 2015 at 10:33:43PM +0300, Adrian Hunter escreveu:
> >>>On 19/06/2015 7:04 p.m., Arnaldo Carvalho de Melo wrote:
> >>>>Em Fri, May 29, 2015 at 04:33:36PM +0300, Adrian Hunter escreveu:
> >>>>>Add support for Intel Processor Trace.
> >>
> >>>>>Intel PT support fits within the new auxtrace infrastructure.
> >>>>>Recording is supporting by identifying the Intel PT PMU,
> >>>>>parsing options and setting up events.  Decoding is supported
> >>>>>by queuing up trace data by cpu or thread and then decoding
> >>>>>synchronously delivering synthesized event samples into the
> >>>>>session processing for tools to consume.
> >>
> >>>>So, at this point what commands should I use to test this? I expected to
> >>>>be able to have some command here, in this changeset log, telling me
> >>>>that what has been applied so far + this "Add Intel PT support", can be
> >>>>used in such and such a fashion, obtaining this and that output.
> >>
> >>>>Now I'll go back and look at the cover letter to see what I can do at
> >>>>this point and with access to a Broadwell class machine.
> >>
> >>>Actually you need the next patch "perf tools: Take Intel PT into use" to do anything.
> >>
> >>Yeah, saw that, the title of this patch fooled me into thinking that
> >>Intel PT support was added :-)
> >>
> >>Anyway, stopping for a moment to push stuff ready to Ingo, will get back
> >>to this after that.
> >
> >So, got back to it, added that "take it into use" patch and now trying
> >to follow that documentation:
> >
> >[root@perf4 ~]# perf evlist
> >intel_pt//u
> >sched:sched_switch
> >dummy:u
> >[root@perf4 ~]# perf report
> >[root@perf4 ~]#  perf record -e intel_pt//u -a sleep 10
> >[ perf record: Woken up 1 times to write data ]
> >[ perf record: Captured and wrote 0.379 MB perf.data ]
> >[root@perf4 ~]#
> >[root@perf4 ~]#
> >[root@perf4 ~]# perf report
> >[root@perf4 ~]# perf evlist
> >intel_pt//u
> >sched:sched_switch
> >dummy:u
> >[root@perf4 ~]# uname -r
> >4.1.0-rc8
> >[root@perf4 ~]#
> >
> >I am not getting any "intel_pt//u" event, ideas?
> 
> Events are synthesized by the decoder.  You should see 'instructions:u' events.
> 
> What does perf report --stdio give?

Well, away from a Broadwell machine now, applied a few patches more and
I'm now trying BTS, on this Ivy Bridge notebook (MacBook Air):

[    0.000000] DMI: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, BIOS MBA51.88Z.00EF.B02.1211271028 11/27/2012

[    0.116644] perf_event_intel: PMU erratum BJ122, BV98, HSD29 worked around, HT is on

[    0.061626] TSC deadline timer enabled
[    0.061630] smpboot: CPU0: Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz (fam: 06, model: 3a, stepping: 09)
[    0.061661] Performance Events: PEBS fmt1+, 16-deep LBR, IvyBridge events, full-width counters, Intel PMU driver.
[    0.061685] ... version:                3
[    0.061686] ... bit width:              48
[    0.061687] ... generic registers:      4
[    0.061688] ... value mask:             0000ffffffffffff
[    0.061690] ... max period:             0000ffffffffffff
[    0.061691] ... fixed-purpose events:   3
[    0.061692] ... event mask:             000000070000000f
[    0.062587] x86: Booting SMP configuration:
[    0.062589] .... node  #0, CPUs:      #1
[    0.074078] microcode: CPU1 microcode updated early to revision 0x1b, date = 2014-05-29
[    0.076715] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
[    0.076825]  #2 #3
[    0.104559] x86: Booted up 1 node, 4 CPUs
[    0.104563] smpboot: Total of 4 processors activated (19953.49 BogoMIPS)

[root@zoo ~]# perf record -e intel_bts//u --per-thread  sleep 5
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.531 MB perf.data ]
[root@zoo ~]# perf evlist
intel_bts//u
dummy:u
[root@zoo ~]#

[root@zoo ~]# perf report --stdio | head -40
# To display the perf.data header info, please use --header/--header-only options.
#
#
# Total Lost Samples: 0
#
# Samples: 0  of event 'intel_bts//u'
# Event count (approx.): 0
#
# Overhead  Command  Shared Object  Symbol
# ........  .......  .............  ......
#


# Samples: 0  of event 'dummy:u'
# Event count (approx.): 0
#
# Overhead  Command  Shared Object  Symbol
# ........  .......  .............  ......
#


# Samples: 22K of event 'branches:u'
# Event count (approx.): 22548
#
# Overhead  Command  Shared Object     Symbol                
# ........  .......  ................  ......................
#
     9.82%  sleep    [unknown]         [.] 0x00007f8e86c09061
     8.28%  sleep    [unknown]         [.] 0x00007f8e86ea4e7e
     6.97%  sleep    [unknown]         [.] 0x00007f8e86c09086
     5.88%  sleep    [unknown]         [.] 0x00007f8e86e95726
     5.28%  sleep    [unknown]         [.] 0x00007f8e86e9730d
     4.69%  sleep    [unknown]         [.] 0x00007f8e86b06bc1
     4.48%  sleep    [unknown]         [.] 0x00007f8e86c090c7
     4.01%  sleep    [unknown]         [.] 0x00007f8e86c09027
     4.01%  sleep    [unknown]         [.] 0x00007f8e86c0904c
     2.84%  sleep    [unknown]         [.] 0x00007f8e86c0908c
     2.74%  sleep    [unknown]         [.] 0x00007f8e86c0909d
     2.74%  sleep    [unknown]         [.] 0x00007f8e86c09037
     1.20%  sleep    [unknown]         [.] 0x00007f8e86af9c68

-----------------------------------------------------------------

Ok, so it synthesized the branches:u, that don't appear on the 'perf evlist'
output, that is the command I use to see what kinds of events are contained
in a given perf.data file, probably we should add a note there that
branches:u will be synthesized at 'report' time, trying with 'perf script':

[root@zoo ~]# perf script | head -10
  :8676  8676  1 branches:u: ffffffff81799b47 [unknown] ([unknown]) => 7f8e86e8bcf0 [unknown] ([unknown])
  :8676  8676  1 branches:u: ffffffff81799b47 [unknown] ([unknown]) => 7f8e86e8bcf0 [unknown] ([unknown])
  :8676  8676  1 branches:u:     7f8e86e8bcf3 [unknown] ([unknown]) => 7f8e86e8f980 [unknown] ([unknown])
  :8676  8676  1 branches:u: ffffffff81799b47 [unknown] ([unknown]) => 7f8e86e8f9a6 [unknown] ([unknown])
  :8676  8676  1 branches:u:     7f8e86e8fa11 [unknown] ([unknown]) => 7f8e86e8fa2f [unknown] ([unknown])
  :8676  8676  1 branches:u:     7f8e86e8fa33 [unknown] ([unknown]) => 7f8e86e8fa18 [unknown] ([unknown])
  :8676  8676  1 branches:u:     7f8e86e8fa33 [unknown] ([unknown]) => 7f8e86e8fa18 [unknown] ([unknown])
  :8676  8676  1 branches:u:     7f8e86e8fa3f [unknown] ([unknown]) => 7f8e86e8fc18 [unknown] ([unknown])
  :8676  8676  1 branches:u:     7f8e86e8fc20 [unknown] ([unknown]) => 7f8e86e8fc40 [unknown] ([unknown])

The synthesized records looks sane:

[root@zoo ~]# perf report -D | grep PERF_RECORD_ | tail -15
0x36f8 [0x78]: PERF_RECORD_MMAP -1/0: [0xffffffffa0933000(0x5000) @ 0]: x /lib/modules/4.1.0-rc5+/kernel/net/netfilter/xt_CHECKSUM.ko
0x3770 [0x68]: PERF_RECORD_MMAP -1/0: [0xffffffffa0938000(0x18000) @ 0]: x /lib/modules/4.1.0-rc5+/kernel/fs/fuse/fuse.ko
0x37d8 [0x68]: PERF_RECORD_MMAP -1/0: [0xffffffffa0950000(0x5f6affff) @ 0]: x /lib/modules/4.1.0-rc5+/kernel/crypto/ccm.ko
0x3840 [0x20]: PERF_RECORD_ITRACE_START pid: 8676 tid: 8676
0x3860 [0x28]: PERF_RECORD_COMM exec: sleep:8676/8676
0x3888 [0x68]: PERF_RECORD_MMAP2 8676/8676: [0x400000(0x6000) @ 0 fd:01 525758 1481335351]: r-xp /usr/bin/sleep
0x38f0 [0x70]: PERF_RECORD_MMAP2 8676/8676: [0x7f8e86e8b000(0x224000) @ 0 fd:01 534148 868528687]: r-xp /usr/lib64/ld-2.20.so
0x3960 [0x60]: PERF_RECORD_MMAP2 8676/8676: [0x7ffdc21d4000(0x2000) @ 0x7ffdc21d4000 00:00 0 0]: ---p [vdso]
0x39c0 [0x70]: PERF_RECORD_MMAP2 8676/8676: [0x7f8e86ace000(0x3bd000) @ 0 fd:01 531160 868528694]: r-xp /usr/lib64/libc-2.20.so
0x3a30 [0x30]: PERF_RECORD_AUX offset: 0 size: 0x80688 flags: 0 []
0x3a60 [0x30]: PERF_RECORD_EXIT(8676:8676):(8676:8676)
0x3a90 [0x30]: PERF_RECORD_AUX offset: 0x80688 size: 0x3b70 flags: 0 []
0x3ac0 [0x30]: PERF_RECORD_EXIT(8676:8676):(8676:8676)
0x3af0 [0x30]: PERF_RECORD_AUXTRACE size: 0x841f8  offset: 0  ref: 0x531e5c6921c  idx: 0  tid: 8676  cpu: -1
0x87d18 [0x8]: PERF_RECORD_FINISHED_ROUNDAggregated stats: (excludes AUX area (e.g. instruction trace) decoded / synthesized events)
[root@zoo ~]# 

One of those samples

>>> (0x7f8e86e8fa3f - 0x7f8e86ace000) < 0x3bd000
False

After the libc-2.20.so map

>>> (0x7f8e86e8fa3f - 0x7f8e86e8b000) < 0x224000
True

Ok, so a /usr/lib64/ld-2.20.so sample, and:

[root@zoo ~]# rpm -q glibc-debuginfo
glibc-debuginfo-2.20-8.fc21.x86_64

But then, it didn't even resolve the DSO, which it should, as I did manually :-/

Will continue investigating... Perhaps this is fixed in another patch? What I
have test merged so far is at my tmp.perf/pt branch.

I will still update the cset comments and possibly do some other changes to
preserve bisectability or some other fix so that we get more test output
inserted in the changesets.

- Arnaldo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-22 23:00             ` Arnaldo Carvalho de Melo
@ 2015-06-23  6:29               ` Adrian Hunter
  2015-06-23 15:15                 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 47+ messages in thread
From: Adrian Hunter @ 2015-06-23  6:29 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

On 23/06/15 02:00, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jun 22, 2015 at 11:26:34PM +0300, Adrian Hunter escreveu:
>> On 22/06/2015 9:24 p.m., Arnaldo Carvalho de Melo wrote:
>>> Em Fri, Jun 19, 2015 at 04:41:56PM -0300, Arnaldo Carvalho de Melo escreveu:
>>>> Em Fri, Jun 19, 2015 at 10:33:43PM +0300, Adrian Hunter escreveu:
>>>>> On 19/06/2015 7:04 p.m., Arnaldo Carvalho de Melo wrote:
>>>>>> Em Fri, May 29, 2015 at 04:33:36PM +0300, Adrian Hunter escreveu:
>>>>>>> Add support for Intel Processor Trace.
>>>>
>>>>>>> Intel PT support fits within the new auxtrace infrastructure.
>>>>>>> Recording is supporting by identifying the Intel PT PMU,
>>>>>>> parsing options and setting up events.  Decoding is supported
>>>>>>> by queuing up trace data by cpu or thread and then decoding
>>>>>>> synchronously delivering synthesized event samples into the
>>>>>>> session processing for tools to consume.
>>>>
>>>>>> So, at this point what commands should I use to test this? I expected to
>>>>>> be able to have some command here, in this changeset log, telling me
>>>>>> that what has been applied so far + this "Add Intel PT support", can be
>>>>>> used in such and such a fashion, obtaining this and that output.
>>>>
>>>>>> Now I'll go back and look at the cover letter to see what I can do at
>>>>>> this point and with access to a Broadwell class machine.
>>>>
>>>>> Actually you need the next patch "perf tools: Take Intel PT into use" to do anything.
>>>>
>>>> Yeah, saw that, the title of this patch fooled me into thinking that
>>>> Intel PT support was added :-)
>>>>
>>>> Anyway, stopping for a moment to push stuff ready to Ingo, will get back
>>>> to this after that.
>>>
>>> So, got back to it, added that "take it into use" patch and now trying
>>> to follow that documentation:
>>>
>>> [root@perf4 ~]# perf evlist
>>> intel_pt//u
>>> sched:sched_switch
>>> dummy:u
>>> [root@perf4 ~]# perf report
>>> [root@perf4 ~]#  perf record -e intel_pt//u -a sleep 10
>>> [ perf record: Woken up 1 times to write data ]
>>> [ perf record: Captured and wrote 0.379 MB perf.data ]
>>> [root@perf4 ~]#
>>> [root@perf4 ~]#
>>> [root@perf4 ~]# perf report
>>> [root@perf4 ~]# perf evlist
>>> intel_pt//u
>>> sched:sched_switch
>>> dummy:u
>>> [root@perf4 ~]# uname -r
>>> 4.1.0-rc8
>>> [root@perf4 ~]#
>>>
>>> I am not getting any "intel_pt//u" event, ideas?
>>
>> Events are synthesized by the decoder.  You should see 'instructions:u' events.
>>
>> What does perf report --stdio give?
> 
> Well, away from a Broadwell machine now, applied a few patches more and
> I'm now trying BTS, on this Ivy Bridge notebook (MacBook Air):
> 
> [    0.000000] DMI: Apple Inc. MacBookAir5,1/Mac-66F35F19FE2A0D05, BIOS MBA51.88Z.00EF.B02.1211271028 11/27/2012
> 
> [    0.116644] perf_event_intel: PMU erratum BJ122, BV98, HSD29 worked around, HT is on
> 
> [    0.061626] TSC deadline timer enabled
> [    0.061630] smpboot: CPU0: Intel(R) Core(TM) i7-3667U CPU @ 2.00GHz (fam: 06, model: 3a, stepping: 09)
> [    0.061661] Performance Events: PEBS fmt1+, 16-deep LBR, IvyBridge events, full-width counters, Intel PMU driver.
> [    0.061685] ... version:                3
> [    0.061686] ... bit width:              48
> [    0.061687] ... generic registers:      4
> [    0.061688] ... value mask:             0000ffffffffffff
> [    0.061690] ... max period:             0000ffffffffffff
> [    0.061691] ... fixed-purpose events:   3
> [    0.061692] ... event mask:             000000070000000f
> [    0.062587] x86: Booting SMP configuration:
> [    0.062589] .... node  #0, CPUs:      #1
> [    0.074078] microcode: CPU1 microcode updated early to revision 0x1b, date = 2014-05-29
> [    0.076715] NMI watchdog: enabled on all CPUs, permanently consumes one hw-PMU counter.
> [    0.076825]  #2 #3
> [    0.104559] x86: Booted up 1 node, 4 CPUs
> [    0.104563] smpboot: Total of 4 processors activated (19953.49 BogoMIPS)
> 
> [root@zoo ~]# perf record -e intel_bts//u --per-thread  sleep 5
> [ perf record: Woken up 1 times to write data ]
> [ perf record: Captured and wrote 0.531 MB perf.data ]
> [root@zoo ~]# perf evlist
> intel_bts//u
> dummy:u
> [root@zoo ~]#
> 
> [root@zoo ~]# perf report --stdio | head -40
> # To display the perf.data header info, please use --header/--header-only options.
> #
> #
> # Total Lost Samples: 0
> #
> # Samples: 0  of event 'intel_bts//u'
> # Event count (approx.): 0
> #
> # Overhead  Command  Shared Object  Symbol
> # ........  .......  .............  ......
> #
> 
> 
> # Samples: 0  of event 'dummy:u'
> # Event count (approx.): 0
> #
> # Overhead  Command  Shared Object  Symbol
> # ........  .......  .............  ......
> #
> 
> 
> # Samples: 22K of event 'branches:u'
> # Event count (approx.): 22548
> #
> # Overhead  Command  Shared Object     Symbol                
> # ........  .......  ................  ......................
> #
>      9.82%  sleep    [unknown]         [.] 0x00007f8e86c09061
>      8.28%  sleep    [unknown]         [.] 0x00007f8e86ea4e7e
>      6.97%  sleep    [unknown]         [.] 0x00007f8e86c09086
>      5.88%  sleep    [unknown]         [.] 0x00007f8e86e95726
>      5.28%  sleep    [unknown]         [.] 0x00007f8e86e9730d
>      4.69%  sleep    [unknown]         [.] 0x00007f8e86b06bc1
>      4.48%  sleep    [unknown]         [.] 0x00007f8e86c090c7
>      4.01%  sleep    [unknown]         [.] 0x00007f8e86c09027
>      4.01%  sleep    [unknown]         [.] 0x00007f8e86c0904c
>      2.84%  sleep    [unknown]         [.] 0x00007f8e86c0908c
>      2.74%  sleep    [unknown]         [.] 0x00007f8e86c0909d
>      2.74%  sleep    [unknown]         [.] 0x00007f8e86c09037
>      1.20%  sleep    [unknown]         [.] 0x00007f8e86af9c68
> 
> -----------------------------------------------------------------
> 
> Ok, so it synthesized the branches:u, that don't appear on the 'perf evlist'
> output, that is the command I use to see what kinds of events are contained
> in a given perf.data file, probably we should add a note there that
> branches:u will be synthesized at 'report' time, trying with 'perf script':
> 
> [root@zoo ~]# perf script | head -10
>   :8676  8676  1 branches:u: ffffffff81799b47 [unknown] ([unknown]) => 7f8e86e8bcf0 [unknown] ([unknown])
>   :8676  8676  1 branches:u: ffffffff81799b47 [unknown] ([unknown]) => 7f8e86e8bcf0 [unknown] ([unknown])
>   :8676  8676  1 branches:u:     7f8e86e8bcf3 [unknown] ([unknown]) => 7f8e86e8f980 [unknown] ([unknown])
>   :8676  8676  1 branches:u: ffffffff81799b47 [unknown] ([unknown]) => 7f8e86e8f9a6 [unknown] ([unknown])
>   :8676  8676  1 branches:u:     7f8e86e8fa11 [unknown] ([unknown]) => 7f8e86e8fa2f [unknown] ([unknown])
>   :8676  8676  1 branches:u:     7f8e86e8fa33 [unknown] ([unknown]) => 7f8e86e8fa18 [unknown] ([unknown])
>   :8676  8676  1 branches:u:     7f8e86e8fa33 [unknown] ([unknown]) => 7f8e86e8fa18 [unknown] ([unknown])
>   :8676  8676  1 branches:u:     7f8e86e8fa3f [unknown] ([unknown]) => 7f8e86e8fc18 [unknown] ([unknown])
>   :8676  8676  1 branches:u:     7f8e86e8fc20 [unknown] ([unknown]) => 7f8e86e8fc40 [unknown] ([unknown])
> 
> The synthesized records looks sane:
> 
> [root@zoo ~]# perf report -D | grep PERF_RECORD_ | tail -15
> 0x36f8 [0x78]: PERF_RECORD_MMAP -1/0: [0xffffffffa0933000(0x5000) @ 0]: x /lib/modules/4.1.0-rc5+/kernel/net/netfilter/xt_CHECKSUM.ko
> 0x3770 [0x68]: PERF_RECORD_MMAP -1/0: [0xffffffffa0938000(0x18000) @ 0]: x /lib/modules/4.1.0-rc5+/kernel/fs/fuse/fuse.ko
> 0x37d8 [0x68]: PERF_RECORD_MMAP -1/0: [0xffffffffa0950000(0x5f6affff) @ 0]: x /lib/modules/4.1.0-rc5+/kernel/crypto/ccm.ko
> 0x3840 [0x20]: PERF_RECORD_ITRACE_START pid: 8676 tid: 8676
> 0x3860 [0x28]: PERF_RECORD_COMM exec: sleep:8676/8676
> 0x3888 [0x68]: PERF_RECORD_MMAP2 8676/8676: [0x400000(0x6000) @ 0 fd:01 525758 1481335351]: r-xp /usr/bin/sleep
> 0x38f0 [0x70]: PERF_RECORD_MMAP2 8676/8676: [0x7f8e86e8b000(0x224000) @ 0 fd:01 534148 868528687]: r-xp /usr/lib64/ld-2.20.so
> 0x3960 [0x60]: PERF_RECORD_MMAP2 8676/8676: [0x7ffdc21d4000(0x2000) @ 0x7ffdc21d4000 00:00 0 0]: ---p [vdso]
> 0x39c0 [0x70]: PERF_RECORD_MMAP2 8676/8676: [0x7f8e86ace000(0x3bd000) @ 0 fd:01 531160 868528694]: r-xp /usr/lib64/libc-2.20.so
> 0x3a30 [0x30]: PERF_RECORD_AUX offset: 0 size: 0x80688 flags: 0 []
> 0x3a60 [0x30]: PERF_RECORD_EXIT(8676:8676):(8676:8676)
> 0x3a90 [0x30]: PERF_RECORD_AUX offset: 0x80688 size: 0x3b70 flags: 0 []
> 0x3ac0 [0x30]: PERF_RECORD_EXIT(8676:8676):(8676:8676)
> 0x3af0 [0x30]: PERF_RECORD_AUXTRACE size: 0x841f8  offset: 0  ref: 0x531e5c6921c  idx: 0  tid: 8676  cpu: -1
> 0x87d18 [0x8]: PERF_RECORD_FINISHED_ROUNDAggregated stats: (excludes AUX area (e.g. instruction trace) decoded / synthesized events)
> [root@zoo ~]# 
> 
> One of those samples
> 
>>>> (0x7f8e86e8fa3f - 0x7f8e86ace000) < 0x3bd000
> False
> 
> After the libc-2.20.so map
> 
>>>> (0x7f8e86e8fa3f - 0x7f8e86e8b000) < 0x224000
> True
> 
> Ok, so a /usr/lib64/ld-2.20.so sample, and:
> 
> [root@zoo ~]# rpm -q glibc-debuginfo
> glibc-debuginfo-2.20-8.fc21.x86_64
> 
> But then, it didn't even resolve the DSO, which it should, as I did manually :-/
> 
> Will continue investigating... Perhaps this is fixed in another patch? What I
> have test merged so far is at my tmp.perf/pt branch.

I tried the same commands with perf tools from that branch (tmp.perf/pt) and
it seemed to work fine.

One reason for not getting symbols is compiling perf tools without ELF support.

> 
> I will still update the cset comments and possibly do some other changes to
> preserve bisectability or some other fix so that we get more test output
> inserted in the changesets.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
Please read the FAQ at  http://www.tux.org/lkml/

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-23  6:29               ` Adrian Hunter
@ 2015-06-23 15:15                 ` Arnaldo Carvalho de Melo
  2015-06-25 13:37                   ` Adrian Hunter
  0 siblings, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-23 15:15 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa

Em Tue, Jun 23, 2015 at 09:29:34AM +0300, Adrian Hunter escreveu:
> On 23/06/15 02:00, Arnaldo Carvalho de Melo wrote:
> > [root@zoo ~]# rpm -q glibc-debuginfo
> > glibc-debuginfo-2.20-8.fc21.x86_64

> > But then, it didn't even resolve the DSO, which it should, as I did manually :-/

> > Will continue investigating... Perhaps this is fixed in another patch? What I
> > have test merged so far is at my tmp.perf/pt branch.

> I tried the same commands with perf tools from that branch (tmp.perf/pt) and
> it seemed to work fine.

> One reason for not getting symbols is compiling perf tools without ELF support.

sure, but that is not the case here. But yeah, I'll try and triple check
everything, next time will add the list of features detected to the
problem report, so that you know which features were detected.

- Arnaldo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* [tip:perf/core] perf tools: Allow auxtrace data alignment
  2015-05-29 13:33 ` [PATCH V6 10/17] perf tools: Allow auxtrace data alignment Adrian Hunter
@ 2015-06-25  7:58   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 47+ messages in thread
From: tip-bot for Adrian Hunter @ 2015-06-25  7:58 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, hpa, linux-kernel, tglx, adrian.hunter, jolsa, acme

Commit-ID:  83b2ea257eb1d43e52f76d756722aeb899a2852c
Gitweb:     http://git.kernel.org/tip/83b2ea257eb1d43e52f76d756722aeb899a2852c
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Fri, 29 May 2015 16:33:38 +0300
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Tue, 23 Jun 2015 18:28:37 -0300

perf tools: Allow auxtrace data alignment

Allow auxtrace data to be a multiple of something other than page size.
That is needed for BTS where the buffer contains 24-byte records.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1432906425-9911-11-git-send-email-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/auxtrace.c | 7 +++++++
 tools/perf/util/auxtrace.h | 1 +
 2 files changed, 8 insertions(+)

diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index 3dab006..7e7405c 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -1182,6 +1182,13 @@ static int __auxtrace_mmap__read(struct auxtrace_mmap *mm,
 		data2 = NULL;
 	}
 
+	if (itr->alignment) {
+		unsigned int unwanted = len1 % itr->alignment;
+
+		len1 -= unwanted;
+		size -= unwanted;
+	}
+
 	/* padding must be written by fn() e.g. record__process_auxtrace() */
 	padding = size & 7;
 	if (padding)
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index a171abb..471aecb 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -303,6 +303,7 @@ struct auxtrace_record {
 				      const char *str);
 	u64 (*reference)(struct auxtrace_record *itr);
 	int (*read_finish)(struct auxtrace_record *itr, int idx);
+	unsigned int alignment;
 };
 
 #ifdef HAVE_AUXTRACE_SUPPORT

^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-23 15:15                 ` Arnaldo Carvalho de Melo
@ 2015-06-25 13:37                   ` Adrian Hunter
  2015-06-25 13:45                     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 47+ messages in thread
From: Adrian Hunter @ 2015-06-25 13:37 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Stephane Eranian

On 23/06/15 18:15, Arnaldo Carvalho de Melo wrote:
> Em Tue, Jun 23, 2015 at 09:29:34AM +0300, Adrian Hunter escreveu:
>> On 23/06/15 02:00, Arnaldo Carvalho de Melo wrote:
>>> [root@zoo ~]# rpm -q glibc-debuginfo
>>> glibc-debuginfo-2.20-8.fc21.x86_64
> 
>>> But then, it didn't even resolve the DSO, which it should, as I did manually :-/
> 
>>> Will continue investigating... Perhaps this is fixed in another patch? What I
>>> have test merged so far is at my tmp.perf/pt branch.
> 
>> I tried the same commands with perf tools from that branch (tmp.perf/pt) and
>> it seemed to work fine.
> 
>> One reason for not getting symbols is compiling perf tools without ELF support.
> 
> sure, but that is not the case here. But yeah, I'll try and triple check
> everything, next time will add the list of features detected to the
> problem report, so that you know which features were detected.

How is it going?


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-25 13:37                   ` Adrian Hunter
@ 2015-06-25 13:45                     ` Arnaldo Carvalho de Melo
  2015-06-25 23:56                       ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-25 13:45 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Stephane Eranian

Em Thu, Jun 25, 2015 at 04:37:01PM +0300, Adrian Hunter escreveu:
> On 23/06/15 18:15, Arnaldo Carvalho de Melo wrote:
> > Em Tue, Jun 23, 2015 at 09:29:34AM +0300, Adrian Hunter escreveu:
> >> On 23/06/15 02:00, Arnaldo Carvalho de Melo wrote:
> >>> [root@zoo ~]# rpm -q glibc-debuginfo
> >>> glibc-debuginfo-2.20-8.fc21.x86_64
> > 
> >>> But then, it didn't even resolve the DSO, which it should, as I did manually :-/
> > 
> >>> Will continue investigating... Perhaps this is fixed in another patch? What I
> >>> have test merged so far is at my tmp.perf/pt branch.
> > 
> >> I tried the same commands with perf tools from that branch (tmp.perf/pt) and
> >> it seemed to work fine.
> > 
> >> One reason for not getting symbols is compiling perf tools without ELF support.
> > 
> > sure, but that is not the case here. But yeah, I'll try and triple check
> > everything, next time will add the list of features detected to the
> > problem report, so that you know which features were detected.
> 
> How is it going?

Yesterday was a holiday here, so no progress, will let you know when I
make some.

- Arnaldo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-25 13:45                     ` Arnaldo Carvalho de Melo
@ 2015-06-25 23:56                       ` Arnaldo Carvalho de Melo
  2015-06-26  0:09                         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-25 23:56 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Stephane Eranian

Em Thu, Jun 25, 2015 at 10:45:57AM -0300, Arnaldo Carvalho de Melo escreveu:
> Em Thu, Jun 25, 2015 at 04:37:01PM +0300, Adrian Hunter escreveu:
> > On 23/06/15 18:15, Arnaldo Carvalho de Melo wrote:
> > > Em Tue, Jun 23, 2015 at 09:29:34AM +0300, Adrian Hunter escreveu:
> > >> On 23/06/15 02:00, Arnaldo Carvalho de Melo wrote:
> > >>> [root@zoo ~]# rpm -q glibc-debuginfo
> > >>> glibc-debuginfo-2.20-8.fc21.x86_64
> > > 
> > >>> But then, it didn't even resolve the DSO, which it should, as I did manually :-/
> > > 
> > >>> Will continue investigating... Perhaps this is fixed in another patch? What I
> > >>> have test merged so far is at my tmp.perf/pt branch.
> > > 
> > >> I tried the same commands with perf tools from that branch (tmp.perf/pt) and
> > >> it seemed to work fine.
> > > 
> > >> One reason for not getting symbols is compiling perf tools without ELF support.
> > > 
> > > sure, but that is not the case here. But yeah, I'll try and triple check
> > > everything, next time will add the list of features detected to the
> > > problem report, so that you know which features were detected.
> > 
> > How is it going?
> 
> Yesterday was a holiday here, so no progress, will let you know when I
> make some.

So, now using the 4.1+ kernel, and the results seems to be the expected
ones for intel_bts//u:

[root@zoo ~]# perf record --per-thread -e intel_bts//u  ls
anaconda-ks.cfg  b  bin  lib64	libexec  new  old  perf.data
perf.data.old  stream_test  tg.run
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.274 MB perf.data ]
[root@zoo ~]# perf report
[root@zoo ~]# perf evlist
intel_bts//u
dummy:u
[root@zoo ~]# perf report --stdio
# To display the perf.data header info, please use --header/--header-only options.
#
# Total Lost Samples: 0
#
# Samples: 0  of event 'intel_bts//u'
# Event count (approx.): 0
#
# Overhead  Command  Shared Object  Symbol
# ........  .......  .............  ......

# Samples: 0  of event 'dummy:u'
# Event count (approx.): 0
#
# Overhead  Command  Shared Object  Symbol
# ........  .......  .............  ......

# Samples: 55K of event 'branches:u'
# Event count (approx.): 55012
#
# Overhead  Command  Shared Object       Symbol                                
# ........  .......  ..................  ......................................
#
    15.73%  ls       ld-2.20.so          [.] strcmp                            
    15.63%  ls       libc-2.20.so        [.] _dl_addr                          
    10.08%  ls       ld-2.20.so          [.] do_lookup_x                       
     7.00%  ls       ld-2.20.so          [.] _dl_name_match_p                  
     6.47%  ls       ld-2.20.so          [.] _dl_lookup_symbol_x               
     4.96%  ls       ld-2.20.so          [.] _dl_relocate_object               
     2.97%  ls       ls                  [.] quotearg_buffer_restyled          
     2.79%  ls       libc-2.20.so        [.] getenv                            
     1.95%  ls       ld-2.20.so          [.] _dl_cache_libcmp                  
     1.76%  ls       ld-2.20.so          [.] check_match.isra.0                
     1.64%  ls       libc-2.20.so        [.] __memmove_sse2                    
     1.47%  ls       ld-2.20.so          [.] _dl_map_object_deps               
     1.27%  ls       ld-2.20.so          [.] _dl_map_object_from_fd            
     1.17%  ls       ls                  [.] quote_name                        

---------------------------------------------------------------------

Will do the same tests with intel_pt as well, on a remote machine, add examples
to the changeset logs and everything going well, aim for pushing for Ingo soon,

Thanks,

- Arnaldo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-25 23:56                       ` Arnaldo Carvalho de Melo
@ 2015-06-26  0:09                         ` Arnaldo Carvalho de Melo
  2015-06-26  6:48                           ` Adrian Hunter
  0 siblings, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-26  0:09 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Stephane Eranian

Em Thu, Jun 25, 2015 at 08:56:34PM -0300, Arnaldo Carvalho de Melo escreveu:
> Will do the same tests with intel_pt as well, on a remote machine, add examples
> to the changeset logs and everything going well, aim for pushing for Ingo soon,

So, I asked for callchains, with:

 perf record -g -e intel_bts// ls

And it got stuck somewhere, then I did a perf top to see where it was,
and got to:

  96.24%  perf    [.] intel_bts_process_queue

Annotating I get to:

  1.17 │1a0:┌─→mov    0x8(%r13),%rdx
       │    │  test   %rdx,%rdx
 98.83 │    └──je     1a0


Which is an endless loop! Source code for intel_bts_process_buffer(),
inlined there:

        while (sz > sizeof(struct branch)) {
                if (!branch->from && !branch->to)
                        continue;
                err = intel_bts_synth_branch_sample(btsq, branch);
                if (err)
                        break;
                branch += 1;
                sz -= sizeof(struct branch);
        }

Can you fix this, please, so that I can fold it into where it was
introduced, namely:

commit 439ad895a2aecea09416206f023336297cc72efe
Author: Adrian Hunter <adrian.hunter@intel.com>
Date:   Fri May 29 16:33:39 2015 +0300

    perf tools: Add Intel BTS support

- Arnaldo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-26  0:09                         ` Arnaldo Carvalho de Melo
@ 2015-06-26  6:48                           ` Adrian Hunter
  2015-06-26 13:41                             ` Arnaldo Carvalho de Melo
  2015-06-26 20:34                             ` Arnaldo Carvalho de Melo
  0 siblings, 2 replies; 47+ messages in thread
From: Adrian Hunter @ 2015-06-26  6:48 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Stephane Eranian

On 26/06/15 03:09, Arnaldo Carvalho de Melo wrote:
> Em Thu, Jun 25, 2015 at 08:56:34PM -0300, Arnaldo Carvalho de Melo escreveu:
>> Will do the same tests with intel_pt as well, on a remote machine, add examples
>> to the changeset logs and everything going well, aim for pushing for Ingo soon,
> 
> So, I asked for callchains, with:
> 
>  perf record -g -e intel_bts// ls
> 
> And it got stuck somewhere, then I did a perf top to see where it was,
> and got to:
> 
>   96.24%  perf    [.] intel_bts_process_queue
> 
> Annotating I get to:
> 
>   1.17 │1a0:┌─→mov    0x8(%r13),%rdx
>        │    │  test   %rdx,%rdx
>  98.83 │    └──je     1a0
> 
> 
> Which is an endless loop! Source code for intel_bts_process_buffer(),
> inlined there:
> 
>         while (sz > sizeof(struct branch)) {
>                 if (!branch->from && !branch->to)
>                         continue;
>                 err = intel_bts_synth_branch_sample(btsq, branch);
>                 if (err)
>                         break;
>                 branch += 1;
>                 sz -= sizeof(struct branch);
>         }
> 
> Can you fix this, please, so that I can fold it into where it was
> introduced, namely:
> 
> commit 439ad895a2aecea09416206f023336297cc72efe
> Author: Adrian Hunter <adrian.hunter@intel.com>
> Date:   Fri May 29 16:33:39 2015 +0300
> 
>     perf tools: Add Intel BTS support

It is fixed as an unexpected side-effect of a following patch (which is probably why I didn't notice it - or perhaps I rolled the fix into the wrong patch O_o). The fix is in:

    perf tools: Output sample flags and insn_len from intel_bts
    
    intel_bts synthesizes samples.  Fill in the new flags and insn_len
    members with instruction information.
    
    Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>


So what you want is:


diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
index 48bcbd607ef7..68bb6fede55b 100644
--- a/tools/perf/util/intel-bts.c
+++ b/tools/perf/util/intel-bts.c
@@ -304,7 +304,7 @@ static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
 				    struct auxtrace_buffer *buffer)
 {
 	struct branch *branch;
-	size_t sz;
+	size_t sz, bsz = sizeof(struct branch);
 	int err = 0;
 
 	if (buffer->use_data) {
@@ -318,14 +318,12 @@ static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
 	if (!btsq->bts->sample_branches)
 		return 0;
 
-	while (sz > sizeof(struct branch)) {
+	for (; sz > bsz; branch += 1, sz -= bsz) {
 		if (!branch->from && !branch->to)
 			continue;
 		err = intel_bts_synth_branch_sample(btsq, branch);
 		if (err)
 			break;
-		branch += 1;
-		sz -= sizeof(struct branch);
 	}
 	return err;
 }


But obviously that will conflict with "perf tools: Output sample flags and insn_len from intel_bts"

Another thing, the intel_bts implementation does not support
"instructions" samples because there is no timing information to
use to create periodic samples.  But callchains are added only
to "instructions" samples so there are no callchains in 'perf report'
for intel_bts.  The call information is still available for
db-export and the example call-graph, though.


^ permalink raw reply related	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-26  6:48                           ` Adrian Hunter
@ 2015-06-26 13:41                             ` Arnaldo Carvalho de Melo
  2015-06-26 13:47                               ` Adrian Hunter
  2015-06-26 20:34                             ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-26 13:41 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Stephane Eranian

Em Fri, Jun 26, 2015 at 09:48:20AM +0300, Adrian Hunter escreveu:
> On 26/06/15 03:09, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Jun 25, 2015 at 08:56:34PM -0300, Arnaldo Carvalho de Melo escreveu:
> >> Will do the same tests with intel_pt as well, on a remote machine, add examples
> >> to the changeset logs and everything going well, aim for pushing for Ingo soon,
> > 
> > So, I asked for callchains, with:
> > 
> >  perf record -g -e intel_bts// ls
> > 
> > And it got stuck somewhere, then I did a perf top to see where it was,
> > and got to:
> > 
> >   96.24%  perf    [.] intel_bts_process_queue
> > 
> > Annotating I get to:
> > 
> >   1.17 │1a0:┌─→mov    0x8(%r13),%rdx
> >        │    │  test   %rdx,%rdx
> >  98.83 │    └──je     1a0
> > 
> > 
> > Which is an endless loop! Source code for intel_bts_process_buffer(),
> > inlined there:
> > 
> >         while (sz > sizeof(struct branch)) {
> >                 if (!branch->from && !branch->to)
> >                         continue;
> >                 err = intel_bts_synth_branch_sample(btsq, branch);
> >                 if (err)
> >                         break;
> >                 branch += 1;
> >                 sz -= sizeof(struct branch);
> >         }
> > 
> > Can you fix this, please, so that I can fold it into where it was
> > introduced, namely:
> > 
> > commit 439ad895a2aecea09416206f023336297cc72efe
> > Author: Adrian Hunter <adrian.hunter@intel.com>
> > Date:   Fri May 29 16:33:39 2015 +0300
> > 
> >     perf tools: Add Intel BTS support
> 
> It is fixed as an unexpected side-effect of a following patch (which is probably why I didn't notice it - or perhaps I rolled the fix into the wrong patch O_o). The fix is in:
> 
>     perf tools: Output sample flags and insn_len from intel_bts
>     
>     intel_bts synthesizes samples.  Fill in the new flags and insn_len
>     members with instruction information.
>     
>     Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> 
> So what you want is:
> 
> 
> diff --git a/tools/perf/util/intel-bts.c b/tools/perf/util/intel-bts.c
> index 48bcbd607ef7..68bb6fede55b 100644
> --- a/tools/perf/util/intel-bts.c
> +++ b/tools/perf/util/intel-bts.c
> @@ -304,7 +304,7 @@ static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
>  				    struct auxtrace_buffer *buffer)
>  {
>  	struct branch *branch;
> -	size_t sz;
> +	size_t sz, bsz = sizeof(struct branch);
>  	int err = 0;
>  
>  	if (buffer->use_data) {
> @@ -318,14 +318,12 @@ static int intel_bts_process_buffer(struct intel_bts_queue *btsq,
>  	if (!btsq->bts->sample_branches)
>  		return 0;
>  
> -	while (sz > sizeof(struct branch)) {
> +	for (; sz > bsz; branch += 1, sz -= bsz) {
>  		if (!branch->from && !branch->to)
>  			continue;
>  		err = intel_bts_synth_branch_sample(btsq, branch);
>  		if (err)
>  			break;
> -		branch += 1;
> -		sz -= sizeof(struct branch);
>  	}
>  	return err;
>  }
> 
> 
> But obviously that will conflict with "perf tools: Output sample flags and insn_len from intel_bts"

I can fix those things up, to keep it bisectable, next time please try
to do it this way :-)

> Another thing, the intel_bts implementation does not support
> "instructions" samples because there is no timing information to
> use to create periodic samples.  But callchains are added only
> to "instructions" samples so there are no callchains in 'perf report'
> for intel_bts.  The call information is still available for

Humm, so IOW, what you say is that we should refuse to run 'record' when
asking for callchains and intel_bts?

> db-export and the example call-graph, though.

- Arnaldo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-26 13:41                             ` Arnaldo Carvalho de Melo
@ 2015-06-26 13:47                               ` Adrian Hunter
  2015-06-26 15:08                                 ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 47+ messages in thread
From: Adrian Hunter @ 2015-06-26 13:47 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Stephane Eranian

On 26/06/15 16:41, Arnaldo Carvalho de Melo wrote:
>> Another thing, the intel_bts implementation does not support
>> "instructions" samples because there is no timing information to
>> use to create periodic samples.  But callchains are added only
>> to "instructions" samples so there are no callchains in 'perf report'
>> for intel_bts.  The call information is still available for
> 
> Humm, so IOW, what you say is that we should refuse to run 'record' when
> asking for callchains and intel_bts?

'record' can record other events at the same time which can have callchains.
e.g.

	perf record -g --per-thread -e intel_bts//u,branch-misses:u ls


^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-26 13:47                               ` Adrian Hunter
@ 2015-06-26 15:08                                 ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-26 15:08 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Stephane Eranian

Em Fri, Jun 26, 2015 at 04:47:39PM +0300, Adrian Hunter escreveu:
> On 26/06/15 16:41, Arnaldo Carvalho de Melo wrote:
> >> Another thing, the intel_bts implementation does not support
> >> "instructions" samples because there is no timing information to
> >> use to create periodic samples.  But callchains are added only
> >> to "instructions" samples so there are no callchains in 'perf report'
> >> for intel_bts.  The call information is still available for
> > 
> > Humm, so IOW, what you say is that we should refuse to run 'record' when
> > asking for callchains and intel_bts?
> 
> 'record' can record other events at the same time which can have callchains.
> e.g.
> 
> 	perf record -g --per-thread -e intel_bts//u,branch-misses:u ls

Right, what I was trying to say is that there are combinations where one
can ask for a callchain in record, it will act as if it did what was
asked for but then when report runs, no callchain will be available.

I guess we can check for that situation and warn the user.

- Arnaldo

^ permalink raw reply	[flat|nested] 47+ messages in thread

* Re: [PATCH V6 08/17] perf tools: Add Intel PT support
  2015-06-26  6:48                           ` Adrian Hunter
  2015-06-26 13:41                             ` Arnaldo Carvalho de Melo
@ 2015-06-26 20:34                             ` Arnaldo Carvalho de Melo
  1 sibling, 0 replies; 47+ messages in thread
From: Arnaldo Carvalho de Melo @ 2015-06-26 20:34 UTC (permalink / raw)
  To: Adrian Hunter; +Cc: Ingo Molnar, linux-kernel, Jiri Olsa, Stephane Eranian

Em Fri, Jun 26, 2015 at 09:48:20AM +0300, Adrian Hunter escreveu:
> On 26/06/15 03:09, Arnaldo Carvalho de Melo wrote:
> > Em Thu, Jun 25, 2015 at 08:56:34PM -0300, Arnaldo Carvalho de Melo escreveu:
> >> Will do the same tests with intel_pt as well, on a remote machine, add examples
> >> to the changeset logs and everything going well, aim for pushing for Ingo soon,
> > 
> > So, I asked for callchains, with:
> > 
> >  perf record -g -e intel_bts// ls
> > 
> > And it got stuck somewhere, then I did a perf top to see where it was,
> > and got to:
> > 
> >   96.24%  perf    [.] intel_bts_process_queue
> > 
> > Annotating I get to:
> > 
> >   1.17 │1a0:┌─→mov    0x8(%r13),%rdx
> >        │    │  test   %rdx,%rdx
> >  98.83 │    └──je     1a0
> > 
> > 
> > Which is an endless loop! Source code for intel_bts_process_buffer(),
> > inlined there:
> > 
> >         while (sz > sizeof(struct branch)) {
> >                 if (!branch->from && !branch->to)
> >                         continue;
> >                 err = intel_bts_synth_branch_sample(btsq, branch);
> >                 if (err)
> >                         break;
> >                 branch += 1;
> >                 sz -= sizeof(struct branch);
> >         }
> > 
> > Can you fix this, please, so that I can fold it into where it was
> > introduced, namely:
> > 
> > commit 439ad895a2aecea09416206f023336297cc72efe
> > Author: Adrian Hunter <adrian.hunter@intel.com>
> > Date:   Fri May 29 16:33:39 2015 +0300
> > 
> >     perf tools: Add Intel BTS support
> 
> It is fixed as an unexpected side-effect of a following patch (which is probably why I didn't notice it - or perhaps I rolled the fix into the wrong patch O_o). The fix is in:
> 
>     perf tools: Output sample flags and insn_len from intel_bts
>     
>     intel_bts synthesizes samples.  Fill in the new flags and insn_len
>     members with instruction information.
>     
>     Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> 
> 
> So what you want is:

Ok, folded that, fixed the problem, moving on...

- Arnaldo

^ permalink raw reply	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2015-06-26 20:34 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-05-29 13:33 [PATCH V6 00/17] perf tools: Introduce an abstraction for AUX Area and Instruction Tracing Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 01/17] perf db-export: Fix thread ref-counting Adrian Hunter
2015-05-29 18:35   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 02/17] perf tools: Ensure thread-stack is flushed Adrian Hunter
2015-06-18 21:56   ` Arnaldo Carvalho de Melo
2015-06-19  5:50     ` Adrian Hunter
2015-06-19 23:15   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 03/17] perf auxtrace: Add Intel PT as an AUX area tracing type Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 04/17] perf tools: Add Intel PT packet decoder Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 05/17] perf tools: Add Intel PT instruction decoder Adrian Hunter
2015-06-18 22:29   ` Arnaldo Carvalho de Melo
2015-06-19 15:44     ` Arnaldo Carvalho de Melo
2015-06-22 12:40       ` Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 06/17] perf tools: Add Intel PT log Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 07/17] perf tools: Add Intel PT decoder Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 08/17] perf tools: Add Intel PT support Adrian Hunter
2015-06-19 16:04   ` Arnaldo Carvalho de Melo
2015-06-19 16:22     ` Arnaldo Carvalho de Melo
2015-06-19 19:33     ` Adrian Hunter
2015-06-19 19:41       ` Arnaldo Carvalho de Melo
2015-06-22 18:24         ` Arnaldo Carvalho de Melo
2015-06-22 20:26           ` Adrian Hunter
2015-06-22 23:00             ` Arnaldo Carvalho de Melo
2015-06-23  6:29               ` Adrian Hunter
2015-06-23 15:15                 ` Arnaldo Carvalho de Melo
2015-06-25 13:37                   ` Adrian Hunter
2015-06-25 13:45                     ` Arnaldo Carvalho de Melo
2015-06-25 23:56                       ` Arnaldo Carvalho de Melo
2015-06-26  0:09                         ` Arnaldo Carvalho de Melo
2015-06-26  6:48                           ` Adrian Hunter
2015-06-26 13:41                             ` Arnaldo Carvalho de Melo
2015-06-26 13:47                               ` Adrian Hunter
2015-06-26 15:08                                 ` Arnaldo Carvalho de Melo
2015-06-26 20:34                             ` Arnaldo Carvalho de Melo
2015-05-29 13:33 ` [PATCH V6 09/17] perf tools: Take Intel PT into use Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 10/17] perf tools: Allow auxtrace data alignment Adrian Hunter
2015-06-25  7:58   ` [tip:perf/core] " tip-bot for Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 11/17] perf tools: Add Intel BTS support Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 12/17] perf tools: Output sample flags and insn_len from intel_pt Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 13/17] perf tools: Output sample flags and insn_len from intel_bts Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 14/17] perf tools: Intel PT to always update thread stack trace number Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 15/17] perf tools: Intel BTS " Adrian Hunter
2015-06-19 16:11   ` Arnaldo Carvalho de Melo
2015-06-22 12:38     ` Adrian Hunter
2015-06-22 14:33       ` Arnaldo Carvalho de Melo
2015-05-29 13:33 ` [PATCH V6 16/17] perf tools: Put itrace options into an asciidoc include Adrian Hunter
2015-05-29 13:33 ` [PATCH V6 17/17] perf tools: Add example call-graph script Adrian Hunter

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.