All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v0 00/71] perf: Add support for Intel Processor Trace
@ 2013-12-11 12:36 Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 01/71] perf: Disable all pmus on unthrottling and rescheduling Alexander Shishkin
                   ` (72 more replies)
  0 siblings, 73 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Alexander Shishkin

Hi,

This patchset adds support for Intel Processor Trace (PT) extension [1] of
Intel Architecture that allows the capture of information about software
execution flow, to the perf kernel and userspace infrastructure. We
provide an abstraction for it called "itrace" for "instruction
trace" ([2]).

The single most notable thing is that while PT outputs trace data in a
compressed binary format, it will still generate hundreds of megabytes
of trace data per second per core. Decoding this binary stream takes
2-3 orders of magnitude the cpu time that it takes to generate
it. These considerations make it impossible to carry out decoding in
kernel space. Therefore, the trace data is exported to userspace as a
zero-copy mapping that userspace can collect and store for later
decoding. To that end, perf is extended to support an additional ring
buffer per event, which will export the trace data. This ring buffer
is mapped from the event's file descriptor with a special "magic"
offset. This ring buffer has its own user page with data_head and
data_tail (in case the buffer is mapped writable) pointers used as
read/write pointers in the buffer.

This way we get a normal perf data stream that provides sideband
information that is required to decode the trace data, such as MMAPs,
COMMs etc, plus the actual trace in a separate buffer.

If the trace buffer is mapped writable, the driver will stop tracing
when it fills up (data_head approaches data_tail), till data is read,
data_tail pointer is moved forward and an ioctl() is issued to
re-enable tracing. If the trace buffer is mapped read only, the
tracing will continue, overwriting older data, so that the buffer
always contains the most recent data. Tracing can be stopped with an
ioctl() and restarted once the data is collected.

Another use case is annotating samples of other perf events: if you
set PERF_SAMPLE_ITRACE, attr.itrace_sample_size bytes of trace will be
included in each event's sample.

Also, itrace data can be included in process core dumps, which can be
enabled with a new rlimit -- RLIMIT_ITRACE.

This patchset consists of necessary changes to the perf kernel
infrastructure, PT pmu driver and the remaining 60+ patches
meticulously add itrace/PT support to perf userspace.

Patch Summary

  1 - 5  kernel support for Intel PT
  6      Allow set-output for task contexts of different types
  7 - 34 perf tools preparatory changes
 35 - 64 perf tools Instruction Tracing support
 65 - 71 perf tools Intel PT support

[1] http://software.intel.com/en-us/intel-isa-extensions
[2] http://events.linuxfoundation.org/sites/events/files/slides/lcna13_kleen.pdf

Adrian Hunter (66):
  perf: Allow set-output for task contexts of different types
  perf tools: Record whether a dso is 64-bit
  perf tools: Let a user specify a PMU event without any config terms
  perf tools: Let default config be defined for a PMU
  perf tools: Add perf_pmu__scan_file()
  perf tools: Add perf_event_paranoid()
  perf tools: Add dsos__hit_all()
  perf tools: Add machine__get_thread_pid()
  perf tools: Add cpu to struct thread
  perf tools: Add ability to record the current tid for each cpu
  perf tools: Allow header->data_offset to be predetermined
  perf tools: Add perf_evlist__can_select_event()
  perf session: Flag if the event stream is entirely in memory
  perf evlist: Pass mmap parameters in a struct
  perf tools: Move mem_bswap32/64 to util.c
  perf tools: Add feature test for __sync_val_compare_and_swap
  perf tools: Add option macro OPT_CALLBACK_OPTARG
  perf evlist: Add perf_evlist__to_front()
  perf evlist: Add perf_evlist__set_tracking_event()
  perf evsel: Add 'no_aux_samples' option
  perf evsel: Add 'immediate' option
  perf evlist: Add 'system_wide' option
  perf tools: Add id index
  perf pmu: Let pmu's with no events show up on perf list
  perf session: Add ability to skip 4GiB or more
  perf session: Add perf_session__deliver_synth_event()
  perf tools: Allow TSC conversion on any arch
  perf tools: Move rdtsc() function
  perf evlist: Add perf_evlist__enable_event_idx()
  perf tools: Add itrace members of struct perf_event_attr
  perf tools: Add support for parsing pmu itrace_config
  perf tools: Add support for PERF_RECORD_ITRACE_LOST
  perf tools: Add itrace sample parsing
  perf header: Add Instruction Tracing feature
  perf evlist: Add ability to mmap itrace buffers
  perf tools: Add user events for Instruction Tracing
  perf tools: Add support for Instruction Trace recording
  perf record: Add basic Instruction Tracing support
  perf record: Extend -m option for Instruction Tracing mmap pages
  perf tools: Add a user event for Instruction Tracing errors
  perf session: Add Instruction Tracing hooks
  perf session: Add Instruction Tracing options
  perf session: Make perf_event__itrace_swap() non-static
  perf itrace: Add helpers for Instruction Tracing errors
  perf itrace: Add helpers for queuing Instruction Tracing data
  perf itrace: Add a heap for sorting Instruction Tracing queues
  perf itrace: Add processing for Instruction Tracing events
  perf script: Add Instruction Tracing support
  perf script: Always allow fields 'addr' and 'cpu' for itrace
  perf report: Add Instruction Tracing support
  perf tools: Add Instruction Trace sampling support
  perf record: Add Instruction Trace sampling support
  perf tools: Add Instruction Tracing Snapshot Mode
  perf record: Add Instruction Tracing Snapshot Mode support
  perf inject: Re-pipe Instruction Tracing events
  perf inject: Add Instruction Tracing support
  perf inject: Cut Instruction Tracing samples
  perf tools: Add Instruction Tracing index
  perf tools: Hit all build ids when Instruction Tracing
  perf itrace: Add Intel PT as an Instruction Tracing type
  perf tools: Add Intel PT packet decoder
  perf tools: Add Intel PT instruction decoder
  perf tools: Add Intel PT log
  perf tools: Add Intel PT decoder
  perf tools: Add Intel PT support
  perf tools: Take Intel PT into use

Alexander Shishkin (5):
  perf: Disable all pmus on unthrottling and rescheduling
  x86: Add Intel Processor Trace (INTEL_PT) cpu feature detection
  perf: Abstract ring_buffer backing store operations
  itrace: Infrastructure for instruction flow tracing units
  x86: perf: Intel PT PMU driver

 arch/x86/include/asm/cpufeature.h                  |    1 +
 arch/x86/include/uapi/asm/msr-index.h              |   18 +
 arch/x86/kernel/cpu/Makefile                       |    1 +
 arch/x86/kernel/cpu/intel_pt.h                     |  129 ++
 arch/x86/kernel/cpu/perf_event.c                   |    4 +
 arch/x86/kernel/cpu/perf_event_intel.c             |   10 +
 arch/x86/kernel/cpu/perf_event_intel_pt.c          | 1167 +++++++++++
 arch/x86/kernel/cpu/scattered.c                    |    1 +
 fs/binfmt_elf.c                                    |    6 +
 fs/proc/base.c                                     |    1 +
 include/asm-generic/resource.h                     |    1 +
 include/linux/itrace.h                             |  147 ++
 include/linux/perf_event.h                         |   33 +-
 include/uapi/asm-generic/resource.h                |    3 +-
 include/uapi/linux/elf.h                           |    1 +
 include/uapi/linux/perf_event.h                    |   25 +-
 kernel/events/Makefile                             |    2 +-
 kernel/events/core.c                               |  329 ++-
 kernel/events/internal.h                           |   21 +-
 kernel/events/itrace.c                             |  589 ++++++
 kernel/events/ring_buffer.c                        |  176 +-
 kernel/exit.c                                      |    3 +
 kernel/sys.c                                       |    5 +
 tools/perf/Documentation/intel-pt.txt              |  581 ++++++
 tools/perf/Documentation/perf-inject.txt           |   20 +
 tools/perf/Documentation/perf-record.txt           |   14 +
 tools/perf/Documentation/perf-report.txt           |   21 +
 tools/perf/Documentation/perf-script.txt           |   21 +
 tools/perf/Makefile.perf                           |   30 +-
 tools/perf/arch/x86/Makefile                       |    2 +
 tools/perf/arch/x86/util/itrace.c                  |   41 +
 tools/perf/arch/x86/util/pmu.c                     |   13 +
 tools/perf/arch/x86/util/tsc.c                     |   31 +-
 tools/perf/arch/x86/util/tsc.h                     |    3 -
 tools/perf/builtin-buildid-list.c                  |    9 +
 tools/perf/builtin-inject.c                        |  193 +-
 tools/perf/builtin-record.c                        |  277 ++-
 tools/perf/builtin-report.c                        |   12 +
 tools/perf/builtin-script.c                        |   13 +
 tools/perf/config/Makefile                         |    5 +
 tools/perf/config/feature-checks/Makefile          |    4 +
 tools/perf/config/feature-checks/test-all.c        |    5 +
 .../feature-checks/test-sync-compare-and-swap.c    |   14 +
 tools/perf/perf.h                                  |   14 +
 tools/perf/tests/perf-time-to-tsc.c                |   12 +-
 tools/perf/tests/pmu.c                             |    2 +-
 tools/perf/tests/sample-parsing.c                  |    7 +-
 tools/perf/util/dso.c                              |    1 +
 tools/perf/util/dso.h                              |    1 +
 tools/perf/util/event.c                            |   21 +
 tools/perf/util/event.h                            |   70 +
 tools/perf/util/evlist.c                           |  289 ++-
 tools/perf/util/evlist.h                           |   19 +
 tools/perf/util/evsel.c                            |   86 +-
 tools/perf/util/evsel.h                            |   19 +-
 tools/perf/util/header.c                           |   73 +-
 tools/perf/util/header.h                           |    3 +
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 1678 +++++++++++++++
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |   83 +
 .../util/intel-pt-decoder/intel-pt-insn-decoder.c  |  224 ++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |   67 +
 tools/perf/util/intel-pt-decoder/intel-pt-log.c    |  119 ++
 tools/perf/util/intel-pt-decoder/intel-pt-log.h    |   52 +
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   |  404 ++++
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.h   |   68 +
 tools/perf/util/intel-pt.c                         | 2193 ++++++++++++++++++++
 tools/perf/util/intel-pt.h                         |   40 +
 tools/perf/util/itrace.c                           | 1273 ++++++++++++
 tools/perf/util/itrace.h                           |  476 +++++
 tools/perf/util/machine.c                          |   85 +
 tools/perf/util/machine.h                          |   11 +
 tools/perf/util/parse-events.c                     |   17 +-
 tools/perf/util/parse-events.h                     |    1 +
 tools/perf/util/parse-events.l                     |    1 +
 tools/perf/util/parse-events.y                     |   10 +
 tools/perf/util/parse-options.h                    |    5 +
 tools/perf/util/pmu.c                              |   95 +-
 tools/perf/util/pmu.h                              |   14 +-
 tools/perf/util/pmu.l                              |    1 +
 tools/perf/util/pmu.y                              |    9 +-
 tools/perf/util/record.c                           |   43 +-
 tools/perf/util/session.c                          |  343 ++-
 tools/perf/util/session.h                          |   27 +-
 tools/perf/util/symbol-elf.c                       |    3 +
 tools/perf/util/symbol-minimal.c                   |   23 +
 tools/perf/util/symbol.c                           |    1 +
 tools/perf/util/symbol.h                           |    1 +
 tools/perf/util/thread.c                           |    1 +
 tools/perf/util/thread.h                           |    1 +
 tools/perf/util/tool.h                             |   12 +-
 tools/perf/util/tsc.c                              |   30 +
 tools/perf/util/tsc.h                              |   12 +
 tools/perf/util/util.c                             |   41 +
 tools/perf/util/util.h                             |    6 +
 94 files changed, 11708 insertions(+), 361 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/intel_pt.h
 create mode 100644 arch/x86/kernel/cpu/perf_event_intel_pt.c
 create mode 100644 include/linux/itrace.h
 create mode 100644 kernel/events/itrace.c
 create mode 100644 tools/perf/Documentation/intel-pt.txt
 create mode 100644 tools/perf/arch/x86/util/itrace.c
 create mode 100644 tools/perf/arch/x86/util/pmu.c
 create mode 100644 tools/perf/config/feature-checks/test-sync-compare-and-swap.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.h
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
 create mode 100644 tools/perf/util/intel-pt.c
 create mode 100644 tools/perf/util/intel-pt.h
 create mode 100644 tools/perf/util/itrace.c
 create mode 100644 tools/perf/util/itrace.h
 create mode 100644 tools/perf/util/tsc.c
 create mode 100644 tools/perf/util/tsc.h

-- 
1.8.5.1


^ permalink raw reply	[flat|nested] 163+ messages in thread

* [PATCH v0 01/71] perf: Disable all pmus on unthrottling and rescheduling
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 20:53   ` Andi Kleen
  2013-12-13 18:06   ` Peter Zijlstra
  2013-12-11 12:36 ` [PATCH v0 02/71] x86: Add Intel Processor Trace (INTEL_PT) cpu feature detection Alexander Shishkin
                   ` (71 subsequent siblings)
  72 siblings, 2 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Alexander Shishkin

Currently, only one pmu in a context gets disabled during unthrottling
and event_sched_{out,in}, however, events in one context may belong to
different pmus, which results in pmus being reprogrammed while they are
still enabled. This patch temporarily disables pmus that correspond to
each event in the context while these events are being modified.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 kernel/events/core.c | 27 ++++++++++++++++++++++++---
 1 file changed, 24 insertions(+), 3 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 403b781..d656cd6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1396,6 +1396,9 @@ event_sched_out(struct perf_event *event,
 	if (event->state != PERF_EVENT_STATE_ACTIVE)
 		return;
 
+	if (event->pmu != ctx->pmu)
+		perf_pmu_disable(event->pmu);
+
 	event->state = PERF_EVENT_STATE_INACTIVE;
 	if (event->pending_disable) {
 		event->pending_disable = 0;
@@ -1412,6 +1415,9 @@ event_sched_out(struct perf_event *event,
 		ctx->nr_freq--;
 	if (event->attr.exclusive || !cpuctx->active_oncpu)
 		cpuctx->exclusive = 0;
+
+	if (event->pmu != ctx->pmu)
+		perf_pmu_enable(event->pmu);
 }
 
 static void
@@ -1652,6 +1658,7 @@ event_sched_in(struct perf_event *event,
 		 struct perf_event_context *ctx)
 {
 	u64 tstamp = perf_event_time(event);
+	int ret = 0;
 
 	if (event->state <= PERF_EVENT_STATE_OFF)
 		return 0;
@@ -1674,10 +1681,14 @@ event_sched_in(struct perf_event *event,
 	 */
 	smp_wmb();
 
+	if (event->pmu != ctx->pmu)
+		perf_pmu_disable(event->pmu);
+
 	if (event->pmu->add(event, PERF_EF_START)) {
 		event->state = PERF_EVENT_STATE_INACTIVE;
 		event->oncpu = -1;
-		return -EAGAIN;
+		ret = -EAGAIN;
+		goto out;
 	}
 
 	event->tstamp_running += tstamp - event->tstamp_stopped;
@@ -1693,7 +1704,11 @@ event_sched_in(struct perf_event *event,
 	if (event->attr.exclusive)
 		cpuctx->exclusive = 1;
 
-	return 0;
+out:
+	if (event->pmu != ctx->pmu)
+		perf_pmu_enable(event->pmu);
+
+	return ret;
 }
 
 static int
@@ -2743,6 +2758,9 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx,
 		if (!event_filter_match(event))
 			continue;
 
+		if (ctx->pmu != event->pmu)
+			perf_pmu_disable(event->pmu);
+
 		hwc = &event->hw;
 
 		if (hwc->interrupts == MAX_INTERRUPTS) {
@@ -2752,7 +2770,7 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx,
 		}
 
 		if (!event->attr.freq || !event->attr.sample_freq)
-			continue;
+			goto next;
 
 		/*
 		 * stop the event and update event->count
@@ -2774,6 +2792,9 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx,
 			perf_adjust_period(event, period, delta, false);
 
 		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
+	next:
+		if (ctx->pmu != event->pmu)
+			perf_pmu_enable(event->pmu);
 	}
 
 	perf_pmu_enable(ctx->pmu);
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 02/71] x86: Add Intel Processor Trace (INTEL_PT) cpu feature detection
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 01/71] perf: Disable all pmus on unthrottling and rescheduling Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 03/71] perf: Abstract ring_buffer backing store operations Alexander Shishkin
                   ` (70 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Alexander Shishkin

Intel Processor Trace is an architecture extension that allows for program
flow tracing.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/include/asm/cpufeature.h | 1 +
 arch/x86/kernel/cpu/scattered.c   | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/x86/include/asm/cpufeature.h b/arch/x86/include/asm/cpufeature.h
index 89270b4..cb9864f 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -186,6 +186,7 @@
 #define X86_FEATURE_DTHERM	(7*32+ 7) /* Digital Thermal Sensor */
 #define X86_FEATURE_HW_PSTATE	(7*32+ 8) /* AMD HW-PState */
 #define X86_FEATURE_PROC_FEEDBACK (7*32+ 9) /* AMD ProcFeedbackInterface */
+#define X86_FEATURE_INTEL_PT	(7*32+10) /* Intel Processor Trace */
 
 /* Virtualization flags: Linux defined, word 8 */
 #define X86_FEATURE_TPR_SHADOW  (8*32+ 0) /* Intel TPR Shadow */
diff --git a/arch/x86/kernel/cpu/scattered.c b/arch/x86/kernel/cpu/scattered.c
index b6f794a..726e6a3 100644
--- a/arch/x86/kernel/cpu/scattered.c
+++ b/arch/x86/kernel/cpu/scattered.c
@@ -36,6 +36,7 @@ void init_scattered_cpuid_features(struct cpuinfo_x86 *c)
 		{ X86_FEATURE_ARAT,		CR_EAX, 2, 0x00000006, 0 },
 		{ X86_FEATURE_PLN,		CR_EAX, 4, 0x00000006, 0 },
 		{ X86_FEATURE_PTS,		CR_EAX, 6, 0x00000006, 0 },
+		{ X86_FEATURE_INTEL_PT,		CR_EBX,25, 0x00000007, 0 },
 		{ X86_FEATURE_APERFMPERF,	CR_ECX, 0, 0x00000006, 0 },
 		{ X86_FEATURE_EPB,		CR_ECX, 3, 0x00000006, 0 },
 		{ X86_FEATURE_XSAVEOPT,		CR_EAX,	0, 0x0000000d, 1 },
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 03/71] perf: Abstract ring_buffer backing store operations
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 01/71] perf: Disable all pmus on unthrottling and rescheduling Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 02/71] x86: Add Intel Processor Trace (INTEL_PT) cpu feature detection Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units Alexander Shishkin
                   ` (69 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Alexander Shishkin

This patch extends perf's ring_buffer code so that buffers with different
backing can be allocated simultaneously with rb_alloc(). This allows the reuse
of ring_buffer code for exporting hardware-written trace buffers (such as
those of Intel PT) to userspace.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 kernel/events/core.c        |   2 +-
 kernel/events/internal.h    |  14 +++-
 kernel/events/ring_buffer.c | 174 +++++++++++++++++++++++++++-----------------
 3 files changed, 122 insertions(+), 68 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index d656cd6..7c3faf1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -4098,7 +4098,7 @@ again:
 
 	rb = rb_alloc(nr_pages, 
 		event->attr.watermark ? event->attr.wakeup_watermark : 0,
-		event->cpu, flags);
+		event->cpu, flags, NULL);
 
 	if (!rb) {
 		ret = -ENOMEM;
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 569b2187..8835f00 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -6,6 +6,16 @@
 
 /* Buffer handling */
 
+struct ring_buffer;
+
+struct ring_buffer_ops {
+	unsigned long	(*get_size)(int);
+	int		(*alloc_user_page)(struct ring_buffer *, int, int);
+	int		(*alloc_data_page)(struct ring_buffer *, int, int, int);
+	void		(*free_buffer)(struct ring_buffer *);
+	struct page	*(*mmap_to_page)(struct ring_buffer *, unsigned long);
+};
+
 #define RING_BUFFER_WRITABLE		0x01
 
 struct ring_buffer {
@@ -15,6 +25,7 @@ struct ring_buffer {
 	struct work_struct		work;
 	int				page_order;	/* allocation order  */
 #endif
+	struct ring_buffer_ops		*ops;
 	int				nr_pages;	/* nr of data pages  */
 	int				overwrite;	/* can overwrite itself */
 
@@ -41,7 +52,8 @@ struct ring_buffer {
 
 extern void rb_free(struct ring_buffer *rb);
 extern struct ring_buffer *
-rb_alloc(int nr_pages, long watermark, int cpu, int flags);
+rb_alloc(int nr_pages, long watermark, int cpu, int flags,
+	 struct ring_buffer_ops *rb_ops);
 extern void perf_event_wakeup(struct perf_event *event);
 
 extern void
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index e8b168a..d7ec426 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -238,18 +238,6 @@ ring_buffer_init(struct ring_buffer *rb, long watermark, int flags)
  * Back perf_mmap() with regular GFP_KERNEL-0 pages.
  */
 
-struct page *
-perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
-{
-	if (pgoff > rb->nr_pages)
-		return NULL;
-
-	if (pgoff == 0)
-		return virt_to_page(rb->user_page);
-
-	return virt_to_page(rb->data_pages[pgoff - 1]);
-}
-
 static void *perf_mmap_alloc_page(int cpu)
 {
 	struct page *page;
@@ -263,46 +251,31 @@ static void *perf_mmap_alloc_page(int cpu)
 	return page_address(page);
 }
 
-struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags)
+static int perf_mmap_alloc_user_page(struct ring_buffer *rb, int cpu,
+				     int flags)
 {
-	struct ring_buffer *rb;
-	unsigned long size;
-	int i;
-
-	size = sizeof(struct ring_buffer);
-	size += nr_pages * sizeof(void *);
-
-	rb = kzalloc(size, GFP_KERNEL);
-	if (!rb)
-		goto fail;
-
 	rb->user_page = perf_mmap_alloc_page(cpu);
 	if (!rb->user_page)
-		goto fail_user_page;
-
-	for (i = 0; i < nr_pages; i++) {
-		rb->data_pages[i] = perf_mmap_alloc_page(cpu);
-		if (!rb->data_pages[i])
-			goto fail_data_pages;
-	}
+		return -ENOMEM;
 
-	rb->nr_pages = nr_pages;
-
-	ring_buffer_init(rb, watermark, flags);
+	return 0;
+}
 
-	return rb;
+static int perf_mmap_alloc_data_page(struct ring_buffer *rb, int cpu,
+				     int nr_pages, int flags)
+{
+	void *data;
 
-fail_data_pages:
-	for (i--; i >= 0; i--)
-		free_page((unsigned long)rb->data_pages[i]);
+	if (nr_pages != 1)
+		return -EINVAL;
 
-	free_page((unsigned long)rb->user_page);
+	data = perf_mmap_alloc_page(cpu);
+	if (!data)
+		return -ENOMEM;
 
-fail_user_page:
-	kfree(rb);
+	rb->data_pages[rb->nr_pages] = data;
 
-fail:
-	return NULL;
+	return 0;
 }
 
 static void perf_mmap_free_page(unsigned long addr)
@@ -313,24 +286,51 @@ static void perf_mmap_free_page(unsigned long addr)
 	__free_page(page);
 }
 
-void rb_free(struct ring_buffer *rb)
+static void perf_mmap_buddy_free(struct ring_buffer *rb)
 {
 	int i;
 
-	perf_mmap_free_page((unsigned long)rb->user_page);
+	if (rb->user_page)
+		perf_mmap_free_page((unsigned long)rb->user_page);
 	for (i = 0; i < rb->nr_pages; i++)
 		perf_mmap_free_page((unsigned long)rb->data_pages[i]);
 	kfree(rb);
 }
 
+struct page *
+perf_mmap_buddy_to_page(struct ring_buffer *rb, unsigned long pgoff)
+{
+	if (pgoff > rb->nr_pages)
+		return NULL;
+
+	if (pgoff == 0)
+		return virt_to_page(rb->user_page);
+
+	return virt_to_page(rb->data_pages[pgoff - 1]);
+}
+
+static unsigned long perf_mmap_buddy_get_size(int nr_pages)
+{
+	return sizeof(struct ring_buffer) + sizeof(void *) * nr_pages;
+}
+
+struct ring_buffer_ops perf_rb_ops = {
+	.get_size		= perf_mmap_buddy_get_size,
+	.alloc_user_page	= perf_mmap_alloc_user_page,
+	.alloc_data_page	= perf_mmap_alloc_data_page,
+	.free_buffer		= perf_mmap_buddy_free,
+	.mmap_to_page		= perf_mmap_buddy_to_page,
+};
+
 #else
+
 static int data_page_nr(struct ring_buffer *rb)
 {
 	return rb->nr_pages << page_order(rb);
 }
 
 struct page *
-perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
+perf_mmap_vmalloc_to_page(struct ring_buffer *rb, unsigned long pgoff)
 {
 	/* The '>' counts in the user page. */
 	if (pgoff > data_page_nr(rb))
@@ -339,14 +339,14 @@ perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
 	return vmalloc_to_page((void *)rb->user_page + pgoff * PAGE_SIZE);
 }
 
-static void perf_mmap_unmark_page(void *addr)
+static void perf_mmap_vmalloc_unmark_page(void *addr)
 {
 	struct page *page = vmalloc_to_page(addr);
 
 	page->mapping = NULL;
 }
 
-static void rb_free_work(struct work_struct *work)
+static void perf_mmap_vmalloc_free_work(struct work_struct *work)
 {
 	struct ring_buffer *rb;
 	void *base;
@@ -358,50 +358,92 @@ static void rb_free_work(struct work_struct *work)
 	base = rb->user_page;
 	/* The '<=' counts in the user page. */
 	for (i = 0; i <= nr; i++)
-		perf_mmap_unmark_page(base + (i * PAGE_SIZE));
+		perf_mmap_vmalloc_unmark_page(base + (i * PAGE_SIZE));
 
 	vfree(base);
 	kfree(rb);
 }
 
-void rb_free(struct ring_buffer *rb)
+static void perf_mmap_vmalloc_free(struct ring_buffer *rb)
 {
 	schedule_work(&rb->work);
 }
 
-struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags)
+static int perf_mmap_vmalloc_data_pages(struct ring_buffer *rb, int cpu,
+					int nr_pages, int flags)
 {
-	struct ring_buffer *rb;
-	unsigned long size;
 	void *all_buf;
 
-	size = sizeof(struct ring_buffer);
-	size += sizeof(void *);
-
-	rb = kzalloc(size, GFP_KERNEL);
-	if (!rb)
-		goto fail;
-
-	INIT_WORK(&rb->work, rb_free_work);
+	INIT_WORK(&rb->work, perf_mmap_vmalloc_free_work);
 
 	all_buf = vmalloc_user((nr_pages + 1) * PAGE_SIZE);
 	if (!all_buf)
-		goto fail_all_buf;
+		return -ENOMEM;
 
 	rb->user_page = all_buf;
 	rb->data_pages[0] = all_buf + PAGE_SIZE;
 	rb->page_order = ilog2(nr_pages);
 	rb->nr_pages = !!nr_pages;
 
+	return 0;
+}
+
+static unsigned long perf_mmap_vmalloc_get_size(int nr_pages)
+{
+	return sizeof(struct ring_buffer) + sizeof(void *);
+}
+
+struct ring_buffer_ops perf_rb_ops = {
+	.get_size		= perf_mmap_vmalloc_get_size,
+	.alloc_data_page	= perf_mmap_vmalloc_data_pages,
+	.free_buffer		= perf_mmap_vmalloc_free,
+	.mmap_to_page		= perf_mmap_vmalloc_to_page,
+};
+
+#endif
+
+struct ring_buffer *rb_alloc(int nr_pages, long watermark, int cpu, int flags,
+			     struct ring_buffer_ops *rb_ops)
+{
+	struct ring_buffer *rb;
+	int i;
+
+	if (!rb_ops)
+		rb_ops = &perf_rb_ops;
+
+	rb = kzalloc(rb_ops->get_size(nr_pages), GFP_KERNEL);
+	if (!rb)
+		return NULL;
+
+	rb->ops = rb_ops;
+	if (rb->ops->alloc_user_page) {
+		if (rb->ops->alloc_user_page(rb, cpu, flags))
+			goto fail;
+
+		for (i = 0; i < nr_pages; i++, rb->nr_pages++)
+			if (rb->ops->alloc_data_page(rb, cpu, 1, flags))
+				goto fail;
+	} else {
+		if (rb->ops->alloc_data_page(rb, cpu, nr_pages, flags))
+			goto fail;
+	}
+
 	ring_buffer_init(rb, watermark, flags);
 
 	return rb;
 
-fail_all_buf:
-	kfree(rb);
-
 fail:
+	rb->ops->free_buffer(rb);
 	return NULL;
 }
 
-#endif
+void rb_free(struct ring_buffer *rb)
+{
+	rb->ops->free_buffer(rb);
+}
+
+struct page *
+perf_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
+{
+	return rb->ops->mmap_to_page(rb, pgoff);
+}
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (2 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 03/71] perf: Abstract ring_buffer backing store operations Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-17 16:11   ` Peter Zijlstra
  2013-12-11 12:36 ` [PATCH v0 05/71] x86: perf: Intel PT PMU driver Alexander Shishkin
                   ` (68 subsequent siblings)
  72 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Alexander Shishkin

Instruction tracing PMUs are capable of recording a log of instruction
execution flow on a cpu core, which can be useful for profiling and crash
analysis. This patch adds itrace infrastructure for perf events and the
rest of the kernel to use.

Since such PMUs can produce copious amounts of trace data, it may be
impractical to process it inside the kernel in real time, but instead export
raw trace streams to userspace for subsequent analysis. Thus, itrace PMUs
may export their trace buffers, which can be mmap()ed to userspace from a
perf event fd with a PERF_EVENT_ITRACE_OFFSET offset. To that end, perf
is extended to work with multiple ring buffers per event, reusing the
ring_buffer code in an attempt to reduce complexity.

Also, trace data from such PMUs can be used to annotate other perf events
by including it in sample records when PERF_SAMPLE_ITRACE flag is set. In
this case, a PT kernel counter is created for each such event and trace data
is retrieved from it and stored in the perf data stream.

Finally, such per thread trace data can be included in process core dumps,
which is controlled via a new rlimit parameter RLIMIT_ITRACE. This again
is done by a per-thread kernel counter that is created when this RLIMIT_ITRACE
is set.

This infrastructure should also be useful for ARM ETM/PTM and other program
flow tracing units that can potentially generate a lot of trace data very
fast.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 fs/binfmt_elf.c                     |   6 +
 fs/proc/base.c                      |   1 +
 include/asm-generic/resource.h      |   1 +
 include/linux/itrace.h              | 147 +++++++++
 include/linux/perf_event.h          |  33 +-
 include/uapi/asm-generic/resource.h |   3 +-
 include/uapi/linux/elf.h            |   1 +
 include/uapi/linux/perf_event.h     |  25 +-
 kernel/events/Makefile              |   2 +-
 kernel/events/core.c                | 299 ++++++++++++------
 kernel/events/internal.h            |   7 +
 kernel/events/itrace.c              | 589 ++++++++++++++++++++++++++++++++++++
 kernel/events/ring_buffer.c         |   2 +-
 kernel/exit.c                       |   3 +
 kernel/sys.c                        |   5 +
 15 files changed, 1020 insertions(+), 104 deletions(-)
 create mode 100644 include/linux/itrace.h
 create mode 100644 kernel/events/itrace.c

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 571a423..c7fcd49 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -34,6 +34,7 @@
 #include <linux/utsname.h>
 #include <linux/coredump.h>
 #include <linux/sched.h>
+#include <linux/itrace.h>
 #include <asm/uaccess.h>
 #include <asm/param.h>
 #include <asm/page.h>
@@ -1576,6 +1577,8 @@ static int fill_thread_core_info(struct elf_thread_core_info *t,
 		}
 	}
 
+	*total += itrace_elf_note_size(t->task);
+
 	return 1;
 }
 
@@ -1608,6 +1611,7 @@ static int fill_note_info(struct elfhdr *elf, int phdrs,
 	for (i = 0; i < view->n; ++i)
 		if (view->regsets[i].core_note_type != 0)
 			++info->thread_notes;
+	info->thread_notes++; /* ITRACE */
 
 	/*
 	 * Sanity check.  We rely on regset 0 being in NT_PRSTATUS,
@@ -1710,6 +1714,8 @@ static int write_note_info(struct elf_note_info *info,
 			    !writenote(&t->notes[i], cprm))
 				return 0;
 
+		itrace_elf_note_write(cprm, t->task);
+
 		first = 0;
 		t = t->next;
 	} while (t);
diff --git a/fs/proc/base.c b/fs/proc/base.c
index 1485e38..41785ec 100644
--- a/fs/proc/base.c
+++ b/fs/proc/base.c
@@ -471,6 +471,7 @@ static const struct limit_names lnames[RLIM_NLIMITS] = {
 	[RLIMIT_NICE] = {"Max nice priority", NULL},
 	[RLIMIT_RTPRIO] = {"Max realtime priority", NULL},
 	[RLIMIT_RTTIME] = {"Max realtime timeout", "us"},
+	[RLIMIT_ITRACE] = {"Max ITRACE buffer size", "bytes"},
 };
 
 /* Display limits for a process */
diff --git a/include/asm-generic/resource.h b/include/asm-generic/resource.h
index b4ea8f5..e6e5657 100644
--- a/include/asm-generic/resource.h
+++ b/include/asm-generic/resource.h
@@ -25,6 +25,7 @@
 	[RLIMIT_NICE]		= { 0, 0 },				\
 	[RLIMIT_RTPRIO]		= { 0, 0 },				\
 	[RLIMIT_RTTIME]		= {  RLIM_INFINITY,  RLIM_INFINITY },	\
+	[RLIMIT_ITRACE]		= {              0,  RLIM_INFINITY },	\
 }
 
 #endif
diff --git a/include/linux/itrace.h b/include/linux/itrace.h
new file mode 100644
index 0000000..c4175b3
--- /dev/null
+++ b/include/linux/itrace.h
@@ -0,0 +1,147 @@
+/*
+ * Instruction flow trace unit infrastructure
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#ifndef _LINUX_ITRACE_H
+#define _LINUX_ITRACE_H
+
+#include <linux/perf_event.h>
+#include <linux/coredump.h>
+
+extern struct ring_buffer_ops itrace_rb_ops;
+
+#define PERF_EVENT_ITRACE_PGOFF (PERF_EVENT_ITRACE_OFFSET >> PAGE_SHIFT)
+
+static inline bool is_itrace_vma(struct vm_area_struct *vma)
+{
+	return vma->vm_pgoff == PERF_EVENT_ITRACE_PGOFF;
+}
+
+void *itrace_priv(struct perf_event *event);
+
+void *itrace_event_get_priv(struct perf_event *event);
+void itrace_event_put(struct perf_event *event);
+
+struct itrace_pmu {
+	struct pmu		pmu;
+	/*
+	 * Allocate/free ring_buffer backing store
+	 */
+	void			*(*alloc_buffer)(int cpu, int nr_pages, bool overwrite,
+						 void **pages,
+						 struct perf_event_mmap_page **user_page);
+	void			(*free_buffer)(void *buffer);
+
+	int			(*event_init)(struct perf_event *event);
+
+	/*
+	 * Calculate the size of a sample to be written out
+	 */
+	unsigned long		(*sample_trace)(struct perf_event *event,
+						struct perf_sample_data *data);
+
+	/*
+	 * Write out a trace sample to the given output handle
+	 */
+	void			(*sample_output)(struct perf_event *event,
+						 struct perf_output_handle *handle,
+						 struct perf_sample_data *data);
+
+	/*
+	 * Get the PMU-specific part of a core dump note
+	 */
+	size_t			(*core_size)(struct perf_event *event);
+
+	/*
+	 * Write out the core dump note
+	 */
+	void			(*core_output)(struct coredump_params *cprm,
+					       struct perf_event *event,
+					       unsigned long len);
+	char			*name;
+};
+
+#define to_itrace_pmu(x) container_of((x), struct itrace_pmu, pmu)
+
+#ifdef CONFIG_PERF_EVENTS
+
+extern void itrace_lost_data(struct perf_event *event, u64 offset);
+extern int itrace_pmu_register(struct itrace_pmu *ipmu);
+
+extern int itrace_event_installable(struct perf_event *event,
+				    struct perf_event_context *ctx);
+
+extern void itrace_wake_up(struct perf_event *event);
+
+extern bool is_itrace_event(struct perf_event *event);
+
+extern int itrace_sampler_init(struct perf_event *event,
+			       struct task_struct *task);
+extern void itrace_sampler_fini(struct perf_event *event);
+extern unsigned long itrace_sampler_trace(struct perf_event *event,
+					  struct perf_sample_data *data);
+extern void itrace_sampler_output(struct perf_event *event,
+				  struct perf_output_handle *handle,
+				  struct perf_sample_data *data);
+
+extern int update_itrace_rlimit(struct task_struct *, unsigned long);
+extern void exit_itrace(struct task_struct *);
+
+struct itrace_note {
+	u64	itrace_config;
+};
+
+extern size_t itrace_elf_note_size(struct task_struct *tsk);
+extern void itrace_elf_note_write(struct coredump_params *cprm,
+				  struct task_struct *task);
+#else
+static inline void
+itrace_lost_data(struct perf_event *event, u64 offset)		{}
+static inline int itrace_pmu_register(struct itrace_pmu *ipmu)	{ return -EINVAL; }
+
+static inline int
+itrace_event_installable(struct perf_event *event,
+			 struct perf_event_context *ctx)	{ return -EINVAL; }
+static inline void itrace_wake_up(struct perf_event *event)	{}
+static inline bool is_itrace_event(struct perf_event *event)	{ return false; }
+
+static inline int itrace_sampler_init(struct perf_event *event,
+				      struct task_struct *task)	{}
+static inline void
+itrace_sampler_fini(struct perf_event *event)			{}
+static inline unsigned long
+itrace_sampler_trace(struct perf_event *event,
+		     struct perf_sample_data *data)		{ return 0; }
+static inline void
+itrace_sampler_output(struct perf_event *event,
+		      struct perf_output_handle *handle,
+		      struct perf_sample_data *data)		{}
+
+static inline int
+update_itrace_rlimit(struct task_struct *, unsigned long)	{ return -EINVAL; }
+static inline void exit_itrace(struct task_struct *)		{}
+
+static inline size_t
+itrace_elf_note_size(struct task_struct *tsk)			{ return 0; }
+static inline void
+itrace_elf_note_write(struct coredump_params *cprm,
+		      struct task_struct *task)			{}
+
+#endif
+
+#endif /* _LINUX_PERF_EVENT_H */
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8f4a70f..b27cfc7 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -83,6 +83,12 @@ struct perf_regs_user {
 	struct pt_regs	*regs;
 };
 
+struct perf_trace_record {
+	u64		size;
+	unsigned long	from;
+	unsigned long	to;
+};
+
 struct task_struct;
 
 /*
@@ -97,6 +103,14 @@ struct hw_perf_event_extra {
 
 struct event_constraint;
 
+enum perf_itrace_counter_type {
+	PERF_ITRACE_USER	= BIT(1),
+	PERF_ITRACE_SAMPLING	= BIT(2),
+	PERF_ITRACE_COREDUMP	= BIT(3),
+	PERF_ITRACE_KERNEL	= (PERF_ITRACE_SAMPLING | PERF_ITRACE_COREDUMP),
+	PERF_ITRACE_ANY		= (PERF_ITRACE_KERNEL | PERF_ITRACE_USER),
+};
+
 /**
  * struct hw_perf_event - performance event hardware details:
  */
@@ -126,6 +140,10 @@ struct hw_perf_event {
 			/* for tp_event->class */
 			struct list_head	tp_list;
 		};
+		struct { /* itrace */
+			struct task_struct	*itrace_target;
+			unsigned int		counter_type;
+		};
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 		struct { /* breakpoint */
 			/*
@@ -289,6 +307,12 @@ struct swevent_hlist {
 struct perf_cgroup;
 struct ring_buffer;
 
+enum perf_event_rb {
+	PERF_RB_MAIN = 0,
+	PERF_RB_ITRACE,
+	PERF_NR_RB,
+};
+
 /**
  * struct perf_event - performance event kernel representation:
  */
@@ -400,10 +424,10 @@ struct perf_event {
 
 	/* mmap bits */
 	struct mutex			mmap_mutex;
-	atomic_t			mmap_count;
+	atomic_t			mmap_count[PERF_NR_RB];
 
-	struct ring_buffer		*rb;
-	struct list_head		rb_entry;
+	struct ring_buffer		*rb[PERF_NR_RB];
+	struct list_head		rb_entry[PERF_NR_RB];
 
 	/* poll related */
 	wait_queue_head_t		waitq;
@@ -426,6 +450,7 @@ struct perf_event {
 	perf_overflow_handler_t		overflow_handler;
 	void				*overflow_handler_context;
 
+	struct perf_event		*trace_event;
 #ifdef CONFIG_EVENT_TRACING
 	struct ftrace_event_call	*tp_event;
 	struct event_filter		*filter;
@@ -583,6 +608,7 @@ struct perf_sample_data {
 	union  perf_mem_data_src	data_src;
 	struct perf_callchain_entry	*callchain;
 	struct perf_raw_record		*raw;
+	struct perf_trace_record	trace;
 	struct perf_branch_stack	*br_stack;
 	struct perf_regs_user		regs_user;
 	u64				stack_user_size;
@@ -603,6 +629,7 @@ static inline void perf_sample_data_init(struct perf_sample_data *data,
 	data->period = period;
 	data->regs_user.abi = PERF_SAMPLE_REGS_ABI_NONE;
 	data->regs_user.regs = NULL;
+	data->trace.from = data->trace.to = data->trace.size = 0;
 	data->stack_user_size = 0;
 	data->weight = 0;
 	data->data_src.val = 0;
diff --git a/include/uapi/asm-generic/resource.h b/include/uapi/asm-generic/resource.h
index f863428..073f413 100644
--- a/include/uapi/asm-generic/resource.h
+++ b/include/uapi/asm-generic/resource.h
@@ -45,7 +45,8 @@
 					   0-39 for nice level 19 .. -20 */
 #define RLIMIT_RTPRIO		14	/* maximum realtime priority */
 #define RLIMIT_RTTIME		15	/* timeout for RT tasks in us */
-#define RLIM_NLIMITS		16
+#define RLIMIT_ITRACE		16	/* max itrace size */
+#define RLIM_NLIMITS		17
 
 /*
  * SuS says limits have to be unsigned.
diff --git a/include/uapi/linux/elf.h b/include/uapi/linux/elf.h
index ef6103b..4bfbf66 100644
--- a/include/uapi/linux/elf.h
+++ b/include/uapi/linux/elf.h
@@ -369,6 +369,7 @@ typedef struct elf64_shdr {
 #define NT_PRPSINFO	3
 #define NT_TASKSTRUCT	4
 #define NT_AUXV		6
+#define NT_ITRACE	7
 /*
  * Note to userspace developers: size of NT_SIGINFO note may increase
  * in the future to accomodate more fields, don't assume it is fixed!
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index e1802d6..9e3a890 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -137,8 +137,9 @@ enum perf_event_sample_format {
 	PERF_SAMPLE_DATA_SRC			= 1U << 15,
 	PERF_SAMPLE_IDENTIFIER			= 1U << 16,
 	PERF_SAMPLE_TRANSACTION			= 1U << 17,
+	PERF_SAMPLE_ITRACE			= 1U << 18,
 
-	PERF_SAMPLE_MAX = 1U << 18,		/* non-ABI */
+	PERF_SAMPLE_MAX = 1U << 19,		/* non-ABI */
 };
 
 /*
@@ -237,6 +238,10 @@ enum perf_event_read_format {
 #define PERF_ATTR_SIZE_VER2	80	/* add: branch_sample_type */
 #define PERF_ATTR_SIZE_VER3	96	/* add: sample_regs_user */
 					/* add: sample_stack_user */
+#define PERF_ATTR_SIZE_VER4	120	/* add: itrace_config */
+					/* add: itrace_watermark */
+					/* add: itrace_sample_type */
+					/* add: itrace_sample_size */
 
 /*
  * Hardware event_id to monitor via a performance monitoring event:
@@ -333,6 +338,11 @@ struct perf_event_attr {
 
 	/* Align to u64. */
 	__u32	__reserved_2;
+
+	__u64	itrace_config;
+	__u32	itrace_watermark;	/* wakeup every n pages */
+	__u32	itrace_sample_type;	/* pmu->type of the itrace PMU */
+	__u64	itrace_sample_size;
 };
 
 #define perf_flags(attr)	(*(&(attr)->read_format + 1))
@@ -679,6 +689,8 @@ enum perf_event_type {
 	 *
 	 *	{ u64			weight;   } && PERF_SAMPLE_WEIGHT
 	 *	{ u64			data_src; } && PERF_SAMPLE_DATA_SRC
+	 *	{ u64			size;
+	 *	  char			data[size]; } && PERF_SAMPLE_ITRACE
 	 * };
 	 */
 	PERF_RECORD_SAMPLE			= 9,
@@ -704,9 +716,20 @@ enum perf_event_type {
 	 */
 	PERF_RECORD_MMAP2			= 10,
 
+	/*
+	 * struct {
+	 *   u64 offset;
+	 * }
+	 */
+	PERF_RECORD_ITRACE_LOST			= 11,
+
 	PERF_RECORD_MAX,			/* non-ABI */
 };
 
+/* Architecture-specific data */
+
+#define PERF_EVENT_ITRACE_OFFSET	0x40000000
+
 #define PERF_MAX_STACK_DEPTH		127
 
 enum perf_callchain_context {
diff --git a/kernel/events/Makefile b/kernel/events/Makefile
index 103f5d1..46a3770 100644
--- a/kernel/events/Makefile
+++ b/kernel/events/Makefile
@@ -2,7 +2,7 @@ ifdef CONFIG_FUNCTION_TRACER
 CFLAGS_REMOVE_core.o = -pg
 endif
 
-obj-y := core.o ring_buffer.o callchain.o
+obj-y := core.o ring_buffer.o callchain.o itrace.o
 
 obj-$(CONFIG_HAVE_HW_BREAKPOINT) += hw_breakpoint.o
 obj-$(CONFIG_UPROBES) += uprobes.o
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 7c3faf1..ca8a130 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -39,6 +39,7 @@
 #include <linux/hw_breakpoint.h>
 #include <linux/mm_types.h>
 #include <linux/cgroup.h>
+#include <linux/itrace.h>
 
 #include "internal.h"
 
@@ -1575,6 +1576,9 @@ void perf_event_disable(struct perf_event *event)
 	struct perf_event_context *ctx = event->ctx;
 	struct task_struct *task = ctx->task;
 
+	if (event->trace_event)
+		perf_event_disable(event->trace_event);
+
 	if (!task) {
 		/*
 		 * Disable the event on the cpu that it's on
@@ -2071,6 +2075,8 @@ void perf_event_enable(struct perf_event *event)
 	struct perf_event_context *ctx = event->ctx;
 	struct task_struct *task = ctx->task;
 
+	if (event->trace_event)
+		perf_event_enable(event->trace_event);
 	if (!task) {
 		/*
 		 * Enable the event on the cpu that it's on
@@ -3180,9 +3186,6 @@ static void free_event_rcu(struct rcu_head *head)
 	kfree(event);
 }
 
-static void ring_buffer_put(struct ring_buffer *rb);
-static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb);
-
 static void unaccount_event_cpu(struct perf_event *event, int cpu)
 {
 	if (event->parent)
@@ -3215,6 +3218,8 @@ static void unaccount_event(struct perf_event *event)
 		static_key_slow_dec_deferred(&perf_sched_events);
 	if (has_branch_stack(event))
 		static_key_slow_dec_deferred(&perf_sched_events);
+	if ((event->attr.sample_type & PERF_SAMPLE_ITRACE) && event->trace_event)
+		itrace_sampler_fini(event);
 
 	unaccount_event_cpu(event, event->cpu);
 }
@@ -3236,28 +3241,31 @@ static void __free_event(struct perf_event *event)
 }
 static void free_event(struct perf_event *event)
 {
+	int rbx;
+
 	irq_work_sync(&event->pending);
 
 	unaccount_event(event);
 
-	if (event->rb) {
-		struct ring_buffer *rb;
+	for (rbx = PERF_RB_MAIN; rbx < PERF_NR_RB; rbx++)
+		if (event->rb[rbx]) {
+			struct ring_buffer *rb;
 
-		/*
-		 * Can happen when we close an event with re-directed output.
-		 *
-		 * Since we have a 0 refcount, perf_mmap_close() will skip
-		 * over us; possibly making our ring_buffer_put() the last.
-		 */
-		mutex_lock(&event->mmap_mutex);
-		rb = event->rb;
-		if (rb) {
-			rcu_assign_pointer(event->rb, NULL);
-			ring_buffer_detach(event, rb);
-			ring_buffer_put(rb); /* could be last */
+			/*
+			 * Can happen when we close an event with re-directed output.
+			 *
+			 * Since we have a 0 refcount, perf_mmap_close() will skip
+			 * over us; possibly making our ring_buffer_put() the last.
+			 */
+			mutex_lock(&event->mmap_mutex);
+			rb = event->rb[rbx];
+			if (rb) {
+				rcu_assign_pointer(event->rb[rbx], NULL);
+				ring_buffer_detach(event, rb);
+				ring_buffer_put(rb); /* could be last */
+			}
+			mutex_unlock(&event->mmap_mutex);
 		}
-		mutex_unlock(&event->mmap_mutex);
-	}
 
 	if (is_cgroup_event(event))
 		perf_detach_cgroup(event);
@@ -3486,21 +3494,24 @@ static unsigned int perf_poll(struct file *file, poll_table *wait)
 {
 	struct perf_event *event = file->private_data;
 	struct ring_buffer *rb;
-	unsigned int events = POLL_HUP;
+	unsigned int events = 0;
+	int i;
 
 	/*
 	 * Pin the event->rb by taking event->mmap_mutex; otherwise
 	 * perf_event_set_output() can swizzle our rb and make us miss wakeups.
 	 */
 	mutex_lock(&event->mmap_mutex);
-	rb = event->rb;
-	if (rb)
-		events = atomic_xchg(&rb->poll, 0);
+	for (i = PERF_RB_MAIN; i < PERF_NR_RB; i++) {
+		rb = event->rb[i];
+		if (rb)
+			events |= atomic_xchg(&rb->poll, 0);
+	}
 	mutex_unlock(&event->mmap_mutex);
 
 	poll_wait(file, &event->waitq, wait);
 
-	return events;
+	return events ? events : POLL_HUP;
 }
 
 static void perf_event_reset(struct perf_event *event)
@@ -3717,7 +3728,7 @@ static void perf_event_init_userpage(struct perf_event *event)
 	struct ring_buffer *rb;
 
 	rcu_read_lock();
-	rb = rcu_dereference(event->rb);
+	rb = rcu_dereference(event->rb[PERF_RB_MAIN]);
 	if (!rb)
 		goto unlock;
 
@@ -3747,7 +3758,7 @@ void perf_event_update_userpage(struct perf_event *event)
 	u64 enabled, running, now;
 
 	rcu_read_lock();
-	rb = rcu_dereference(event->rb);
+	rb = rcu_dereference(event->rb[PERF_RB_MAIN]);
 	if (!rb)
 		goto unlock;
 
@@ -3794,23 +3805,29 @@ static int perf_mmap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
 {
 	struct perf_event *event = vma->vm_file->private_data;
 	struct ring_buffer *rb;
-	int ret = VM_FAULT_SIGBUS;
+	unsigned long pgoff = vmf->pgoff;
+	int ret = VM_FAULT_SIGBUS, rbx = PERF_RB_MAIN;
+
+	if (is_itrace_event(event) && is_itrace_vma(vma)) {
+		rbx = PERF_RB_ITRACE;
+		pgoff -= PERF_EVENT_ITRACE_PGOFF;
+	}
 
 	if (vmf->flags & FAULT_FLAG_MKWRITE) {
-		if (vmf->pgoff == 0)
+		if (pgoff == 0)
 			ret = 0;
 		return ret;
 	}
 
 	rcu_read_lock();
-	rb = rcu_dereference(event->rb);
+	rb = rcu_dereference(event->rb[rbx]);
 	if (!rb)
 		goto unlock;
 
-	if (vmf->pgoff && (vmf->flags & FAULT_FLAG_WRITE))
+	if (pgoff && (vmf->flags & FAULT_FLAG_WRITE))
 		goto unlock;
 
-	vmf->page = perf_mmap_to_page(rb, vmf->pgoff);
+	vmf->page = perf_mmap_to_page(rb, pgoff);
 	if (!vmf->page)
 		goto unlock;
 
@@ -3825,29 +3842,33 @@ unlock:
 	return ret;
 }
 
-static void ring_buffer_attach(struct perf_event *event,
-			       struct ring_buffer *rb)
+void ring_buffer_attach(struct perf_event *event,
+			struct ring_buffer *rb)
 {
+	int rbx = rb->priv ? PERF_RB_ITRACE : PERF_RB_MAIN;
+	struct list_head *head = &event->rb_entry[rbx];
 	unsigned long flags;
 
-	if (!list_empty(&event->rb_entry))
+	if (!list_empty(head))
 		return;
 
 	spin_lock_irqsave(&rb->event_lock, flags);
-	if (list_empty(&event->rb_entry))
-		list_add(&event->rb_entry, &rb->event_list);
+	if (list_empty(head))
+		list_add(head, &rb->event_list);
 	spin_unlock_irqrestore(&rb->event_lock, flags);
 }
 
-static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb)
+void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb)
 {
+	int rbx = rb->priv ? PERF_RB_ITRACE : PERF_RB_MAIN;
+	struct list_head *head = &event->rb_entry[rbx];
 	unsigned long flags;
 
-	if (list_empty(&event->rb_entry))
+	if (list_empty(head))
 		return;
 
 	spin_lock_irqsave(&rb->event_lock, flags);
-	list_del_init(&event->rb_entry);
+	list_del_init(head);
 	wake_up_all(&event->waitq);
 	spin_unlock_irqrestore(&rb->event_lock, flags);
 }
@@ -3855,12 +3876,16 @@ static void ring_buffer_detach(struct perf_event *event, struct ring_buffer *rb)
 static void ring_buffer_wakeup(struct perf_event *event)
 {
 	struct ring_buffer *rb;
+	struct perf_event *iter;
+	int rbx;
 
 	rcu_read_lock();
-	rb = rcu_dereference(event->rb);
-	if (rb) {
-		list_for_each_entry_rcu(event, &rb->event_list, rb_entry)
-			wake_up_all(&event->waitq);
+	for (rbx = PERF_RB_MAIN; rbx < PERF_NR_RB; rbx++) {
+		rb = rcu_dereference(event->rb[rbx]);
+		if (rb) {
+			list_for_each_entry_rcu(iter, &rb->event_list, rb_entry[rbx])
+				wake_up_all(&iter->waitq);
+		}
 	}
 	rcu_read_unlock();
 }
@@ -3873,12 +3898,12 @@ static void rb_free_rcu(struct rcu_head *rcu_head)
 	rb_free(rb);
 }
 
-static struct ring_buffer *ring_buffer_get(struct perf_event *event)
+struct ring_buffer *ring_buffer_get(struct perf_event *event, int rbx)
 {
 	struct ring_buffer *rb;
 
 	rcu_read_lock();
-	rb = rcu_dereference(event->rb);
+	rb = rcu_dereference(event->rb[rbx]);
 	if (rb) {
 		if (!atomic_inc_not_zero(&rb->refcount))
 			rb = NULL;
@@ -3888,7 +3913,7 @@ static struct ring_buffer *ring_buffer_get(struct perf_event *event)
 	return rb;
 }
 
-static void ring_buffer_put(struct ring_buffer *rb)
+void ring_buffer_put(struct ring_buffer *rb)
 {
 	if (!atomic_dec_and_test(&rb->refcount))
 		return;
@@ -3901,9 +3926,10 @@ static void ring_buffer_put(struct ring_buffer *rb)
 static void perf_mmap_open(struct vm_area_struct *vma)
 {
 	struct perf_event *event = vma->vm_file->private_data;
+	int rbx = is_itrace_vma(vma) ? PERF_RB_ITRACE : PERF_RB_MAIN;
 
-	atomic_inc(&event->mmap_count);
-	atomic_inc(&event->rb->mmap_count);
+	atomic_inc(&event->mmap_count[rbx]);
+	atomic_inc(&event->rb[rbx]->mmap_count);
 }
 
 /*
@@ -3917,19 +3943,19 @@ static void perf_mmap_open(struct vm_area_struct *vma)
 static void perf_mmap_close(struct vm_area_struct *vma)
 {
 	struct perf_event *event = vma->vm_file->private_data;
-
-	struct ring_buffer *rb = event->rb;
+	int rbx = is_itrace_vma(vma) ? PERF_RB_ITRACE : PERF_RB_MAIN;
+	struct ring_buffer *rb = event->rb[rbx];
 	struct user_struct *mmap_user = rb->mmap_user;
 	int mmap_locked = rb->mmap_locked;
 	unsigned long size = perf_data_size(rb);
 
 	atomic_dec(&rb->mmap_count);
 
-	if (!atomic_dec_and_mutex_lock(&event->mmap_count, &event->mmap_mutex))
+	if (!atomic_dec_and_mutex_lock(&event->mmap_count[rbx], &event->mmap_mutex))
 		return;
 
 	/* Detach current event from the buffer. */
-	rcu_assign_pointer(event->rb, NULL);
+	rcu_assign_pointer(event->rb[rbx], NULL);
 	ring_buffer_detach(event, rb);
 	mutex_unlock(&event->mmap_mutex);
 
@@ -3946,7 +3972,7 @@ static void perf_mmap_close(struct vm_area_struct *vma)
 	 */
 again:
 	rcu_read_lock();
-	list_for_each_entry_rcu(event, &rb->event_list, rb_entry) {
+	list_for_each_entry_rcu(event, &rb->event_list, rb_entry[rbx]) {
 		if (!atomic_long_inc_not_zero(&event->refcount)) {
 			/*
 			 * This event is en-route to free_event() which will
@@ -3967,8 +3993,8 @@ again:
 		 * still restart the iteration to make sure we're not now
 		 * iterating the wrong list.
 		 */
-		if (event->rb == rb) {
-			rcu_assign_pointer(event->rb, NULL);
+		if (event->rb[rbx] == rb) {
+			rcu_assign_pointer(event->rb[rbx], NULL);
 			ring_buffer_detach(event, rb);
 			ring_buffer_put(rb); /* can't be last, we still have one */
 		}
@@ -4017,6 +4043,7 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 	unsigned long nr_pages;
 	long user_extra, extra;
 	int ret = 0, flags = 0;
+	int rbx = PERF_RB_MAIN;
 
 	/*
 	 * Don't allow mmap() of inherited per-task counters. This would
@@ -4030,31 +4057,39 @@ static int perf_mmap(struct file *file, struct vm_area_struct *vma)
 		return -EINVAL;
 
 	vma_size = vma->vm_end - vma->vm_start;
+
+	if (is_itrace_event(event) && is_itrace_vma(vma))
+		rbx = PERF_RB_ITRACE;
+
 	nr_pages = (vma_size / PAGE_SIZE) - 1;
 
 	/*
 	 * If we have rb pages ensure they're a power-of-two number, so we
 	 * can do bitmasks instead of modulo.
 	 */
-	if (nr_pages != 0 && !is_power_of_2(nr_pages))
-		return -EINVAL;
+	if (!rbx) {
+		if (nr_pages != 0 && !is_power_of_2(nr_pages))
+			return -EINVAL;
+
+		if (vma->vm_pgoff != 0)
+			return -EINVAL;
+	}
 
 	if (vma_size != PAGE_SIZE * (1 + nr_pages))
 		return -EINVAL;
 
-	if (vma->vm_pgoff != 0)
-		return -EINVAL;
 
 	WARN_ON_ONCE(event->ctx->parent_ctx);
 again:
 	mutex_lock(&event->mmap_mutex);
-	if (event->rb) {
-		if (event->rb->nr_pages != nr_pages) {
+	rb = event->rb[rbx];
+	if (rb) {
+		if (rb->nr_pages != nr_pages) {
 			ret = -EINVAL;
 			goto unlock;
 		}
 
-		if (!atomic_inc_not_zero(&event->rb->mmap_count)) {
+		if (!atomic_inc_not_zero(&rb->mmap_count)) {
 			/*
 			 * Raced against perf_mmap_close() through
 			 * perf_event_set_output(). Try again, hope for better
@@ -4091,14 +4126,14 @@ again:
 		goto unlock;
 	}
 
-	WARN_ON(event->rb);
+	WARN_ON(event->rb[rbx]);
 
 	if (vma->vm_flags & VM_WRITE)
 		flags |= RING_BUFFER_WRITABLE;
 
 	rb = rb_alloc(nr_pages, 
 		event->attr.watermark ? event->attr.wakeup_watermark : 0,
-		event->cpu, flags, NULL);
+		event->cpu, flags, rbx ? &itrace_rb_ops : NULL);
 
 	if (!rb) {
 		ret = -ENOMEM;
@@ -4113,14 +4148,14 @@ again:
 	vma->vm_mm->pinned_vm += extra;
 
 	ring_buffer_attach(event, rb);
-	rcu_assign_pointer(event->rb, rb);
+	rcu_assign_pointer(event->rb[rbx], rb);
 
 	perf_event_init_userpage(event);
 	perf_event_update_userpage(event);
 
 unlock:
 	if (!ret)
-		atomic_inc(&event->mmap_count);
+		atomic_inc(&event->mmap_count[rbx]);
 	mutex_unlock(&event->mmap_mutex);
 
 	/*
@@ -4626,6 +4661,13 @@ void perf_output_sample(struct perf_output_handle *handle,
 	if (sample_type & PERF_SAMPLE_TRANSACTION)
 		perf_output_put(handle, data->txn);
 
+	if (sample_type & PERF_SAMPLE_ITRACE) {
+		perf_output_put(handle, data->trace.size);
+
+		if (data->trace.size)
+			itrace_sampler_output(event, handle, data);
+	}
+
 	if (!event->attr.watermark) {
 		int wakeup_events = event->attr.wakeup_events;
 
@@ -4733,6 +4775,14 @@ void perf_prepare_sample(struct perf_event_header *header,
 		data->stack_user_size = stack_size;
 		header->size += size;
 	}
+
+	if (sample_type & PERF_SAMPLE_ITRACE) {
+		u64 size = sizeof(u64);
+
+		size += itrace_sampler_trace(event, data);
+
+		header->size += size;
+	}
 }
 
 static void perf_event_output(struct perf_event *event,
@@ -6652,6 +6702,7 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	struct perf_event *event;
 	struct hw_perf_event *hwc;
 	long err = -EINVAL;
+	int rbx;
 
 	if ((unsigned)cpu >= nr_cpu_ids) {
 		if (!task || cpu != -1)
@@ -6675,7 +6726,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	INIT_LIST_HEAD(&event->group_entry);
 	INIT_LIST_HEAD(&event->event_entry);
 	INIT_LIST_HEAD(&event->sibling_list);
-	INIT_LIST_HEAD(&event->rb_entry);
+	for (rbx = PERF_RB_MAIN; rbx < PERF_NR_RB; rbx++)
+		INIT_LIST_HEAD(&event->rb_entry[rbx]);
 	INIT_LIST_HEAD(&event->active_entry);
 
 	init_waitqueue_head(&event->waitq);
@@ -6702,6 +6754,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 
 		if (attr->type == PERF_TYPE_TRACEPOINT)
 			event->hw.tp_target = task;
+		else if (is_itrace_event(event))
+			event->hw.itrace_target = task;
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 		/*
 		 * hw_breakpoint is a bit difficult here..
@@ -6751,6 +6805,15 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 			if (err)
 				goto err_pmu;
 		}
+
+		if (event->attr.sample_type & PERF_SAMPLE_ITRACE) {
+			err = itrace_sampler_init(event, task);
+			if (err) {
+				/* XXX: either clean up callchain buffers too
+				   or forbid them to go together */
+				goto err_pmu;
+			}
+		}
 	}
 
 	return event;
@@ -6901,8 +6964,7 @@ err_size:
 static int
 perf_event_set_output(struct perf_event *event, struct perf_event *output_event)
 {
-	struct ring_buffer *rb = NULL, *old_rb = NULL;
-	int ret = -EINVAL;
+	int ret = -EINVAL, rbx;
 
 	if (!output_event)
 		goto set;
@@ -6922,42 +6984,60 @@ perf_event_set_output(struct perf_event *event, struct perf_event *output_event)
 	 */
 	if (output_event->cpu == -1 && output_event->ctx != event->ctx)
 		goto out;
+	/*
+	 * XXX^2: that's all bollocks
+	 *   + for sampling events, both get to keep their ->trace_event
+	 *   + for normal itrace events, the rules:
+	 *      * no cross-cpu buffers (as any other event);
+	 *      * both must be itrace events
+	 */
+	if (is_itrace_event(event)) {
+		if (!is_itrace_event(output_event))
+			goto out;
+
+		if (event->attr.type != output_event->attr.type)
+			goto out;
+	}
 
 set:
 	mutex_lock(&event->mmap_mutex);
-	/* Can't redirect output if we've got an active mmap() */
-	if (atomic_read(&event->mmap_count))
-		goto unlock;
 
-	old_rb = event->rb;
+	for (rbx = PERF_RB_MAIN; rbx < PERF_NR_RB; rbx++) {
+		struct ring_buffer *rb = NULL, *old_rb = NULL;
 
-	if (output_event) {
-		/* get the rb we want to redirect to */
-		rb = ring_buffer_get(output_event);
-		if (!rb)
-			goto unlock;
-	}
+		/* Can't redirect output if we've got an active mmap() */
+		if (atomic_read(&event->mmap_count[rbx]))
+			continue;
 
-	if (old_rb)
-		ring_buffer_detach(event, old_rb);
+		old_rb = event->rb[rbx];
 
-	if (rb)
-		ring_buffer_attach(event, rb);
+		if (output_event) {
+			/* get the rb we want to redirect to */
+			rb = ring_buffer_get(output_event, rbx);
+			if (!rb)
+				continue;
+		}
 
-	rcu_assign_pointer(event->rb, rb);
+		if (old_rb)
+			ring_buffer_detach(event, old_rb);
 
-	if (old_rb) {
-		ring_buffer_put(old_rb);
-		/*
-		 * Since we detached before setting the new rb, so that we
-		 * could attach the new rb, we could have missed a wakeup.
-		 * Provide it now.
-		 */
-		wake_up_all(&event->waitq);
+		if (rb)
+			ring_buffer_attach(event, rb);
+
+		rcu_assign_pointer(event->rb[rbx], rb);
+
+		if (old_rb) {
+			ring_buffer_put(old_rb);
+			/*
+			 * Since we detached before setting the new rb, so that we
+			 * could attach the new rb, we could have missed a wakeup.
+			 * Provide it now.
+			 */
+			wake_up_all(&event->waitq);
+		}
 	}
 
 	ret = 0;
-unlock:
 	mutex_unlock(&event->mmap_mutex);
 
 out:
@@ -7095,6 +7175,10 @@ SYSCALL_DEFINE5(perf_event_open,
 		goto err_alloc;
 	}
 
+	err = itrace_event_installable(event, ctx);
+	if (err)
+		goto err_alloc;
+
 	if (task) {
 		put_task_struct(task);
 		task = NULL;
@@ -7223,6 +7307,9 @@ err_fd:
 	return err;
 }
 
+/* XXX */
+int itrace_kernel_event(struct perf_event *event, struct task_struct *task);
+
 /**
  * perf_event_create_kernel_counter
  *
@@ -7253,12 +7340,20 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 
 	account_event(event);
 
+	err = itrace_kernel_event(event, task);
+	if (err)
+		goto err_free;
+
 	ctx = find_get_context(event->pmu, task, cpu);
 	if (IS_ERR(ctx)) {
 		err = PTR_ERR(ctx);
 		goto err_free;
 	}
 
+	err = itrace_event_installable(event, ctx);
+	if (err)
+		goto err_free;
+
 	WARN_ON_ONCE(ctx->parent_ctx);
 	mutex_lock(&ctx->mutex);
 	perf_install_in_context(ctx, event, cpu);
@@ -7536,6 +7631,8 @@ void perf_event_delayed_put(struct task_struct *task)
 		WARN_ON_ONCE(task->perf_event_ctxp[ctxn]);
 }
 
+int itrace_inherit_event(struct perf_event *event, struct task_struct *task);
+
 /*
  * inherit a event from parent task to child task:
  */
@@ -7549,6 +7646,7 @@ inherit_event(struct perf_event *parent_event,
 {
 	struct perf_event *child_event;
 	unsigned long flags;
+	int err;
 
 	/*
 	 * Instead of creating recursive hierarchies of events,
@@ -7567,10 +7665,12 @@ inherit_event(struct perf_event *parent_event,
 	if (IS_ERR(child_event))
 		return child_event;
 
-	if (!atomic_long_inc_not_zero(&parent_event->refcount)) {
-		free_event(child_event);
-		return NULL;
-	}
+	err = itrace_inherit_event(child_event, child);
+	if (err)
+		goto err_alloc;
+
+	if (!atomic_long_inc_not_zero(&parent_event->refcount))
+		goto err_alloc;
 
 	get_ctx(child_ctx);
 
@@ -7621,6 +7721,11 @@ inherit_event(struct perf_event *parent_event,
 	mutex_unlock(&parent_event->child_mutex);
 
 	return child_event;
+
+err_alloc:
+	free_event(child_event);
+
+	return NULL;
 }
 
 static int inherit_group(struct perf_event *parent_event,
diff --git a/kernel/events/internal.h b/kernel/events/internal.h
index 8835f00..f183efe 100644
--- a/kernel/events/internal.h
+++ b/kernel/events/internal.h
@@ -45,6 +45,7 @@ struct ring_buffer {
 	atomic_t			mmap_count;
 	unsigned long			mmap_locked;
 	struct user_struct		*mmap_user;
+	void				*priv;
 
 	struct perf_event_mmap_page	*user_page;
 	void				*data_pages[0];
@@ -55,6 +56,12 @@ extern struct ring_buffer *
 rb_alloc(int nr_pages, long watermark, int cpu, int flags,
 	 struct ring_buffer_ops *rb_ops);
 extern void perf_event_wakeup(struct perf_event *event);
+extern struct ring_buffer *ring_buffer_get(struct perf_event *event, int rbx);
+extern void ring_buffer_put(struct ring_buffer *rb);
+extern void ring_buffer_attach(struct perf_event *event,
+			       struct ring_buffer *rb);
+extern void ring_buffer_detach(struct perf_event *event,
+			       struct ring_buffer *rb);
 
 extern void
 perf_event_header__init_id(struct perf_event_header *header,
diff --git a/kernel/events/itrace.c b/kernel/events/itrace.c
new file mode 100644
index 0000000..3adba62
--- /dev/null
+++ b/kernel/events/itrace.c
@@ -0,0 +1,589 @@
+/*
+ * Instruction flow trace unit infrastructure
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#undef DEBUG
+
+#include <linux/kernel.h>
+#include <linux/sched.h>
+#include <linux/perf_event.h>
+#include <linux/itrace.h>
+#include <linux/sizes.h>
+#include <linux/elf.h>
+#include <linux/coredump.h>
+#include <linux/slab.h>
+
+#include "internal.h"
+
+#define CORE_OWNER "ITRACE"
+
+/*
+ * for the sake of simplicity, we assume that for now we can
+ * only have one type of an itrace pmu in a system
+ */
+static struct itrace_pmu *itrace_pmu;
+
+struct static_key_deferred itrace_core_events __read_mostly;
+
+struct itrace_lost_record {
+	struct perf_event_header	header;
+	u64				offset;
+};
+
+/*
+ * In the worst case, perf buffer might be full and we're not able to output
+ * this record, so the decoder won't know that the data was lost. However,
+ * it will still see inconsistency in the trace IP.
+ */
+void itrace_lost_data(struct perf_event *event, u64 offset)
+{
+	struct perf_output_handle handle;
+	struct perf_sample_data sample;
+	struct itrace_lost_record rec = {
+		.header = {
+			.type = PERF_RECORD_ITRACE_LOST,
+			.misc = 0,
+			.size = sizeof(rec),
+		},
+		.offset = offset
+	};
+	int ret;
+
+	perf_event_header__init_id(&rec.header, &sample, event);
+	ret = perf_output_begin(&handle, event, rec.header.size);
+
+	if (ret)
+		return;
+
+	perf_output_put(&handle, rec);
+	perf_event__output_id_sample(event, &handle, &sample);
+	perf_output_end(&handle);
+}
+
+static struct itrace_pmu *itrace_pmu_find(int type)
+{
+	if (itrace_pmu && itrace_pmu->pmu.type == type)
+		return itrace_pmu;
+
+	return NULL;
+}
+
+bool is_itrace_event(struct perf_event *event)
+{
+	return !!itrace_pmu_find(event->attr.type);
+}
+
+static void itrace_event_destroy(struct perf_event *event)
+{
+	struct task_struct *task = event->hw.itrace_target;
+	struct ring_buffer *rb = event->rb[PERF_RB_ITRACE];
+
+	if (task && event->hw.counter_type == PERF_ITRACE_COREDUMP)
+		static_key_slow_dec_deferred(&itrace_core_events);
+
+	if (!rb)
+		return;
+
+	if (event->hw.counter_type != PERF_ITRACE_USER) {
+		atomic_dec(&rb->mmap_count);
+		atomic_dec(&event->mmap_count[PERF_RB_ITRACE]);
+		ring_buffer_detach(event, rb);
+		rcu_assign_pointer(event->rb[PERF_RB_ITRACE], NULL);
+		ring_buffer_put(rb); /* should be last */
+	}
+}
+
+int itrace_event_installable(struct perf_event *event,
+			     struct perf_event_context *ctx)
+{
+	struct perf_event *iter_event;
+
+	if (!is_itrace_event(event))
+		return 0;
+
+	/*
+	 * the context is locked and pinned and won't change under us,
+	 * also we don't care if it's a cpu or task context at this point
+	 */
+	list_for_each_entry(iter_event, &ctx->event_list, event_entry) {
+		if (is_itrace_event(iter_event) &&
+		    (iter_event->cpu == event->cpu ||
+		     iter_event->cpu == -1 ||
+		     event->cpu == -1))
+			return -EEXIST;
+	}
+
+	return 0;
+}
+
+static int itrace_event_init(struct perf_event *event)
+{
+	struct itrace_pmu *ipmu = to_itrace_pmu(event->pmu);
+	int ret;
+
+	ret = ipmu->event_init(event);
+	if (ret)
+		return ret;
+
+	event->destroy = itrace_event_destroy;
+	event->hw.counter_type = PERF_ITRACE_USER;
+
+	return 0;
+}
+
+static unsigned long itrace_rb_get_size(int nr_pages)
+{
+	return sizeof(struct ring_buffer) + sizeof(void *) * nr_pages;
+}
+
+static int itrace_alloc_data_pages(struct ring_buffer *rb, int cpu,
+				   int nr_pages, int flags)
+{
+	struct itrace_pmu *ipmu = itrace_pmu;
+	bool overwrite = !(flags & RING_BUFFER_WRITABLE);
+
+	rb->priv = ipmu->alloc_buffer(cpu, nr_pages, overwrite,
+				      rb->data_pages, &rb->user_page);
+	if (!rb->priv)
+		return -ENOMEM;
+	rb->nr_pages = nr_pages;
+
+	return 0;
+}
+
+static void itrace_free(struct ring_buffer *rb)
+{
+	struct itrace_pmu *ipmu = itrace_pmu;
+
+	if (rb->priv)
+		ipmu->free_buffer(rb->priv);
+}
+
+struct page *
+itrace_mmap_to_page(struct ring_buffer *rb, unsigned long pgoff)
+{
+	if (pgoff > rb->nr_pages)
+		return NULL;
+
+	if (pgoff == 0)
+		return virt_to_page(rb->user_page);
+
+	return virt_to_page(rb->data_pages[pgoff - 1]);
+}
+
+struct ring_buffer_ops itrace_rb_ops = {
+	.get_size		= itrace_rb_get_size,
+	.alloc_data_page	= itrace_alloc_data_pages,
+	.free_buffer		= itrace_free,
+	.mmap_to_page		= itrace_mmap_to_page,
+};
+
+void *itrace_priv(struct perf_event *event)
+{
+	if (!event->rb[PERF_RB_ITRACE])
+		return NULL;
+
+	return event->rb[PERF_RB_ITRACE]->priv;
+}
+
+void *itrace_event_get_priv(struct perf_event *event)
+{
+	struct ring_buffer *rb = ring_buffer_get(event, PERF_RB_ITRACE);
+
+	return rb ? rb->priv : NULL;
+}
+
+void itrace_event_put(struct perf_event *event)
+{
+	struct ring_buffer *rb;
+
+	rcu_read_lock();
+	rb = rcu_dereference(event->rb[PERF_RB_ITRACE]);
+	if (rb)
+		ring_buffer_put(rb);
+	rcu_read_unlock();
+}
+
+static void itrace_set_output(struct perf_event *event,
+			      struct perf_event *output_event)
+{
+	struct ring_buffer *rb;
+
+	mutex_lock(&event->mmap_mutex);
+
+	if (atomic_read(&event->mmap_count[PERF_RB_ITRACE]) ||
+	    event->rb[PERF_RB_ITRACE])
+		goto out;
+
+	rb = ring_buffer_get(output_event, PERF_RB_ITRACE);
+	if (!rb)
+		goto out;
+
+	ring_buffer_attach(event, rb);
+	rcu_assign_pointer(event->rb[PERF_RB_ITRACE], rb);
+
+out:
+	mutex_unlock(&event->mmap_mutex);
+}
+
+static size_t roundup_buffer_size(u64 size)
+{
+	return 1ul << (__get_order(size) + PAGE_SHIFT);
+}
+
+int itrace_inherit_event(struct perf_event *event, struct task_struct *task)
+{
+	size_t size = event->attr.itrace_sample_size;
+	struct perf_event *parent = event->parent;
+	struct ring_buffer *rb;
+	struct itrace_pmu *ipmu;
+
+	if (!is_itrace_event(event))
+		return 0;
+
+	ipmu = to_itrace_pmu(event->pmu);
+
+	if (parent->hw.counter_type == PERF_ITRACE_USER) {
+		/*
+		 * inherited user's counters should inherit buffers IF
+		 * they aren't cpu==-1
+		 */
+		if (parent->cpu == -1)
+			return -EINVAL;
+
+		itrace_set_output(event, parent);
+		return 0;
+	}
+
+	event->hw.counter_type = parent->hw.counter_type;
+	if (event->hw.counter_type == PERF_ITRACE_COREDUMP) {
+		static_key_slow_inc(&itrace_core_events.key);
+		size = task_rlimit(task, RLIMIT_ITRACE);
+	}
+
+	size = roundup_buffer_size(size);
+	rb = rb_alloc(size >> PAGE_SHIFT, 0, event->cpu, 0, &itrace_rb_ops);
+	if (!rb)
+		return -ENOMEM;
+
+	ring_buffer_attach(event, rb);
+	rcu_assign_pointer(event->rb[PERF_RB_ITRACE], rb);
+	atomic_set(&rb->mmap_count, 1);
+	atomic_set(&event->mmap_count[PERF_RB_ITRACE], 1);
+
+	return 0;
+}
+
+int itrace_kernel_event(struct perf_event *event, struct task_struct *task)
+{
+	struct itrace_pmu *ipmu;
+	struct ring_buffer *rb;
+	size_t size;
+
+	if (!is_itrace_event(event))
+		return 0;
+
+	ipmu = to_itrace_pmu(event->pmu);
+
+	if (event->attr.itrace_sample_size)
+		size = roundup_buffer_size(event->attr.itrace_sample_size);
+	else
+		size = task_rlimit(task, RLIMIT_ITRACE);
+
+	rb = rb_alloc(size >> PAGE_SHIFT, 0, event->cpu, 0, &itrace_rb_ops);
+	if (!rb)
+		return -ENOMEM;
+
+	ring_buffer_attach(event, rb);
+	rcu_assign_pointer(event->rb[PERF_RB_ITRACE], rb);
+	atomic_set(&rb->mmap_count, 1);
+	atomic_set(&event->mmap_count[PERF_RB_ITRACE], 1);
+
+	return 0;
+}
+
+void itrace_wake_up(struct perf_event *event)
+{
+	struct ring_buffer *rb;
+
+	rcu_read_lock();
+	rb = rcu_dereference(event->rb[PERF_RB_ITRACE]);
+	if (rb) {
+		atomic_set(&rb->poll, POLL_IN);
+		irq_work_queue(&event->pending);
+	}
+	rcu_read_unlock();
+}
+
+int itrace_pmu_register(struct itrace_pmu *ipmu)
+{
+	int ret;
+
+	if (itrace_pmu)
+		return -EBUSY;
+
+	if (!ipmu->sample_trace    ||
+	    !ipmu->sample_output   ||
+	    !ipmu->core_size       ||
+	    !ipmu->core_output)
+		return -EINVAL;
+
+	ipmu->event_init = ipmu->pmu.event_init;
+	ipmu->pmu.event_init = itrace_event_init;
+
+	ret = perf_pmu_register(&ipmu->pmu, ipmu->name, -1);
+	if (!ret)
+		itrace_pmu = ipmu;
+
+	return ret;
+}
+
+/*
+ * Trace sample annotation
+ * For events that have attr.sample_type & PERF_SAMPLE_ITRACE, perf calls here
+ * to configure and obtain itrace samples.
+ */
+
+int itrace_sampler_init(struct perf_event *event, struct task_struct *task)
+{
+	struct perf_event_attr attr;
+	struct perf_event *tevt;
+	struct itrace_pmu *ipmu;
+
+	ipmu = itrace_pmu_find(event->attr.itrace_sample_type);
+	if (!ipmu)
+		return -ENOTSUPP;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.type = ipmu->pmu.type;
+	attr.config = 0;
+	attr.sample_type = 0;
+	attr.exclude_user = event->attr.exclude_user;
+	attr.exclude_kernel = event->attr.exclude_kernel;
+	attr.itrace_sample_size = event->attr.itrace_sample_size;
+	attr.itrace_config = event->attr.itrace_config;
+
+	tevt = perf_event_create_kernel_counter(&attr, event->cpu, task, NULL, NULL);
+	if (IS_ERR(tevt))
+		return PTR_ERR(tevt);
+
+	if (!itrace_priv(tevt)) {
+		perf_event_release_kernel(tevt);
+		return -EINVAL;
+	}
+
+	event->trace_event = tevt;
+	tevt->hw.counter_type = PERF_ITRACE_SAMPLING;
+	if (event->state != PERF_EVENT_STATE_OFF)
+		perf_event_enable(event->trace_event);
+
+	return 0;
+}
+
+void itrace_sampler_fini(struct perf_event *event)
+{
+	struct perf_event *tevt = event->trace_event;
+
+	perf_event_release_kernel(tevt);
+	event->trace_event = NULL;
+}
+
+unsigned long itrace_sampler_trace(struct perf_event *event,
+				   struct perf_sample_data *data)
+{
+	struct perf_event *tevt = event->trace_event;
+	struct itrace_pmu *ipmu;
+
+	if (!tevt)
+		return 0;
+
+	ipmu = to_itrace_pmu(tevt->pmu);
+	return ipmu->sample_trace(tevt, data);
+}
+
+void itrace_sampler_output(struct perf_event *event,
+			   struct perf_output_handle *handle,
+			   struct perf_sample_data *data)
+{
+	struct perf_event *tevt = event->trace_event;
+	struct itrace_pmu *ipmu;
+
+	if (!tevt || !data->trace.size)
+		return;
+
+	ipmu = to_itrace_pmu(tevt->pmu);
+	ipmu->sample_output(tevt, handle, data);
+}
+
+/*
+ * Core dump bits
+ *
+ * Various parts of the kernel will call here:
+ *   + do_prlimit(): to tell us that the user is trying to set RLIMIT_ITRACE
+ *   + various places in bitfmt_elf.c: to write out itrace notes
+ *   + do_exit(): to destroy the first core dump counter
+ *   + the rest (copy_process()/do_exit()) is taken care of by perf for us
+ */
+
+static struct perf_event *
+itrace_find_task_event(struct task_struct *task, unsigned type)
+{
+	struct perf_event_context *ctx;
+	struct perf_event *event = NULL;
+
+	rcu_read_lock();
+	ctx = rcu_dereference(task->perf_event_ctxp[perf_hw_context]);
+	if (!ctx)
+		goto out;
+
+	list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
+		if (is_itrace_event(event) &&
+		    event->cpu == -1 &&
+		    !!(event->hw.counter_type & type))
+			goto out;
+	}
+
+	event = NULL;
+out:
+	rcu_read_unlock();
+
+	return event;
+}
+
+int update_itrace_rlimit(struct task_struct *task, unsigned long rlim)
+{
+	struct itrace_pmu *ipmu = itrace_pmu;
+	struct perf_event_attr attr;
+	struct perf_event *event;
+
+	event = itrace_find_task_event(task, PERF_ITRACE_ANY);
+	if (event) {
+		if (event->hw.counter_type != PERF_ITRACE_COREDUMP)
+			return -EINVAL;
+
+		perf_event_release_kernel(event);
+		static_key_slow_dec_deferred(&itrace_core_events);
+	}
+
+	if (!rlim)
+		return 0;
+
+	memset(&attr, 0, sizeof(attr));
+	attr.type = ipmu->pmu.type;
+	attr.config = 0;
+	attr.sample_type = 0;
+	attr.exclude_kernel = 1;
+	attr.inherit = 1;
+
+	event = perf_event_create_kernel_counter(&attr, -1, task, NULL, NULL);
+	if (IS_ERR(event))
+		return PTR_ERR(event);
+
+	static_key_slow_inc(&itrace_core_events.key);
+
+	event->hw.counter_type = PERF_ITRACE_COREDUMP;
+	perf_event_enable(event);
+
+	return 0;
+}
+
+static void itrace_pmu_exit_task(struct task_struct *task)
+{
+	struct perf_event *event;
+
+	event = itrace_find_task_event(task, PERF_ITRACE_COREDUMP);
+
+	/*
+	 * here we are only interested in kernel counters created by
+	 * update_itrace_rlimit(), inherited ones should be taken care of by
+	 * perf_event_exit_task(), sampling ones are taken care of by
+	 * itrace_sampler_fini().
+	 */
+	if (!event)
+		return;
+
+	if (!event->parent)
+		perf_event_release_kernel(event);
+}
+
+void exit_itrace(struct task_struct *task)
+{
+	if (static_key_false(&itrace_core_events.key))
+		itrace_pmu_exit_task(task);
+}
+
+size_t itrace_elf_note_size(struct task_struct *task)
+{
+	struct itrace_pmu *ipmu;
+	struct perf_event *event = NULL;
+	size_t size = 0;
+
+	event = itrace_find_task_event(task, PERF_ITRACE_COREDUMP);
+	if (event) {
+		perf_event_disable(event);
+
+		ipmu = to_itrace_pmu(event->pmu);
+		size = ipmu->core_size(event);
+		size += task_rlimit(task, RLIMIT_ITRACE);
+		size = roundup(size + strlen(ipmu->name) + 1, 4);
+		size += sizeof(struct itrace_note) + sizeof(struct elf_note);
+		size += roundup(sizeof(CORE_OWNER), 4);
+	}
+
+	return size;
+}
+
+void itrace_elf_note_write(struct coredump_params *cprm,
+			   struct task_struct *task)
+{
+	struct perf_event *event;
+	struct itrace_note note;
+	struct itrace_pmu *ipmu;
+	struct elf_note en;
+	unsigned long rlim;
+	size_t pmu_len;
+
+	event = itrace_find_task_event(task, PERF_ITRACE_COREDUMP);
+	if (!event)
+		return;
+
+	ipmu = to_itrace_pmu(event->pmu);
+	pmu_len = strlen(ipmu->name) + 1;
+
+	rlim = task_rlimit(task, RLIMIT_ITRACE);
+
+	/* Elf note with name */
+	en.n_namesz = strlen(CORE_OWNER);
+	en.n_descsz = roundup(ipmu->core_size(event) + rlim + sizeof(note) +
+			      pmu_len, 4);
+	en.n_type = NT_ITRACE;
+	dump_emit(cprm, &en, sizeof(en));
+	dump_align(cprm, 4);
+	dump_emit(cprm, CORE_OWNER, sizeof(CORE_OWNER));
+	dump_align(cprm, 4);
+
+	/* ITRACE header */
+	note.itrace_config = event->attr.itrace_config;
+	dump_emit(cprm, &note, sizeof(note));
+	dump_emit(cprm, ipmu->name, pmu_len);
+
+	/* ITRACE PMU header + payload */
+	ipmu->core_output(cprm, event, rlim);
+	dump_align(cprm, 4);
+}
+
diff --git a/kernel/events/ring_buffer.c b/kernel/events/ring_buffer.c
index d7ec426..0bee352 100644
--- a/kernel/events/ring_buffer.c
+++ b/kernel/events/ring_buffer.c
@@ -119,7 +119,7 @@ int perf_output_begin(struct perf_output_handle *handle,
 	if (event->parent)
 		event = event->parent;
 
-	rb = rcu_dereference(event->rb);
+	rb = rcu_dereference(event->rb[PERF_RB_MAIN]);
 	if (unlikely(!rb))
 		goto out;
 
diff --git a/kernel/exit.c b/kernel/exit.c
index a949819..28138ef 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -48,6 +48,7 @@
 #include <linux/fs_struct.h>
 #include <linux/init_task.h>
 #include <linux/perf_event.h>
+#include <linux/itrace.h>
 #include <trace/events/sched.h>
 #include <linux/hw_breakpoint.h>
 #include <linux/oom.h>
@@ -788,6 +789,8 @@ void do_exit(long code)
 	check_stack_usage();
 	exit_thread();
 
+	exit_itrace(tsk);
+
 	/*
 	 * Flush inherited counters to the parent - before the parent
 	 * gets woken up by child-exit notifications.
diff --git a/kernel/sys.c b/kernel/sys.c
index c723113..7651d6f 100644
--- a/kernel/sys.c
+++ b/kernel/sys.c
@@ -14,6 +14,7 @@
 #include <linux/fs.h>
 #include <linux/kmod.h>
 #include <linux/perf_event.h>
+#include <linux/itrace.h>
 #include <linux/resource.h>
 #include <linux/kernel.h>
 #include <linux/workqueue.h>
@@ -1402,6 +1403,10 @@ int do_prlimit(struct task_struct *tsk, unsigned int resource,
 		update_rlimit_cpu(tsk, new_rlim->rlim_cur);
 out:
 	read_unlock(&tasklist_lock);
+
+	if (!retval && new_rlim && resource == RLIMIT_ITRACE)
+		retval = update_itrace_rlimit(tsk, new_rlim->rlim_cur);
+
 	return retval;
 }
 
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 05/71] x86: perf: Intel PT PMU driver
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (3 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 06/71] perf: Allow set-output for task contexts of different types Alexander Shishkin
                   ` (67 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Alexander Shishkin

Add support for Intel Processor Trace (PT) to kernel's perf/itrace events.
PT is an extension of Intel Architecture that collects information about
software execuction such as control flow, execution modes and timings and
formats it into highly compressed binary packets. Even being compressed,
these packets are generated at hundreds of megabytes per second per core,
which makes it impractical to decode them on the fly in the kernel. Thus,
buffers containing this binary stream are zero-copy mapped to the debug
tools in userspace for subsequent decoding and analysis.

PT trace data can also be used to annotate other perf events by setting
a corresponding bit in sample_type; it can also be included in process
core dumps. This relies on itrace infrastructure extension to perf core.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/include/uapi/asm/msr-index.h     |   18 +
 arch/x86/kernel/cpu/Makefile              |    1 +
 arch/x86/kernel/cpu/intel_pt.h            |  129 ++++
 arch/x86/kernel/cpu/perf_event.c          |    4 +
 arch/x86/kernel/cpu/perf_event_intel.c    |   10 +
 arch/x86/kernel/cpu/perf_event_intel_pt.c | 1167 +++++++++++++++++++++++++++++
 6 files changed, 1329 insertions(+)
 create mode 100644 arch/x86/kernel/cpu/intel_pt.h
 create mode 100644 arch/x86/kernel/cpu/perf_event_intel_pt.c

diff --git a/arch/x86/include/uapi/asm/msr-index.h b/arch/x86/include/uapi/asm/msr-index.h
index b93e09a..6dfa422 100644
--- a/arch/x86/include/uapi/asm/msr-index.h
+++ b/arch/x86/include/uapi/asm/msr-index.h
@@ -74,6 +74,24 @@
 #define MSR_IA32_PERF_CAPABILITIES	0x00000345
 #define MSR_PEBS_LD_LAT_THRESHOLD	0x000003f6
 
+#define MSR_IA32_RTIT_CTL		0x00000570
+#define RTIT_CTL_TRACEEN		BIT(0)
+#define RTIT_CTL_OS			BIT(2)
+#define RTIT_CTL_USR			BIT(3)
+#define RTIT_CTL_CR3EN			BIT(7)
+#define RTIT_CTL_TOPA			BIT(8)
+#define RTIT_CTL_TSC_EN			BIT(10)
+#define RTIT_CTL_DISRETC		BIT(11)
+#define RTIT_CTL_BRANCH_EN		BIT(13)
+#define MSR_IA32_RTIT_STATUS		0x00000571
+#define RTIT_STATUS_CONTEXTEN		BIT(1)
+#define RTIT_STATUS_TRIGGEREN		BIT(2)
+#define RTIT_STATUS_ERROR		BIT(4)
+#define RTIT_STATUS_STOPPED		BIT(5)
+#define MSR_IA32_RTIT_CR3_MATCH		0x00000572
+#define MSR_IA32_RTIT_OUTPUT_BASE	0x00000560
+#define MSR_IA32_RTIT_OUTPUT_MASK	0x00000561
+
 #define MSR_MTRRfix64K_00000		0x00000250
 #define MSR_MTRRfix16K_80000		0x00000258
 #define MSR_MTRRfix16K_A0000		0x00000259
diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index 6359506..cb69de3 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -37,6 +37,7 @@ endif
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_knc.o perf_event_p4.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_uncore.o perf_event_intel_rapl.o
+obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_pt.o
 endif
 
 
diff --git a/arch/x86/kernel/cpu/intel_pt.h b/arch/x86/kernel/cpu/intel_pt.h
new file mode 100644
index 0000000..7fb10db
--- /dev/null
+++ b/arch/x86/kernel/cpu/intel_pt.h
@@ -0,0 +1,129 @@
+/*
+ * Intel(R) Processor Trace PMU driver for perf
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#ifndef __INTEL_PT_H__
+#define __INTEL_PT_H__
+
+#include <linux/radix-tree.h>
+#include <linux/itrace.h>
+
+/*
+ * Single-entry ToPA: when this close to region boundary, switch
+ * buffers to avoid losing data.
+ */
+#define TOPA_PMI_MARGIN 512
+
+/*
+ * Table of Physical Addresses bits
+ */
+enum topa_sz {
+	TOPA_4K	= 0,
+	TOPA_8K,
+	TOPA_16K,
+	TOPA_32K,
+	TOPA_64K,
+	TOPA_128K,
+	TOPA_256K,
+	TOPA_512K,
+	TOPA_1MB,
+	TOPA_2MB,
+	TOPA_4MB,
+	TOPA_8MB,
+	TOPA_16MB,
+	TOPA_32MB,
+	TOPA_64MB,
+	TOPA_128MB,
+	TOPA_SZ_END,
+};
+
+static inline unsigned int sizes(enum topa_sz tsz)
+{
+	return 1 << (tsz + 12);
+};
+
+struct topa_entry {
+	u64	end	: 1;
+	u64	rsvd0	: 1;
+	u64	intr	: 1;
+	u64	rsvd1	: 1;
+	u64	stop	: 1;
+	u64	rsvd2	: 1;
+	u64	size	: 4;
+	u64	rsvd3	: 2;
+	u64	base	: 36;
+	u64	rsvd4	: 16;
+};
+
+#define TOPA_SHIFT 12
+#define PT_CPUID_LEAVES 2
+
+enum pt_capabilities {
+	PT_CAP_max_subleaf = 0,
+	PT_CAP_cr3_filtering,
+	PT_CAP_topa_output,
+	PT_CAP_topa_multiple_entries,
+	PT_CAP_payloads_lip,
+};
+
+struct pt_pmu {
+	struct itrace_pmu	itrace;
+	u32			caps[4 * PT_CPUID_LEAVES];
+	char			*capstr;
+	unsigned int		caplen;
+};
+
+/**
+ * struct pt_buffer - buffer configuration; one buffer per task_struct or
+ * cpu, depending on perf event configuration
+ * @tables: list of ToPA tables in this buffer
+ * @first, @last: shorthands for first and last topa tables
+ * @cur: current topa table
+ * @size: total size of all output regions within this buffer
+ * @cur_idx: current output region's index within @cur table
+ * @output_off: offset within the current output region
+ */
+struct pt_buffer {
+	/* hint for allocation */
+	int			cpu;
+	/* list of ToPA tables */
+	struct list_head	tables;
+	/* top-level table */
+	struct topa		*first, *last, *cur;
+	unsigned long		round;
+	unsigned int		cur_idx;
+	size_t			output_off;
+	unsigned long		size;
+	local64_t		head;
+	unsigned long		watermark;
+	bool			snapshot;
+	struct perf_event_mmap_page *user_page;
+	void			**data_pages;
+};
+
+/**
+ * struct pt - per-cpu pt
+ */
+struct pt {
+	raw_spinlock_t		lock;
+	struct perf_event	*event;
+};
+
+void intel_pt_interrupt(void);
+
+#endif /* __INTEL_PT_H__ */
diff --git a/arch/x86/kernel/cpu/perf_event.c b/arch/x86/kernel/cpu/perf_event.c
index 8e13293..9125797 100644
--- a/arch/x86/kernel/cpu/perf_event.c
+++ b/arch/x86/kernel/cpu/perf_event.c
@@ -385,6 +385,10 @@ static inline int precise_br_compat(struct perf_event *event)
 
 int x86_pmu_hw_config(struct perf_event *event)
 {
+	if (event->attr.sample_type & PERF_SAMPLE_ITRACE &&
+	    event->attr.sample_type & PERF_SAMPLE_BRANCH_STACK)
+		return -EINVAL;
+
 	if (event->attr.precise_ip) {
 		int precise = 0;
 
diff --git a/arch/x86/kernel/cpu/perf_event_intel.c b/arch/x86/kernel/cpu/perf_event_intel.c
index 0fa4f24..28b5023 100644
--- a/arch/x86/kernel/cpu/perf_event_intel.c
+++ b/arch/x86/kernel/cpu/perf_event_intel.c
@@ -1312,6 +1312,8 @@ int intel_pmu_save_and_restart(struct perf_event *event)
 	return x86_perf_event_set_period(event);
 }
 
+void intel_pt_interrupt(void);
+
 static void intel_pmu_reset(void)
 {
 	struct debug_store *ds = __this_cpu_read(cpu_hw_events.ds);
@@ -1393,6 +1395,14 @@ again:
 	}
 
 	/*
+	 * Intel PT
+	 */
+	if (__test_and_clear_bit(55, (unsigned long *)&status)) {
+		handled++;
+		intel_pt_interrupt();
+	}
+
+	/*
 	 * Checkpointed counters can lead to 'spurious' PMIs because the
 	 * rollback caused by the PMI will have cleared the overflow status
 	 * bit. Therefore always force probe these counters.
diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c b/arch/x86/kernel/cpu/perf_event_intel_pt.c
new file mode 100644
index 0000000..37b4db2
--- /dev/null
+++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c
@@ -0,0 +1,1167 @@
+/*
+ * Intel(R) Processor Trace PMU driver for perf
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#undef DEBUG
+
+#include <linux/bitops.h>
+#include <linux/types.h>
+#include <linux/slab.h>
+#include <linux/debugfs.h>
+#include <linux/device.h>
+#include <linux/coredump.h>
+
+#include <asm-generic/sizes.h>
+#include <asm/perf_event.h>
+#include <asm/insn.h>
+
+#include "perf_event.h"
+#include "intel_pt.h"
+
+static DEFINE_PER_CPU(struct pt, pt_ctx);
+
+static struct pt_pmu pt_pmu;
+
+enum cpuid_regs {
+	CR_EAX = 0,
+	CR_ECX,
+	CR_EDX,
+	CR_EBX
+};
+
+/*
+ * Capabilities of Intel PT hardware, such as number of address bits or
+ * supported output schemes, are cached and exported to userspace as "caps"
+ * attribute group of pt pmu device
+ * (/sys/bus/event_source/devices/intel_pt/caps/) so that userspace can store
+ * relevant bits together with intel_pt traces.
+ *
+ * Currently, for debugging purposes, these attributes are also writable; this
+ * should be removed in the final version.
+ */
+#define PT_CAP(_n, _l, _r, _m)						\
+	[PT_CAP_ ## _n] = { .name = __stringify(_n), .leaf = _l,	\
+			    .reg = _r, .mask = _m }
+
+static struct pt_cap_desc {
+	const char	*name;
+	u32		leaf;
+	u8		reg;
+	u32		mask;
+} pt_caps[] = {
+	PT_CAP(max_subleaf,		0, CR_EAX, 0xffffffff),
+	PT_CAP(cr3_filtering,		0, CR_EBX, BIT(0)),
+	PT_CAP(topa_output,		0, CR_ECX, BIT(0)),
+	PT_CAP(topa_multiple_entries,	0, CR_ECX, BIT(1)),
+	PT_CAP(payloads_lip,		0, CR_ECX, BIT(31)),
+};
+
+static u32 pt_cap_get(enum pt_capabilities cap)
+{
+	struct pt_cap_desc *cd = &pt_caps[cap];
+	u32 c = pt_pmu.caps[cd->leaf * 4 + cd->reg];
+	unsigned int shift = __ffs(cd->mask);
+
+	return (c & cd->mask) >> shift;
+}
+
+static void pt_cap_set(enum pt_capabilities cap, u32 val)
+{
+	struct pt_cap_desc *cd = &pt_caps[cap];
+	unsigned int idx = cd->leaf * 4 + cd->reg;
+	unsigned int shift = __ffs(cd->mask);
+
+	pt_pmu.caps[idx] = (val << shift) & cd->mask;
+}
+
+static void pt_cap_string(void)
+{
+	char *capstr;
+	int pos, i;
+
+	capstr = kzalloc(PAGE_SIZE, GFP_KERNEL);
+	if (!capstr)
+		return;
+
+	for (i = 0, pos = 0; i < ARRAY_SIZE(pt_caps) && pos < PAGE_SIZE; i++) {
+		pos += snprintf(&capstr[pos], PAGE_SIZE - pos, "%s:%x%c",
+				pt_caps[i].name, pt_cap_get(i),
+				i == ARRAY_SIZE(pt_caps) - 1 ? 0 : ',');
+	}
+
+	if (pt_pmu.capstr)
+		kfree(pt_pmu.capstr);
+
+	pt_pmu.capstr = capstr;
+	pt_pmu.caplen = pos;
+}
+
+static ssize_t pt_cap_show(struct device *cdev,
+			   struct device_attribute *attr,
+			   char *buf)
+{
+	struct dev_ext_attribute *ea =
+		container_of(attr, struct dev_ext_attribute, attr);
+	enum pt_capabilities cap = (long)ea->var;
+
+	return snprintf(buf, PAGE_SIZE, "%x\n", pt_cap_get(cap));
+}
+
+static ssize_t pt_cap_store(struct device *cdev,
+			    struct device_attribute *attr,
+			    const char *buf, size_t size)
+{
+	struct dev_ext_attribute *ea =
+		container_of(attr, struct dev_ext_attribute, attr);
+	enum pt_capabilities cap = (long)ea->var;
+	unsigned long new;
+	char *end;
+
+	new = simple_strtoul(buf, &end, 0);
+	if (end == buf)
+		return -EINVAL;
+
+	pt_cap_set(cap, new);
+	pt_cap_string();
+	return size;
+}
+
+static struct attribute_group pt_cap_group = {
+	.name	= "caps",
+};
+
+PMU_FORMAT_ATTR(tsc,		"itrace_config:10"	);
+PMU_FORMAT_ATTR(noretcomp,	"itrace_config:11"	);
+
+static struct attribute *pt_formats_attr[] = {
+	&format_attr_tsc.attr,
+	&format_attr_noretcomp.attr,
+	NULL,
+};
+
+static struct attribute_group pt_format_group = {
+	.name	= "format",
+	.attrs	= pt_formats_attr,
+};
+
+static const struct attribute_group *pt_attr_groups[] = {
+	&pt_cap_group,
+	&pt_format_group,
+	NULL,
+};
+
+static void __init pt_pmu_hw_init(void)
+{
+	struct dev_ext_attribute *de_attrs;
+	struct attribute **attrs;
+	size_t size;
+	long i;
+
+	if (test_cpu_cap(&boot_cpu_data, X86_FEATURE_INTEL_PT)) {
+		for (i = 0; i < PT_CPUID_LEAVES; i++)
+			cpuid_count(20, i,
+				    &pt_pmu.caps[CR_EAX + i * 4],
+				    &pt_pmu.caps[CR_EBX + i * 4],
+				    &pt_pmu.caps[CR_ECX + i * 4],
+				    &pt_pmu.caps[CR_EDX + i * 4]);
+	}
+
+	size = sizeof(struct attribute *) * (ARRAY_SIZE(pt_caps) + 1);
+	attrs = kzalloc(size, GFP_KERNEL);
+	if (!attrs)
+		goto err_attrs;
+
+	size = sizeof(struct dev_ext_attribute) * (ARRAY_SIZE(pt_caps) + 1);
+	de_attrs = kzalloc(size, GFP_KERNEL);
+	if (!de_attrs)
+		goto err_de_attrs;
+
+	for (i = 0; i < ARRAY_SIZE(pt_caps); i++) {
+		de_attrs[i].attr.attr.name = pt_caps[i].name;
+
+		sysfs_attr_init(&de_attrs[i].attr.attr);
+		de_attrs[i].attr.attr.mode = S_IRUGO | S_IWUSR;
+		de_attrs[i].attr.show = pt_cap_show;
+		de_attrs[i].attr.store = pt_cap_store;
+		de_attrs[i].var = (void *)i;
+		attrs[i] = &de_attrs[i].attr.attr;
+	}
+
+	pt_cap_string();
+	pt_cap_group.attrs = attrs;
+	return;
+
+err_de_attrs:
+	kfree(de_attrs);
+err_attrs:
+	kfree(attrs);
+}
+
+#define PT_CONFIG_MASK (RTIT_CTL_TSC_EN | RTIT_CTL_DISRETC)
+
+static bool pt_event_valid(struct perf_event *event)
+{
+	u64 itrace_config = event->attr.itrace_config;
+
+	if ((itrace_config & PT_CONFIG_MASK) != itrace_config)
+		return false;
+
+	return true;
+}
+
+/*
+ * PT configuration helpers
+ * These all are cpu affine and operate on a local PT
+ */
+
+static int pt_config(struct perf_event *event)
+{
+	u64 reg;
+
+	reg = RTIT_CTL_TOPA | RTIT_CTL_BRANCH_EN;
+
+	if (!event->attr.exclude_kernel)
+		reg |= RTIT_CTL_OS;
+	if (!event->attr.exclude_user)
+		reg |= RTIT_CTL_USR;
+
+	reg |= (event->attr.itrace_config & PT_CONFIG_MASK);
+
+	if (wrmsr_safe(MSR_IA32_RTIT_CTL, reg, 0) < 0) {
+		pr_warn("Failed to enable PT on cpu %d\n", event->cpu);
+		return -EINVAL;
+	}
+
+	return 0;
+}
+
+static void pt_config_start(bool start)
+{
+	u64 ctl;
+
+	rdmsrl(MSR_IA32_RTIT_CTL, ctl);
+	if (start)
+		ctl |= RTIT_CTL_TRACEEN;
+	else
+		ctl &= ~RTIT_CTL_TRACEEN;
+	wrmsrl(MSR_IA32_RTIT_CTL, ctl);
+}
+
+static void pt_config_buffer(void *buf, unsigned int topa_idx,
+			     unsigned int output_off)
+{
+	u64 reg;
+
+	wrmsrl(MSR_IA32_RTIT_OUTPUT_BASE, virt_to_phys(buf));
+
+	reg = 0x7f | ((u64)topa_idx << 7) | ((u64)output_off << 32);
+
+	wrmsrl(MSR_IA32_RTIT_OUTPUT_MASK, reg);
+}
+
+#define TENTS_PER_PAGE (((PAGE_SIZE - 40) / sizeof(struct topa_entry)) - 1)
+
+struct topa {
+	struct topa_entry	table[TENTS_PER_PAGE];
+	struct list_head	list;
+	u64			phys;
+	u64			offset;
+	size_t			size;
+	int			last;
+};
+
+/* make negative table index stand for the last table entry */
+#define TOPA_ENTRY(t, i) ((i) == -1 ? &(t)->table[(t)->last] : &(t)->table[(i)])
+
+/*
+ * allocate page-sized ToPA table
+ */
+static struct topa *topa_alloc(int cpu, gfp_t gfp)
+{
+	int node = cpu_to_node(cpu);
+	struct topa *topa;
+	struct page *p;
+
+	p = alloc_pages_node(node, gfp | __GFP_ZERO, 0);
+	if (!p)
+		return NULL;
+
+	topa = page_address(p);
+	topa->last = 0;
+	topa->phys = page_to_phys(p);
+
+	/*
+	 * In case of singe-entry ToPA, always put the self-referencing END
+	 * link as the 2nd entry in the table
+	 */
+	if (!pt_cap_get(PT_CAP_topa_multiple_entries)) {
+		TOPA_ENTRY(topa, 1)->base = topa->phys >> TOPA_SHIFT;
+		TOPA_ENTRY(topa, 1)->end = 1;
+	}
+
+	return topa;
+}
+
+static void topa_free(struct topa *topa)
+{
+	free_page((unsigned long)topa);
+}
+
+static void topa_free_pages(struct pt_buffer *buf, struct topa *topa, int idx)
+{
+	size_t size = sizes(TOPA_ENTRY(topa, idx)->size);
+	void *base = phys_to_virt(TOPA_ENTRY(topa, idx)->base << TOPA_SHIFT);
+	unsigned long pn;
+
+	for (pn = 0; pn < size; pn += PAGE_SIZE) {
+		struct page *page = virt_to_page(base + pn);
+
+		page->mapping = NULL;
+		__free_page(page);
+	}
+}
+
+/**
+ * topa_insert_table - insert a ToPA table into a buffer
+ * @buf - pt buffer that's being extended
+ * @topa - new topa table to be inserted
+ *
+ * If it's the first table in this buffer, set up buffer's pointers
+ * accordingly; otherwise, add a END=1 link entry to @topa to the current
+ * "last" table and adjust the last table pointer to @topa.
+ */
+static void topa_insert_table(struct pt_buffer *buf, struct topa *topa)
+{
+	struct topa *last = buf->last;
+
+	list_add_tail(&topa->list, &buf->tables);
+
+	if (!buf->first) {
+		buf->first = buf->last = buf->cur = topa;
+		return;
+	}
+
+	topa->offset = last->offset + last->size;
+	buf->last = topa;
+
+	if (!pt_cap_get(PT_CAP_topa_multiple_entries))
+		return;
+
+	BUG_ON(last->last != TENTS_PER_PAGE - 1);
+
+	TOPA_ENTRY(last, -1)->base = topa->phys >> TOPA_SHIFT;
+	TOPA_ENTRY(last, -1)->end = 1;
+}
+
+static bool topa_table_full(struct topa *topa)
+{
+	/* single-entry ToPA is a special case */
+	if (!pt_cap_get(PT_CAP_topa_multiple_entries))
+		return !!topa->last;
+
+	return topa->last == TENTS_PER_PAGE - 1;
+}
+
+static bool pt_buffer_needs_watermark(struct pt_buffer *buf, unsigned long offset)
+{
+	if (buf->snapshot)
+		return false;
+
+	return !(offset % (buf->watermark << PAGE_SHIFT));
+}
+
+static int topa_insert_pages(struct pt_buffer *buf, gfp_t gfp,
+			     enum topa_sz sz)
+{
+	struct topa *topa = buf->last;
+	int node = cpu_to_node(buf->cpu);
+	int order = get_order(sizes(sz));
+	struct page *p;
+	unsigned long pn;
+
+	p = alloc_pages_node(node, gfp | GFP_USER | __GFP_ZERO | __GFP_NOWARN | __GFP_NORETRY, order);
+	if (!p)
+		return -ENOMEM;
+
+	split_page(p, order);
+
+	if (topa_table_full(topa)) {
+		topa = topa_alloc(buf->cpu, gfp);
+
+		if (!topa) {
+			free_pages((unsigned long)page_address(p), order);
+			return -ENOMEM;
+		}
+
+		topa_insert_table(buf, topa);
+	}
+
+	TOPA_ENTRY(topa, -1)->base = page_to_phys(p) >> TOPA_SHIFT;
+	TOPA_ENTRY(topa, -1)->size = sz;
+	if (!buf->snapshot && !pt_cap_get(PT_CAP_topa_multiple_entries)) {
+		TOPA_ENTRY(topa, -1)->intr = 1;
+		TOPA_ENTRY(topa, -1)->stop = 1;
+	}
+	if (pt_buffer_needs_watermark(buf, buf->size))
+		TOPA_ENTRY(topa, -1)->intr = 1;
+
+	topa->last++;
+	topa->size += sizes(sz);
+	for (pn = 0; pn < sizes(sz); pn += PAGE_SIZE, buf->size += PAGE_SIZE)
+		buf->data_pages[buf->size >> PAGE_SHIFT] = page_address(p) + pn;
+
+	return 0;
+}
+
+static void pt_topa_dump(struct pt_buffer *buf)
+{
+	struct topa *topa;
+
+	list_for_each_entry(topa, &buf->tables, list) {
+		int i;
+
+		pr_debug("# table @%p (%p), off %llx size %lx\n", topa->table,
+			 (void *)topa->phys, topa->offset, topa->size);
+		for (i = 0; i < TENTS_PER_PAGE; i++) {
+			pr_debug("# entry @%p (%lx sz %u %c%c%c) raw=%16llx\n",
+				 &topa->table[i],
+				 (unsigned long)topa->table[i].base << TOPA_SHIFT,
+				 sizes(topa->table[i].size),
+				 topa->table[i].end ?  'E' : ' ',
+				 topa->table[i].intr ? 'I' : ' ',
+				 topa->table[i].stop ? 'S' : ' ',
+				 *(u64 *)&topa->table[i]);
+			if ((pt_cap_get(PT_CAP_topa_multiple_entries) && topa->table[i].stop)
+			    || topa->table[i].end)
+				break;
+		}
+	}
+}
+
+/* advance to the next output region */
+static void pt_buffer_advance(struct pt_buffer *buf)
+{
+	buf->output_off = 0;
+	buf->cur_idx++;
+
+	if (buf->cur_idx == buf->cur->last) {
+		if (buf->cur == buf->last)
+			buf->cur = buf->first;
+		else
+			buf->cur = list_entry(buf->cur->list.next, struct topa, list);
+		buf->cur_idx = 0;
+	}
+}
+
+static void pt_update_head(struct pt_buffer *buf)
+{
+	u64 topa_idx, base;
+
+	/* offset of the first region in this table from the beginning of buf */
+	base = buf->cur->offset + buf->output_off;
+
+	/* offset of the current output region within this table */
+	for (topa_idx = 0; topa_idx < buf->cur_idx; topa_idx++)
+		base += sizes(buf->cur->table[topa_idx].size);
+
+	/* data_head always increases when buffer pointer wraps */
+	base += buf->size * buf->round;
+
+	local64_set(&buf->head, base);
+	if (!buf->user_page)
+		return;
+
+	buf->user_page->data_head = base;
+	smp_wmb();
+}
+
+static void *pt_buffer_region(struct pt_buffer *buf)
+{
+	return phys_to_virt(buf->cur->table[buf->cur_idx].base << TOPA_SHIFT);
+}
+
+static size_t pt_buffer_region_size(struct pt_buffer *buf)
+{
+	return sizes(buf->cur->table[buf->cur_idx].size);
+}
+
+/**
+ * pt_handle_status - take care of possible status conditions
+ * @event: currently active PT event
+ */
+static void pt_handle_status(struct perf_event *event)
+{
+	struct pt_buffer *buf = itrace_priv(event);
+	int advance = 0;
+	u64 status;
+
+	rdmsrl(MSR_IA32_RTIT_STATUS, status);
+
+	if (status & RTIT_STATUS_ERROR) {
+		pr_err("ToPA ERROR encountered, trying to recover\n");
+		pt_topa_dump(buf);
+		status &= ~RTIT_STATUS_ERROR;
+		wrmsrl(MSR_IA32_RTIT_STATUS, status);
+	}
+
+	if (status & RTIT_STATUS_STOPPED) {
+		status &= ~RTIT_STATUS_STOPPED;
+		wrmsrl(MSR_IA32_RTIT_STATUS, status);
+
+		/*
+		 * On systems that only do single-entry ToPA, hitting STOP
+		 * means we are already losing data; need to let the decoder
+		 * know.
+		 */
+		if (!pt_cap_get(PT_CAP_topa_multiple_entries) ||
+		    buf->output_off == sizes(TOPA_ENTRY(buf->cur, buf->cur_idx)->size)) {
+			pt_update_head(buf);
+			itrace_lost_data(event, local64_read(&buf->head));
+			advance++;
+		}
+	}
+
+	/*
+	 * Also on single-entry ToPA implementations, interrupt will come
+	 * before the output reaches its output region's boundary.
+	 */
+	if (!pt_cap_get(PT_CAP_topa_multiple_entries) && !buf->snapshot &&
+	    pt_buffer_region_size(buf) - buf->output_off <= TOPA_PMI_MARGIN) {
+		void *head = pt_buffer_region(buf);
+
+		/* everything within this margin needs to be zeroed out */
+		memset(head + buf->output_off, 0,
+		       pt_buffer_region_size(buf) -
+		       buf->output_off);
+		advance++;
+	}
+
+	if (advance) {
+		/* check if the pointer has wrapped */
+		if (!buf->snapshot &&
+		    buf->cur == buf->last &&
+		    buf->cur_idx == buf->cur->last - 1)
+			buf->round++;
+		pt_buffer_advance(buf);
+	}
+}
+
+static void pt_read_offset(struct pt_buffer *buf)
+{
+	u64 offset, base_topa;
+
+	rdmsrl(MSR_IA32_RTIT_OUTPUT_BASE, base_topa);
+	buf->cur = phys_to_virt(base_topa);
+
+	rdmsrl(MSR_IA32_RTIT_OUTPUT_MASK, offset);
+	/* offset within current output region */
+	buf->output_off = offset >> 32;
+	/* index of current output region within this table */
+	buf->cur_idx = (offset & 0xffffff80) >> 7;
+}
+
+/**
+ * pt_buffer_fini_topa() - deallocate ToPA structure of a buffer
+ * @buf: pt buffer
+ */
+static void pt_buffer_fini_topa(struct pt_buffer *buf)
+{
+	struct topa *topa, *iter;
+
+	list_for_each_entry_safe(topa, iter, &buf->tables, list) {
+		int i;
+
+		for (i = 0; i < topa->last; i++)
+			topa_free_pages(buf, topa, i);
+
+		list_del(&topa->list);
+		topa_free(topa);
+	}
+}
+
+/**
+ * pt_get_topa_region_size - calculate one output region's size
+ * @snapshot: if the counter is a snapshot counter
+ * @size: overall requested allocation size
+ * returns topa region size or error
+ */
+static int pt_get_topa_region_size(bool snapshot, size_t size)
+{
+	unsigned int factor = snapshot ? 1 : 2;
+
+	if (pt_cap_get(PT_CAP_topa_multiple_entries))
+		return TOPA_4K;
+
+	if (size < SZ_4K * factor)
+		return -EINVAL;
+
+	if (!is_power_of_2(size))
+		return -EINVAL;
+
+	if (size >= SZ_128M)
+		return TOPA_128MB;
+
+	return get_order(size / factor);
+}
+
+/**
+ * pt_buffer_init_topa() - initialize ToPA table for pt buffer
+ * @buf: pt buffer
+ * @size: total size of all regions within this ToPA
+ * @gfp: allocation flags
+ */
+static int pt_buffer_init_topa(struct pt_buffer *buf, size_t size, gfp_t gfp)
+{
+	struct topa *topa;
+	int err, region_size;
+
+	topa = topa_alloc(buf->cpu, gfp);
+	if (!topa)
+		return -ENOMEM;
+
+	topa_insert_table(buf, topa);
+
+	region_size = pt_get_topa_region_size(buf->snapshot, size);
+	if (region_size < 0) {
+		pt_buffer_fini_topa(buf);
+		return region_size;
+	}
+
+	while (region_size && get_order(sizes(region_size)) > MAX_ORDER)
+		region_size--;
+
+	/* fixup watermark in case of higher order allocations */
+	if (buf->watermark < (sizes(region_size) >> PAGE_SHIFT))
+		buf->watermark = sizes(region_size) >> PAGE_SHIFT;
+
+	while (buf->size < size) {
+		err = topa_insert_pages(buf, gfp, region_size);
+		if (err) {
+			if (region_size) {
+				region_size--;
+				continue;
+			}
+			pt_buffer_fini_topa(buf);
+			return -ENOMEM;
+		}
+	}
+
+	/* link last table to the first one, unless we're double buffering */
+	if (pt_cap_get(PT_CAP_topa_multiple_entries)) {
+		TOPA_ENTRY(buf->last, -1)->base = buf->first->phys >> TOPA_SHIFT;
+		TOPA_ENTRY(buf->last, -1)->end = 1;
+	}
+
+	pt_topa_dump(buf);
+	return 0;
+}
+
+/**
+ * pt_buffer_alloc() - make a buffer for pt data
+ * @cpu: cpu on which to allocate, -1 means current
+ * @size: desired buffer size, should be multiple of pages
+ * @watermark: place interrupt flags every @watermark pages, 0 == disable
+ * @snapshot: if this is a snapshot counter
+ * @gfp: allocation flags
+ */
+static struct pt_buffer *pt_buffer_alloc(int cpu, size_t size,
+					 unsigned long watermark,
+					 bool snapshot, gfp_t gfp,
+					 void **pages)
+{
+	struct pt_buffer *buf;
+	int node;
+	int ret;
+
+	if (!size || watermark << PAGE_SHIFT > size)
+		return NULL;
+
+	if (cpu == -1)
+		cpu = raw_smp_processor_id();
+	node = cpu_to_node(cpu);
+
+	buf = kzalloc(sizeof(struct pt_buffer), gfp);
+	if (!buf)
+		return NULL;
+
+	buf->cpu = cpu;
+	buf->data_pages = pages;
+	buf->snapshot = snapshot;
+	buf->watermark = watermark;
+	if (!buf->watermark)
+		buf->watermark = (size / 2) >> PAGE_SHIFT;
+
+	INIT_LIST_HEAD(&buf->tables);
+
+	ret = pt_buffer_init_topa(buf, size, gfp);
+	if (ret) {
+		kfree(buf);
+		return NULL;
+	}
+
+	return buf;
+}
+
+/**
+ * pt_buffer_free() - dispose of pt buffer
+ * @buf: pt buffer
+ */
+static void pt_buffer_itrace_free(void *data)
+{
+	struct pt_buffer *buf = data;
+
+	pt_buffer_fini_topa(buf);
+	if (buf->user_page) {
+		struct page *up = virt_to_page(buf->user_page);
+
+		up->mapping = NULL;
+		__free_page(up);
+	}
+
+	kfree(buf);
+}
+
+static void *
+pt_buffer_itrace_alloc(int cpu, int nr_pages, bool overwrite, void **pages,
+		       struct perf_event_mmap_page **user_page)
+{
+	struct pt_buffer *buf;
+	struct page *up = NULL;
+	int node;
+
+	if (user_page) {
+		*user_page = NULL;
+		node = (cpu == -1) ? cpu : cpu_to_node(cpu);
+		up = alloc_pages_node(node, GFP_KERNEL | __GFP_ZERO, 0);
+		if (!up)
+			return NULL;
+	}
+
+	buf = pt_buffer_alloc(cpu, nr_pages << PAGE_SHIFT, 0, overwrite,
+			      GFP_KERNEL, pages);
+	if (user_page && buf) {
+		buf->user_page = page_address(up);
+		*user_page = page_address(up);
+	} else if (up)
+		__free_page(up);
+
+	return buf;
+}
+
+/**
+ * pt_buffer_get_page() - find n'th page in pt buffer
+ * @buf: pt buffer
+ * @idx: page index in the buffer
+ */
+static void *pt_buffer_get_page(struct pt_buffer *buf, unsigned long idx)
+{
+	return buf->data_pages[idx];
+}
+
+typedef unsigned int (*pt_copyfn)(void *data, const void *src,
+				  unsigned int len);
+
+/**
+ * pt_buffer_output - copy part of pt buffer to perf stream
+ * @buf: buffer to copy from
+ * @from: initial offset
+ * @to: final offset
+ * @copyfn: function that copies data out (like perf_output_copy())
+ * @data: data to be passed on to the copy function (like perf_output_handle)
+ */
+static int pt_buffer_output(struct pt_buffer *buf, unsigned long from,
+			    unsigned long to, pt_copyfn copyfn, void *data)
+{
+	unsigned long tocopy;
+	unsigned int len = 0, remainder;
+	void *page;
+
+	do {
+		tocopy = PAGE_SIZE - offset_in_page(from);
+		if (to > from)
+			tocopy = min(tocopy, to - from);
+		if (!tocopy)
+			break;
+
+		page = pt_buffer_get_page(buf, from >> PAGE_SHIFT);
+		if (WARN_ONCE(!page, "no data page for %lx offset\n", from))
+			break;
+
+		page += offset_in_page(from);
+
+		remainder = copyfn(data, page, tocopy);
+		if (remainder)
+			return -EFAULT;
+
+		len += tocopy;
+		from += tocopy;
+		if (from == buf->size)
+			from = 0;
+	} while (to != from);
+	return len;
+}
+
+/**
+ * pt_buffer_is_full - check if the buffer is full
+ * @event: pt event
+ * If the user hasn't read data from the output region that data_head
+ * points to, the buffer is considered full: the user needs to read at
+ * least this region and update data_tail to point past it.
+ */
+static bool pt_buffer_is_full(struct pt_buffer *buf)
+{
+	void *tail, *head;
+	unsigned long tailoff, headoff = local64_read(&buf->head);
+
+	if (buf->snapshot)
+		return false;
+
+	tailoff = ACCESS_ONCE(buf->user_page->data_tail);
+	smp_mb();
+
+	if (headoff < tailoff || headoff - tailoff < buf->size / 2)
+		return false;
+
+	tailoff %= buf->size;
+	headoff %= buf->size;
+
+	if (headoff > tailoff)
+		return false;
+
+	/* check if head and tail are in the same output region */
+	tail = pt_buffer_get_page(buf, tailoff >> PAGE_SHIFT);
+	head = pt_buffer_region(buf);
+
+	if (tail >= head && tail < head + pt_buffer_region_size(buf))
+		return true;
+
+	return false;
+}
+
+static void pt_wake_up(struct perf_event *event)
+{
+	struct pt_buffer *buf = itrace_priv(event);
+
+	if (!buf || buf->snapshot)
+		return;
+	if (pt_buffer_is_full(buf)) {
+		event->pending_disable = 1;
+		event->pending_kill = POLL_IN;
+		event->pending_wakeup = 1;
+		event->hw.state = PERF_HES_STOPPED;
+	}
+
+	if (pt_buffer_needs_watermark(buf, local64_read(&buf->head))) {
+		event->pending_wakeup = 1;
+		event->pending_kill = POLL_IN;
+	}
+
+	if (event->pending_disable || event->pending_kill)
+		itrace_wake_up(event);
+}
+
+void intel_pt_interrupt(void)
+{
+	struct pt *pt = this_cpu_ptr(&pt_ctx);
+	struct perf_event *event = pt->event;
+	struct pt_buffer *buf;
+
+	pt_config_start(false);
+
+	if (!event)
+		return;
+
+	buf = itrace_event_get_priv(event);
+	if (!buf)
+		return;
+
+	pt_read_offset(buf);
+
+	pt_handle_status(event);
+
+	pt_update_head(buf);
+
+	pt_wake_up(event);
+
+	if (!event->hw.state) {
+		pt_config(event);
+		pt_config_buffer(buf->cur->table, buf->cur_idx,
+				 buf->output_off);
+		wrmsrl(MSR_IA32_RTIT_STATUS, 0);
+		pt_config_start(true);
+	}
+
+	itrace_event_put(event);
+}
+
+static void pt_event_start(struct perf_event *event, int flags)
+{
+	struct pt_buffer *buf = itrace_priv(event);
+
+	if (!buf || pt_buffer_is_full(buf) || pt_config(event)) {
+		event->hw.state = PERF_HES_STOPPED;
+		return;
+	}
+
+	event->hw.state = 0;
+
+	pt_config_buffer(buf->cur->table, buf->cur_idx,
+			 buf->output_off);
+	wrmsrl(MSR_IA32_RTIT_STATUS, 0);
+	pt_config_start(true);
+}
+
+static void pt_event_stop(struct perf_event *event, int flags)
+{
+	if (event->hw.state == PERF_HES_STOPPED)
+		return;
+
+	event->hw.state = PERF_HES_STOPPED;
+
+	pt_config_start(false);
+
+	if (flags & PERF_EF_UPDATE) {
+		struct pt_buffer *buf = itrace_priv(event);
+
+		if (WARN_ONCE(!buf, "no buffer\n"))
+			return;
+
+		pt_read_offset(buf);
+
+		pt_handle_status(event);
+
+		pt_update_head(buf);
+
+		pt_wake_up(event);
+	}
+}
+
+static void pt_event_del(struct perf_event *event, int flags)
+{
+	struct pt *pt = this_cpu_ptr(&pt_ctx);
+
+	pt_event_stop(event, PERF_EF_UPDATE);
+
+	raw_spin_lock(&pt->lock);
+	pt->event = NULL;
+	raw_spin_unlock(&pt->lock);
+
+	itrace_event_put(event);
+}
+
+static int pt_event_add(struct perf_event *event, int flags)
+{
+	struct pt_buffer *buf;
+	struct pt *pt = this_cpu_ptr(&pt_ctx);
+	struct hw_perf_event *hwc = &event->hw;
+	int ret = 0;
+
+	ret = pt_config(event);
+	if (ret)
+		return ret;
+
+	buf = itrace_event_get_priv(event);
+	if (!buf) {
+		hwc->state = PERF_HES_STOPPED;
+		return -EINVAL;
+	}
+
+	raw_spin_lock(&pt->lock);
+	if (pt->event) {
+		raw_spin_unlock(&pt->lock);
+		itrace_event_put(event);
+		ret = -EBUSY;
+		event->hw.state = PERF_HES_STOPPED;
+		goto out;
+	}
+
+	pt->event = event;
+	raw_spin_unlock(&pt->lock);
+
+	hwc->state = !(flags & PERF_EF_START);
+	if (!hwc->state) {
+		pt_event_start(event, 0);
+		if (hwc->state == PERF_HES_STOPPED) {
+			pt_event_del(event, 0);
+			pt_wake_up(event);
+			ret = -EBUSY;
+		}
+	}
+
+out:
+	return ret;
+}
+
+static void pt_event_read(struct perf_event *event)
+{
+}
+
+static int pt_event_init(struct perf_event *event)
+{
+	if (event->attr.type != pt_pmu.itrace.pmu.type)
+		return -ENOENT;
+
+	/* can't be both */
+	if (event->attr.sample_type & PERF_SAMPLE_ITRACE)
+		return -ENOENT;
+
+	if (!pt_event_valid(event))
+		return -EINVAL;
+
+	return 0;
+}
+
+static unsigned long pt_trace_sampler_trace(struct perf_event *event,
+					    struct perf_sample_data *data)
+{
+	struct pt_buffer *buf;
+
+	pt_event_stop(event, 0);
+
+	buf = itrace_event_get_priv(event);
+	if (!buf) {
+		data->trace.size = 0;
+		goto out;
+	}
+
+	pt_read_offset(buf);
+	pt_update_head(buf);
+
+	data->trace.to = local64_read(&buf->head);
+
+	if (data->trace.to < event->attr.itrace_sample_size)
+		data->trace.from = buf->size + data->trace.to -
+			event->attr.itrace_sample_size;
+	else
+		data->trace.from = data->trace.to -
+			event->attr.itrace_sample_size;
+	data->trace.size = ALIGN(event->attr.itrace_sample_size, sizeof(u64));
+
+	itrace_event_put(event);
+
+out:
+	if (!data->trace.size)
+		pt_event_start(event, 0);
+
+	return data->trace.size;
+}
+
+static void pt_trace_sampler_output(struct perf_event *event,
+				    struct perf_output_handle *handle,
+				    struct perf_sample_data *data)
+{
+	unsigned long padding;
+	struct pt_buffer *buf;
+	int ret;
+
+	buf = itrace_event_get_priv(event);
+	if (!buf)
+		return;
+
+	ret = pt_buffer_output(buf, data->trace.from, data->trace.to,
+			       (pt_copyfn)perf_output_copy, handle);
+	itrace_event_put(event);
+	if (ret < 0) {
+		pr_warn("%s: failed to copy trace data\n", __func__);
+		goto out;
+	}
+
+	padding = data->trace.size - ret;
+	if (padding) {
+		u64 u = 0;
+
+		perf_output_copy(handle, &u, padding);
+	}
+
+out:
+	pt_event_start(event, 0);
+}
+
+static size_t pt_trace_core_size(struct perf_event *event)
+{
+	return pt_pmu.caplen;
+}
+
+static unsigned int pt_core_copy(void *data, const void *src,
+				 unsigned int len)
+{
+	struct coredump_params *cprm = data;
+
+	if (dump_emit(cprm, src, len))
+		return 0;
+
+	return len;
+}
+
+static void pt_trace_core_output(struct coredump_params *cprm,
+				 struct perf_event *event,
+				 unsigned long len)
+{
+	struct pt_buffer *buf;
+	u64 from, to;
+	int ret;
+
+	buf = itrace_priv(event);
+
+	if (!dump_emit(cprm, pt_pmu.capstr, pt_pmu.caplen))
+		return;
+
+	to = local64_read(&buf->head);
+	if (to < len)
+		from = buf->size + to - len;
+	else
+		from = to - len;
+
+	ret = pt_buffer_output(buf, from, to, pt_core_copy, cprm);
+	if (ret < 0)
+		pr_warn("%s: failed to copy trace data\n", __func__);
+}
+
+static __init int pt_init(void)
+{
+	int ret, cpu;
+
+	BUILD_BUG_ON(sizeof(struct topa) > PAGE_SIZE);
+	get_online_cpus();
+	for_each_possible_cpu(cpu) {
+		raw_spin_lock_init(&per_cpu(pt_ctx, cpu).lock);
+	}
+	put_online_cpus();
+
+	pt_pmu_hw_init();
+	pt_pmu.itrace.pmu.attr_groups	= pt_attr_groups;
+	pt_pmu.itrace.pmu.task_ctx_nr	= perf_hw_context;
+	pt_pmu.itrace.pmu.event_init	= pt_event_init;
+	pt_pmu.itrace.pmu.add		= pt_event_add;
+	pt_pmu.itrace.pmu.del		= pt_event_del;
+	pt_pmu.itrace.pmu.start		= pt_event_start;
+	pt_pmu.itrace.pmu.stop		= pt_event_stop;
+	pt_pmu.itrace.pmu.read		= pt_event_read;
+	pt_pmu.itrace.alloc_buffer	= pt_buffer_itrace_alloc;
+	pt_pmu.itrace.free_buffer	= pt_buffer_itrace_free;
+	pt_pmu.itrace.sample_trace	= pt_trace_sampler_trace;
+	pt_pmu.itrace.sample_output	= pt_trace_sampler_output;
+	pt_pmu.itrace.core_size		= pt_trace_core_size;
+	pt_pmu.itrace.core_output	= pt_trace_core_output;
+	pt_pmu.itrace.name		= "intel_pt";
+	ret = itrace_pmu_register(&pt_pmu.itrace);
+
+	return ret;
+}
+
+module_init(pt_init);
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 06/71] perf: Allow set-output for task contexts of different types
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (4 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 05/71] x86: perf: Intel PT PMU driver Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit Alexander Shishkin
                   ` (66 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter, Alexander Shishkin

From: Adrian Hunter <adrian.hunter@intel.com>

Set-output must be limited to events that cannot be active on different
cpus at the same time.  Thus either the event cpu must be the same, or
the event task must be the same.  Current logic does not check the task
directly but checks whether the perf_event_context is the same, however
there are separate contexts for hardware and software events so in that
case the perf_event_context is different even though the task is the same.
This patch changes the logic to check the task directly.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 kernel/events/core.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index ca8a130..93d712d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6982,7 +6982,8 @@ perf_event_set_output(struct perf_event *event, struct perf_event *output_event)
 	/*
 	 * If its not a per-cpu rb, it must be the same task.
 	 */
-	if (output_event->cpu == -1 && output_event->ctx != event->ctx)
+	if (output_event->cpu == -1 &&
+	    output_event->ctx->task != event->ctx->task)
 		goto out;
 	/*
 	 * XXX^2: that's all bollocks
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (5 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 06/71] perf: Allow set-output for task contexts of different types Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 19:26   ` David Ahern
  2013-12-16  3:16   ` David Ahern
  2013-12-11 12:36 ` [PATCH v0 08/71] perf tools: Let a user specify a PMU event without any config terms Alexander Shishkin
                   ` (65 subsequent siblings)
  72 siblings, 2 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add a flag to 'struct dso' to record if the dso is 64-bit
or not.  Update the flag when reading the ELF.

This is needed for instruction decoding.  For
example, x86 instruction decoding depends on
whether or not the 64-bit instruction set is
used.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/dso.c            |  1 +
 tools/perf/util/dso.h            |  1 +
 tools/perf/util/symbol-elf.c     |  3 +++
 tools/perf/util/symbol-minimal.c | 23 +++++++++++++++++++++++
 tools/perf/util/symbol.c         |  1 +
 tools/perf/util/symbol.h         |  1 +
 6 files changed, 30 insertions(+)

diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
index a0c7c59..80817ec 100644
--- a/tools/perf/util/dso.c
+++ b/tools/perf/util/dso.c
@@ -446,6 +446,7 @@ struct dso *dso__new(const char *name)
 		dso->cache = RB_ROOT;
 		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
 		dso->data_type   = DSO_BINARY_TYPE__NOT_FOUND;
+		dso->is_64_bit = (sizeof(void *) == 8);
 		dso->loaded = 0;
 		dso->rel = 0;
 		dso->sorted_by_name = 0;
diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
index 384f2d9..62680e1 100644
--- a/tools/perf/util/dso.h
+++ b/tools/perf/util/dso.h
@@ -91,6 +91,7 @@ struct dso {
 	u8		 annotate_warned:1;
 	u8		 sname_alloc:1;
 	u8		 lname_alloc:1;
+	u8		 is_64_bit:1;
 	u8		 sorted_by_name;
 	u8		 loaded;
 	u8		 rel;
diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
index eed0b96..a0fc81b 100644
--- a/tools/perf/util/symbol-elf.c
+++ b/tools/perf/util/symbol-elf.c
@@ -595,6 +595,8 @@ int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
 			goto out_elf_end;
 	}
 
+	ss->is_64_bit = (gelf_getclass(elf) == ELFCLASS64);
+
 	ss->symtab = elf_section_by_name(elf, &ehdr, &ss->symshdr, ".symtab",
 			NULL);
 	if (ss->symshdr.sh_type != SHT_SYMTAB)
@@ -694,6 +696,7 @@ int dso__load_sym(struct dso *dso, struct map *map,
 	bool remap_kernel = false, adjust_kernel_syms = false;
 
 	dso->symtab_type = syms_ss->type;
+	dso->is_64_bit = syms_ss->is_64_bit;
 	dso->rel = syms_ss->ehdr.e_type == ET_REL;
 
 	/*
diff --git a/tools/perf/util/symbol-minimal.c b/tools/perf/util/symbol-minimal.c
index ac7070a..b9d1119 100644
--- a/tools/perf/util/symbol-minimal.c
+++ b/tools/perf/util/symbol-minimal.c
@@ -1,3 +1,4 @@
+#include "util.h"
 #include "symbol.h"
 
 #include <stdio.h>
@@ -287,6 +288,23 @@ int dso__synthesize_plt_symbols(struct dso *dso __maybe_unused,
 	return 0;
 }
 
+static int fd__is_64_bit(int fd)
+{
+	u8 e_ident[EI_NIDENT];
+
+	if (lseek(fd, 0, SEEK_SET))
+		return -1;
+
+	if (readn(fd, e_ident, sizeof(e_ident)) != sizeof(e_ident))
+		return -1;
+
+	if (memcmp(e_ident, ELFMAG, SELFMAG) ||
+	    e_ident[EI_VERSION] != EV_CURRENT)
+		return -1;
+
+	return e_ident[EI_CLASS] == ELFCLASS64;
+}
+
 int dso__load_sym(struct dso *dso, struct map *map __maybe_unused,
 		  struct symsrc *ss,
 		  struct symsrc *runtime_ss __maybe_unused,
@@ -294,6 +312,11 @@ int dso__load_sym(struct dso *dso, struct map *map __maybe_unused,
 		  int kmodule __maybe_unused)
 {
 	unsigned char *build_id[BUILD_ID_SIZE];
+	int ret;
+
+	ret = fd__is_64_bit(ss->fd);
+	if (ret >= 0)
+		dso->is_64_bit = ret;
 
 	if (filename__read_build_id(ss->name, build_id, BUILD_ID_SIZE) > 0) {
 		dso__set_build_id(dso, build_id);
diff --git a/tools/perf/util/symbol.c b/tools/perf/util/symbol.c
index de87dba..511de06 100644
--- a/tools/perf/util/symbol.c
+++ b/tools/perf/util/symbol.c
@@ -1103,6 +1103,7 @@ static int dso__load_kcore(struct dso *dso, struct map *map,
 			      &is_64_bit);
 	if (err)
 		goto out_err;
+	dso->is_64_bit = is_64_bit;
 
 	if (list_empty(&md.maps)) {
 		err = -EINVAL;
diff --git a/tools/perf/util/symbol.h b/tools/perf/util/symbol.h
index f1031a1..7f23a05 100644
--- a/tools/perf/util/symbol.h
+++ b/tools/perf/util/symbol.h
@@ -195,6 +195,7 @@ struct symsrc {
 	GElf_Shdr dynshdr;
 
 	bool adjust_symbols;
+	bool is_64_bit;
 #endif
 };
 
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 08/71] perf tools: Let a user specify a PMU event without any config terms
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (6 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 09/71] perf tools: Let default config be defined for a PMU Alexander Shishkin
                   ` (64 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

This enables a PMU event to be specified in the form:

	pmu//

which is effectively the same as:

	pmu/config=0/

This patch is a precursor to defining
default config for a PMU.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/parse-events.c |  6 ++++++
 tools/perf/util/parse-events.y | 10 ++++++++++
 2 files changed, 16 insertions(+)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 969cb8f..98547e0 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -644,6 +644,12 @@ int parse_events_add_pmu(struct list_head *list, int *idx,
 
 	memset(&attr, 0, sizeof(attr));
 
+	if (!head_config) {
+		attr.type = pmu->type;
+		evsel = __add_event(list, idx, &attr, NULL, pmu->cpus);
+		return evsel ? 0 : -ENOMEM;
+	}
+
 	if (perf_pmu__check_alias(pmu, head_config, &unit, &scale))
 		return -EINVAL;
 
diff --git a/tools/perf/util/parse-events.y b/tools/perf/util/parse-events.y
index 4eb67ec..8fad267 100644
--- a/tools/perf/util/parse-events.y
+++ b/tools/perf/util/parse-events.y
@@ -210,6 +210,16 @@ PE_NAME '/' event_config '/'
 	parse_events__free_terms($3);
 	$$ = list;
 }
+|
+PE_NAME '/' '/'
+{
+	struct parse_events_evlist *data = _data;
+	struct list_head *list;
+
+	ALLOC_LIST(list);
+	ABORT_ON(parse_events_add_pmu(list, &data->idx, $1, NULL));
+	$$ = list;
+}
 
 value_sym:
 PE_VALUE_SYM_HW
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 09/71] perf tools: Let default config be defined for a PMU
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (7 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 08/71] perf tools: Let a user specify a PMU event without any config terms Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 10/71] perf tools: Add perf_pmu__scan_file() Alexander Shishkin
                   ` (63 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

This allows default config terms to be provided
for a PMU. So, for example, when the Intel PT
PMU is added, it will be possible to specify:

	intel_pt//

which will be the same as:

	intel_pt/tsc=1,noretcomp=1/

meaning that the trace should contain
TSC timestamps but not perform 'return
compression'.

An important consideration of this
patch is that it must be possible
to overwrite the default values.
That has meant changing the logic
so that a zero value can replace
a non-zero value.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/tests/pmu.c         |  2 +-
 tools/perf/util/parse-events.c |  7 ++++++-
 tools/perf/util/pmu.c          | 42 ++++++++++++++++++++++++++----------------
 tools/perf/util/pmu.h          |  9 ++++++++-
 4 files changed, 41 insertions(+), 19 deletions(-)

diff --git a/tools/perf/tests/pmu.c b/tools/perf/tests/pmu.c
index 12b322f..eeb68bb1 100644
--- a/tools/perf/tests/pmu.c
+++ b/tools/perf/tests/pmu.c
@@ -152,7 +152,7 @@ int test__pmu(void)
 		if (ret)
 			break;
 
-		ret = perf_pmu__config_terms(&formats, &attr, terms);
+		ret = perf_pmu__config_terms(&formats, &attr, terms, false);
 		if (ret)
 			break;
 
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 98547e0..464dafd 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -642,7 +642,12 @@ int parse_events_add_pmu(struct list_head *list, int *idx,
 	if (!pmu)
 		return -EINVAL;
 
-	memset(&attr, 0, sizeof(attr));
+	if (pmu->default_config) {
+		memcpy(&attr, pmu->default_config,
+		       sizeof(struct perf_event_attr));
+	} else {
+		memset(&attr, 0, sizeof(attr));
+	}
 
 	if (!head_config) {
 		attr.type = pmu->type;
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 56fc10a..a53b8ac 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -2,6 +2,7 @@
 #include <sys/types.h>
 #include <unistd.h>
 #include <stdio.h>
+#include <stdbool.h>
 #include <dirent.h>
 #include "fs.h"
 #include <locale.h>
@@ -387,6 +388,12 @@ static struct cpu_map *pmu_cpumask(const char *name)
 	return cpus;
 }
 
+struct perf_event_attr *__attribute__((weak))
+perf_pmu__get_default_config(struct perf_pmu *pmu __maybe_unused)
+{
+	return NULL;
+}
+
 static struct perf_pmu *pmu_lookup(const char *name)
 {
 	struct perf_pmu *pmu;
@@ -421,6 +428,9 @@ static struct perf_pmu *pmu_lookup(const char *name)
 	pmu->name = strdup(name);
 	pmu->type = type;
 	list_add_tail(&pmu->list, &pmus);
+
+	pmu->default_config = perf_pmu__get_default_config(pmu);
+
 	return pmu;
 }
 
@@ -479,28 +489,24 @@ pmu_find_format(struct list_head *formats, char *name)
 }
 
 /*
- * Returns value based on the format definition (format parameter)
+ * Sets value based on the format definition (format parameter)
  * and unformated value (value parameter).
- *
- * TODO maybe optimize a little ;)
  */
-static __u64 pmu_format_value(unsigned long *format, __u64 value)
+static void pmu_format_value(unsigned long *format, __u64 value, __u64 *v,
+			     bool zero)
 {
 	unsigned long fbit, vbit;
-	__u64 v = 0;
 
 	for (fbit = 0, vbit = 0; fbit < PERF_PMU_FORMAT_BITS; fbit++) {
 
 		if (!test_bit(fbit, format))
 			continue;
 
-		if (!(value & (1llu << vbit++)))
-			continue;
-
-		v |= (1llu << fbit);
+		if (value & (1llu << vbit++))
+			*v |= (1llu << fbit);
+		else if (zero)
+			*v &= ~(1llu << fbit);
 	}
-
-	return v;
 }
 
 /*
@@ -509,7 +515,8 @@ static __u64 pmu_format_value(unsigned long *format, __u64 value)
  */
 static int pmu_config_term(struct list_head *formats,
 			   struct perf_event_attr *attr,
-			   struct parse_events_term *term)
+			   struct parse_events_term *term,
+			   bool zero)
 {
 	struct perf_pmu_format *format;
 	__u64 *vp;
@@ -548,18 +555,19 @@ static int pmu_config_term(struct list_head *formats,
 	 * non-hardcoded terms, here's the place to translate
 	 * them into value.
 	 */
-	*vp |= pmu_format_value(format->bits, term->val.num);
+	pmu_format_value(format->bits, term->val.num, vp, zero);
 	return 0;
 }
 
 int perf_pmu__config_terms(struct list_head *formats,
 			   struct perf_event_attr *attr,
-			   struct list_head *head_terms)
+			   struct list_head *head_terms,
+			   bool zero)
 {
 	struct parse_events_term *term;
 
 	list_for_each_entry(term, head_terms, list)
-		if (pmu_config_term(formats, attr, term))
+		if (pmu_config_term(formats, attr, term, zero))
 			return -EINVAL;
 
 	return 0;
@@ -573,8 +581,10 @@ int perf_pmu__config_terms(struct list_head *formats,
 int perf_pmu__config(struct perf_pmu *pmu, struct perf_event_attr *attr,
 		     struct list_head *head_terms)
 {
+	bool zero = !!pmu->default_config;
+
 	attr->type = pmu->type;
-	return perf_pmu__config_terms(&pmu->format, attr, head_terms);
+	return perf_pmu__config_terms(&pmu->format, attr, head_terms, zero);
 }
 
 static struct perf_pmu_alias *pmu_find_alias(struct perf_pmu *pmu,
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 9183380..df762fd 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -13,9 +13,12 @@ enum {
 
 #define PERF_PMU_FORMAT_BITS 64
 
+struct perf_event_attr;
+
 struct perf_pmu {
 	char *name;
 	__u32 type;
+	struct perf_event_attr *default_config;
 	struct cpu_map *cpus;
 	struct list_head format;
 	struct list_head aliases;
@@ -27,7 +30,8 @@ int perf_pmu__config(struct perf_pmu *pmu, struct perf_event_attr *attr,
 		     struct list_head *head_terms);
 int perf_pmu__config_terms(struct list_head *formats,
 			   struct perf_event_attr *attr,
-			   struct list_head *head_terms);
+			   struct list_head *head_terms,
+			   bool zero);
 int perf_pmu__check_alias(struct perf_pmu *pmu, struct list_head *head_terms,
 			  char **unit, double *scale);
 struct list_head *perf_pmu__alias(struct perf_pmu *pmu,
@@ -46,4 +50,7 @@ void print_pmu_events(const char *event_glob, bool name_only);
 bool pmu_have_event(const char *pname, const char *name);
 
 int perf_pmu__test(void);
+
+struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu);
+
 #endif /* __PMU_H */
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 10/71] perf tools: Add perf_pmu__scan_file()
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (8 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 09/71] perf tools: Let default config be defined for a PMU Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 11/71] perf tools: Add perf_event_paranoid() Alexander Shishkin
                   ` (62 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add a function to scan a sysfs file within the pmu device
directory.

This will be used to read capability values from the PMU
'caps' subdirectory.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/pmu.c | 37 +++++++++++++++++++++++++++++++++++++
 tools/perf/util/pmu.h |  3 +++
 2 files changed, 40 insertions(+)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index a53b8ac..a742eeb 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -3,6 +3,7 @@
 #include <unistd.h>
 #include <stdio.h>
 #include <stdbool.h>
+#include <stdarg.h>
 #include <dirent.h>
 #include "fs.h"
 #include <locale.h>
@@ -788,3 +789,39 @@ bool pmu_have_event(const char *pname, const char *name)
 	}
 	return false;
 }
+
+static FILE *perf_pmu__open_file(struct perf_pmu *pmu, const char *name)
+{
+	struct stat st;
+	char path[PATH_MAX];
+	const char *sysfs;
+
+	sysfs = sysfs__mountpoint();
+	if (!sysfs)
+		return NULL;
+
+	snprintf(path, PATH_MAX,
+		 "%s" EVENT_SOURCE_DEVICE_PATH "%s/%s", sysfs, pmu->name, name);
+
+	if (stat(path, &st) < 0)
+		return NULL;
+
+	return fopen(path, "r");
+}
+
+int perf_pmu__scan_file(struct perf_pmu *pmu, const char *name, const char *fmt,
+			...)
+{
+	va_list args;
+	FILE *file;
+	int ret = EOF;
+
+	va_start(args, fmt);
+	file = perf_pmu__open_file(pmu, name);
+	if (file) {
+		ret = vfscanf(file, fmt, args);
+		fclose(file);
+	}
+	va_end(args);
+	return ret;
+}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index df762fd..437fdb2 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -49,6 +49,9 @@ struct perf_pmu *perf_pmu__scan(struct perf_pmu *pmu);
 void print_pmu_events(const char *event_glob, bool name_only);
 bool pmu_have_event(const char *pname, const char *name);
 
+int perf_pmu__scan_file(struct perf_pmu *pmu, const char *name,
+		const char *fmt, ...) __attribute__((format(scanf, 3, 4)));
+
 int perf_pmu__test(void);
 
 struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu);
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 11/71] perf tools: Add perf_event_paranoid()
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (9 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 10/71] perf tools: Add perf_pmu__scan_file() Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-16 15:26   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2013-12-11 12:36 ` [PATCH v0 12/71] perf tools: Add dsos__hit_all() Alexander Shishkin
                   ` (61 subsequent siblings)
  72 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add a function to return the value of
/proc/sys/kernel/perf_event_paranoid.

This will be used to determine default
values for mmap size because perf is not
subject to mmap limits when
perf_event_paranoid is less than zero.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c |  3 +--
 tools/perf/util/util.c   | 19 +++++++++++++++++++
 tools/perf/util/util.h   |  3 +++
 3 files changed, 23 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 7bb6ee1..50fadde 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1189,8 +1189,7 @@ int perf_evlist__strerror_open(struct perf_evlist *evlist __maybe_unused,
 				    "Error:\t%s.\n"
 				    "Hint:\tCheck /proc/sys/kernel/perf_event_paranoid setting.", emsg);
 
-		if (filename__read_int("/proc/sys/kernel/perf_event_paranoid", &value))
-			break;
+		value = perf_event_paranoid();
 
 		printed += scnprintf(buf + printed, size - printed, "\nHint:\t");
 
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index bae8756..3aed4af 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -1,5 +1,6 @@
 #include "../perf.h"
 #include "util.h"
+#include "fs.h"
 #include <sys/mman.h>
 #ifdef HAVE_BACKTRACE_SUPPORT
 #include <execinfo.h>
@@ -8,6 +9,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <errno.h>
+#include <limits.h>
 #include <linux/kernel.h>
 
 /*
@@ -482,3 +484,20 @@ int filename__read_str(const char *filename, char **buf, size_t *sizep)
 	close(fd);
 	return err;
 }
+
+int perf_event_paranoid(void)
+{
+	char path[PATH_MAX];
+	const char *procfs = procfs__mountpoint();
+	int value;
+
+	if (!procfs)
+		return INT_MAX;
+
+	scnprintf(path, PATH_MAX, "%s/sys/kernel/perf_event_paranoid", procfs);
+
+	if (filename__read_int(path, &value))
+		return INT_MAX;
+
+	return value;
+}
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index adb39f2..4b6b260 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -309,4 +309,7 @@ void free_srcline(char *srcline);
 
 int filename__read_int(const char *filename, int *value);
 int filename__read_str(const char *filename, char **buf, size_t *sizep);
+
+int perf_event_paranoid(void);
+
 #endif /* GIT_COMPAT_UTIL_H */
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 12/71] perf tools: Add dsos__hit_all()
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (10 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 11/71] perf tools: Add perf_event_paranoid() Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 13/71] perf tools: Add machine__get_thread_pid() Alexander Shishkin
                   ` (60 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add ability to mark all dsos as hit.

This is needed in the case of Instruction
Tracing.  It takes so long to decode an
Instruction Trace that it is not worth
doing just to determine which dsos are
hit.  A later patch takes this into use.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/header.c | 41 +++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/header.h |  2 ++
 2 files changed, 43 insertions(+)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 125cdc9..49c4896 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -200,6 +200,47 @@ static int write_buildid(char *name, size_t name_len, u8 *build_id,
 	return write_padded(fd, name, name_len + 1, len);
 }
 
+static int __dsos__hit_all(struct list_head *head)
+{
+	struct dso *pos;
+
+	list_for_each_entry(pos, head, node)
+		pos->hit = true;
+
+	return 0;
+}
+
+static int machine__hit_all_dsos(struct machine *machine)
+{
+	int err;
+
+	err = __dsos__hit_all(&machine->kernel_dsos);
+	if (err)
+		return err;
+
+	return __dsos__hit_all(&machine->user_dsos);
+}
+
+int dsos__hit_all(struct perf_session *session)
+{
+	struct rb_node *nd;
+	int err;
+
+	err = machine__hit_all_dsos(&session->machines.host);
+	if (err)
+		return err;
+
+	for (nd = rb_first(&session->machines.guests); nd; nd = rb_next(nd)) {
+		struct machine *pos = rb_entry(nd, struct machine, rb_node);
+
+		err = machine__hit_all_dsos(pos);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
 static int __dsos__write_buildid_table(struct list_head *head,
 				       struct machine *machine,
 				       pid_t pid, u16 misc, int fd)
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index 307c9ae..e8c45fa 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -151,6 +151,8 @@ int perf_event__process_build_id(struct perf_tool *tool,
 				 struct perf_session *session);
 bool is_perf_magic(u64 magic);
 
+int dsos__hit_all(struct perf_session *session);
+
 /*
  * arch specific callback
  */
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 13/71] perf tools: Add machine__get_thread_pid()
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (11 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 12/71] perf tools: Add dsos__hit_all() Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 19:28   ` David Ahern
  2013-12-11 12:36 ` [PATCH v0 14/71] perf tools: Add cpu to struct thread Alexander Shishkin
                   ` (59 subsequent siblings)
  72 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add a function to get the pid from the tid.

This is needed when using the sched_switch
tracepoint to follow object code execution.
sched_switch identifies the thread but, to
find the process mmaps, we need the process
pid.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/machine.c | 10 ++++++++++
 tools/perf/util/machine.h |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index bac817a..55f3608 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1402,3 +1402,13 @@ int __machine__synthesize_threads(struct machine *machine, struct perf_tool *too
 	/* command specified */
 	return 0;
 }
+
+pid_t machine__get_thread_pid(struct machine *machine, pid_t tid)
+{
+	struct thread *thread = machine__find_thread(machine, tid);
+
+	if (!thread)
+		return -1;
+
+	return thread->pid_;
+}
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 4771330..b800a5a 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -190,4 +190,6 @@ int machine__synthesize_threads(struct machine *machine, struct target *target,
 					     perf_event__process, data_mmap);
 }
 
+pid_t machine__get_thread_pid(struct machine *machine, pid_t tid);
+
 #endif /* __PERF_MACHINE_H */
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 14/71] perf tools: Add cpu to struct thread
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (12 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 13/71] perf tools: Add machine__get_thread_pid() Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 14:19   ` Arnaldo Carvalho de Melo
  2013-12-11 19:30   ` David Ahern
  2013-12-11 12:36 ` [PATCH v0 15/71] perf tools: Add ability to record the current tid for each cpu Alexander Shishkin
                   ` (58 subsequent siblings)
  72 siblings, 2 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Tools may wish to track on which cpu a thread
is running.  Add 'cpu' to struct thread for
that purpose.  Also add machine functions to
get / set the cpu for a tid.

This will be used to determine the cpu when
decoding a per-thread Instruction Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/machine.c | 26 ++++++++++++++++++++++++++
 tools/perf/util/machine.h |  3 +++
 tools/perf/util/thread.c  |  1 +
 tools/perf/util/thread.h  |  1 +
 4 files changed, 31 insertions(+)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 55f3608..52fbfb6 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -1412,3 +1412,29 @@ pid_t machine__get_thread_pid(struct machine *machine, pid_t tid)
 
 	return thread->pid_;
 }
+
+int machine__get_thread_cpu(struct machine *machine, pid_t tid, pid_t *pid)
+{
+	struct thread *thread = machine__find_thread(machine, tid);
+
+	if (!thread)
+		return -1;
+
+	if (pid)
+		*pid = thread->pid_;
+
+	return thread->cpu;
+}
+
+int machine__set_thread_cpu(struct machine *machine, pid_t pid, pid_t tid,
+			    int cpu)
+{
+	struct thread *thread = machine__findnew_thread(machine, pid, tid);
+
+	if (!thread)
+		return -ENOMEM;
+
+	thread->cpu = cpu;
+
+	return 0;
+}
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index b800a5a..27486af 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -191,5 +191,8 @@ int machine__synthesize_threads(struct machine *machine, struct target *target,
 }
 
 pid_t machine__get_thread_pid(struct machine *machine, pid_t tid);
+int machine__get_thread_cpu(struct machine *machine, pid_t tid, pid_t *pid);
+int machine__set_thread_cpu(struct machine *machine, pid_t pid, pid_t tid,
+			    int cpu);
 
 #endif /* __PERF_MACHINE_H */
diff --git a/tools/perf/util/thread.c b/tools/perf/util/thread.c
index 49eaf1d..a120af3 100644
--- a/tools/perf/util/thread.c
+++ b/tools/perf/util/thread.c
@@ -19,6 +19,7 @@ struct thread *thread__new(pid_t pid, pid_t tid)
 		thread->pid_ = pid;
 		thread->tid = tid;
 		thread->ppid = -1;
+		thread->cpu = -1;
 		INIT_LIST_HEAD(&thread->comm_list);
 
 		comm_str = malloc(32);
diff --git a/tools/perf/util/thread.h b/tools/perf/util/thread.h
index 5b856bf..7914050 100644
--- a/tools/perf/util/thread.h
+++ b/tools/perf/util/thread.h
@@ -17,6 +17,7 @@ struct thread {
 	pid_t			pid_; /* Not all tools update this */
 	pid_t			tid;
 	pid_t			ppid;
+	int			cpu;
 	char			shortname[3];
 	bool			comm_set;
 	bool			dead; /* if set thread has exited */
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 15/71] perf tools: Add ability to record the current tid for each cpu
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (13 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 14/71] perf tools: Add cpu to struct thread Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 16/71] perf tools: Allow header->data_offset to be predetermined Alexander Shishkin
                   ` (57 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add an array to struct machine to store
the current tid running on each cpu.
Add machine functions to get / set
the tid for a cpu.

This will be used to determine the tid
when decoding a per-cpu Instruction Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/machine.c | 39 +++++++++++++++++++++++++++++++++++++++
 tools/perf/util/machine.h |  4 ++++
 2 files changed, 43 insertions(+)

diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index 52fbfb6..a04210d 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -43,6 +43,8 @@ int machine__init(struct machine *machine, const char *root_dir, pid_t pid)
 		thread__set_comm(thread, comm, 0);
 	}
 
+	machine->current_tid = NULL;
+
 	return 0;
 }
 
@@ -103,6 +105,8 @@ void machine__exit(struct machine *machine)
 	dsos__delete(&machine->kernel_dsos);
 	free(machine->root_dir);
 	machine->root_dir = NULL;
+	free(machine->current_tid);
+	machine->current_tid = NULL;
 }
 
 void machine__delete(struct machine *machine)
@@ -1438,3 +1442,38 @@ int machine__set_thread_cpu(struct machine *machine, pid_t pid, pid_t tid,
 
 	return 0;
 }
+
+pid_t machine__get_current_tid(struct machine *machine, int cpu)
+{
+	if (cpu < 0 || cpu >= MAX_NR_CPUS || !machine->current_tid)
+		return -1;
+
+	return machine->current_tid[cpu];
+}
+
+int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
+			     pid_t tid)
+{
+	if (cpu < 0)
+		return -EINVAL;
+
+	if (!machine->current_tid) {
+		int i;
+
+		machine->current_tid = calloc(MAX_NR_CPUS, sizeof(pid_t));
+		if (!machine->current_tid)
+			return -ENOMEM;
+		for (i = 0; i < MAX_NR_CPUS; i++)
+			machine->current_tid[i] = -1;
+	}
+
+	if (cpu >= MAX_NR_CPUS) {
+		pr_err("Requested CPU %d too large. ", cpu);
+		pr_err("Consider raising MAX_NR_CPUS\n");
+		return -EINVAL;
+	}
+
+	machine->current_tid[cpu] = tid;
+
+	return machine__set_thread_cpu(machine, pid, tid, cpu);
+}
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index 27486af..aaad99a 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -31,6 +31,7 @@ struct machine {
 	struct map_groups kmaps;
 	struct map	  *vmlinux_maps[MAP__NR_TYPES];
 	symbol_filter_t	  symbol_filter;
+	pid_t		  *current_tid;
 };
 
 static inline
@@ -194,5 +195,8 @@ pid_t machine__get_thread_pid(struct machine *machine, pid_t tid);
 int machine__get_thread_cpu(struct machine *machine, pid_t tid, pid_t *pid);
 int machine__set_thread_cpu(struct machine *machine, pid_t pid, pid_t tid,
 			    int cpu);
+pid_t machine__get_current_tid(struct machine *machine, int cpu);
+int machine__set_current_tid(struct machine *machine, int cpu, pid_t pid,
+			     pid_t tid);
 
 #endif /* __PERF_MACHINE_H */
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 16/71] perf tools: Allow header->data_offset to be predetermined
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (14 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 15/71] perf tools: Add ability to record the current tid for each cpu Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-16 15:26   ` [tip:perf/core] perf header: Allow header-> data_offset " tip-bot for Adrian Hunter
  2013-12-11 12:36 ` [PATCH v0 17/71] perf tools: Add perf_evlist__can_select_event() Alexander Shishkin
                   ` (56 subsequent siblings)
  72 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

It will be necessary to predetermine header->data_offset
to allow space for attributes that are added later.
Consequently, do not change header->data_offset if it
is non-zero.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/header.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 49c4896..0114f0a 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2368,7 +2368,8 @@ int perf_session__write_header(struct perf_session *session,
 		}
 	}
 
-	header->data_offset = lseek(fd, 0, SEEK_CUR);
+	if (!header->data_offset)
+		header->data_offset = lseek(fd, 0, SEEK_CUR);
 	header->feat_offset = header->data_offset + header->data_size;
 
 	if (at_exit) {
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 17/71] perf tools: Add perf_evlist__can_select_event()
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (15 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 16/71] perf tools: Allow header->data_offset to be predetermined Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-16 15:27   ` [tip:perf/core] perf evlist: Add can_select_event() method tip-bot for Adrian Hunter
  2013-12-11 12:36 ` [PATCH v0 18/71] perf session: Flag if the event stream is entirely in memory Alexander Shishkin
                   ` (55 subsequent siblings)
  72 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add a function to determine whether an event can be
selected.

This function is needed to allow a tool to automatically
select additional events, but only if they are available.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.h |  2 ++
 tools/perf/util/record.c | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 649d6ea..8a04aae 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -193,4 +193,6 @@ static inline void perf_mmap__write_tail(struct perf_mmap *md,
 	pc->data_tail = tail;
 }
 
+bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str);
+
 #endif /* __PERF_EVLIST_H */
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index c8845b1..e510453 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -177,3 +177,40 @@ int perf_record_opts__config(struct perf_record_opts *opts)
 {
 	return perf_record_opts__config_freq(opts);
 }
+
+bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str)
+{
+	struct perf_evlist *temp_evlist;
+	struct perf_evsel *evsel;
+	int err, fd, cpu;
+	bool ret = false;
+
+	temp_evlist = perf_evlist__new();
+	if (!temp_evlist)
+		return false;
+
+	err = parse_events(temp_evlist, str);
+	if (err)
+		goto out_delete;
+
+	evsel = perf_evlist__last(temp_evlist);
+
+	if (!evlist || cpu_map__empty(evlist->cpus)) {
+		struct cpu_map *cpus = cpu_map__new(NULL);
+
+		cpu =  cpus ? cpus->map[0] : 0;
+		cpu_map__delete(cpus);
+	} else {
+		cpu = evlist->cpus->map[0];
+	}
+
+	fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, 0);
+	if (fd >= 0) {
+		close(fd);
+		ret = true;
+	}
+
+out_delete:
+	perf_evlist__delete(temp_evlist);
+	return ret;
+}
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 18/71] perf session: Flag if the event stream is entirely in memory
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (16 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 17/71] perf tools: Add perf_evlist__can_select_event() Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 19/71] perf evlist: Pass mmap parameters in a struct Alexander Shishkin
                   ` (54 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Flag if the event stream is a file that has been mmapped
in one go.

This is useful, for example, if a tool needs to keep an
event for later reference.  If the new flag is set, a
pointer to the event can be retained, otherwise the
event must be copied.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/session.c | 9 ++++++++-
 tools/perf/util/session.h | 3 +++
 2 files changed, 11 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 8a7da6f..10ac07a 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1305,8 +1305,10 @@ int __perf_session__process_events(struct perf_session *session,
 	ui_progress__init(&prog, file_size, "Processing events...");
 
 	mmap_size = MMAP_SIZE;
-	if (mmap_size > file_size)
+	if (mmap_size > file_size) {
 		mmap_size = file_size;
+		session->one_mmap = true;
+	}
 
 	memset(mmaps, 0, sizeof(mmaps));
 
@@ -1328,6 +1330,10 @@ remap:
 	mmaps[map_idx] = buf;
 	map_idx = (map_idx + 1) & (ARRAY_SIZE(mmaps) - 1);
 	file_pos = file_offset + head;
+	if (session->one_mmap) {
+		session->one_mmap_addr = buf;
+		session->one_mmap_offset = file_offset;
+	}
 
 more:
 	event = fetch_mmaped_event(session, head, mmap_size, buf);
@@ -1373,6 +1379,7 @@ out_err:
 	ui_progress__finish();
 	perf_session__warn_about_errors(session, tool);
 	perf_session_free_sample_buffers(session);
+	session->one_mmap = false;
 	return err;
 }
 
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 004d3e8..ca1d734 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -36,6 +36,9 @@ struct perf_session {
 	struct trace_event	tevent;
 	struct events_stats	stats;
 	bool			repipe;
+	bool			one_mmap;
+	void			*one_mmap_addr;
+	u64			one_mmap_offset;
 	struct ordered_samples	ordered_samples;
 	struct perf_data_file	*file;
 };
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 19/71] perf evlist: Pass mmap parameters in a struct
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (17 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 18/71] perf session: Flag if the event stream is entirely in memory Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 20/71] perf tools: Move mem_bswap32/64 to util.c Alexander Shishkin
                   ` (53 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

In preparation for adding more mmap parameters, pass
existing parameters in a struct.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 46 ++++++++++++++++++++++++++--------------------
 1 file changed, 26 insertions(+), 20 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 50fadde..f9dbf5f 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -600,12 +600,17 @@ static int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
 	return evlist->mmap != NULL ? 0 : -ENOMEM;
 }
 
-static int __perf_evlist__mmap(struct perf_evlist *evlist,
-			       int idx, int prot, int mask, int fd)
+struct mmap_params {
+	int prot;
+	int mask;
+};
+
+static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
+			       struct mmap_params *mp, int fd)
 {
 	evlist->mmap[idx].prev = 0;
-	evlist->mmap[idx].mask = mask;
-	evlist->mmap[idx].base = mmap(NULL, evlist->mmap_len, prot,
+	evlist->mmap[idx].mask = mp->mask;
+	evlist->mmap[idx].base = mmap(NULL, evlist->mmap_len, mp->prot,
 				      MAP_SHARED, fd, 0);
 	if (evlist->mmap[idx].base == MAP_FAILED) {
 		pr_debug2("failed to mmap perf event ring buffer, error %d\n",
@@ -619,8 +624,8 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist,
 }
 
 static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
-				       int prot, int mask, int cpu, int thread,
-				       int *output)
+				       struct mmap_params *mp, int cpu,
+				       int thread, int *output)
 {
 	struct perf_evsel *evsel;
 
@@ -629,8 +634,7 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 
 		if (*output == -1) {
 			*output = fd;
-			if (__perf_evlist__mmap(evlist, idx, prot, mask,
-						*output) < 0)
+			if (__perf_evlist__mmap(evlist, idx, mp, *output) < 0)
 				return -1;
 		} else {
 			if (ioctl(fd, PERF_EVENT_IOC_SET_OUTPUT, *output) != 0)
@@ -645,8 +649,8 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 	return 0;
 }
 
-static int perf_evlist__mmap_per_cpu(struct perf_evlist *evlist, int prot,
-				     int mask)
+static int perf_evlist__mmap_per_cpu(struct perf_evlist *evlist,
+				     struct mmap_params *mp)
 {
 	int cpu, thread;
 	int nr_cpus = cpu_map__nr(evlist->cpus);
@@ -657,8 +661,8 @@ static int perf_evlist__mmap_per_cpu(struct perf_evlist *evlist, int prot,
 		int output = -1;
 
 		for (thread = 0; thread < nr_threads; thread++) {
-			if (perf_evlist__mmap_per_evsel(evlist, cpu, prot, mask,
-							cpu, thread, &output))
+			if (perf_evlist__mmap_per_evsel(evlist, cpu, mp, cpu,
+							thread, &output))
 				goto out_unmap;
 		}
 	}
@@ -671,8 +675,8 @@ out_unmap:
 	return -1;
 }
 
-static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist, int prot,
-					int mask)
+static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist,
+					struct mmap_params *mp)
 {
 	int thread;
 	int nr_threads = thread_map__nr(evlist->threads);
@@ -681,8 +685,8 @@ static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist, int prot,
 	for (thread = 0; thread < nr_threads; thread++) {
 		int output = -1;
 
-		if (perf_evlist__mmap_per_evsel(evlist, thread, prot, mask, 0,
-						thread, &output))
+		if (perf_evlist__mmap_per_evsel(evlist, thread, mp, 0, thread,
+						&output))
 			goto out_unmap;
 	}
 
@@ -785,7 +789,9 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 	struct perf_evsel *evsel;
 	const struct cpu_map *cpus = evlist->cpus;
 	const struct thread_map *threads = evlist->threads;
-	int prot = PROT_READ | (overwrite ? 0 : PROT_WRITE), mask;
+	struct mmap_params mp = {
+		.prot = PROT_READ | (overwrite ? 0 : PROT_WRITE),
+	};
 
 	if (evlist->mmap == NULL && perf_evlist__alloc_mmap(evlist) < 0)
 		return -ENOMEM;
@@ -796,7 +802,7 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 	evlist->overwrite = overwrite;
 	evlist->mmap_len = perf_evlist__mmap_size(pages);
 	pr_debug("mmap size %zuB\n", evlist->mmap_len);
-	mask = evlist->mmap_len - page_size - 1;
+	mp.mask = evlist->mmap_len - page_size - 1;
 
 	list_for_each_entry(evsel, &evlist->entries, node) {
 		if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
@@ -806,9 +812,9 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 	}
 
 	if (cpu_map__empty(cpus))
-		return perf_evlist__mmap_per_thread(evlist, prot, mask);
+		return perf_evlist__mmap_per_thread(evlist, &mp);
 
-	return perf_evlist__mmap_per_cpu(evlist, prot, mask);
+	return perf_evlist__mmap_per_cpu(evlist, &mp);
 }
 
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 20/71] perf tools: Move mem_bswap32/64 to util.c
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (18 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 19/71] perf evlist: Pass mmap parameters in a struct Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-16 15:27   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2013-12-11 12:36 ` [PATCH v0 21/71] perf tools: Add feature test for __sync_val_compare_and_swap Alexander Shishkin
                   ` (52 subsequent siblings)
  72 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Move functions mem_bswap_32() and mem_bswap_64()
so they can be reused.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/session.c | 21 ---------------------
 tools/perf/util/session.h |  2 --
 tools/perf/util/util.c    | 22 ++++++++++++++++++++++
 tools/perf/util/util.h    |  3 +++
 4 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 10ac07a..6bfb36b 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -247,27 +247,6 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 	}
 }
  
-void mem_bswap_32(void *src, int byte_size)
-{
-	u32 *m = src;
-	while (byte_size > 0) {
-		*m = bswap_32(*m);
-		byte_size -= sizeof(u32);
-		++m;
-	}
-}
-
-void mem_bswap_64(void *src, int byte_size)
-{
-	u64 *m = src;
-
-	while (byte_size > 0) {
-		*m = bswap_64(*m);
-		byte_size -= sizeof(u64);
-		++m;
-	}
-}
-
 static void swap_sample_id_all(union perf_event *event, void *data)
 {
 	void *end = (void *) event + event->header.size;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index ca1d734..69e3bad 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -76,8 +76,6 @@ int perf_session__resolve_callchain(struct perf_session *session,
 
 bool perf_session__has_traces(struct perf_session *session, const char *msg);
 
-void mem_bswap_64(void *src, int byte_size);
-void mem_bswap_32(void *src, int byte_size);
 void perf_event__attr_swap(struct perf_event_attr *attr);
 
 int perf_session__create_kernel_maps(struct perf_session *session);
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 3aed4af..6df02a9 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -10,6 +10,7 @@
 #include <string.h>
 #include <errno.h>
 #include <limits.h>
+#include <byteswap.h>
 #include <linux/kernel.h>
 
 /*
@@ -501,3 +502,24 @@ int perf_event_paranoid(void)
 
 	return value;
 }
+
+void mem_bswap_32(void *src, int byte_size)
+{
+	u32 *m = src;
+	while (byte_size > 0) {
+		*m = bswap_32(*m);
+		byte_size -= sizeof(u32);
+		++m;
+	}
+}
+
+void mem_bswap_64(void *src, int byte_size)
+{
+	u64 *m = src;
+
+	while (byte_size > 0) {
+		*m = bswap_64(*m);
+		byte_size -= sizeof(u64);
+		++m;
+	}
+}
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 4b6b260..24da45e 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -312,4 +312,7 @@ int filename__read_str(const char *filename, char **buf, size_t *sizep);
 
 int perf_event_paranoid(void);
 
+void mem_bswap_64(void *src, int byte_size);
+void mem_bswap_32(void *src, int byte_size);
+
 #endif /* GIT_COMPAT_UTIL_H */
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 21/71] perf tools: Add feature test for __sync_val_compare_and_swap
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (19 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 20/71] perf tools: Move mem_bswap32/64 to util.c Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 19:24   ` Arnaldo Carvalho de Melo
  2013-12-11 12:36 ` [PATCH v0 22/71] perf tools: Add option macro OPT_CALLBACK_OPTARG Alexander Shishkin
                   ` (51 subsequent siblings)
  72 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add a feature test for __sync_val_compare_and_swap()
and __sync_bool_compare_and_swap()

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/config/Makefile                                 |  5 +++++
 tools/perf/config/feature-checks/Makefile                  |  4 ++++
 tools/perf/config/feature-checks/test-all.c                |  5 +++++
 .../config/feature-checks/test-sync-compare-and-swap.c     | 14 ++++++++++++++
 4 files changed, 28 insertions(+)
 create mode 100644 tools/perf/config/feature-checks/test-sync-compare-and-swap.c

diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
index bae1072..43a2879 100644
--- a/tools/perf/config/Makefile
+++ b/tools/perf/config/Makefile
@@ -126,6 +126,7 @@ CORE_FEATURE_TESTS =			\
 	backtrace			\
 	dwarf				\
 	fortify-source			\
+	sync-compare-and-swap		\
 	glibc				\
 	gtk2				\
 	gtk2-infobar			\
@@ -234,6 +235,10 @@ CFLAGS += -I$(LIB_INCLUDE)
 
 CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
 
+ifeq ($(feature-sync-compare-and-swap), 1)
+  CFLAGS += -DHAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
+endif
+
 ifndef NO_BIONIC
   $(call feature_check,bionic)
   ifeq ($(feature-bionic), 1)
diff --git a/tools/perf/config/feature-checks/Makefile b/tools/perf/config/feature-checks/Makefile
index b8bb749..b4b7bb2 100644
--- a/tools/perf/config/feature-checks/Makefile
+++ b/tools/perf/config/feature-checks/Makefile
@@ -5,6 +5,7 @@ FILES=					\
 	test-bionic			\
 	test-dwarf			\
 	test-fortify-source		\
+	test-sync-compare-and-swap	\
 	test-glibc			\
 	test-gtk2			\
 	test-gtk2-infobar		\
@@ -140,6 +141,9 @@ test-backtrace:
 test-timerfd:
 	$(BUILD)
 
+test-sync-compare-and-swap:
+	$(BUILD)
+
 -include *.d
 
 ###############################
diff --git a/tools/perf/config/feature-checks/test-all.c b/tools/perf/config/feature-checks/test-all.c
index 9b8a544..5cfec18 100644
--- a/tools/perf/config/feature-checks/test-all.c
+++ b/tools/perf/config/feature-checks/test-all.c
@@ -89,6 +89,10 @@
 # include "test-stackprotector-all.c"
 #undef main
 
+#define main main_test_sync_compare_and_swap
+# include "test-sync-compare-and-swap.c"
+#undef main
+
 int main(int argc, char *argv[])
 {
 	main_test_libpython();
@@ -111,6 +115,7 @@ int main(int argc, char *argv[])
 	main_test_libnuma();
 	main_test_timerfd();
 	main_test_stackprotector_all();
+	main_test_sync_compare_and_swap();
 
 	return 0;
 }
diff --git a/tools/perf/config/feature-checks/test-sync-compare-and-swap.c b/tools/perf/config/feature-checks/test-sync-compare-and-swap.c
new file mode 100644
index 0000000..c34d4ca
--- /dev/null
+++ b/tools/perf/config/feature-checks/test-sync-compare-and-swap.c
@@ -0,0 +1,14 @@
+#include <stdint.h>
+
+volatile uint64_t x;
+
+int main(int argc, char *argv[])
+{
+	uint64_t old, new = argc;
+
+	argv = argv;
+	do {
+		old = __sync_val_compare_and_swap(&x, 0, 0);
+	} while (!__sync_bool_compare_and_swap(&x, old, new));
+	return old == new;
+}
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 22/71] perf tools: Add option macro OPT_CALLBACK_OPTARG
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (20 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 21/71] perf tools: Add feature test for __sync_val_compare_and_swap Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 23/71] perf evlist: Add perf_evlist__to_front() Alexander Shishkin
                   ` (50 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add an option macro that is the same as
OPT_CALLBACK except that the argument is
optional and it is possible to associate
additional data with it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/parse-options.h | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tools/perf/util/parse-options.h b/tools/perf/util/parse-options.h
index cbf0149..aec0afb 100644
--- a/tools/perf/util/parse-options.h
+++ b/tools/perf/util/parse-options.h
@@ -98,6 +98,7 @@ struct option {
 	parse_opt_cb *callback;
 	intptr_t defval;
 	bool *set;
+	void *data;
 };
 
 #define check_vtype(v, type) ( BUILD_BUG_ON_ZERO(!__builtin_types_compatible_p(typeof(v), type)) + v )
@@ -131,6 +132,10 @@ struct option {
 	{ .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l),\
 	.value = (v), (a), .help = (h), .callback = (f), .defval = (intptr_t)d,\
 	.flags = PARSE_OPT_LASTARG_DEFAULT | PARSE_OPT_NOARG}
+#define OPT_CALLBACK_OPTARG(s, l, v, d, a, h, f) \
+	{ .type = OPTION_CALLBACK, .short_name = (s), .long_name = (l), \
+	  .value = (v), (a), .help = (h), .callback = (f), \
+	  .flags = PARSE_OPT_OPTARG, .data = (d) }
 
 /* parse_options() will filter out the processed options and leave the
  * non-option argments in argv[].
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 23/71] perf evlist: Add perf_evlist__to_front()
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (21 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 22/71] perf tools: Add option macro OPT_CALLBACK_OPTARG Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 19:38   ` Arnaldo Carvalho de Melo
  2013-12-16 15:27   ` [tip:perf/core] " tip-bot for Adrian Hunter
  2013-12-11 12:36 ` [PATCH v0 24/71] perf evlist: Add perf_evlist__set_tracking_event() Alexander Shishkin
                   ` (49 subsequent siblings)
  72 siblings, 2 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add a function to move a selected event to the
front of the list.

This is needed because it is not possible
to use the PERF_EVENT_IOC_SET_OUTPUT IOCTL
from an Instruction Tracing event to a
non-Instruction Tracing event.  Thus the
Instruction Tracing event must come first.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 17 +++++++++++++++++
 tools/perf/util/evlist.h |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index f9dbf5f..93683bc 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1216,3 +1216,20 @@ int perf_evlist__strerror_open(struct perf_evlist *evlist __maybe_unused,
 
 	return 0;
 }
+
+void perf_evlist__to_front(struct perf_evlist *evlist,
+			   struct perf_evsel *move_evsel)
+{
+	struct perf_evsel *evsel, *n;
+	LIST_HEAD(move);
+
+	if (move_evsel == perf_evlist__first(evlist))
+		return;
+
+	list_for_each_entry_safe(evsel, n, &evlist->entries, node) {
+		if (evsel->leader == move_evsel->leader)
+			list_move_tail(&evsel->node, &move);
+	}
+
+	list_splice(&move, &evlist->entries);
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 8a04aae..9f64ede 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -194,5 +194,8 @@ static inline void perf_mmap__write_tail(struct perf_mmap *md,
 }
 
 bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str);
+void perf_evlist__to_front(struct perf_evlist *evlist,
+			   struct perf_evsel *move_evsel);
+
 
 #endif /* __PERF_EVLIST_H */
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 24/71] perf evlist: Add perf_evlist__set_tracking_event()
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (22 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 23/71] perf evlist: Add perf_evlist__to_front() Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 25/71] perf evsel: Add 'no_aux_samples' option Alexander Shishkin
                   ` (48 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add a function to change which event is used
to track mmap, comm and task events.

This is needed with Instruction Tracing
because the Instruction Tracing event
must come first but cannot be used for
tracking because it will be disabled
under some circumstances.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 20 ++++++++++++++++++++
 tools/perf/util/evlist.h |  3 ++-
 2 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 93683bc..ae9cbe6 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1233,3 +1233,23 @@ void perf_evlist__to_front(struct perf_evlist *evlist,
 
 	list_splice(&move, &evlist->entries);
 }
+
+int perf_evlist__set_tracking_event(struct perf_evlist *evlist,
+				    struct perf_evsel *tracking_evsel)
+{
+	struct perf_evsel *evsel;
+
+	if (tracking_evsel->idx == 0)
+		return 0;
+
+	if (tracking_evsel->leader->nr_members > 1)
+		return -EINVAL;
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		if (evsel->idx < tracking_evsel->idx)
+			evsel->idx += 1;
+	}
+	tracking_evsel->idx = 0;
+
+	return 0;
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 9f64ede..2c8d068 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -196,6 +196,7 @@ static inline void perf_mmap__write_tail(struct perf_mmap *md,
 bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str);
 void perf_evlist__to_front(struct perf_evlist *evlist,
 			   struct perf_evsel *move_evsel);
-
+int perf_evlist__set_tracking_event(struct perf_evlist *evlist,
+				    struct perf_evsel *tracking_evsel);
 
 #endif /* __PERF_EVLIST_H */
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 25/71] perf evsel: Add 'no_aux_samples' option
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (23 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 24/71] perf evlist: Add perf_evlist__set_tracking_event() Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 26/71] perf evsel: Add 'immediate' option Alexander Shishkin
                   ` (47 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add an option to prevent additional
samples being added to a selected
event by perf_evsel__config().

This is needed when using the sched_switch
tracepoint to follow object code execution.
Since sched_switch will be used only for
switch information, additional sampling is
wasteful.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evsel.c | 6 +++---
 tools/perf/util/evsel.h | 1 +
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 7b510fd..dbf737c 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -596,7 +596,7 @@ void perf_evsel__config(struct perf_evsel *evsel,
 		attr->mmap_data = track;
 	}
 
-	if (opts->call_graph) {
+	if (opts->call_graph && !evsel->no_aux_samples) {
 		perf_evsel__set_sample_bit(evsel, CALLCHAIN);
 
 		if (opts->call_graph == CALLCHAIN_DWARF) {
@@ -619,7 +619,7 @@ void perf_evsel__config(struct perf_evsel *evsel,
 	     target__has_cpu(&opts->target) || per_cpu))
 		perf_evsel__set_sample_bit(evsel, TIME);
 
-	if (opts->raw_samples) {
+	if (opts->raw_samples && !evsel->no_aux_samples) {
 		perf_evsel__set_sample_bit(evsel, TIME);
 		perf_evsel__set_sample_bit(evsel, RAW);
 		perf_evsel__set_sample_bit(evsel, CPU);
@@ -632,7 +632,7 @@ void perf_evsel__config(struct perf_evsel *evsel,
 		attr->watermark = 0;
 		attr->wakeup_events = 1;
 	}
-	if (opts->branch_stack) {
+	if (opts->branch_stack && !evsel->no_aux_samples) {
 		perf_evsel__set_sample_bit(evsel, BRANCH_STACK);
 		attr->branch_sample_type = opts->branch_stack;
 	}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 8120eeb..af38e2c 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -83,6 +83,7 @@ struct perf_evsel {
 	int			is_pos;
 	bool 			supported;
 	bool 			needs_swap;
+	bool			no_aux_samples;
 	/* parse modifier helper */
 	int			exclude_GH;
 	int			nr_members;
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 26/71] perf evsel: Add 'immediate' option
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (24 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 25/71] perf evsel: Add 'no_aux_samples' option Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 27/71] perf evlist: Add 'system_wide' option Alexander Shishkin
                   ` (46 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add an option to cause a selected event
to be enabled immediately when configured
by perf_evsel__config().

This is needed when using the sched_switch
tracepoint to follow object code execution.
By having sched_switch enabled immediately
the first sched_switch event always
precedes the start of other tracing.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evsel.c | 5 +++++
 tools/perf/util/evsel.h | 1 +
 2 files changed, 6 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index dbf737c..5fcd7cb 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -661,6 +661,11 @@ void perf_evsel__config(struct perf_evsel *evsel,
 	 */
 	if (target__none(&opts->target) && perf_evsel__is_group_leader(evsel))
 		attr->enable_on_exec = 1;
+
+	if (evsel->immediate) {
+		attr->disabled = 0;
+		attr->enable_on_exec = 0;
+	}
 }
 
 int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index af38e2c..de1b36e 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -84,6 +84,7 @@ struct perf_evsel {
 	bool 			supported;
 	bool 			needs_swap;
 	bool			no_aux_samples;
+	bool			immediate;
 	/* parse modifier helper */
 	int			exclude_GH;
 	int			nr_members;
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 27/71] perf evlist: Add 'system_wide' option
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (25 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 26/71] perf evsel: Add 'immediate' option Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 19:37   ` David Ahern
  2013-12-11 12:36 ` [PATCH v0 28/71] perf tools: Add id index Alexander Shishkin
                   ` (45 subsequent siblings)
  72 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add an option to cause a selected event
to be opened always without a pid when
configured by perf_evsel__config().

This is needed when using the sched_switch
tracepoint to follow object code execution.
sched_switch occurs before the task
switch and so it cannot record it in a
context limited to that task.  Note
that also means that sched_switch is
useless when capturing data per-thread,
as is the 'context-switches' software
event for the same reason.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 45 +++++++++++++++++++++++++++++++++++++--------
 tools/perf/util/evsel.c  | 31 ++++++++++++++++++++++++++-----
 tools/perf/util/evsel.h  |  1 +
 3 files changed, 64 insertions(+), 13 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index ae9cbe6..3959978 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -261,17 +261,27 @@ int perf_evlist__add_newtp(struct perf_evlist *evlist,
 	return 0;
 }
 
+static int perf_evlist__nr_threads(struct perf_evlist *evlist,
+				   struct perf_evsel *evsel)
+{
+	if (evsel->system_wide)
+		return 1;
+	else
+		return thread_map__nr(evlist->threads);
+}
+
 void perf_evlist__disable(struct perf_evlist *evlist)
 {
 	int cpu, thread;
 	struct perf_evsel *pos;
 	int nr_cpus = cpu_map__nr(evlist->cpus);
-	int nr_threads = thread_map__nr(evlist->threads);
+	int nr_threads;
 
 	for (cpu = 0; cpu < nr_cpus; cpu++) {
 		list_for_each_entry(pos, &evlist->entries, node) {
 			if (!perf_evsel__is_group_leader(pos) || !pos->fd)
 				continue;
+			nr_threads = perf_evlist__nr_threads(evlist, pos);
 			for (thread = 0; thread < nr_threads; thread++)
 				ioctl(FD(pos, cpu, thread),
 				      PERF_EVENT_IOC_DISABLE, 0);
@@ -284,12 +294,13 @@ void perf_evlist__enable(struct perf_evlist *evlist)
 	int cpu, thread;
 	struct perf_evsel *pos;
 	int nr_cpus = cpu_map__nr(evlist->cpus);
-	int nr_threads = thread_map__nr(evlist->threads);
+	int nr_threads;
 
 	for (cpu = 0; cpu < nr_cpus; cpu++) {
 		list_for_each_entry(pos, &evlist->entries, node) {
 			if (!perf_evsel__is_group_leader(pos) || !pos->fd)
 				continue;
+			nr_threads = perf_evlist__nr_threads(evlist, pos);
 			for (thread = 0; thread < nr_threads; thread++)
 				ioctl(FD(pos, cpu, thread),
 				      PERF_EVENT_IOC_ENABLE, 0);
@@ -301,12 +312,14 @@ int perf_evlist__disable_event(struct perf_evlist *evlist,
 			       struct perf_evsel *evsel)
 {
 	int cpu, thread, err;
+	int nr_cpus = cpu_map__nr(evlist->cpus);
+	int nr_threads = perf_evlist__nr_threads(evlist, evsel);
 
 	if (!evsel->fd)
 		return 0;
 
-	for (cpu = 0; cpu < evlist->cpus->nr; cpu++) {
-		for (thread = 0; thread < evlist->threads->nr; thread++) {
+	for (cpu = 0; cpu < nr_cpus; cpu++) {
+		for (thread = 0; thread < nr_threads; thread++) {
 			err = ioctl(FD(evsel, cpu, thread),
 				    PERF_EVENT_IOC_DISABLE, 0);
 			if (err)
@@ -320,12 +333,14 @@ int perf_evlist__enable_event(struct perf_evlist *evlist,
 			      struct perf_evsel *evsel)
 {
 	int cpu, thread, err;
+	int nr_cpus = cpu_map__nr(evlist->cpus);
+	int nr_threads = perf_evlist__nr_threads(evlist, evsel);
 
 	if (!evsel->fd)
 		return -EINVAL;
 
-	for (cpu = 0; cpu < evlist->cpus->nr; cpu++) {
-		for (thread = 0; thread < evlist->threads->nr; thread++) {
+	for (cpu = 0; cpu < nr_cpus; cpu++) {
+		for (thread = 0; thread < nr_threads; thread++) {
 			err = ioctl(FD(evsel, cpu, thread),
 				    PERF_EVENT_IOC_ENABLE, 0);
 			if (err)
@@ -339,7 +354,16 @@ static int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
 {
 	int nr_cpus = cpu_map__nr(evlist->cpus);
 	int nr_threads = thread_map__nr(evlist->threads);
-	int nfds = nr_cpus * nr_threads * evlist->nr_entries;
+	int nfds = 0;
+	struct perf_evsel *evsel;
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		if (evsel->system_wide)
+			nfds += nr_cpus;
+		else
+			nfds += nr_cpus * nr_threads;
+	}
+
 	evlist->pollfd = malloc(sizeof(struct pollfd) * nfds);
 	return evlist->pollfd != NULL ? 0 : -ENOMEM;
 }
@@ -630,7 +654,12 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 	struct perf_evsel *evsel;
 
 	list_for_each_entry(evsel, &evlist->entries, node) {
-		int fd = FD(evsel, cpu, thread);
+		int fd;
+
+		if (evsel->system_wide && thread)
+			continue;
+
+		fd = FD(evsel, cpu, thread);
 
 		if (*output == -1) {
 			*output = fd;
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 5fcd7cb..4e92a22 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -671,6 +671,10 @@ void perf_evsel__config(struct perf_evsel *evsel,
 int perf_evsel__alloc_fd(struct perf_evsel *evsel, int ncpus, int nthreads)
 {
 	int cpu, thread;
+
+	if (evsel->system_wide)
+		nthreads = 1;
+
 	evsel->fd = xyarray__new(ncpus, nthreads, sizeof(int));
 
 	if (evsel->fd) {
@@ -689,6 +693,9 @@ static int perf_evsel__run_ioctl(struct perf_evsel *evsel, int ncpus, int nthrea
 {
 	int cpu, thread;
 
+	if (evsel->system_wide)
+		nthreads = 1;
+
 	for (cpu = 0; cpu < ncpus; cpu++) {
 		for (thread = 0; thread < nthreads; thread++) {
 			int fd = FD(evsel, cpu, thread),
@@ -719,6 +726,9 @@ int perf_evsel__enable(struct perf_evsel *evsel, int ncpus, int nthreads)
 
 int perf_evsel__alloc_id(struct perf_evsel *evsel, int ncpus, int nthreads)
 {
+	if (evsel->system_wide)
+		nthreads = 1;
+
 	evsel->sample_id = xyarray__new(ncpus, nthreads, sizeof(struct perf_sample_id));
 	if (evsel->sample_id == NULL)
 		return -ENOMEM;
@@ -764,6 +774,9 @@ void perf_evsel__close_fd(struct perf_evsel *evsel, int ncpus, int nthreads)
 {
 	int cpu, thread;
 
+	if (evsel->system_wide)
+		nthreads = 1;
+
 	for (cpu = 0; cpu < ncpus; cpu++)
 		for (thread = 0; thread < nthreads; ++thread) {
 			close(FD(evsel, cpu, thread));
@@ -852,6 +865,9 @@ int __perf_evsel__read(struct perf_evsel *evsel,
 	int cpu, thread;
 	struct perf_counts_values *aggr = &evsel->counts->aggr, count;
 
+	if (evsel->system_wide)
+		nthreads = 1;
+
 	aggr->val = aggr->ena = aggr->run = 0;
 
 	for (cpu = 0; cpu < ncpus; cpu++) {
@@ -974,13 +990,18 @@ static size_t perf_event_attr__fprintf(struct perf_event_attr *attr, FILE *fp)
 static int __perf_evsel__open(struct perf_evsel *evsel, struct cpu_map *cpus,
 			      struct thread_map *threads)
 {
-	int cpu, thread;
+	int cpu, thread, nthreads;
 	unsigned long flags = 0;
 	int pid = -1, err;
 	enum { NO_CHANGE, SET_TO_MAX, INCREASED_MAX } set_rlimit = NO_CHANGE;
 
+	if (evsel->system_wide)
+		nthreads = 1;
+	else
+		nthreads = threads->nr;
+
 	if (evsel->fd == NULL &&
-	    perf_evsel__alloc_fd(evsel, cpus->nr, threads->nr) < 0)
+	    perf_evsel__alloc_fd(evsel, cpus->nr, nthreads) < 0)
 		return -ENOMEM;
 
 	if (evsel->cgrp) {
@@ -1002,10 +1023,10 @@ retry_sample_id:
 
 	for (cpu = 0; cpu < cpus->nr; cpu++) {
 
-		for (thread = 0; thread < threads->nr; thread++) {
+		for (thread = 0; thread < nthreads; thread++) {
 			int group_fd;
 
-			if (!evsel->cgrp)
+			if (!evsel->cgrp && !evsel->system_wide)
 				pid = threads->map[thread];
 
 			group_fd = get_group_fd(evsel, cpu, thread);
@@ -1075,7 +1096,7 @@ out_close:
 			close(FD(evsel, cpu, thread));
 			FD(evsel, cpu, thread) = -1;
 		}
-		thread = threads->nr;
+		thread = nthreads;
 	} while (--cpu >= 0);
 	return err;
 }
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index de1b36e..7b8795d 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -85,6 +85,7 @@ struct perf_evsel {
 	bool 			needs_swap;
 	bool			no_aux_samples;
 	bool			immediate;
+	bool			system_wide;
 	/* parse modifier helper */
 	int			exclude_GH;
 	int			nr_members;
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 28/71] perf tools: Add id index
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (26 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 27/71] perf evlist: Add 'system_wide' option Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 29/71] perf pmu: Let pmu's with no events show up on perf list Alexander Shishkin
                   ` (44 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add an index of the event identifiers.

This is needed to queue Instruction
Trace samples according to the mmap
buffer from which they were recorded.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-inject.c |   1 +
 tools/perf/util/event.c     |   1 +
 tools/perf/util/event.h     |  15 ++++++
 tools/perf/util/evlist.c    |  26 ++++++++--
 tools/perf/util/evsel.h     |   3 ++
 tools/perf/util/session.c   | 122 ++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/session.h   |  10 ++++
 tools/perf/util/tool.h      |   3 +-
 8 files changed, 177 insertions(+), 4 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 6a25085..8400d29 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -424,6 +424,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 			.tracing_data	= perf_event__repipe_op2_synth,
 			.finished_round	= perf_event__repipe_op2_synth,
 			.build_id	= perf_event__repipe_op2_synth,
+			.id_index	= perf_event__repipe_op2_synth,
 		},
 		.input_name  = "-",
 		.samples = LIST_HEAD_INIT(inject.samples),
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index c77814b..30f91d7 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -25,6 +25,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
 	[PERF_RECORD_HEADER_BUILD_ID]		= "BUILD_ID",
 	[PERF_RECORD_FINISHED_ROUND]		= "FINISHED_ROUND",
+	[PERF_RECORD_ID_INDEX]			= "ID_INDEX",
 };
 
 const char *perf_event__name(unsigned int id)
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 30fec99..88e27cb 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -153,6 +153,7 @@ enum perf_user_event_type { /* above any possible kernel type */
 	PERF_RECORD_HEADER_TRACING_DATA		= 66,
 	PERF_RECORD_HEADER_BUILD_ID		= 67,
 	PERF_RECORD_FINISHED_ROUND		= 68,
+	PERF_RECORD_ID_INDEX			= 69,
 	PERF_RECORD_HEADER_MAX
 };
 
@@ -179,6 +180,19 @@ struct tracing_data_event {
 	u32 size;
 };
 
+struct id_index_entry {
+	u64 id;
+	u64 idx;
+	u64 cpu;
+	u64 tid;
+};
+
+struct id_index_event {
+	struct perf_event_header header;
+	u64 nr;
+	struct id_index_entry entries[0];
+};
+
 union perf_event {
 	struct perf_event_header	header;
 	struct mmap_event		mmap;
@@ -193,6 +207,7 @@ union perf_event {
 	struct event_type_event		event_type;
 	struct tracing_data_event	tracing_data;
 	struct build_id_event		build_id;
+	struct id_index_event		id_index;
 };
 
 void perf_event__print_totals(void);
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 3959978..7ae3139 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -437,6 +437,22 @@ static int perf_evlist__id_add_fd(struct perf_evlist *evlist,
 	return 0;
 }
 
+static void perf_evlist__set_sid_idx(struct perf_evlist *evlist,
+				     struct perf_evsel *evsel, int idx, int cpu,
+				     int thread)
+{
+	struct perf_sample_id *sid = SID(evsel, cpu, thread);
+	sid->idx = idx;
+	if (evlist->cpus && cpu >= 0)
+		sid->cpu = evlist->cpus->map[cpu];
+	else
+		sid->cpu = -1;
+	if (!evsel->system_wide && evlist->threads && thread >= 0)
+		sid->tid = evlist->threads->map[thread];
+	else
+		sid->tid = -1;
+}
+
 struct perf_sample_id *perf_evlist__id2sid(struct perf_evlist *evlist, u64 id)
 {
 	struct hlist_head *head;
@@ -670,9 +686,13 @@ static int perf_evlist__mmap_per_evsel(struct perf_evlist *evlist, int idx,
 				return -1;
 		}
 
-		if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
-		    perf_evlist__id_add_fd(evlist, evsel, cpu, thread, fd) < 0)
-			return -1;
+		if (evsel->attr.read_format & PERF_FORMAT_ID) {
+			if (perf_evlist__id_add_fd(evlist, evsel, cpu, thread,
+						   fd) < 0)
+				return -1;
+			perf_evlist__set_sid_idx(evlist, evsel, idx, cpu,
+						 thread);
+		}
 	}
 
 	return 0;
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 7b8795d..3e25e23 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -38,6 +38,9 @@ struct perf_sample_id {
 	struct hlist_node 	node;
 	u64		 	id;
 	struct perf_evsel	*evsel;
+	int			idx;
+	int			cpu;
+	pid_t			tid;
 
 	/* Holds total ID period value for PERF_SAMPLE_READ processing. */
 	u64			period;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 6bfb36b..81fb4ad 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -211,6 +211,15 @@ static int process_finished_round(struct perf_tool *tool,
 				  union perf_event *event,
 				  struct perf_session *session);
 
+static int process_id_index_stub(struct perf_tool *tool __maybe_unused,
+				 union perf_event *event __maybe_unused,
+				 struct perf_session *perf_session
+				 __maybe_unused)
+{
+	dump_printf(": unhandled!\n");
+	return 0;
+}
+
 void perf_tool__fill_defaults(struct perf_tool *tool)
 {
 	if (tool->sample == NULL)
@@ -245,6 +254,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		else
 			tool->finished_round = process_finished_round_stub;
 	}
+	if (tool->id_index == NULL)
+		tool->id_index = process_id_index_stub;
 }
  
 static void swap_sample_id_all(union perf_event *event, void *data)
@@ -443,6 +454,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
 	[PERF_RECORD_HEADER_EVENT_TYPE]	  = perf_event__event_type_swap,
 	[PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
 	[PERF_RECORD_HEADER_BUILD_ID]	  = NULL,
+	[PERF_RECORD_ID_INDEX]		  = perf_event__all64_swap,
 	[PERF_RECORD_HEADER_MAX]	  = NULL,
 };
 
@@ -1011,6 +1023,8 @@ static int perf_session__process_user_event(struct perf_session *session, union
 		return tool->build_id(tool, event, session);
 	case PERF_RECORD_FINISHED_ROUND:
 		return tool->finished_round(tool, event, session);
+	case PERF_RECORD_ID_INDEX:
+		return tool->id_index(tool, event, session);
 	default:
 		return -EINVAL;
 	}
@@ -1648,3 +1662,111 @@ int __perf_session__set_tracepoints_handlers(struct perf_session *session,
 out:
 	return err;
 }
+
+int perf_event__process_id_index(struct perf_tool *tool __maybe_unused,
+				 union perf_event *event,
+				 struct perf_session *session)
+{
+	struct perf_evlist *evlist = session->evlist;
+	struct id_index_event *ie = &event->id_index;
+	size_t i, nr, max_nr;
+
+	max_nr = (ie->header.size - sizeof(struct id_index_event)) /
+		 sizeof(struct id_index_entry);
+	nr = ie->nr;
+	if (nr > max_nr)
+		return -EINVAL;
+
+	if (dump_trace)
+		fprintf(stdout, " nr: %zu\n", nr);
+
+	for (i = 0; i < nr; i++) {
+		struct id_index_entry *e = &ie->entries[i];
+		struct perf_sample_id *sid;
+
+		if (dump_trace) {
+			fprintf(stdout,	" ... id: %"PRIu64, e->id);
+			fprintf(stdout,	"  idx: %"PRIu64, e->idx);
+			fprintf(stdout,	"  cpu: %"PRId64, e->cpu);
+			fprintf(stdout,	"  tid: %"PRId64"\n", e->tid);
+		}
+
+		sid = perf_evlist__id2sid(evlist, e->id);
+		if (!sid)
+			return -ENOENT;
+		sid->idx = e->idx;
+		sid->cpu = e->cpu;
+		sid->tid = e->tid;
+	}
+	return 0;
+}
+
+int perf_event__synthesize_id_index(struct perf_tool *tool,
+				    perf_event__handler_t process,
+				    struct perf_evlist *evlist,
+				    struct machine *machine)
+{
+	union perf_event *ev;
+	struct perf_evsel *evsel;
+	size_t nr = 0, i = 0, sz, max_nr, n;
+	int err;
+
+	pr_debug2("Synthesizing id index\n");
+
+	max_nr = (UINT16_MAX - sizeof(struct id_index_event)) /
+		 sizeof(struct id_index_entry);
+
+	list_for_each_entry(evsel, &evlist->entries, node)
+		nr += evsel->ids;
+
+	n = nr > max_nr ? max_nr : nr;
+	sz = sizeof(struct id_index_event) + n * sizeof(struct id_index_entry);
+	ev = zalloc(sz);
+	if (!ev)
+		return -ENOMEM;
+
+	ev->id_index.header.type = PERF_RECORD_ID_INDEX;
+	ev->id_index.header.size = sz;
+	ev->id_index.nr = n;
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		u32 j;
+
+		for (j = 0; j < evsel->ids; j++) {
+			struct id_index_entry *e;
+			struct perf_sample_id *sid;
+
+			if (i >= n) {
+				err = process(tool, ev, NULL, machine);
+				if (err)
+					goto out_err;
+				nr -= n;
+				i = 0;
+			}
+
+			e = &ev->id_index.entries[i++];
+
+			e->id = evsel->id[j];
+
+			sid = perf_evlist__id2sid(evlist, e->id);
+			if (!sid) {
+				free(ev);
+				return -ENOENT;
+			}
+
+			e->idx = sid->idx;
+			e->cpu = sid->cpu;
+			e->tid = sid->tid;
+		}
+	}
+
+	sz = sizeof(struct id_index_event) + nr * sizeof(struct id_index_entry);
+	ev->id_index.header.size = sz;
+	ev->id_index.nr = nr;
+
+	err = process(tool, ev, NULL, machine);
+out_err:
+	free(ev);
+
+	return err;
+}
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 69e3bad..60a31db 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -128,4 +128,14 @@ int __perf_session__set_tracepoints_handlers(struct perf_session *session,
 extern volatile int session_done;
 
 #define session_done()	(*(volatile int *)(&session_done))
+
+int perf_event__process_id_index(struct perf_tool *tool,
+				 union perf_event *event,
+				 struct perf_session *session);
+
+int perf_event__synthesize_id_index(struct perf_tool *tool,
+				    perf_event__handler_t process,
+				    struct perf_evlist *evlist,
+				    struct machine *machine);
+
 #endif /* __PERF_SESSION_H */
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 4385816..f07d6fe 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -39,7 +39,8 @@ struct perf_tool {
 	event_attr_op	attr;
 	event_op2	tracing_data;
 	event_op2	finished_round,
-			build_id;
+			build_id,
+			id_index;
 	bool		ordered_samples;
 	bool		ordering_requires_timestamps;
 };
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 29/71] perf pmu: Let pmu's with no events show up on perf list
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (27 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 28/71] perf tools: Add id index Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 30/71] perf session: Add ability to skip 4GiB or more Alexander Shishkin
                   ` (43 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

perf list only lists PMUs with events.  Add a
flag to cause a PMU to be also listed separately.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/pmu.c | 13 +++++++++++--
 tools/perf/util/pmu.h |  1 +
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index a742eeb..c6c240f 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -733,15 +733,18 @@ void print_pmu_events(const char *event_glob, bool name_only)
 
 	pmu = NULL;
 	len = 0;
-	while ((pmu = perf_pmu__scan(pmu)) != NULL)
+	while ((pmu = perf_pmu__scan(pmu)) != NULL) {
 		list_for_each_entry(alias, &pmu->aliases, list)
 			len++;
+		if (pmu->selectable)
+			len++;
+	}
 	aliases = malloc(sizeof(char *) * len);
 	if (!aliases)
 		return;
 	pmu = NULL;
 	j = 0;
-	while ((pmu = perf_pmu__scan(pmu)) != NULL)
+	while ((pmu = perf_pmu__scan(pmu)) != NULL) {
 		list_for_each_entry(alias, &pmu->aliases, list) {
 			char *name = format_alias(buf, sizeof(buf), pmu, alias);
 			bool is_cpu = !strcmp(pmu->name, "cpu");
@@ -758,6 +761,12 @@ void print_pmu_events(const char *event_glob, bool name_only)
 			aliases[j] = strdup(aliases[j]);
 			j++;
 		}
+		if (pmu->selectable) {
+			scnprintf(buf, sizeof(buf), "%s//", pmu->name);
+			aliases[j] = strdup(buf);
+			j++;
+		}
+	}
 	len = j;
 	qsort(aliases, len, sizeof(char *), cmp_string);
 	for (j = 0; j < len; j++) {
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index 437fdb2..d5266d1 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -18,6 +18,7 @@ struct perf_event_attr;
 struct perf_pmu {
 	char *name;
 	__u32 type;
+	bool selectable;
 	struct perf_event_attr *default_config;
 	struct cpu_map *cpus;
 	struct list_head format;
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 30/71] perf session: Add ability to skip 4GiB or more
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (28 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 29/71] perf pmu: Let pmu's with no events show up on perf list Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 31/71] perf session: Add perf_session__deliver_synth_event() Alexander Shishkin
                   ` (42 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

A session can be made to skip portions of the input
file.  Do not limit that size to 32-bits.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/session.c | 20 +++++++++++++-------
 1 file changed, 13 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 81fb4ad..3de0831 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1000,8 +1000,10 @@ static int perf_session_deliver_event(struct perf_session *session,
 	}
 }
 
-static int perf_session__process_user_event(struct perf_session *session, union perf_event *event,
-					    struct perf_tool *tool, u64 file_offset)
+static s64 perf_session__process_user_event(struct perf_session *session,
+					    union perf_event *event,
+					    struct perf_tool *tool,
+					    u64 file_offset)
 {
 	int fd = perf_data_file__fd(session->file);
 	int err;
@@ -1039,7 +1041,7 @@ static void event_swap(union perf_event *event, bool sample_id_all)
 		swap(event, sample_id_all);
 }
 
-static int perf_session__process_event(struct perf_session *session,
+static s64 perf_session__process_event(struct perf_session *session,
 				       union perf_event *event,
 				       struct perf_tool *tool,
 				       u64 file_offset)
@@ -1149,7 +1151,7 @@ static int __perf_session__process_pipe_events(struct perf_session *session,
 	union perf_event *event;
 	uint32_t size, cur_size = 0;
 	void *buf = NULL;
-	int skip = 0;
+	s64 skip = 0;
 	u64 head;
 	ssize_t err;
 	void *p;
@@ -1278,13 +1280,13 @@ int __perf_session__process_events(struct perf_session *session,
 				   u64 file_size, struct perf_tool *tool)
 {
 	int fd = perf_data_file__fd(session->file);
-	u64 head, page_offset, file_offset, file_pos;
+	u64 head, page_offset, file_offset, file_pos, size;
 	int err, mmap_prot, mmap_flags, map_idx = 0;
 	size_t	mmap_size;
 	char *buf, *mmaps[NUM_MMAPS];
 	union perf_event *event;
-	uint32_t size;
 	struct ui_progress prog;
+	s64 skip;
 
 	perf_tool__fill_defaults(tool);
 
@@ -1345,7 +1347,8 @@ more:
 	size = event->header.size;
 
 	if (size < sizeof(struct perf_event_header) ||
-	    perf_session__process_event(session, event, tool, file_pos) < 0) {
+	    (skip = perf_session__process_event(session, event, tool, file_pos))
+									< 0) {
 		pr_err("%#" PRIx64 " [%#x]: failed to process type: %d\n",
 		       file_offset + head, event->header.size,
 		       event->header.type);
@@ -1353,6 +1356,9 @@ more:
 		goto out_err;
 	}
 
+	if (skip)
+		size += skip;
+
 	head += size;
 	file_pos += size;
 
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 31/71] perf session: Add perf_session__deliver_synth_event()
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (29 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 30/71] perf session: Add ability to skip 4GiB or more Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 32/71] perf tools: Allow TSC conversion on any arch Alexander Shishkin
                   ` (41 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add a function to deliver synthesized events from
within a session.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/session.c | 14 ++++++++++++++
 tools/perf/util/session.h |  5 +++++
 2 files changed, 19 insertions(+)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 3de0831..f2ac351 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1032,6 +1032,20 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 	}
 }
 
+int perf_session__deliver_synth_event(struct perf_session *session,
+				      union perf_event *event,
+				      struct perf_sample *sample,
+				      struct perf_tool *tool)
+{
+	events_stats__inc(&session->stats, event->header.type);
+
+	if (event->header.type >= PERF_RECORD_USER_TYPE_START)
+		return perf_session__process_user_event(session, event, tool,
+							0);
+
+	return perf_session_deliver_event(session, event, sample, tool, 0);
+}
+
 static void event_swap(union perf_event *event, bool sample_id_all)
 {
 	perf_event__swap_op swap;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 60a31db..64d8145 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -129,6 +129,11 @@ extern volatile int session_done;
 
 #define session_done()	(*(volatile int *)(&session_done))
 
+int perf_session__deliver_synth_event(struct perf_session *session,
+				      union perf_event *event,
+				      struct perf_sample *sample,
+				      struct perf_tool *tool);
+
 int perf_event__process_id_index(struct perf_tool *tool,
 				 union perf_event *event,
 				 struct perf_session *session);
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 32/71] perf tools: Allow TSC conversion on any arch
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (30 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 31/71] perf session: Add perf_session__deliver_synth_event() Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 33/71] perf tools: Move rdtsc() function Alexander Shishkin
                   ` (40 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

It is possible to record a perf.data file on
one architecture and process it on another.
Consequently, TSC conversion functions need
to be moved out of the arch directory.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Makefile.perf            |  2 ++
 tools/perf/arch/x86/util/tsc.c      | 22 +---------------------
 tools/perf/arch/x86/util/tsc.h      |  3 ---
 tools/perf/tests/perf-time-to-tsc.c |  3 +--
 tools/perf/util/tsc.c               | 25 +++++++++++++++++++++++++
 tools/perf/util/tsc.h               | 11 +++++++++++
 6 files changed, 40 insertions(+), 26 deletions(-)
 create mode 100644 tools/perf/util/tsc.c
 create mode 100644 tools/perf/util/tsc.h

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 9a8cf37..c1f9a54 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -292,6 +292,7 @@ LIB_H += util/intlist.h
 LIB_H += util/perf_regs.h
 LIB_H += util/unwind.h
 LIB_H += util/vdso.h
+LIB_H += util/tsc.h
 LIB_H += ui/helpline.h
 LIB_H += ui/progress.h
 LIB_H += ui/util.h
@@ -370,6 +371,7 @@ LIB_OBJS += $(OUTPUT)util/stat.o
 LIB_OBJS += $(OUTPUT)util/record.o
 LIB_OBJS += $(OUTPUT)util/srcline.o
 LIB_OBJS += $(OUTPUT)util/data.o
+LIB_OBJS += $(OUTPUT)util/tsc.o
 
 LIB_OBJS += $(OUTPUT)ui/setup.o
 LIB_OBJS += $(OUTPUT)ui/helpline.o
diff --git a/tools/perf/arch/x86/util/tsc.c b/tools/perf/arch/x86/util/tsc.c
index b2519e4..831b6ed 100644
--- a/tools/perf/arch/x86/util/tsc.c
+++ b/tools/perf/arch/x86/util/tsc.c
@@ -6,29 +6,9 @@
 #include "../../perf.h"
 #include "../../util/types.h"
 #include "../../util/debug.h"
+#include "../../util/tsc.h"
 #include "tsc.h"
 
-u64 perf_time_to_tsc(u64 ns, struct perf_tsc_conversion *tc)
-{
-	u64 t, quot, rem;
-
-	t = ns - tc->time_zero;
-	quot = t / tc->time_mult;
-	rem  = t % tc->time_mult;
-	return (quot << tc->time_shift) +
-	       (rem << tc->time_shift) / tc->time_mult;
-}
-
-u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc)
-{
-	u64 quot, rem;
-
-	quot = cyc >> tc->time_shift;
-	rem  = cyc & ((1 << tc->time_shift) - 1);
-	return tc->time_zero + quot * tc->time_mult +
-	       ((rem * tc->time_mult) >> tc->time_shift);
-}
-
 int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
 			     struct perf_tsc_conversion *tc)
 {
diff --git a/tools/perf/arch/x86/util/tsc.h b/tools/perf/arch/x86/util/tsc.h
index a24dec8..18fb762 100644
--- a/tools/perf/arch/x86/util/tsc.h
+++ b/tools/perf/arch/x86/util/tsc.h
@@ -14,7 +14,4 @@ struct perf_event_mmap_page;
 int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
 			     struct perf_tsc_conversion *tc);
 
-u64 perf_time_to_tsc(u64 ns, struct perf_tsc_conversion *tc);
-u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc);
-
 #endif /* TOOLS_PERF_ARCH_X86_UTIL_TSC_H__ */
diff --git a/tools/perf/tests/perf-time-to-tsc.c b/tools/perf/tests/perf-time-to-tsc.c
index 4ca1b93..9ba7d38 100644
--- a/tools/perf/tests/perf-time-to-tsc.c
+++ b/tools/perf/tests/perf-time-to-tsc.c
@@ -9,10 +9,9 @@
 #include "evsel.h"
 #include "thread_map.h"
 #include "cpumap.h"
+#include "tsc.h"
 #include "tests.h"
 
-#include "../arch/x86/util/tsc.h"
-
 #define CHECK__(x) {				\
 	while ((x) < 0) {			\
 		pr_debug(#x " failed!\n");	\
diff --git a/tools/perf/util/tsc.c b/tools/perf/util/tsc.c
new file mode 100644
index 0000000..bc69d86
--- /dev/null
+++ b/tools/perf/util/tsc.c
@@ -0,0 +1,25 @@
+#include <linux/compiler.h>
+
+#include "types.h"
+#include "tsc.h"
+
+u64 perf_time_to_tsc(u64 ns, struct perf_tsc_conversion *tc)
+{
+	u64 t, quot, rem;
+
+	t = ns - tc->time_zero;
+	quot = t / tc->time_mult;
+	rem  = t % tc->time_mult;
+	return (quot << tc->time_shift) +
+	       (rem << tc->time_shift) / tc->time_mult;
+}
+
+u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc)
+{
+	u64 quot, rem;
+
+	quot = cyc >> tc->time_shift;
+	rem  = cyc & ((1 << tc->time_shift) - 1);
+	return tc->time_zero + quot * tc->time_mult +
+	       ((rem * tc->time_mult) >> tc->time_shift);
+}
diff --git a/tools/perf/util/tsc.h b/tools/perf/util/tsc.h
new file mode 100644
index 0000000..5083766
--- /dev/null
+++ b/tools/perf/util/tsc.h
@@ -0,0 +1,11 @@
+#ifndef __PERF_TSC_H
+#define __PERF_TSC_H
+
+#include "types.h"
+
+#include "../arch/x86/util/tsc.h"
+
+u64 perf_time_to_tsc(u64 ns, struct perf_tsc_conversion *tc);
+u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc);
+
+#endif
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 33/71] perf tools: Move rdtsc() function
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (31 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 32/71] perf tools: Allow TSC conversion on any arch Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 34/71] perf evlist: Add perf_evlist__enable_event_idx() Alexander Shishkin
                   ` (39 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Move the rdtsc() function so it can
be reusued.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/arch/x86/util/tsc.c      | 9 +++++++++
 tools/perf/tests/perf-time-to-tsc.c | 9 ---------
 tools/perf/util/tsc.c               | 5 +++++
 tools/perf/util/tsc.h               | 1 +
 4 files changed, 15 insertions(+), 9 deletions(-)

diff --git a/tools/perf/arch/x86/util/tsc.c b/tools/perf/arch/x86/util/tsc.c
index 831b6ed..e478f28 100644
--- a/tools/perf/arch/x86/util/tsc.c
+++ b/tools/perf/arch/x86/util/tsc.c
@@ -37,3 +37,12 @@ int perf_read_tsc_conversion(const struct perf_event_mmap_page *pc,
 
 	return 0;
 }
+
+u64 rdtsc(void)
+{
+	unsigned int low, high;
+
+	asm volatile("rdtsc" : "=a" (low), "=d" (high));
+
+	return low | ((u64)high) << 32;
+}
diff --git a/tools/perf/tests/perf-time-to-tsc.c b/tools/perf/tests/perf-time-to-tsc.c
index 9ba7d38..a5aebf9 100644
--- a/tools/perf/tests/perf-time-to-tsc.c
+++ b/tools/perf/tests/perf-time-to-tsc.c
@@ -26,15 +26,6 @@
 	}					\
 }
 
-static u64 rdtsc(void)
-{
-	unsigned int low, high;
-
-	asm volatile("rdtsc" : "=a" (low), "=d" (high));
-
-	return low | ((u64)high) << 32;
-}
-
 /**
  * test__perf_time_to_tsc - test converting perf time to TSC.
  *
diff --git a/tools/perf/util/tsc.c b/tools/perf/util/tsc.c
index bc69d86..617debb 100644
--- a/tools/perf/util/tsc.c
+++ b/tools/perf/util/tsc.c
@@ -23,3 +23,8 @@ u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc)
 	return tc->time_zero + quot * tc->time_mult +
 	       ((rem * tc->time_mult) >> tc->time_shift);
 }
+
+u64 __weak rdtsc(void)
+{
+	return 0;
+}
diff --git a/tools/perf/util/tsc.h b/tools/perf/util/tsc.h
index 5083766..181f778 100644
--- a/tools/perf/util/tsc.h
+++ b/tools/perf/util/tsc.h
@@ -7,5 +7,6 @@
 
 u64 perf_time_to_tsc(u64 ns, struct perf_tsc_conversion *tc);
 u64 tsc_to_perf_time(u64 cyc, struct perf_tsc_conversion *tc);
+u64 rdtsc(void);
 
 #endif
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 34/71] perf evlist: Add perf_evlist__enable_event_idx()
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (32 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 33/71] perf tools: Move rdtsc() function Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 35/71] perf tools: Add itrace members of struct perf_event_attr Alexander Shishkin
                   ` (38 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add a function to enable a specific event
within a specific perf event buffer.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evlist.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/evlist.h |  2 ++
 2 files changed, 49 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 7ae3139..e750a21 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -350,6 +350,53 @@ int perf_evlist__enable_event(struct perf_evlist *evlist,
 	return 0;
 }
 
+static int perf_evlist__enable_event_cpu(struct perf_evlist *evlist,
+					 struct perf_evsel *evsel, int cpu)
+{
+	int thread, err;
+	int nr_threads = perf_evlist__nr_threads(evlist, evsel);
+
+	if (!evsel->fd)
+		return -EINVAL;
+
+	for (thread = 0; thread < nr_threads; thread++) {
+		err = ioctl(FD(evsel, cpu, thread),
+				PERF_EVENT_IOC_ENABLE, 0);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+static int perf_evlist__enable_event_thread(struct perf_evlist *evlist,
+					    struct perf_evsel *evsel,
+					    int thread)
+{
+	int cpu, err;
+	int nr_cpus = cpu_map__nr(evlist->cpus);
+
+	if (!evsel->fd)
+		return -EINVAL;
+
+	for (cpu = 0; cpu < nr_cpus; cpu++) {
+		err = ioctl(FD(evsel, cpu, thread), PERF_EVENT_IOC_ENABLE, 0);
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+int perf_evlist__enable_event_idx(struct perf_evlist *evlist,
+				  struct perf_evsel *evsel, int idx)
+{
+	bool per_cpu_mmaps = !cpu_map__empty(evlist->cpus);
+
+	if (per_cpu_mmaps)
+		return perf_evlist__enable_event_cpu(evlist, evsel, idx);
+	else
+		return perf_evlist__enable_event_thread(evlist, evsel, idx);
+}
+
 static int perf_evlist__alloc_pollfd(struct perf_evlist *evlist)
 {
 	int nr_cpus = cpu_map__nr(evlist->cpus);
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 2c8d068..f0ce3bf 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -122,6 +122,8 @@ int perf_evlist__disable_event(struct perf_evlist *evlist,
 			       struct perf_evsel *evsel);
 int perf_evlist__enable_event(struct perf_evlist *evlist,
 			      struct perf_evsel *evsel);
+int perf_evlist__enable_event_idx(struct perf_evlist *evlist,
+				  struct perf_evsel *evsel, int idx);
 
 void perf_evlist__set_selected(struct perf_evlist *evlist,
 			       struct perf_evsel *evsel);
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 35/71] perf tools: Add itrace members of struct perf_event_attr
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (33 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 34/71] perf evlist: Add perf_evlist__enable_event_idx() Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 36/71] perf tools: Add support for parsing pmu itrace_config Alexander Shishkin
                   ` (37 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add new Instruction Tracing members of struct perf_event_attr
to debug prints and byte swapping.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/evsel.c   | 4 ++++
 tools/perf/util/session.c | 4 ++++
 2 files changed, 8 insertions(+)

diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 4e92a22..da2116c 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -981,6 +981,10 @@ static size_t perf_event_attr__fprintf(struct perf_event_attr *attr, FILE *fp)
 	ret += PRINT_ATTR_X64(branch_sample_type);
 	ret += PRINT_ATTR_X64(sample_regs_user);
 	ret += PRINT_ATTR_U32(sample_stack_user);
+	ret += PRINT_ATTR_X64(itrace_config);
+	ret += PRINT_ATTR_U32(itrace_watermark);
+	ret += PRINT_ATTR_U32(itrace_sample_type);
+	ret += PRINT_ATTR_U64(itrace_sample_size);
 
 	ret += fprintf(fp, "%.60s\n", graph_dotted_line);
 
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index f2ac351..7847096 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -407,6 +407,10 @@ void perf_event__attr_swap(struct perf_event_attr *attr)
 	attr->branch_sample_type = bswap_64(attr->branch_sample_type);
 	attr->sample_regs_user	 = bswap_64(attr->sample_regs_user);
 	attr->sample_stack_user  = bswap_32(attr->sample_stack_user);
+	attr->itrace_config	 = bswap_64(attr->itrace_config);
+	attr->itrace_watermark	 = bswap_32(attr->itrace_watermark);
+	attr->itrace_sample_type = bswap_32(attr->itrace_sample_type);
+	attr->itrace_sample_size = bswap_64(attr->itrace_sample_size);
 
 	swap_bitfield((u8 *) (&attr->read_format + 1), sizeof(u64));
 }
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 36/71] perf tools: Add support for parsing pmu itrace_config
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (34 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 35/71] perf tools: Add itrace members of struct perf_event_attr Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 37/71] perf tools: Add support for PERF_RECORD_ITRACE_LOST Alexander Shishkin
                   ` (36 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Instruction Tracing uses a new perf_event_attr member
named itrace_config.  Add support for parsing pmu format
values and term types based on itrace_config.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/parse-events.c | 4 ++++
 tools/perf/util/parse-events.h | 1 +
 tools/perf/util/parse-events.l | 1 +
 tools/perf/util/pmu.c          | 3 +++
 tools/perf/util/pmu.h          | 1 +
 tools/perf/util/pmu.l          | 1 +
 tools/perf/util/pmu.y          | 9 ++++++++-
 7 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index 464dafd..d494a5a 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -576,6 +576,10 @@ do {								\
 	case PARSE_EVENTS__TERM_TYPE_NAME:
 		CHECK_TYPE_VAL(STR);
 		break;
+	case PARSE_EVENTS__TERM_TYPE_ITRACE_CONFIG:
+		CHECK_TYPE_VAL(NUM);
+		attr->itrace_config = term->val.num;
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index f1cb4c4..86a2721 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -49,6 +49,7 @@ enum {
 	PARSE_EVENTS__TERM_TYPE_NAME,
 	PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD,
 	PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE,
+	PARSE_EVENTS__TERM_TYPE_ITRACE_CONFIG,
 };
 
 struct parse_events_term {
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index 3432995..85106c4 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -133,6 +133,7 @@ config2			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_CONFIG2); }
 name			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NAME); }
 period			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_SAMPLE_PERIOD); }
 branch_type		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_BRANCH_SAMPLE_TYPE); }
+itrace_config		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_ITRACE_CONFIG); }
 ,			{ return ','; }
 "/"			{ BEGIN(INITIAL); return '/'; }
 {name_minus}		{ return str(yyscanner, PE_NAME); }
diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index c6c240f..e308dfe 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -547,6 +547,9 @@ static int pmu_config_term(struct list_head *formats,
 	case PERF_PMU_FORMAT_VALUE_CONFIG2:
 		vp = &attr->config2;
 		break;
+	case PERF_PMU_FORMAT_VALUE_ITRACE_CONFIG:
+		vp = &attr->itrace_config;
+		break;
 	default:
 		return -EINVAL;
 	}
diff --git a/tools/perf/util/pmu.h b/tools/perf/util/pmu.h
index d5266d1..52e4d30 100644
--- a/tools/perf/util/pmu.h
+++ b/tools/perf/util/pmu.h
@@ -9,6 +9,7 @@ enum {
 	PERF_PMU_FORMAT_VALUE_CONFIG,
 	PERF_PMU_FORMAT_VALUE_CONFIG1,
 	PERF_PMU_FORMAT_VALUE_CONFIG2,
+	PERF_PMU_FORMAT_VALUE_ITRACE_CONFIG,
 };
 
 #define PERF_PMU_FORMAT_BITS 64
diff --git a/tools/perf/util/pmu.l b/tools/perf/util/pmu.l
index a15d9fb..c94ee8cd 100644
--- a/tools/perf/util/pmu.l
+++ b/tools/perf/util/pmu.l
@@ -29,6 +29,7 @@ num_dec         [0-9]+
 config		{ return PP_CONFIG; }
 config1		{ return PP_CONFIG1; }
 config2		{ return PP_CONFIG2; }
+itrace_config	{ return PP_ITRACE_CONFIG; }
 -		{ return '-'; }
 :		{ return ':'; }
 ,		{ return ','; }
diff --git a/tools/perf/util/pmu.y b/tools/perf/util/pmu.y
index bfd7e85..caa190e 100644
--- a/tools/perf/util/pmu.y
+++ b/tools/perf/util/pmu.y
@@ -20,7 +20,7 @@ do { \
 
 %}
 
-%token PP_CONFIG PP_CONFIG1 PP_CONFIG2
+%token PP_CONFIG PP_CONFIG1 PP_CONFIG2 PP_ITRACE_CONFIG
 %token PP_VALUE PP_ERROR
 %type <num> PP_VALUE
 %type <bits> bit_term
@@ -60,6 +60,13 @@ PP_CONFIG2 ':' bits
 				      PERF_PMU_FORMAT_VALUE_CONFIG2,
 				      $3));
 }
+|
+PP_ITRACE_CONFIG ':' bits
+{
+	ABORT_ON(perf_pmu__new_format(format, name,
+				      PERF_PMU_FORMAT_VALUE_ITRACE_CONFIG,
+				      $3));
+}
 
 bits:
 bits ',' bit_term
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 37/71] perf tools: Add support for PERF_RECORD_ITRACE_LOST
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (35 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 36/71] perf tools: Add support for parsing pmu itrace_config Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 38/71] perf tools: Add itrace sample parsing Alexander Shishkin
                   ` (35 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Instruction Tracing may lose data, for example when buffers
are full.  In that case, an event of type PERF_RECORD_ITRACE_LOST
is created to inform of the loss.  Add support for that event type.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-inject.c |  1 +
 tools/perf/util/event.c     | 17 +++++++++++++++++
 tools/perf/util/event.h     | 11 +++++++++++
 tools/perf/util/machine.c   | 10 ++++++++++
 tools/perf/util/machine.h   |  2 ++
 tools/perf/util/session.c   |  5 +++++
 tools/perf/util/tool.h      |  1 +
 7 files changed, 47 insertions(+)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 8400d29..78911a3 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -417,6 +417,7 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 			.fork		= perf_event__repipe,
 			.exit		= perf_event__repipe,
 			.lost		= perf_event__repipe,
+			.itrace_lost	= perf_event__repipe,
 			.read		= perf_event__repipe_sample,
 			.throttle	= perf_event__repipe,
 			.unthrottle	= perf_event__repipe,
diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 30f91d7..4d001d9 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -20,6 +20,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_FORK]			= "FORK",
 	[PERF_RECORD_READ]			= "READ",
 	[PERF_RECORD_SAMPLE]			= "SAMPLE",
+	[PERF_RECORD_ITRACE_LOST]		= "ITRACE_LOST",
 	[PERF_RECORD_HEADER_ATTR]		= "ATTR",
 	[PERF_RECORD_HEADER_EVENT_TYPE]		= "EVENT_TYPE",
 	[PERF_RECORD_HEADER_TRACING_DATA]	= "TRACING_DATA",
@@ -536,6 +537,14 @@ int perf_event__process_lost(struct perf_tool *tool __maybe_unused,
 	return machine__process_lost_event(machine, event, sample);
 }
 
+int perf_event__process_itrace_lost(struct perf_tool *tool __maybe_unused,
+				    union perf_event *event,
+				    struct perf_sample *sample __maybe_unused,
+				    struct machine *machine)
+{
+	return machine__process_itrace_lost_event(machine, event);
+}
+
 size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp)
 {
 	return fprintf(fp, " %d/%d: [%#" PRIx64 "(%#" PRIx64 ") @ %#" PRIx64 "]: %c %s\n",
@@ -596,6 +605,11 @@ int perf_event__process_exit(struct perf_tool *tool __maybe_unused,
 	return machine__process_exit_event(machine, event, sample);
 }
 
+size_t perf_event__fprintf_itrace_lost(union perf_event *event, FILE *fp)
+{
+	return fprintf(fp, " offset: %#"PRIx64"\n", event->itrace_lost.offset);
+}
+
 size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 {
 	size_t ret = fprintf(fp, "PERF_RECORD_%s",
@@ -615,6 +629,9 @@ size_t perf_event__fprintf(union perf_event *event, FILE *fp)
 	case PERF_RECORD_MMAP2:
 		ret += perf_event__fprintf_mmap2(event, fp);
 		break;
+	case PERF_RECORD_ITRACE_LOST:
+		ret += perf_event__fprintf_itrace_lost(event, fp);
+		break;
 	default:
 		ret += fprintf(fp, "\n");
 	}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index 88e27cb..b684398 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -193,6 +193,11 @@ struct id_index_event {
 	struct id_index_entry entries[0];
 };
 
+struct itrace_lost_event {
+	struct perf_event_header header;
+	u64 offset;
+};
+
 union perf_event {
 	struct perf_event_header	header;
 	struct mmap_event		mmap;
@@ -208,6 +213,7 @@ union perf_event {
 	struct tracing_data_event	tracing_data;
 	struct build_id_event		build_id;
 	struct id_index_event		id_index;
+	struct itrace_lost_event	itrace_lost;
 };
 
 void perf_event__print_totals(void);
@@ -244,6 +250,10 @@ int perf_event__process_lost(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
 			     struct machine *machine);
+int perf_event__process_itrace_lost(struct perf_tool *tool,
+				    union perf_event *event,
+				    struct perf_sample *sample,
+				    struct machine *machine);
 int perf_event__process_mmap(struct perf_tool *tool,
 			     union perf_event *event,
 			     struct perf_sample *sample,
@@ -285,6 +295,7 @@ size_t perf_event__fprintf_comm(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_mmap(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_mmap2(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf_task(union perf_event *event, FILE *fp);
+size_t perf_event__fprintf_itrace_lost(union perf_event *event, FILE *fp);
 size_t perf_event__fprintf(union perf_event *event, FILE *fp);
 
 #endif /* __PERF_RECORD_H */
diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
index a04210d..a224cf7 100644
--- a/tools/perf/util/machine.c
+++ b/tools/perf/util/machine.c
@@ -361,6 +361,14 @@ int machine__process_lost_event(struct machine *machine __maybe_unused,
 	return 0;
 }
 
+int machine__process_itrace_lost_event(struct machine *machine __maybe_unused,
+				       union perf_event *event)
+{
+	if (dump_trace)
+		perf_event__fprintf_itrace_lost(event, stdout);
+	return 0;
+}
+
 struct map *machine__new_module(struct machine *machine, u64 start,
 				const char *filename)
 {
@@ -1149,6 +1157,8 @@ int machine__process_event(struct machine *machine, union perf_event *event,
 		ret = machine__process_exit_event(machine, event, sample); break;
 	case PERF_RECORD_LOST:
 		ret = machine__process_lost_event(machine, event, sample); break;
+	case PERF_RECORD_ITRACE_LOST:
+		ret = machine__process_itrace_lost_event(machine, event); break;
 	default:
 		ret = -1;
 		break;
diff --git a/tools/perf/util/machine.h b/tools/perf/util/machine.h
index aaad99a..fee1f89 100644
--- a/tools/perf/util/machine.h
+++ b/tools/perf/util/machine.h
@@ -50,6 +50,8 @@ int machine__process_fork_event(struct machine *machine, union perf_event *event
 				struct perf_sample *sample);
 int machine__process_lost_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
+int machine__process_itrace_lost_event(struct machine *machine,
+				       union perf_event *event);
 int machine__process_mmap_event(struct machine *machine, union perf_event *event,
 				struct perf_sample *sample);
 int machine__process_mmap2_event(struct machine *machine, union perf_event *event,
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 7847096..de09c2e 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -236,6 +236,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->exit = process_event_stub;
 	if (tool->lost == NULL)
 		tool->lost = perf_event__process_lost;
+	if (tool->itrace_lost == NULL)
+		tool->itrace_lost = perf_event__process_itrace_lost;
 	if (tool->read == NULL)
 		tool->read = process_event_sample_stub;
 	if (tool->throttle == NULL)
@@ -454,6 +456,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
 	[PERF_RECORD_THROTTLE]		  = perf_event__throttle_swap,
 	[PERF_RECORD_UNTHROTTLE]	  = perf_event__throttle_swap,
 	[PERF_RECORD_SAMPLE]		  = perf_event__all64_swap,
+	[PERF_RECORD_ITRACE_LOST]	  = perf_event__all64_swap,
 	[PERF_RECORD_HEADER_ATTR]	  = perf_event__hdr_attr_swap,
 	[PERF_RECORD_HEADER_EVENT_TYPE]	  = perf_event__event_type_swap,
 	[PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
@@ -998,6 +1001,8 @@ static int perf_session_deliver_event(struct perf_session *session,
 		return tool->throttle(tool, event, sample, machine);
 	case PERF_RECORD_UNTHROTTLE:
 		return tool->unthrottle(tool, event, sample, machine);
+	case PERF_RECORD_ITRACE_LOST:
+		return tool->itrace_lost(tool, event, sample, machine);
 	default:
 		++session->stats.nr_unknown_events;
 		return -1;
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index f07d6fe..18afd13 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -34,6 +34,7 @@ struct perf_tool {
 			fork,
 			exit,
 			lost,
+			itrace_lost,
 			throttle,
 			unthrottle;
 	event_attr_op	attr;
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 38/71] perf tools: Add itrace sample parsing
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (36 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 37/71] perf tools: Add support for PERF_RECORD_ITRACE_LOST Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 39/71] perf header: Add Instruction Tracing feature Alexander Shishkin
                   ` (34 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add supoprt for parsing samples containing
Instruction Traces i.e. PERF_SAMPLE_ITRACE.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/tests/sample-parsing.c |  7 ++++++-
 tools/perf/util/event.h           |  6 ++++++
 tools/perf/util/evlist.c          | 12 ++++++++++++
 tools/perf/util/evlist.h          |  4 ++++
 tools/perf/util/evsel.c           | 33 +++++++++++++++++++++++++++++++--
 tools/perf/util/evsel.h           | 13 +++++++++++--
 tools/perf/util/session.c         |  3 ++-
 7 files changed, 72 insertions(+), 6 deletions(-)

diff --git a/tools/perf/tests/sample-parsing.c b/tools/perf/tests/sample-parsing.c
index 1b67720..acc8132 100644
--- a/tools/perf/tests/sample-parsing.c
+++ b/tools/perf/tests/sample-parsing.c
@@ -155,6 +155,7 @@ static int do_test(u64 sample_type, u64 sample_regs_user, u64 read_format)
 	u64 user_regs[64];
 	const u64 raw_data[] = {0x123456780a0b0c0dULL, 0x1102030405060708ULL};
 	const u64 data[] = {0x2211443366558877ULL, 0, 0xaabbccddeeff4321ULL};
+	const u64 itrace_data[] = {0xa55a, 0, 0xeeddee, 0x0282028202820282};
 	struct perf_sample sample = {
 		.ip		= 101,
 		.pid		= 102,
@@ -184,6 +185,10 @@ static int do_test(u64 sample_type, u64 sample_regs_user, u64 read_format)
 			.time_enabled = 0x030a59d664fca7deULL,
 			.time_running = 0x011b6ae553eb98edULL,
 		},
+		.itrace_sample	= {
+			.size	= sizeof(itrace_data),
+			.data	= (void *)itrace_data,
+		},
 	};
 	struct sample_read_value values[] = {{1, 5}, {9, 3}, {2, 7}, {6, 4},};
 	struct perf_sample sample_out;
@@ -280,7 +285,7 @@ int test__sample_parsing(void)
 	 * were added.  Please actually update the test rather than just change
 	 * the condition below.
 	 */
-	if (PERF_SAMPLE_MAX > PERF_SAMPLE_TRANSACTION << 1) {
+	if (PERF_SAMPLE_MAX > PERF_SAMPLE_ITRACE << 1) {
 		pr_debug("sample format has changed, some new PERF_SAMPLE_ bit was introduced - test needs updating\n");
 		return -1;
 	}
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index b684398..cc49148 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -111,6 +111,11 @@ struct sample_read {
 	};
 };
 
+struct itrace_sample {
+	u64 size;
+	void *data;
+};
+
 struct perf_sample {
 	u64 ip;
 	u32 pid, tid;
@@ -130,6 +135,7 @@ struct perf_sample {
 	struct regs_dump  user_regs;
 	struct stack_dump user_stack;
 	struct sample_read read;
+	struct itrace_sample itrace_sample;
 };
 
 #define PERF_MEM_DATA_SRC_NONE \
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e750a21..490af37 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1238,6 +1238,18 @@ int perf_evlist__parse_sample(struct perf_evlist *evlist, union perf_event *even
 	return perf_evsel__parse_sample(evsel, event, sample);
 }
 
+int __perf_evlist__parse_sample(struct perf_evlist *evlist,
+				union perf_event *event,
+				struct perf_sample *sample,
+				bool fix_swap)
+{
+	struct perf_evsel *evsel = perf_evlist__event2evsel(evlist, event);
+
+	if (!evsel)
+		return -EFAULT;
+	return __perf_evsel__parse_sample(evsel, event, sample, fix_swap);
+}
+
 size_t perf_evlist__fprintf(struct perf_evlist *evlist, FILE *fp)
 {
 	struct perf_evsel *evsel;
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index f0ce3bf..6f3166e 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -151,6 +151,10 @@ u16 perf_evlist__id_hdr_size(struct perf_evlist *evlist);
 
 int perf_evlist__parse_sample(struct perf_evlist *evlist, union perf_event *event,
 			      struct perf_sample *sample);
+int __perf_evlist__parse_sample(struct perf_evlist *evlist,
+				union perf_event *event,
+				struct perf_sample *sample,
+				bool fix_swap);
 
 bool perf_evlist__valid_sample_type(struct perf_evlist *evlist);
 bool perf_evlist__valid_sample_id_all(struct perf_evlist *evlist);
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index da2116c..88b7edd 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1233,8 +1233,10 @@ static inline bool overflow(const void *endp, u16 max_size, const void *offset,
 #define OVERFLOW_CHECK_u64(offset) \
 	OVERFLOW_CHECK(offset, sizeof(u64), sizeof(u64))
 
-int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
-			     struct perf_sample *data)
+int __perf_evsel__parse_sample(struct perf_evsel *evsel,
+			       union perf_event *event,
+			       struct perf_sample *data,
+			       bool fix_swap)
 {
 	u64 type = evsel->attr.sample_type;
 	bool swapped = evsel->needs_swap;
@@ -1479,6 +1481,19 @@ int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
 		array++;
 	}
 
+	if (type & PERF_SAMPLE_ITRACE) {
+		OVERFLOW_CHECK_u64(array);
+		sz = *array++;
+
+		OVERFLOW_CHECK(array, sz, max_size);
+		/* Undo swap of data */
+		if (fix_swap && swapped)
+			mem_bswap_64((char *)array, sz);
+		data->itrace_sample.size = sz;
+		data->itrace_sample.data = (char *)array;
+		array = (void *)array + sz;
+	}
+
 	return 0;
 }
 
@@ -1574,6 +1589,11 @@ size_t perf_event__sample_event_size(const struct perf_sample *sample, u64 type,
 	if (type & PERF_SAMPLE_TRANSACTION)
 		result += sizeof(u64);
 
+	if (type & PERF_SAMPLE_ITRACE) {
+		result += sizeof(u64);
+		result += sample->itrace_sample.size;
+	}
+
 	return result;
 }
 
@@ -1752,6 +1772,15 @@ int perf_event__synthesize_sample(union perf_event *event, u64 type,
 		array++;
 	}
 
+	if (type & PERF_SAMPLE_ITRACE) {
+		sz = sample->itrace_sample.size;
+		*array++ = sz;
+		memcpy(array, sample->itrace_sample.data, sz);
+		if (swapped)
+			mem_bswap_64((char *)array, sz);
+		array = (void *)array + sz;
+	}
+
 	return 0;
 }
 
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 3e25e23..b0433fd 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -280,8 +280,17 @@ static inline int perf_evsel__read_scaled(struct perf_evsel *evsel,
 
 void hists__init(struct hists *hists);
 
-int perf_evsel__parse_sample(struct perf_evsel *evsel, union perf_event *event,
-			     struct perf_sample *sample);
+int __perf_evsel__parse_sample(struct perf_evsel *evsel,
+			       union perf_event *event,
+			       struct perf_sample *data,
+			       bool fix_swap);
+
+static inline int perf_evsel__parse_sample(struct perf_evsel *evsel,
+					   union perf_event *event,
+					   struct perf_sample *data)
+{
+	return __perf_evsel__parse_sample(evsel, event, data, false);
+}
 
 static inline struct perf_evsel *perf_evsel__next(struct perf_evsel *evsel)
 {
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index de09c2e..fbf9024 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1086,7 +1086,8 @@ static s64 perf_session__process_event(struct perf_session *session,
 	/*
 	 * For all kernel events we get the sample data
 	 */
-	ret = perf_evlist__parse_sample(session->evlist, event, &sample);
+	ret = __perf_evlist__parse_sample(session->evlist, event, &sample,
+					  true);
 	if (ret)
 		return ret;
 
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 39/71] perf header: Add Instruction Tracing feature
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (37 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 38/71] perf tools: Add itrace sample parsing Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 40/71] perf evlist: Add ability to mmap itrace buffers Alexander Shishkin
                   ` (33 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add a feature to indicate that a perf.data file
contains Instruction Tracing data.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/header.c | 14 ++++++++++++++
 tools/perf/util/header.h |  1 +
 2 files changed, 15 insertions(+)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 0114f0a..72bcca9 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1188,6 +1188,13 @@ static int write_branch_stack(int fd __maybe_unused,
 	return 0;
 }
 
+static int write_itrace(int fd __maybe_unused,
+			struct perf_header *h __maybe_unused,
+			struct perf_evlist *evlist __maybe_unused)
+{
+	return 0;
+}
+
 static void print_hostname(struct perf_header *ph, int fd __maybe_unused,
 			   FILE *fp)
 {
@@ -1485,6 +1492,12 @@ static void print_branch_stack(struct perf_header *ph __maybe_unused,
 	fprintf(fp, "# contains samples with branch stack\n");
 }
 
+static void print_itrace(struct perf_header *ph __maybe_unused,
+			 int fd __maybe_unused, FILE *fp)
+{
+	fprintf(fp, "# contains Instruction Traces\n");
+}
+
 static void print_pmu_mappings(struct perf_header *ph, int fd __maybe_unused,
 			       FILE *fp)
 {
@@ -2195,6 +2208,7 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
 	FEAT_OPA(HEADER_BRANCH_STACK,	branch_stack),
 	FEAT_OPP(HEADER_PMU_MAPPINGS,	pmu_mappings),
 	FEAT_OPP(HEADER_GROUP_DESC,	group_desc),
+	FEAT_OPA(HEADER_ITRACE,		itrace),
 };
 
 struct header_print_data {
diff --git a/tools/perf/util/header.h b/tools/perf/util/header.h
index e8c45fa..be60483 100644
--- a/tools/perf/util/header.h
+++ b/tools/perf/util/header.h
@@ -30,6 +30,7 @@ enum {
 	HEADER_BRANCH_STACK,
 	HEADER_PMU_MAPPINGS,
 	HEADER_GROUP_DESC,
+	HEADER_ITRACE,
 	HEADER_LAST_FEATURE,
 	HEADER_FEAT_BITS	= 256,
 };
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 40/71] perf evlist: Add ability to mmap itrace buffers
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (38 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 39/71] perf header: Add Instruction Tracing feature Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 41/71] perf tools: Add user events for Instruction Tracing Alexander Shishkin
                   ` (32 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Instruction Tracing data is provided in a separate
mmap made a special offset PERF_EVENT_ITRACE_OFFSET.
Add the ability to mmap that offset.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Makefile.perf |   2 +
 tools/perf/util/evlist.c |  57 +++++++++++++++++++++--
 tools/perf/util/evlist.h |   5 ++
 tools/perf/util/itrace.c | 104 +++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/itrace.h | 118 +++++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 283 insertions(+), 3 deletions(-)
 create mode 100644 tools/perf/util/itrace.c
 create mode 100644 tools/perf/util/itrace.h

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index c1f9a54..6ef50f9 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -293,6 +293,7 @@ LIB_H += util/perf_regs.h
 LIB_H += util/unwind.h
 LIB_H += util/vdso.h
 LIB_H += util/tsc.h
+LIB_H += util/itrace.h
 LIB_H += ui/helpline.h
 LIB_H += ui/progress.h
 LIB_H += ui/util.h
@@ -372,6 +373,7 @@ LIB_OBJS += $(OUTPUT)util/record.o
 LIB_OBJS += $(OUTPUT)util/srcline.o
 LIB_OBJS += $(OUTPUT)util/data.o
 LIB_OBJS += $(OUTPUT)util/tsc.o
+LIB_OBJS += $(OUTPUT)util/itrace.o
 
 LIB_OBJS += $(OUTPUT)ui/setup.o
 LIB_OBJS += $(OUTPUT)ui/helpline.o
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 490af37..c720c6c 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -659,12 +659,39 @@ void perf_evlist__mmap_consume(struct perf_evlist *evlist, int idx)
 	}
 }
 
+int __weak itrace_mmap__mmap(struct itrace_mmap *mm __maybe_unused,
+			     struct itrace_mmap_params *mp __maybe_unused,
+			     int fd __maybe_unused)
+{
+	return 0;
+}
+
+void __weak itrace_mmap__munmap(struct itrace_mmap *mm __maybe_unused)
+{
+}
+
+void __weak itrace_mmap_params__init(
+			struct itrace_mmap_params *mp __maybe_unused,
+			unsigned int itrace_pages __maybe_unused,
+			bool itrace_overwrite __maybe_unused)
+{
+}
+
+void __weak itrace_mmap_params__set_idx(
+			struct itrace_mmap_params *mp __maybe_unused,
+			struct perf_evlist *evlist __maybe_unused,
+			int idx __maybe_unused,
+			bool per_cpu __maybe_unused)
+{
+}
+
 static void __perf_evlist__munmap(struct perf_evlist *evlist, int idx)
 {
 	if (evlist->mmap[idx].base != NULL) {
 		munmap(evlist->mmap[idx].base, evlist->mmap_len);
 		evlist->mmap[idx].base = NULL;
 	}
+	itrace_mmap__munmap(&evlist->mmap[idx].itrace_mmap);
 }
 
 void perf_evlist__munmap(struct perf_evlist *evlist)
@@ -690,6 +717,7 @@ static int perf_evlist__alloc_mmap(struct perf_evlist *evlist)
 struct mmap_params {
 	int prot;
 	int mask;
+	struct itrace_mmap_params itrace_mp;
 };
 
 static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
@@ -706,6 +734,10 @@ static int __perf_evlist__mmap(struct perf_evlist *evlist, int idx,
 		return -1;
 	}
 
+	if (itrace_mmap__mmap(&evlist->mmap[idx].itrace_mmap,
+			      &mp->itrace_mp, fd))
+		return -1;
+
 	perf_evlist__add_pollfd(evlist, fd);
 	return 0;
 }
@@ -756,6 +788,8 @@ static int perf_evlist__mmap_per_cpu(struct perf_evlist *evlist,
 	for (cpu = 0; cpu < nr_cpus; cpu++) {
 		int output = -1;
 
+		itrace_mmap_params__set_idx(&mp->itrace_mp, evlist, cpu, true);
+
 		for (thread = 0; thread < nr_threads; thread++) {
 			if (perf_evlist__mmap_per_evsel(evlist, cpu, mp, cpu,
 							thread, &output))
@@ -781,6 +815,9 @@ static int perf_evlist__mmap_per_thread(struct perf_evlist *evlist,
 	for (thread = 0; thread < nr_threads; thread++) {
 		int output = -1;
 
+		itrace_mmap_params__set_idx(&mp->itrace_mp, evlist, thread,
+					    false);
+
 		if (perf_evlist__mmap_per_evsel(evlist, thread, mp, 0, thread,
 						&output))
 			goto out_unmap;
@@ -868,19 +905,25 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 }
 
 /**
- * perf_evlist__mmap - Create mmaps to receive events.
+ * perf_evlist__mmap_ex - Create mmaps to receive events.
  * @evlist: list of events
  * @pages: map length in pages
  * @overwrite: overwrite older events?
+ * @itrace_pages - itrace map length in pages
+ * @itrace_overwrite - overwrite older itrace data?
  *
  * If @overwrite is %false the user needs to signal event consumption using
  * perf_mmap__write_tail().  Using perf_evlist__mmap_read() does this
  * automatically.
  *
+ * Similarly, if @itrace_overwrite is %false the user needs to signal data
+ * consumption using itrace_mmap__write_tail().
+ *
  * Return: %0 on success, negative error code otherwise.
  */
-int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
-		      bool overwrite)
+int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
+			 bool overwrite, unsigned int itrace_pages,
+			 bool itrace_overwrite)
 {
 	struct perf_evsel *evsel;
 	const struct cpu_map *cpus = evlist->cpus;
@@ -900,6 +943,8 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 	pr_debug("mmap size %zuB\n", evlist->mmap_len);
 	mp.mask = evlist->mmap_len - page_size - 1;
 
+	itrace_mmap_params__init(&mp.itrace_mp, itrace_pages, itrace_overwrite);
+
 	list_for_each_entry(evsel, &evlist->entries, node) {
 		if ((evsel->attr.read_format & PERF_FORMAT_ID) &&
 		    evsel->sample_id == NULL &&
@@ -913,6 +958,12 @@ int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 	return perf_evlist__mmap_per_cpu(evlist, &mp);
 }
 
+int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
+		      bool overwrite)
+{
+	return perf_evlist__mmap_ex(evlist, pages, overwrite, 0, false);
+}
+
 int perf_evlist__create_maps(struct perf_evlist *evlist, struct target *target)
 {
 	evlist->threads = thread_map__new_str(target->pid, target->tid,
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 6f3166e..7f56fdc 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -7,6 +7,7 @@
 #include "event.h"
 #include "evsel.h"
 #include "util.h"
+#include "itrace.h"
 #include <unistd.h>
 
 struct pollfd;
@@ -21,6 +22,7 @@ struct perf_mmap {
 	void		 *base;
 	int		 mask;
 	unsigned int	 prev;
+	struct itrace_mmap itrace_mmap;
 	char		 event_copy[PERF_SAMPLE_MAX_SIZE];
 };
 
@@ -111,6 +113,9 @@ int perf_evlist__parse_mmap_pages(const struct option *opt,
 				  const char *str,
 				  int unset);
 
+int perf_evlist__mmap_ex(struct perf_evlist *evlist, unsigned int pages,
+			 bool overwrite, unsigned int itrace_pages,
+			 bool itrace_overwrite);
 int perf_evlist__mmap(struct perf_evlist *evlist, unsigned int pages,
 		      bool overwrite);
 void perf_evlist__munmap(struct perf_evlist *evlist);
diff --git a/tools/perf/util/itrace.c b/tools/perf/util/itrace.c
new file mode 100644
index 0000000..a889e63
--- /dev/null
+++ b/tools/perf/util/itrace.c
@@ -0,0 +1,104 @@
+/*
+ * itrace.c: Instruction Tracing support
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#include <sys/types.h>
+#include <sys/mman.h>
+#include <stdbool.h>
+
+#include <linux/kernel.h>
+#include <linux/perf_event.h>
+
+#include "../perf.h"
+#include "types.h"
+#include "util.h"
+#include "evlist.h"
+#include "cpumap.h"
+#include "thread_map.h"
+#include "itrace.h"
+
+int itrace_mmap__mmap(struct itrace_mmap *mm, struct itrace_mmap_params *mp,
+		      int fd)
+{
+	off_t offs = PERF_EVENT_ITRACE_OFFSET;
+
+#if BITS_PER_LONG != 64 && !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
+	pr_err("Cannot use Instruction Tracing mmaps\n");
+	return -1;
+#endif
+
+	mm->mask = mp->mask;
+	mm->len = mp->len;
+	mm->mmap_len = mp->mmap_len;
+	mm->prev = 0;
+	mm->idx = mp->idx;
+	mm->tid = mp->tid;
+	mm->cpu = mp->cpu;
+
+	if (!mp->len) {
+		mm->base = NULL;
+		return 0;
+	}
+
+	mm->base = mmap(NULL, mp->mmap_len, mp->prot, MAP_SHARED, fd, offs);
+	if (mm->base == MAP_FAILED) {
+		pr_debug2("failed to mmap itrace ring buffer\n");
+		mm->base = NULL;
+		return -1;
+	}
+
+	return 0;
+}
+
+void itrace_mmap__munmap(struct itrace_mmap *mm)
+{
+	if (mm->base)
+		munmap(mm->base, mm->mmap_len);
+}
+
+void itrace_mmap_params__init(struct itrace_mmap_params *mp,
+			      unsigned int itrace_pages, bool itrace_overwrite)
+{
+	if (itrace_pages) {
+		mp->len = itrace_pages * page_size;
+		mp->mmap_len = mp->len + page_size;
+		mp->mask = is_power_of_2(mp->len) ? mp->len - 1 : 0;
+		mp->prot = PROT_READ | (itrace_overwrite ? 0 : PROT_WRITE);
+		pr_debug2("itrace mmap length %zu\n", mp->mmap_len);
+	} else {
+		mp->len = 0;
+	}
+}
+
+void itrace_mmap_params__set_idx(struct itrace_mmap_params *mp,
+				 struct perf_evlist *evlist, int idx,
+				 bool per_cpu)
+{
+	mp->idx = idx;
+
+	if (per_cpu) {
+		mp->cpu = evlist->cpus->map[idx];
+		if (evlist->threads)
+			mp->tid = evlist->threads->map[0];
+		else
+			mp->tid = -1;
+	} else {
+		mp->cpu = -1;
+		mp->tid = evlist->threads->map[idx];
+	}
+}
diff --git a/tools/perf/util/itrace.h b/tools/perf/util/itrace.h
new file mode 100644
index 0000000..4b17aca
--- /dev/null
+++ b/tools/perf/util/itrace.h
@@ -0,0 +1,118 @@
+/*
+ * itrace.h: Instruction Tracing support
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#ifndef __PERF_ITRACE_H
+#define __PERF_ITRACE_H
+
+#include <sys/types.h>
+#include <stdbool.h>
+
+#include <linux/perf_event.h>
+
+#include "../perf.h"
+#include "types.h"
+
+struct perf_evlist;
+
+/**
+ * struct itrace_mmap - records an mmap at PERF_EVENT_ITRACE_OFFSET.
+ * @base: address of mapped area
+ * @mask: %0 if @len is not a power of two, otherwise (@len - %1)
+ * @len: size of mapped area excluding perf_event_mmap_page
+ * @mmap_len: size of mapped area including perf_event_mmap_page
+ * @prev: previous data_head
+ * @idx: index of this mmap
+ * @tid: tid for a per-thread mmap (also set if there is only 1 tid on a per-cpu
+ *       mmap) otherwise %0
+ * @cpu: cpu number for a per-cpu mmap otherwise %-1
+ */
+struct itrace_mmap {
+	void		*base;
+	size_t		mask;
+	size_t		len;
+	size_t		mmap_len;
+	u64		prev;
+	int		idx;
+	pid_t		tid;
+	int		cpu;
+};
+
+/**
+ * struct itrace_mmap_params - parameters to set up struct itrace_mmap.
+ * @mask: %0 if @len is not a power of two, otherwise (@len - %1)
+ * @len: size of mapped area excluding perf_event_mmap_page
+ * @mmap_len: size of mapped area including perf_event_mmap_page
+ * @prot: mmap memory protection
+ * @idx: index of this mmap
+ * @tid: tid for a per-thread mmap (also set if there is only 1 tid on a per-cpu
+ *       mmap) otherwise %0
+ * @cpu: cpu number for a per-cpu mmap otherwise %-1
+ */
+struct itrace_mmap_params {
+	size_t		mask;
+	size_t		len;
+	size_t		mmap_len;
+	int		prot;
+	int		idx;
+	pid_t		tid;
+	int		cpu;
+};
+
+static inline u64 itrace_mmap__read_head(struct itrace_mmap *mm)
+{
+	struct perf_event_mmap_page *pc = mm->base;
+#if BITS_PER_LONG == 64 || !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
+	u64 head = ACCESS_ONCE(pc->data_head);
+#else
+	u64 head = __sync_val_compare_and_swap(&pc->data_head, 0, 0);
+#endif
+
+	/* Ensure all reads are done after we read the head */
+	rmb();
+	return head;
+}
+
+static inline void itrace_mmap__write_tail(struct itrace_mmap *mm, u64 tail)
+{
+	struct perf_event_mmap_page *pc = mm->base;
+#if BITS_PER_LONG != 64 && defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
+	u64 old_tail;
+#endif
+
+	/* Ensure all reads are done before we write the tail out */
+	mb();
+#if BITS_PER_LONG == 64 || !defined(HAVE_SYNC_COMPARE_AND_SWAP_SUPPORT)
+	pc->data_tail = tail;
+#else
+	do {
+		old_tail = __sync_val_compare_and_swap(&pc->data_tail, 0, 0);
+	} while (!__sync_bool_compare_and_swap(&pc->data_tail, old_tail, tail));
+#endif
+}
+
+int itrace_mmap__mmap(struct itrace_mmap *mm,
+		      struct itrace_mmap_params *mp, int fd);
+void itrace_mmap__munmap(struct itrace_mmap *mm);
+void itrace_mmap_params__init(struct itrace_mmap_params *mp,
+			      unsigned int itrace_pages, bool itrace_overwrite);
+void itrace_mmap_params__set_idx(struct itrace_mmap_params *mp,
+				 struct perf_evlist *evlist, int idx,
+				 bool per_cpu);
+
+#endif
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 41/71] perf tools: Add user events for Instruction Tracing
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (39 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 40/71] perf evlist: Add ability to mmap itrace buffers Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 42/71] perf tools: Add support for Instruction Trace recording Alexander Shishkin
                   ` (31 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add two user events for Instruction Tracing.

PERF_RECORD_ITRACE_INFO contains metadata,
consisting primarily the type of the
Instruction Tracing data plus some amount
of architecture-specific information.
There should be only one
PERF_RECORD_ITRACE_INFO event.

PERF_RECORD_ITRACE identifies Instruction
Tracing data copied from the mmapped
Instruction Tracing region.  The actual
data is not part of the event but
immediately follows it.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/event.c   |  2 ++
 tools/perf/util/event.h   | 22 +++++++++++++++
 tools/perf/util/session.c | 69 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/tool.h    |  9 ++++++-
 4 files changed, 101 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 4d001d9..990d10b 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -27,6 +27,8 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_HEADER_BUILD_ID]		= "BUILD_ID",
 	[PERF_RECORD_FINISHED_ROUND]		= "FINISHED_ROUND",
 	[PERF_RECORD_ID_INDEX]			= "ID_INDEX",
+	[PERF_RECORD_ITRACE_INFO]		= "ITRACE_INFO",
+	[PERF_RECORD_ITRACE]			= "ITRACE",
 };
 
 const char *perf_event__name(unsigned int id)
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index cc49148..ad3625c 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -160,6 +160,8 @@ enum perf_user_event_type { /* above any possible kernel type */
 	PERF_RECORD_HEADER_BUILD_ID		= 67,
 	PERF_RECORD_FINISHED_ROUND		= 68,
 	PERF_RECORD_ID_INDEX			= 69,
+	PERF_RECORD_ITRACE_INFO			= 70,
+	PERF_RECORD_ITRACE			= 71,
 	PERF_RECORD_HEADER_MAX
 };
 
@@ -199,6 +201,24 @@ struct id_index_event {
 	struct id_index_entry entries[0];
 };
 
+struct itrace_info_event {
+	struct perf_event_header header;
+	u32 type;
+	u32 reserved__; /* For alignment */
+	u64 priv[];
+};
+
+struct itrace_event {
+	struct perf_event_header header;
+	u64 size;
+	u64 offset;
+	u64 reference;
+	u32 idx;
+	u32 tid;
+	u32 cpu;
+	u32 reserved__; /* For alignment */
+};
+
 struct itrace_lost_event {
 	struct perf_event_header header;
 	u64 offset;
@@ -220,6 +240,8 @@ union perf_event {
 	struct build_id_event		build_id;
 	struct id_index_event		id_index;
 	struct itrace_lost_event	itrace_lost;
+	struct itrace_info_event	itrace_info;
+	struct itrace_event		itrace;
 };
 
 void perf_event__print_totals(void);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index fbf9024..ac71006 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -220,6 +220,40 @@ static int process_id_index_stub(struct perf_tool *tool __maybe_unused,
 	return 0;
 }
 
+static int process_event_itrace_info_stub(struct perf_tool *tool __maybe_unused,
+				  union perf_event *event __maybe_unused,
+				  struct perf_session *session __maybe_unused)
+{
+	dump_printf(": unhandled!\n");
+	return 0;
+}
+
+static int skipn(int fd, size_t n)
+{
+	char buf[4096];
+	ssize_t ret;
+
+	while (n) {
+		ret = read(fd, buf, max(n, sizeof(buf)));
+		if (ret <= 0)
+			return ret;
+		n -= ret;
+	}
+
+	return 0;
+}
+
+static s64 process_event_itrace_stub(struct perf_tool *tool __maybe_unused,
+				     union perf_event *event,
+				     struct perf_session *session
+				     __maybe_unused)
+{
+	dump_printf(": unhandled!\n");
+	if (perf_data_file__is_pipe(session->file))
+		skipn(perf_data_file__fd(session->file), event->itrace.size);
+	return event->itrace.size;
+}
+
 void perf_tool__fill_defaults(struct perf_tool *tool)
 {
 	if (tool->sample == NULL)
@@ -258,6 +292,10 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 	}
 	if (tool->id_index == NULL)
 		tool->id_index = process_id_index_stub;
+	if (tool->itrace_info == NULL)
+		tool->itrace_info = process_event_itrace_info_stub;
+	if (tool->itrace == NULL)
+		tool->itrace = process_event_itrace_stub;
 }
  
 static void swap_sample_id_all(union perf_event *event, void *data)
@@ -442,6 +480,29 @@ static void perf_event__tracing_data_swap(union perf_event *event,
 	event->tracing_data.size = bswap_32(event->tracing_data.size);
 }
 
+static void perf_event__itrace_info_swap(union perf_event *event,
+					 bool sample_id_all __maybe_unused)
+{
+	size_t size;
+
+	event->itrace_info.type = bswap_32(event->itrace_info.type);
+
+	size = event->header.size;
+	size -= (void *)&event->itrace_info.priv - (void *)event;
+	mem_bswap_64(event->itrace_info.priv, size);
+}
+
+static void perf_event__itrace_swap(union perf_event *event,
+				    bool sample_id_all __maybe_unused)
+{
+	event->itrace.size      = bswap_64(event->itrace.size);
+	event->itrace.offset    = bswap_64(event->itrace.offset);
+	event->itrace.reference = bswap_64(event->itrace.reference);
+	event->itrace.idx       = bswap_32(event->itrace.idx);
+	event->itrace.tid       = bswap_32(event->itrace.tid);
+	event->itrace.cpu       = bswap_32(event->itrace.cpu);
+}
+
 typedef void (*perf_event__swap_op)(union perf_event *event,
 				    bool sample_id_all);
 
@@ -462,6 +523,8 @@ static perf_event__swap_op perf_event__swap_ops[] = {
 	[PERF_RECORD_HEADER_TRACING_DATA] = perf_event__tracing_data_swap,
 	[PERF_RECORD_HEADER_BUILD_ID]	  = NULL,
 	[PERF_RECORD_ID_INDEX]		  = perf_event__all64_swap,
+	[PERF_RECORD_ITRACE_INFO]	  = perf_event__itrace_info_swap,
+	[PERF_RECORD_ITRACE]		  = perf_event__itrace_swap,
 	[PERF_RECORD_HEADER_MAX]	  = NULL,
 };
 
@@ -1036,6 +1099,12 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 		return tool->finished_round(tool, event, session);
 	case PERF_RECORD_ID_INDEX:
 		return tool->id_index(tool, event, session);
+	case PERF_RECORD_ITRACE_INFO:
+		return tool->itrace_info(tool, event, session);
+	case PERF_RECORD_ITRACE:
+		/* setup for reading amidst mmap */
+		lseek(fd, file_offset + event->header.size, SEEK_SET);
+		return tool->itrace(tool, event, session);
 	default:
 		return -EINVAL;
 	}
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index 18afd13..c1e8744 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -3,6 +3,8 @@
 
 #include <stdbool.h>
 
+#include "types.h"
+
 struct perf_session;
 union perf_event;
 struct perf_evlist;
@@ -25,6 +27,9 @@ typedef int (*event_attr_op)(struct perf_tool *tool,
 typedef int (*event_op2)(struct perf_tool *tool, union perf_event *event,
 			 struct perf_session *session);
 
+typedef s64 (*event_op3)(struct perf_tool *tool, union perf_event *event,
+			 struct perf_session *session);
+
 struct perf_tool {
 	event_sample	sample,
 			read;
@@ -41,7 +46,9 @@ struct perf_tool {
 	event_op2	tracing_data;
 	event_op2	finished_round,
 			build_id,
-			id_index;
+			id_index,
+			itrace_info;
+	event_op3	itrace;
 	bool		ordered_samples;
 	bool		ordering_requires_timestamps;
 };
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 42/71] perf tools: Add support for Instruction Trace recording
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (40 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 41/71] perf tools: Add user events for Instruction Tracing Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 43/71] perf record: Add basic Instruction Tracing support Alexander Shishkin
                   ` (30 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for reading from the Instruction
Tracing mmap and synthesizing Instruction
Tracing events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/perf.h        |   2 +
 tools/perf/util/itrace.c | 190 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/itrace.h |  51 ++++++++++++-
 tools/perf/util/record.c |   6 +-
 4 files changed, 247 insertions(+), 2 deletions(-)

diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index b23fed5..b68b469 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -261,8 +261,10 @@ struct perf_record_opts {
 	bool	     sample_weight;
 	bool	     sample_time;
 	bool	     period;
+	bool	     full_itrace;
 	unsigned int freq;
 	unsigned int mmap_pages;
+	unsigned int itrace_mmap_pages;
 	unsigned int user_freq;
 	u64          branch_stack;
 	u64	     default_interval;
diff --git a/tools/perf/util/itrace.c b/tools/perf/util/itrace.c
index a889e63..9596cc2 100644
--- a/tools/perf/util/itrace.c
+++ b/tools/perf/util/itrace.c
@@ -24,6 +24,10 @@
 #include <linux/kernel.h>
 #include <linux/perf_event.h>
 
+#include <stdlib.h>
+#include <string.h>
+#include <errno.h>
+
 #include "../perf.h"
 #include "types.h"
 #include "util.h"
@@ -32,6 +36,9 @@
 #include "thread_map.h"
 #include "itrace.h"
 
+#include "event.h"
+#include "debug.h"
+
 int itrace_mmap__mmap(struct itrace_mmap *mm, struct itrace_mmap_params *mp,
 		      int fd)
 {
@@ -102,3 +109,186 @@ void itrace_mmap_params__set_idx(struct itrace_mmap_params *mp,
 		mp->tid = evlist->threads->map[idx];
 	}
 }
+
+size_t itrace_record__info_priv_size(struct itrace_record *itr)
+{
+	if (itr)
+		return itr->info_priv_size(itr);
+	return 0;
+}
+
+static int itrace_not_supported(void)
+{
+	pr_err("Instruction tracing is not supported on this architecture\n");
+	return -EINVAL;
+}
+
+int itrace_record__info_fill(struct itrace_record *itr,
+			     struct perf_session *session,
+			     struct itrace_info_event *itrace_info,
+			     size_t priv_size)
+{
+	if (itr)
+		return itr->info_fill(itr, session, itrace_info, priv_size);
+	return itrace_not_supported();
+}
+
+void itrace_record__free(struct itrace_record *itr)
+{
+	if (itr)
+		itr->free(itr);
+}
+
+int itrace_record__options(struct itrace_record *itr,
+			   struct perf_evlist *evlist,
+			   struct perf_record_opts *opts)
+{
+	if (itr)
+		return itr->recording_options(itr, evlist, opts);
+	return 0;
+}
+
+u64 itrace_record__reference(struct itrace_record *itr)
+{
+	if (itr)
+		return itr->reference(itr);
+	return 0;
+}
+
+struct itrace_record *__attribute__ ((weak)) itrace_record__init(int *err)
+{
+	*err = 0;
+	return NULL;
+}
+
+int perf_event__synthesize_itrace_info(struct itrace_record *itr,
+				       struct perf_tool *tool,
+				       struct perf_session *session,
+				       perf_event__handler_t process)
+{
+	union perf_event *ev;
+	size_t priv_size;
+	int err;
+
+	pr_debug2("Synthesizing itrace information\n");
+	priv_size = itrace_record__info_priv_size(itr);
+	ev = zalloc(sizeof(struct itrace_info_event) + priv_size);
+	if (!ev)
+		return -ENOMEM;
+
+	ev->itrace_info.header.type = PERF_RECORD_ITRACE_INFO;
+	ev->itrace_info.header.size = sizeof(struct itrace_info_event) +
+				      priv_size;
+	err = itrace_record__info_fill(itr, session, &ev->itrace_info,
+				       priv_size);
+	if (err)
+		goto out_free;
+
+	err = process(tool, ev, NULL, NULL);
+out_free:
+	free(ev);
+	return err;
+}
+
+int perf_event__synthesize_itrace(struct perf_tool *tool,
+				  perf_event__handler_t process,
+				  size_t size, u64 offset, u64 ref, int idx,
+				  u32 tid, u32 cpu)
+{
+	union perf_event ev;
+
+	memset(&ev, 0, sizeof(ev));
+	ev.itrace.header.type = PERF_RECORD_ITRACE;
+	ev.itrace.header.size = sizeof(ev.itrace);
+	ev.itrace.size = size;
+	ev.itrace.offset = offset;
+	ev.itrace.reference = ref;
+	ev.itrace.idx = idx;
+	ev.itrace.tid = tid;
+	ev.itrace.cpu = cpu;
+
+	return process(tool, &ev, NULL, NULL);
+}
+
+int itrace_mmap__read(struct itrace_mmap *mm, struct itrace_record *itr,
+		      struct perf_tool *tool, process_itrace_t fn)
+{
+	u64 head = itrace_mmap__read_head(mm);
+	u64 old = mm->prev, offset, ref;
+	unsigned char *data = mm->base + page_size;
+	size_t size, head_off, old_off, len1, len2;
+	union perf_event ev;
+	void *data1, *data2;
+
+	if (old == head)
+		return 0;
+
+	pr_debug3("itrace idx %d old %"PRIu64" head %"PRIu64" diff %"PRIu64"\n",
+		  mm->idx, old, head, head - old);
+
+	if (mm->mask) {
+		head_off = head & mm->mask;
+		old_off = old & mm->mask;
+	} else {
+		head_off = head % mm->len;
+		old_off = old % mm->len;
+	}
+
+	if (head_off > old_off)
+		size = head_off - old_off;
+	else
+		size = mm->len - (old_off - head_off);
+
+	ref = itrace_record__reference(itr);
+
+	if (head > old || size <= head || mm->mask) {
+		offset = head - size;
+	} else {
+		/*
+		 * When the buffer size is not a power of 2, 'head' wraps at the
+		 * highest multiple of the buffer size, so we have to subtract
+		 * the remainder here.
+		 */
+		u64 rem = (0ULL - mm->len) % mm->len;
+
+		offset = head - size - rem;
+	}
+
+	if (size > head_off) {
+		len1 = size - head_off;
+		data1 = &data[mm->len - len1];
+		len2 = head_off;
+		data2 = &data[0];
+	} else {
+		len1 = size;
+		data1 = &data[head_off - len1];
+		len2 = 0;
+		data2 = NULL;
+	}
+
+	memset(&ev, 0, sizeof(ev));
+	ev.itrace.header.type = PERF_RECORD_ITRACE;
+	ev.itrace.header.size = sizeof(ev.itrace);
+	ev.itrace.size = size;
+	ev.itrace.offset = offset;
+	ev.itrace.reference = ref;
+	ev.itrace.idx = mm->idx;
+	ev.itrace.tid = mm->tid;
+	ev.itrace.cpu = mm->cpu;
+
+	if (fn(tool, &ev, data1, len1, data2, len2))
+		return -1;
+
+	mm->prev = head;
+
+	itrace_mmap__write_tail(mm, head);
+	if (itr->read_finish) {
+		int err;
+
+		err = itr->read_finish(itr, mm->idx);
+		if (err < 0)
+			return err;
+	}
+
+	return 1;
+}
diff --git a/tools/perf/util/itrace.h b/tools/perf/util/itrace.h
index 4b17aca..da52a29 100644
--- a/tools/perf/util/itrace.h
+++ b/tools/perf/util/itrace.h
@@ -22,13 +22,18 @@
 
 #include <sys/types.h>
 #include <stdbool.h>
-
+#include <stddef.h>
 #include <linux/perf_event.h>
 
 #include "../perf.h"
 #include "types.h"
 
+union perf_event;
+struct perf_session;
 struct perf_evlist;
+struct perf_tool;
+struct perf_record_opts;
+struct itrace_info_event;
 
 /**
  * struct itrace_mmap - records an mmap at PERF_EVENT_ITRACE_OFFSET.
@@ -74,6 +79,20 @@ struct itrace_mmap_params {
 	int		cpu;
 };
 
+struct itrace_record {
+	int (*recording_options)(struct itrace_record *itr,
+				 struct perf_evlist *evlist,
+				 struct perf_record_opts *opts);
+	size_t (*info_priv_size)(struct itrace_record *itr);
+	int (*info_fill)(struct itrace_record *itr,
+			 struct perf_session *session,
+			 struct itrace_info_event *itrace_info,
+			 size_t priv_size);
+	void (*free)(struct itrace_record *itr);
+	u64 (*reference)(struct itrace_record *itr);
+	int (*read_finish)(struct itrace_record *itr, int idx);
+};
+
 static inline u64 itrace_mmap__read_head(struct itrace_mmap *mm)
 {
 	struct perf_event_mmap_page *pc = mm->base;
@@ -115,4 +134,34 @@ void itrace_mmap_params__set_idx(struct itrace_mmap_params *mp,
 				 struct perf_evlist *evlist, int idx,
 				 bool per_cpu);
 
+typedef int (*process_itrace_t)(struct perf_tool *tool, union perf_event *event,
+				void *data1, size_t len1, void *data2,
+				size_t len2);
+
+int itrace_mmap__read(struct itrace_mmap *mm,
+			    struct itrace_record *itr, struct perf_tool *tool,
+			    process_itrace_t fn);
+
+struct itrace_record *itrace_record__init(int *err);
+
+int itrace_record__options(struct itrace_record *itr,
+			     struct perf_evlist *evlist,
+			     struct perf_record_opts *opts);
+size_t itrace_record__info_priv_size(struct itrace_record *itr);
+int itrace_record__info_fill(struct itrace_record *itr,
+			     struct perf_session *session,
+			     struct itrace_info_event *itrace_info,
+			     size_t priv_size);
+void itrace_record__free(struct itrace_record *itr);
+u64 itrace_record__reference(struct itrace_record *itr);
+
+int perf_event__synthesize_itrace_info(struct itrace_record *itr,
+				       struct perf_tool *tool,
+				       struct perf_session *session,
+				       perf_event__handler_t process);
+int perf_event__synthesize_itrace(struct perf_tool *tool,
+				  perf_event__handler_t process,
+				  size_t size, u64 offset, u64 ref, int idx,
+				  u32 tid, u32 cpu);
+
 #endif
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index e510453..86f980e 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -93,7 +93,11 @@ void perf_evlist__config(struct perf_evlist *evlist,
 	list_for_each_entry(evsel, &evlist->entries, node)
 		perf_evsel__config(evsel, opts);
 
-	if (evlist->nr_entries > 1) {
+	if (opts->full_itrace) {
+		use_sample_identifier = true;
+		list_for_each_entry(evsel, &evlist->entries, node)
+			perf_evsel__set_sample_id(evsel, use_sample_identifier);
+	} else if (evlist->nr_entries > 1) {
 		struct perf_evsel *first = perf_evlist__first(evlist);
 
 		list_for_each_entry(evsel, &evlist->entries, node) {
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 43/71] perf record: Add basic Instruction Tracing support
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (41 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 42/71] perf tools: Add support for Instruction Trace recording Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 44/71] perf record: Extend -m option for Instruction Tracing mmap pages Alexander Shishkin
                   ` (29 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Amend the perf record tool to read the
Instruction Tracing mmap and synthesize
Instruction Tracing events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-record.c | 103 +++++++++++++++++++++++++++++++++++++-------
 1 file changed, 87 insertions(+), 16 deletions(-)

diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index d93e2ee..4613f55 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -25,6 +25,7 @@
 #include "util/cpumap.h"
 #include "util/thread_map.h"
 #include "util/data.h"
+#include "util/itrace.h"
 
 #include <unistd.h>
 #include <sched.h>
@@ -67,6 +68,7 @@ struct perf_record {
 	struct perf_record_opts	opts;
 	u64			bytes_written;
 	struct perf_data_file	file;
+	struct itrace_record	*itr;
 	struct perf_evlist	*evlist;
 	struct perf_session	*session;
 	const char		*progname;
@@ -150,6 +152,42 @@ out:
 	return rc;
 }
 
+static int perf_record__process_itrace(struct perf_tool *tool,
+				       union perf_event *event, void *data1,
+				       size_t len1, void *data2, size_t len2)
+{
+	struct perf_record *rec = container_of(tool, struct perf_record, tool);
+	size_t padding;
+	u8 pad[8] = {0};
+
+	padding = (len1 + len2) & 7;
+	if (padding)
+		padding = 8 - padding;
+
+	perf_record__write(rec, event, event->header.size);
+	perf_record__write(rec, data1, len1);
+	perf_record__write(rec, data2, len2);
+	perf_record__write(rec, &pad, padding);
+
+	return 0;
+}
+
+static int perf_record__itrace_mmap_read(struct perf_record *rec,
+					 struct itrace_mmap *mm)
+{
+	int ret;
+
+	ret = itrace_mmap__read(mm, rec->itr, &rec->tool,
+				perf_record__process_itrace);
+	if (ret < 0)
+		return ret;
+
+	if (ret)
+		rec->samples++;
+
+	return 0;
+}
+
 static volatile int done = 0;
 static volatile int signr = -1;
 static volatile int child_finished = 0;
@@ -218,13 +256,16 @@ try_again:
 		goto out;
 	}
 
-	if (perf_evlist__mmap(evlist, opts->mmap_pages, false) < 0) {
+	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
+				 opts->itrace_mmap_pages,
+				 false) < 0) {
 		if (errno == EPERM) {
 			pr_err("Permission error mapping pages.\n"
 			       "Consider increasing "
 			       "/proc/sys/kernel/perf_event_mlock_kb,\n"
 			       "or try again with a smaller value of -m/--mmap_pages.\n"
-			       "(current value: %d)\n", opts->mmap_pages);
+			       "(current value: %u,%u)\n",
+			       opts->mmap_pages, opts->itrace_mmap_pages);
 			rc = -errno;
 		} else {
 			pr_err("failed to mmap with %d (%s)\n", errno, strerror(errno));
@@ -318,12 +359,20 @@ static int perf_record__mmap_read_all(struct perf_record *rec)
 	int rc = 0;
 
 	for (i = 0; i < rec->evlist->nr_mmaps; i++) {
+		struct itrace_mmap *mm = &rec->evlist->mmap[i].itrace_mmap;
+
 		if (rec->evlist->mmap[i].base) {
 			if (perf_record__mmap_read(rec, &rec->evlist->mmap[i]) != 0) {
 				rc = -1;
 				goto out;
 			}
 		}
+
+		if (mm->base &&
+		    perf_record__itrace_mmap_read(rec, mm) != 0) {
+			rc = -1;
+			goto out;
+		}
 	}
 
 	if (perf_header__has_feat(&rec->session->header, HEADER_TRACING_DATA))
@@ -351,6 +400,9 @@ static void perf_record__init_features(struct perf_record *rec)
 
 	if (!rec->opts.branch_stack)
 		perf_header__clear_feat(&session->header, HEADER_BRANCH_STACK);
+
+	if (!rec->opts.full_itrace)
+		perf_header__clear_feat(&session->header, HEADER_ITRACE);
 }
 
 static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
@@ -455,6 +507,13 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 		}
 	}
 
+	if (rec->opts.full_itrace) {
+		err = perf_event__synthesize_itrace_info(rec->itr, tool,
+					session, process_synthesized_event);
+		if (err)
+			goto out_delete_session;
+	}
+
 	err = perf_event__synthesize_kernel_mmap(tool, process_synthesized_event,
 						 machine, "_text");
 	if (err < 0)
@@ -536,16 +595,17 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 	if (quiet || signr == SIGUSR1)
 		return 0;
 
-	fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n", waking);
-
-	/*
-	 * Approximate RIP event size: 24 bytes.
-	 */
-	fprintf(stderr,
-		"[ perf record: Captured and wrote %.3f MB %s (~%" PRIu64 " samples) ]\n",
-		(double)rec->bytes_written / 1024.0 / 1024.0,
-		file->path,
-		rec->bytes_written / 24);
+	fprintf(stderr, "[ perf record: Woken up %ld times to write data ]\n",
+		waking);
+	fprintf(stderr, "[ perf record: Captured and wrote %.3f MB %s",
+		(double)rec->bytes_written / 1024.0 / 1024.0, file->path);
+	if (rec->opts.full_itrace) {
+		fprintf(stderr, " ]\n");
+	} else {
+		/* Approximate RIP event size: 24 bytes */
+		fprintf(stderr, " (~%" PRIu64 " samples) ]\n",
+			rec->bytes_written / 24);
+	}
 
 	return 0;
 
@@ -889,14 +949,19 @@ const struct option record_options[] = {
 
 int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 {
-	int err = -ENOMEM;
+	int err;
 	struct perf_evlist *evsel_list;
 	struct perf_record *rec = &record;
 	char errbuf[BUFSIZ];
 
+	rec->itr = itrace_record__init(&err);
+	if (err)
+		return err;
+
+	err = -ENOMEM;
 	evsel_list = perf_evlist__new();
 	if (evsel_list == NULL)
-		return -ENOMEM;
+		goto out_itrace_free;
 
 	rec->evlist = evsel_list;
 
@@ -956,18 +1021,24 @@ int cmd_record(int argc, const char **argv, const char *prefix __maybe_unused)
 	if (perf_evlist__create_maps(evsel_list, &rec->opts.target) < 0)
 		usage_with_options(record_usage, record_options);
 
+	err = itrace_record__options(rec->itr, evsel_list, &rec->opts);
+	if (err)
+		goto out_symbol_exit;
+
 	if (perf_record_opts__config(&rec->opts)) {
 		err = -EINVAL;
-		goto out_free_fd;
+		goto out_delete_maps;
 	}
 
 	err = __cmd_record(&record, argc, argv);
 
 	perf_evlist__munmap(evsel_list);
 	perf_evlist__close(evsel_list);
-out_free_fd:
+out_delete_maps:
 	perf_evlist__delete_maps(evsel_list);
 out_symbol_exit:
 	symbol__exit();
+out_itrace_free:
+	itrace_record__free(rec->itr);
 	return err;
 }
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 44/71] perf record: Extend -m option for Instruction Tracing mmap pages
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (42 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 43/71] perf record: Add basic Instruction Tracing support Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 45/71] perf tools: Add a user event for Instruction Tracing errors Alexander Shishkin
                   ` (28 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Extend the -m option so that the number
of mmap pages for Instruction Tracing
can be specified.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-record.txt |  2 ++
 tools/perf/builtin-record.c              | 49 ++++++++++++++++++++++++++++++--
 tools/perf/util/evlist.c                 | 16 +++++++----
 tools/perf/util/evlist.h                 |  2 ++
 4 files changed, 60 insertions(+), 9 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index c407897..bb01df7 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -92,6 +92,8 @@ OPTIONS
 	Number of mmap data pages (must be a power of two) or size
 	specification with appended unit character - B/K/M/G. The
 	size is rounded up to have nearest pages power of two value.
+	Also, by adding a comma, the number of mmap pages for Instruction
+	Tracing can be specified.
 
 -g::
 	Enables call-graph (stack chain/backtrace) recording.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 4613f55..344603f 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -828,6 +828,49 @@ int record_callchain_opt(const struct option *opt,
 	return 0;
 }
 
+static int perf_record__parse_mmap_pages(const struct option *opt,
+					 const char *str,
+					 int unset __maybe_unused)
+{
+	struct perf_record_opts	*opts = opt->value;
+	char *s, *p;
+	unsigned int mmap_pages;
+	int ret;
+
+	if (!str)
+		return -EINVAL;
+
+	s = strdup(str);
+	if (!s)
+		return -ENOMEM;
+
+	p = strchr(s, ',');
+	if (p)
+		*p = '\0';
+
+	if (*s) {
+		ret = __perf_evlist__parse_mmap_pages(&mmap_pages, s, true);
+		if (ret)
+			goto out_free;
+		opts->mmap_pages = mmap_pages;
+	}
+
+	if (!p) {
+		ret = 0;
+		goto out_free;
+	}
+
+	ret = __perf_evlist__parse_mmap_pages(&mmap_pages, p + 1, false);
+	if (ret)
+		goto out_free;
+
+	opts->itrace_mmap_pages = mmap_pages;
+
+out_free:
+	free(s);
+	return ret;
+}
+
 static const char * const record_usage[] = {
 	"perf record [<options>] [<command>]",
 	"perf record [<options>] -- <command> [<options>]",
@@ -899,9 +942,9 @@ const struct option record_options[] = {
 			&record.opts.no_inherit_set,
 			"child tasks do not inherit counters"),
 	OPT_UINTEGER('F', "freq", &record.opts.user_freq, "profile at this frequency"),
-	OPT_CALLBACK('m', "mmap-pages", &record.opts.mmap_pages, "pages",
-		     "number of mmap data pages",
-		     perf_evlist__parse_mmap_pages),
+	OPT_CALLBACK('m', "mmap-pages", &record.opts, "pages[,pages]",
+		     "number of mmap data pages and instruction tracing mmap pages",
+		     perf_record__parse_mmap_pages),
 	OPT_BOOLEAN(0, "group", &record.opts.group,
 		    "put the counters into a counter group"),
 	OPT_CALLBACK_NOOPT('g', NULL, &record.opts,
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index c720c6c..dbb1898 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -843,7 +843,7 @@ static size_t perf_evlist__mmap_size(unsigned long pages)
 }
 
 static long parse_pages_arg(const char *str, unsigned long min,
-			    unsigned long max)
+			    unsigned long max, bool po2)
 {
 	unsigned long pages, val;
 	static struct parse_tag tags[] = {
@@ -871,7 +871,7 @@ static long parse_pages_arg(const char *str, unsigned long min,
 
 	if ((pages == 0) && (min == 0)) {
 		/* leave number of pages at 0 */
-	} else if (pages < (1UL << 31) && !is_power_of_2(pages)) {
+	} else if (po2 && pages < (1UL << 31) && !is_power_of_2(pages)) {
 		/* round pages up to next power of 2 */
 		pages = next_pow2(pages);
 		pr_info("rounding mmap pages size to %lu bytes (%lu pages)\n",
@@ -884,17 +884,15 @@ static long parse_pages_arg(const char *str, unsigned long min,
 	return pages;
 }
 
-int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
-				  int unset __maybe_unused)
+int __perf_evlist__parse_mmap_pages(unsigned int *mmap_pages, const char *str, bool po2)
 {
-	unsigned int *mmap_pages = opt->value;
 	unsigned long max = UINT_MAX;
 	long pages;
 
 	if (max < SIZE_MAX / page_size)
 		max = SIZE_MAX / page_size;
 
-	pages = parse_pages_arg(str, 1, max);
+	pages = parse_pages_arg(str, 1, max, po2);
 	if (pages < 0) {
 		pr_err("Invalid argument for --mmap_pages/-m\n");
 		return -1;
@@ -904,6 +902,12 @@ int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
 	return 0;
 }
 
+int perf_evlist__parse_mmap_pages(const struct option *opt, const char *str,
+				  int unset __maybe_unused)
+{
+	return __perf_evlist__parse_mmap_pages(opt->value, str, true);
+}
+
 /**
  * perf_evlist__mmap_ex - Create mmaps to receive events.
  * @evlist: list of events
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 7f56fdc..c5ef575 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -109,6 +109,8 @@ int perf_evlist__prepare_workload(struct perf_evlist *evlist,
 				  bool want_signal);
 int perf_evlist__start_workload(struct perf_evlist *evlist);
 
+int __perf_evlist__parse_mmap_pages(unsigned int *mmap_pages, const char *str,
+				    bool po2);
 int perf_evlist__parse_mmap_pages(const struct option *opt,
 				  const char *str,
 				  int unset);
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 45/71] perf tools: Add a user event for Instruction Tracing errors
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (43 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 44/71] perf record: Extend -m option for Instruction Tracing mmap pages Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 46/71] perf session: Add Instruction Tracing hooks Alexander Shishkin
                   ` (27 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Errors encountered when decoding an Instruction
Trace need to be reported to the user. However
the "user" might be a script or another tool,
so provide a new user event to capture those
errors.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/event.c   |  1 +
 tools/perf/util/event.h   | 16 ++++++++++++++++
 tools/perf/util/session.c | 25 +++++++++++++++++++++++++
 tools/perf/util/tool.h    |  3 ++-
 4 files changed, 44 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/event.c b/tools/perf/util/event.c
index 990d10b..9ae8a26 100644
--- a/tools/perf/util/event.c
+++ b/tools/perf/util/event.c
@@ -29,6 +29,7 @@ static const char *perf_event__names[] = {
 	[PERF_RECORD_ID_INDEX]			= "ID_INDEX",
 	[PERF_RECORD_ITRACE_INFO]		= "ITRACE_INFO",
 	[PERF_RECORD_ITRACE]			= "ITRACE",
+	[PERF_RECORD_ITRACE_ERROR]		= "ITRACE_ERROR",
 };
 
 const char *perf_event__name(unsigned int id)
diff --git a/tools/perf/util/event.h b/tools/perf/util/event.h
index ad3625c..9d9b9d2 100644
--- a/tools/perf/util/event.h
+++ b/tools/perf/util/event.h
@@ -162,6 +162,7 @@ enum perf_user_event_type { /* above any possible kernel type */
 	PERF_RECORD_ID_INDEX			= 69,
 	PERF_RECORD_ITRACE_INFO			= 70,
 	PERF_RECORD_ITRACE			= 71,
+	PERF_RECORD_ITRACE_ERROR		= 72,
 	PERF_RECORD_HEADER_MAX
 };
 
@@ -224,6 +225,20 @@ struct itrace_lost_event {
 	u64 offset;
 };
 
+#define MAX_ITRACE_ERROR_MSG 64
+
+struct itrace_error_event {
+	struct perf_event_header header;
+	u32 type;
+	u32 code;
+	u32 cpu;
+	u32 pid;
+	u32 tid;
+	u32 reserved__; /* For alignment */
+	u64 ip;
+	char msg[MAX_ITRACE_ERROR_MSG];
+};
+
 union perf_event {
 	struct perf_event_header	header;
 	struct mmap_event		mmap;
@@ -242,6 +257,7 @@ union perf_event {
 	struct itrace_lost_event	itrace_lost;
 	struct itrace_info_event	itrace_info;
 	struct itrace_event		itrace;
+	struct itrace_error_event	itrace_error;
 };
 
 void perf_event__print_totals(void);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index ac71006..95067b6 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -254,6 +254,15 @@ static s64 process_event_itrace_stub(struct perf_tool *tool __maybe_unused,
 	return event->itrace.size;
 }
 
+static
+int process_event_itrace_error_stub(struct perf_tool *tool __maybe_unused,
+				    union perf_event *event __maybe_unused,
+				    struct perf_session *session __maybe_unused)
+{
+	dump_printf(": unhandled!\n");
+	return 0;
+}
+
 void perf_tool__fill_defaults(struct perf_tool *tool)
 {
 	if (tool->sample == NULL)
@@ -296,6 +305,8 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 		tool->itrace_info = process_event_itrace_info_stub;
 	if (tool->itrace == NULL)
 		tool->itrace = process_event_itrace_stub;
+	if (tool->itrace_error == NULL)
+		tool->itrace_error = process_event_itrace_error_stub;
 }
  
 static void swap_sample_id_all(union perf_event *event, void *data)
@@ -503,6 +514,17 @@ static void perf_event__itrace_swap(union perf_event *event,
 	event->itrace.cpu       = bswap_32(event->itrace.cpu);
 }
 
+static void perf_event__itrace_error_swap(union perf_event *event,
+					  bool sample_id_all __maybe_unused)
+{
+	event->itrace_error.type = bswap_32(event->itrace_error.type);
+	event->itrace_error.code = bswap_32(event->itrace_error.code);
+	event->itrace_error.cpu  = bswap_32(event->itrace_error.cpu);
+	event->itrace_error.pid  = bswap_32(event->itrace_error.pid);
+	event->itrace_error.tid  = bswap_32(event->itrace_error.tid);
+	event->itrace_error.ip   = bswap_64(event->itrace_error.ip);
+}
+
 typedef void (*perf_event__swap_op)(union perf_event *event,
 				    bool sample_id_all);
 
@@ -525,6 +547,7 @@ static perf_event__swap_op perf_event__swap_ops[] = {
 	[PERF_RECORD_ID_INDEX]		  = perf_event__all64_swap,
 	[PERF_RECORD_ITRACE_INFO]	  = perf_event__itrace_info_swap,
 	[PERF_RECORD_ITRACE]		  = perf_event__itrace_swap,
+	[PERF_RECORD_ITRACE_ERROR]	  = perf_event__itrace_error_swap,
 	[PERF_RECORD_HEADER_MAX]	  = NULL,
 };
 
@@ -1105,6 +1128,8 @@ static s64 perf_session__process_user_event(struct perf_session *session,
 		/* setup for reading amidst mmap */
 		lseek(fd, file_offset + event->header.size, SEEK_SET);
 		return tool->itrace(tool, event, session);
+	case PERF_RECORD_ITRACE_ERROR:
+		return tool->itrace_error(tool, event, session);
 	default:
 		return -EINVAL;
 	}
diff --git a/tools/perf/util/tool.h b/tools/perf/util/tool.h
index c1e8744..0700257 100644
--- a/tools/perf/util/tool.h
+++ b/tools/perf/util/tool.h
@@ -47,7 +47,8 @@ struct perf_tool {
 	event_op2	finished_round,
 			build_id,
 			id_index,
-			itrace_info;
+			itrace_info,
+			itrace_error;
 	event_op3	itrace;
 	bool		ordered_samples;
 	bool		ordering_requires_timestamps;
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 46/71] perf session: Add Instruction Tracing hooks
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (44 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 45/71] perf tools: Add a user event for Instruction Tracing errors Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:36 ` [PATCH v0 47/71] perf session: Add Instruction Tracing options Alexander Shishkin
                   ` (26 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Hook into session processing so that
Instruction Trace decoding can
synthesize events transparently to
the tools.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/itrace.h  | 49 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/session.c | 45 +++++++++++++++++++++++++++++++++++++------
 tools/perf/util/session.h |  3 +++
 3 files changed, 91 insertions(+), 6 deletions(-)

diff --git a/tools/perf/util/itrace.h b/tools/perf/util/itrace.h
index da52a29..e6b0cc0 100644
--- a/tools/perf/util/itrace.h
+++ b/tools/perf/util/itrace.h
@@ -26,6 +26,7 @@
 #include <linux/perf_event.h>
 
 #include "../perf.h"
+#include "session.h"
 #include "types.h"
 
 union perf_event;
@@ -35,6 +36,18 @@ struct perf_tool;
 struct perf_record_opts;
 struct itrace_info_event;
 
+struct itrace {
+	int (*process_event)(struct perf_session *session,
+			     union perf_event *event,
+			     struct perf_sample *sample,
+			     struct perf_tool *tool);
+	int (*flush_events)(struct perf_session *session,
+			    struct perf_tool *tool);
+	void (*free_events)(struct perf_session *session);
+	void (*free)(struct perf_session *session);
+	unsigned long long error_count;
+};
+
 /**
  * struct itrace_mmap - records an mmap at PERF_EVENT_ITRACE_OFFSET.
  * @base: address of mapped area
@@ -164,4 +177,40 @@ int perf_event__synthesize_itrace(struct perf_tool *tool,
 				  size_t size, u64 offset, u64 ref, int idx,
 				  u32 tid, u32 cpu);
 
+static inline int itrace__process_event(struct perf_session *session,
+					union perf_event *event,
+					struct perf_sample *sample,
+					struct perf_tool *tool)
+{
+	if (!session->itrace)
+		return 0;
+
+	return session->itrace->process_event(session, event, sample, tool);
+}
+
+static inline int itrace__flush_events(struct perf_session *session,
+				       struct perf_tool *tool)
+{
+	if (!session->itrace)
+		return 0;
+
+	return session->itrace->flush_events(session, tool);
+}
+
+static inline void itrace__free_events(struct perf_session *session)
+{
+	if (!session->itrace)
+		return;
+
+	return session->itrace->free_events(session);
+}
+
+static inline void itrace__free(struct perf_session *session)
+{
+	if (!session->itrace)
+		return;
+
+	return session->itrace->free(session);
+}
+
 #endif
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 95067b6..55aead5 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -15,6 +15,7 @@
 #include "cpumap.h"
 #include "perf_regs.h"
 #include "vdso.h"
+#include "itrace.h"
 
 static int perf_session__open(struct perf_session *session)
 {
@@ -148,6 +149,7 @@ static void perf_session_env__delete(struct perf_session_env *env)
 
 void perf_session__delete(struct perf_session *session)
 {
+	itrace__free(session);
 	perf_session__destroy_kernel_maps(session);
 	perf_session__delete_dead_threads(session);
 	perf_session__delete_threads(session);
@@ -1022,11 +1024,11 @@ perf_session__deliver_sample(struct perf_session *session,
 					    &sample->read.one, machine);
 }
 
-static int perf_session_deliver_event(struct perf_session *session,
-				      union perf_event *event,
-				      struct perf_sample *sample,
-				      struct perf_tool *tool,
-				      u64 file_offset)
+static int __perf_session__deliver_event(struct perf_session *session,
+					 union perf_event *event,
+					 struct perf_sample *sample,
+					 struct perf_tool *tool,
+					 u64 file_offset)
 {
 	struct perf_evsel *evsel;
 	struct machine *machine;
@@ -1095,6 +1097,24 @@ static int perf_session_deliver_event(struct perf_session *session,
 	}
 }
 
+static int perf_session_deliver_event(struct perf_session *session,
+				      union perf_event *event,
+				      struct perf_sample *sample,
+				      struct perf_tool *tool,
+				      u64 file_offset)
+{
+	int ret;
+
+	ret = itrace__process_event(session, event, sample, tool);
+	if (ret < 0)
+		return ret;
+	if (ret > 0)
+		return 0;
+
+	return __perf_session__deliver_event(session, event, sample, tool,
+					     file_offset);
+}
+
 static s64 perf_session__process_user_event(struct perf_session *session,
 					    union perf_event *event,
 					    struct perf_tool *tool,
@@ -1146,7 +1166,7 @@ int perf_session__deliver_synth_event(struct perf_session *session,
 		return perf_session__process_user_event(session, event, tool,
 							0);
 
-	return perf_session_deliver_event(session, event, sample, tool, 0);
+	return __perf_session__deliver_event(session, event, sample, tool, 0);
 }
 
 static void event_swap(union perf_event *event, bool sample_id_all)
@@ -1258,6 +1278,11 @@ static void perf_session__warn_about_errors(const struct perf_session *session,
 			    "Do you have a KVM guest running and not using 'perf kvm'?\n",
 			    session->stats.nr_unprocessable_samples);
 	}
+
+	if (session->itrace && session->itrace->error_count) {
+		ui__warning("%llu instruction trace errors\n",
+			    session->itrace->error_count);
+	}
 }
 
 volatile int session_done;
@@ -1346,10 +1371,14 @@ done:
 	/* do the final flush for ordered samples */
 	session->ordered_samples.next_flush = ULLONG_MAX;
 	err = flush_sample_queue(session, tool);
+	if (err)
+		goto out_err;
+	err = itrace__flush_events(session, tool);
 out_err:
 	free(buf);
 	perf_session__warn_about_errors(session, tool);
 	perf_session_free_sample_buffers(session);
+	itrace__free_events(session);
 	return err;
 }
 
@@ -1492,10 +1521,14 @@ out:
 	/* do the final flush for ordered samples */
 	session->ordered_samples.next_flush = ULLONG_MAX;
 	err = flush_sample_queue(session, tool);
+	if (err)
+		goto out_err;
+	err = itrace__flush_events(session, tool);
 out_err:
 	ui_progress__finish();
 	perf_session__warn_about_errors(session, tool);
 	perf_session_free_sample_buffers(session);
+	itrace__free_events(session);
 	session->one_mmap = false;
 	return err;
 }
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 64d8145..9000193 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -29,10 +29,13 @@ struct ordered_samples {
 	unsigned int		nr_samples;
 };
 
+struct itrace;
+
 struct perf_session {
 	struct perf_header	header;
 	struct machines		machines;
 	struct perf_evlist	*evlist;
+	struct itrace		*itrace;
 	struct trace_event	tevent;
 	struct events_stats	stats;
 	bool			repipe;
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 47/71] perf session: Add Instruction Tracing options
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (45 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 46/71] perf session: Add Instruction Tracing hooks Alexander Shishkin
@ 2013-12-11 12:36 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 48/71] perf session: Make perf_event__itrace_swap() non-static Alexander Shishkin
                   ` (25 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:36 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Instruction Trace decoding synthesizes
"instructions" and "branches" events.
Add options for that.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/itrace.c  | 100 ++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/itrace.h  |  33 +++++++++++++++
 tools/perf/util/session.h |   2 +
 3 files changed, 135 insertions(+)

diff --git a/tools/perf/util/itrace.c b/tools/perf/util/itrace.c
index 9596cc2..b80411d 100644
--- a/tools/perf/util/itrace.c
+++ b/tools/perf/util/itrace.c
@@ -38,6 +38,7 @@
 
 #include "event.h"
 #include "debug.h"
+#include "parse-options.h"
 
 int itrace_mmap__mmap(struct itrace_mmap *mm, struct itrace_mmap_params *mp,
 		      int fd)
@@ -210,6 +211,105 @@ int perf_event__synthesize_itrace(struct perf_tool *tool,
 	return process(tool, &ev, NULL, NULL);
 }
 
+#define PERF_ITRACE_DEFAULT_PERIOD_TYPE	PERF_ITRACE_PERIOD_INSTRUCTIONS
+#define PERF_ITRACE_DEFAULT_PERIOD	1000
+
+void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts)
+{
+	synth_opts->instructions = true;
+	synth_opts->branches = true;
+	synth_opts->errors = true;
+	synth_opts->period_type = PERF_ITRACE_DEFAULT_PERIOD_TYPE;
+	synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
+}
+
+int itrace_parse_synth_opts(const struct option *opt, const char *str,
+			    int unset)
+{
+	struct itrace_synth_opts *synth_opts = opt->value;
+	const char *p;
+	char *endptr;
+
+	synth_opts->set = true;
+
+	if (unset) {
+		synth_opts->dont_decode = true;
+		return 0;
+	}
+
+	if (!str) {
+		itrace_synth_opts__set_default(synth_opts);
+		return 0;
+	}
+
+	for (p = str; *p;) {
+		switch (*p++) {
+		case 'i':
+			synth_opts->instructions = true;
+			while (*p == ' ' || *p == ',')
+				p += 1;
+			if (isdigit(*p)) {
+				synth_opts->period = strtoull(p, &endptr, 10);
+				p = endptr;
+				while (*p == ' ' || *p == ',')
+					p += 1;
+				switch (*p++) {
+				case 'i':
+					synth_opts->period_type =
+						PERF_ITRACE_PERIOD_INSTRUCTIONS;
+					break;
+				case 't':
+					synth_opts->period_type =
+						PERF_ITRACE_PERIOD_TICKS;
+					break;
+				case 'm':
+					synth_opts->period *= 1000;
+					/* Fall through */
+				case 'u':
+					synth_opts->period *= 1000;
+					/* Fall through */
+				case 'n':
+					if (*p++ != 's')
+						goto out_err;
+					synth_opts->period_type =
+						PERF_ITRACE_PERIOD_NANOSECS;
+					break;
+				case '\0':
+					goto out;
+				default:
+					goto out_err;
+				}
+			}
+			break;
+		case 'b':
+			synth_opts->branches = true;
+			break;
+		case 'e':
+			synth_opts->errors = true;
+			break;
+		case ' ':
+		case ',':
+			break;
+		default:
+			goto out_err;
+		}
+	}
+out:
+	if (synth_opts->instructions) {
+		if (!synth_opts->period_type)
+			synth_opts->period_type =
+					PERF_ITRACE_DEFAULT_PERIOD_TYPE;
+		if (!synth_opts->period)
+			synth_opts->period = PERF_ITRACE_DEFAULT_PERIOD;
+	}
+
+	return 0;
+
+out_err:
+	pr_err("Bad instruction trace options '%s'\n", str);
+	return -EINVAL;
+}
+
 int itrace_mmap__read(struct itrace_mmap *mm, struct itrace_record *itr,
 		      struct perf_tool *tool, process_itrace_t fn)
 {
diff --git a/tools/perf/util/itrace.h b/tools/perf/util/itrace.h
index e6b0cc0..8453351 100644
--- a/tools/perf/util/itrace.h
+++ b/tools/perf/util/itrace.h
@@ -33,9 +33,39 @@ union perf_event;
 struct perf_session;
 struct perf_evlist;
 struct perf_tool;
+struct option;
 struct perf_record_opts;
 struct itrace_info_event;
 
+enum itrace_period_type {
+	PERF_ITRACE_PERIOD_INSTRUCTIONS,
+	PERF_ITRACE_PERIOD_TICKS,
+	PERF_ITRACE_PERIOD_NANOSECS,
+};
+
+/**
+ * struct itrace_synth_opts - Instruction Tracing synthesis options.
+ * @set: indicates whether or not options have been set
+ * @inject: indicates the event (not just the sample) must be fully synthesized
+ *          because 'perf inject' will write it out
+ * @instructions: whether to synthesize 'instructions' events
+ * @branches: whether to synthesize 'branches' events
+ * @errors: whether to synthesize decoder error events
+ * @dont_decode: whether to skip decoding entirely
+ * @period: 'instructions' events period
+ * @period_type: 'instructions' events period type
+ */
+struct itrace_synth_opts {
+	bool			set;
+	bool			inject;
+	bool			instructions;
+	bool			branches;
+	bool			errors;
+	bool			dont_decode;
+	unsigned long long	period;
+	enum itrace_period_type	period_type;
+};
+
 struct itrace {
 	int (*process_event)(struct perf_session *session,
 			     union perf_event *event,
@@ -176,6 +206,9 @@ int perf_event__synthesize_itrace(struct perf_tool *tool,
 				  perf_event__handler_t process,
 				  size_t size, u64 offset, u64 ref, int idx,
 				  u32 tid, u32 cpu);
+int itrace_parse_synth_opts(const struct option *opt, const char *str,
+			    int unset);
+void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts);
 
 static inline int itrace__process_event(struct perf_session *session,
 					union perf_event *event,
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 9000193..a7873c0 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -30,12 +30,14 @@ struct ordered_samples {
 };
 
 struct itrace;
+struct itrace_synth_opts;
 
 struct perf_session {
 	struct perf_header	header;
 	struct machines		machines;
 	struct perf_evlist	*evlist;
 	struct itrace		*itrace;
+	struct itrace_synth_opts *itrace_synth_opts;
 	struct trace_event	tevent;
 	struct events_stats	stats;
 	bool			repipe;
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 48/71] perf session: Make perf_event__itrace_swap() non-static
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (46 preceding siblings ...)
  2013-12-11 12:36 ` [PATCH v0 47/71] perf session: Add Instruction Tracing options Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 49/71] perf itrace: Add helpers for Instruction Tracing errors Alexander Shishkin
                   ` (24 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Make it possible for the Instruction Trace decoder
to read and byte-swap Instruction Tracing events
directly from file.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/session.c | 4 ++--
 tools/perf/util/session.h | 1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 55aead5..49c89e7 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -505,8 +505,8 @@ static void perf_event__itrace_info_swap(union perf_event *event,
 	mem_bswap_64(event->itrace_info.priv, size);
 }
 
-static void perf_event__itrace_swap(union perf_event *event,
-				    bool sample_id_all __maybe_unused)
+void perf_event__itrace_swap(union perf_event *event,
+			     bool sample_id_all __maybe_unused)
 {
 	event->itrace.size      = bswap_64(event->itrace.size);
 	event->itrace.offset    = bswap_64(event->itrace.offset);
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index a7873c0..25aa9e7 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -61,6 +61,7 @@ struct perf_session *perf_session__new(struct perf_data_file *file,
 void perf_session__delete(struct perf_session *session);
 
 void perf_event_header__bswap(struct perf_event_header *hdr);
+void perf_event__itrace_swap(union perf_event *event, bool sample_id_all);
 
 int __perf_session__process_events(struct perf_session *session,
 				   u64 data_offset, u64 data_size, u64 size,
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 49/71] perf itrace: Add helpers for Instruction Tracing errors
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (47 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 48/71] perf session: Make perf_event__itrace_swap() non-static Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 50/71] perf itrace: Add helpers for queuing Instruction Tracing data Alexander Shishkin
                   ` (23 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add functions to synthesize, count and print
Instruction Tracing error events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/itrace.c | 55 ++++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/itrace.h | 16 ++++++++++++++
 2 files changed, 71 insertions(+)

diff --git a/tools/perf/util/itrace.c b/tools/perf/util/itrace.c
index b80411d..865b584 100644
--- a/tools/perf/util/itrace.c
+++ b/tools/perf/util/itrace.c
@@ -25,6 +25,7 @@
 #include <linux/perf_event.h>
 
 #include <stdlib.h>
+#include <stdio.h>
 #include <string.h>
 #include <errno.h>
 
@@ -37,6 +38,7 @@
 #include "itrace.h"
 
 #include "event.h"
+#include "session.h"
 #include "debug.h"
 #include "parse-options.h"
 
@@ -162,6 +164,28 @@ struct itrace_record *__attribute__ ((weak)) itrace_record__init(int *err)
 	return NULL;
 }
 
+void itrace_synth_error(struct itrace_error_event *itrace_error, int type,
+			int code, int cpu, pid_t pid, pid_t tid, u64 ip,
+			const char *msg)
+{
+	size_t size;
+
+	memset(itrace_error, 0, sizeof(struct itrace_error_event));
+
+	itrace_error->header.type = PERF_RECORD_ITRACE_ERROR;
+	itrace_error->type = type;
+	itrace_error->code = code;
+	itrace_error->cpu = cpu;
+	itrace_error->pid = pid;
+	itrace_error->tid = tid;
+	itrace_error->ip = ip;
+	strncpy(itrace_error->msg, msg, MAX_ITRACE_ERROR_MSG - 1);
+
+	size = (void *)itrace_error->msg - (void *)itrace_error +
+	       strlen(itrace_error->msg);
+	itrace_error->header.size = PERF_ALIGN(size, sizeof(u64));
+}
+
 int perf_event__synthesize_itrace_info(struct itrace_record *itr,
 				       struct perf_tool *tool,
 				       struct perf_session *session,
@@ -310,6 +334,37 @@ out_err:
 	return -EINVAL;
 }
 
+size_t perf_event__fprintf_itrace_error(union perf_event *event, FILE *fp)
+{
+	struct itrace_error_event *e = &event->itrace_error;
+	int ret;
+
+	ret = fprintf(fp, " Instruction trace error type %u", e->type);
+	ret += fprintf(fp, " cpu %d pid %d tid %d ip %#"PRIx64" code %u: %s\n",
+		       e->cpu, e->pid, e->tid, e->ip, e->code, e->msg);
+	return ret;
+}
+
+int perf_event__process_itrace_error(struct perf_tool *tool __maybe_unused,
+				     union perf_event *event,
+				     struct perf_session *session)
+{
+	if (session->itrace)
+		session->itrace->error_count += 1;
+
+	perf_event__fprintf_itrace_error(event, stdout);
+	return 0;
+}
+
+int perf_event__count_itrace_error(struct perf_tool *tool __maybe_unused,
+				   union perf_event *event __maybe_unused,
+				   struct perf_session *session)
+{
+	if (session->itrace)
+		session->itrace->error_count += 1;
+	return 0;
+}
+
 int itrace_mmap__read(struct itrace_mmap *mm, struct itrace_record *itr,
 		      struct perf_tool *tool, process_itrace_t fn)
 {
diff --git a/tools/perf/util/itrace.h b/tools/perf/util/itrace.h
index 8453351..08877d2 100644
--- a/tools/perf/util/itrace.h
+++ b/tools/perf/util/itrace.h
@@ -37,6 +37,10 @@ struct option;
 struct perf_record_opts;
 struct itrace_info_event;
 
+enum itrace_error_type {
+	PERF_ITRACE_DECODER_ERROR = 1,
+};
+
 enum itrace_period_type {
 	PERF_ITRACE_PERIOD_INSTRUCTIONS,
 	PERF_ITRACE_PERIOD_TICKS,
@@ -198,6 +202,10 @@ int itrace_record__info_fill(struct itrace_record *itr,
 void itrace_record__free(struct itrace_record *itr);
 u64 itrace_record__reference(struct itrace_record *itr);
 
+void itrace_synth_error(struct itrace_error_event *itrace_error, int type,
+			int code, int cpu, pid_t pid, pid_t tid, u64 ip,
+			const char *msg);
+
 int perf_event__synthesize_itrace_info(struct itrace_record *itr,
 				       struct perf_tool *tool,
 				       struct perf_session *session,
@@ -206,10 +214,18 @@ int perf_event__synthesize_itrace(struct perf_tool *tool,
 				  perf_event__handler_t process,
 				  size_t size, u64 offset, u64 ref, int idx,
 				  u32 tid, u32 cpu);
+int perf_event__process_itrace_error(struct perf_tool *tool,
+				     union perf_event *event,
+				     struct perf_session *session);
+int perf_event__count_itrace_error(struct perf_tool *tool __maybe_unused,
+				   union perf_event *event __maybe_unused,
+				   struct perf_session *session);
 int itrace_parse_synth_opts(const struct option *opt, const char *str,
 			    int unset);
 void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts);
 
+size_t perf_event__fprintf_itrace_error(union perf_event *event, FILE *fp);
+
 static inline int itrace__process_event(struct perf_session *session,
 					union perf_event *event,
 					struct perf_sample *sample,
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 50/71] perf itrace: Add helpers for queuing Instruction Tracing data
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (48 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 49/71] perf itrace: Add helpers for Instruction Tracing errors Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 51/71] perf itrace: Add a heap for sorting Instruction Tracing queues Alexander Shishkin
                   ` (22 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Provide functions to queue Instruction
Tracing data buffers for processing.
There is one queue for each of the
mmap buffers used for recording.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/itrace.c | 278 +++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/itrace.h |  77 +++++++++++++
 2 files changed, 355 insertions(+)

diff --git a/tools/perf/util/itrace.c b/tools/perf/util/itrace.c
index 865b584..f26d6cd 100644
--- a/tools/perf/util/itrace.c
+++ b/tools/perf/util/itrace.c
@@ -23,11 +23,15 @@
 
 #include <linux/kernel.h>
 #include <linux/perf_event.h>
+#include <linux/string.h>
 
+#include <sys/param.h>
 #include <stdlib.h>
 #include <stdio.h>
 #include <string.h>
+#include <limits.h>
 #include <errno.h>
+#include <linux/list.h>
 
 #include "../perf.h"
 #include "types.h"
@@ -113,6 +117,233 @@ void itrace_mmap_params__set_idx(struct itrace_mmap_params *mp,
 	}
 }
 
+#define ITRACE_INIT_NR_QUEUES	32
+
+static struct itrace_queue *itrace_alloc_queue_array(unsigned int nr_queues)
+{
+	struct itrace_queue *queue_array;
+	unsigned int max_nr_queues, i;
+
+	max_nr_queues = MIN(UINT_MAX, SIZE_MAX) / sizeof(struct itrace_queue);
+	if (nr_queues > max_nr_queues)
+		return NULL;
+
+	queue_array = calloc(nr_queues, sizeof(struct itrace_queue));
+	if (!queue_array)
+		return NULL;
+
+	for (i = 0; i < nr_queues; i++) {
+		INIT_LIST_HEAD(&queue_array[i].head);
+		queue_array[i].priv = NULL;
+	}
+
+	return queue_array;
+}
+
+int itrace_queues__init(struct itrace_queues *queues)
+{
+	queues->nr_queues = ITRACE_INIT_NR_QUEUES;
+	queues->queue_array = itrace_alloc_queue_array(queues->nr_queues);
+	if (!queues->queue_array)
+		return -ENOMEM;
+	return 0;
+}
+
+static int itrace_queues__grow(struct itrace_queues *queues,
+			       unsigned int new_nr_queues)
+{
+	unsigned int nr_queues = queues->nr_queues;
+	struct itrace_queue *queue_array;
+	unsigned int i;
+
+	if (!nr_queues)
+		nr_queues = ITRACE_INIT_NR_QUEUES;
+
+	while (nr_queues && nr_queues < new_nr_queues)
+		nr_queues <<= 1;
+
+	if (nr_queues < queues->nr_queues || nr_queues < new_nr_queues)
+		return -EINVAL;
+
+	queue_array = itrace_alloc_queue_array(nr_queues);
+	if (!queue_array)
+		return -ENOMEM;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		list_splice_tail(&queues->queue_array[i].head,
+				 &queue_array[i].head);
+		queue_array[i].priv = queues->queue_array[i].priv;
+	}
+
+	queues->nr_queues = nr_queues;
+	queues->queue_array = queue_array;
+
+	return 0;
+}
+
+static void *itrace_event__copy_data(union perf_event *event,
+				     struct perf_session *session)
+{
+	int fd = perf_data_file__fd(session->file);
+	void *p;
+	ssize_t ret;
+
+	if (event->itrace.size > SSIZE_MAX)
+		return NULL;
+
+	p = malloc(event->itrace.size);
+	if (!p)
+		return NULL;
+
+	ret = readn(fd, p, event->itrace.size);
+	if (ret != (ssize_t)event->itrace.size) {
+		free(p);
+		return NULL;
+	}
+
+	return p;
+}
+
+static int itrace_queues__add_buffer(struct itrace_queues *queues,
+				     unsigned int idx,
+				     struct itrace_buffer *buffer)
+{
+	struct itrace_queue *queue;
+	int err;
+
+	if (idx >= queues->nr_queues) {
+		err = itrace_queues__grow(queues, idx + 1);
+		if (err)
+			goto out_err;
+	}
+
+	queue = &queues->queue_array[idx];
+
+	if (!queue->set) {
+		queue->set = true;
+		queue->tid = buffer->tid;
+		queue->cpu = buffer->cpu;
+	} else if (buffer->cpu != queue->cpu || buffer->tid != queue->tid) {
+		pr_err("itrace queue conflict: cpu %d, tid %d vs cpu %d, tid %d\n",
+		       queue->cpu, queue->tid, buffer->cpu, buffer->tid);
+		err = -EINVAL;
+		goto out_err;
+	}
+
+	list_add_tail(&buffer->list, &queue->head);
+
+	queues->new_data = true;
+
+	return 0;
+
+out_err:
+	if (buffer->data_needs_freeing)
+		free(buffer->data);
+	free(buffer);
+	return err;
+}
+
+/* Limit buffers to 32MiB on 32-bit */
+#define BUFFER_LIMIT_FOR_32_BIT (32 * 1024 * 1024)
+
+static int itrace_queues__split_buffer(struct itrace_queues *queues,
+				       union perf_event *event,
+				       struct itrace_buffer *buffer)
+{
+	u64 sz = event->itrace.size;
+	bool consecutive = false;
+	struct itrace_buffer *b;
+	int err;
+
+	while (sz > BUFFER_LIMIT_FOR_32_BIT) {
+		b = memdup(buffer, sizeof(struct itrace_buffer));
+		if (!b)
+			return -ENOMEM;
+		b->size = BUFFER_LIMIT_FOR_32_BIT;
+		b->consecutive = consecutive;
+		err = itrace_queues__add_buffer(queues, event->itrace.idx, b);
+		if (err)
+			return err;
+		buffer->data_offset += BUFFER_LIMIT_FOR_32_BIT;
+		sz -= BUFFER_LIMIT_FOR_32_BIT;
+		consecutive = true;
+	}
+
+	buffer->size = sz;
+	buffer->consecutive = consecutive;
+
+	return 0;
+}
+
+int itrace_queues__add_event(struct itrace_queues *queues,
+			     struct perf_session *session,
+			     union perf_event *event, off_t data_offset,
+			     struct itrace_buffer **buffer_ptr)
+{
+	struct itrace_buffer *buffer;
+	int err;
+
+	queues->populated = true;
+
+	buffer = zalloc(sizeof(struct itrace_buffer));
+	if (!buffer)
+		return -ENOMEM;
+
+	if (buffer_ptr)
+		*buffer_ptr = buffer;
+
+	buffer->tid = event->itrace.tid;
+	buffer->cpu = event->itrace.cpu;
+
+	buffer->offset = event->itrace.offset;
+	buffer->reference = event->itrace.reference;
+
+	buffer->size = event->itrace.size;
+
+	if (session->one_mmap) {
+		buffer->data = data_offset - session->one_mmap_offset +
+			       session->one_mmap_addr;
+	} else if (perf_data_file__is_pipe(session->file)) {
+		buffer->data = itrace_event__copy_data(event, session);
+		if (!buffer->data)
+			return -ENOMEM;
+		buffer->data_needs_freeing = true;
+	} else if (BITS_PER_LONG == 64 ||
+		   event->itrace.size <= BUFFER_LIMIT_FOR_32_BIT) {
+		buffer->data_offset = data_offset;
+	} else {
+		buffer->data_offset = data_offset;
+		err = itrace_queues__split_buffer(queues, event, buffer);
+		if (err)
+			return err;
+	}
+
+	return itrace_queues__add_buffer(queues, event->itrace.idx, buffer);
+}
+
+void itrace_queues__free(struct itrace_queues *queues)
+{
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		while (!list_empty(&queues->queue_array[i].head)) {
+			struct itrace_buffer *buffer;
+
+			buffer = list_entry(queues->queue_array[i].head.next,
+					     struct itrace_buffer, list);
+			itrace_buffer__put_data(buffer);
+			if (buffer->data_needs_freeing)
+				free(buffer->data);
+			list_del(&buffer->list);
+			free(buffer);
+		}
+	}
+
+	free(queues->queue_array);
+	queues->queue_array = NULL;
+	queues->nr_queues = 0;
+}
+
 size_t itrace_record__info_priv_size(struct itrace_record *itr)
 {
 	if (itr)
@@ -164,6 +395,53 @@ struct itrace_record *__attribute__ ((weak)) itrace_record__init(int *err)
 	return NULL;
 }
 
+struct itrace_buffer *itrace_buffer__next(struct itrace_queue *queue,
+					  struct itrace_buffer *buffer)
+{
+	if (buffer) {
+		if (list_is_last(&buffer->list, &queue->head))
+			return NULL;
+		return list_entry(buffer->list.next, struct itrace_buffer,
+				  list);
+	} else {
+		if (list_empty(&queue->head))
+			return NULL;
+		return list_entry(queue->head.next, struct itrace_buffer, list);
+	}
+}
+
+void *itrace_buffer__get_data(struct itrace_buffer *buffer, int fd)
+{
+	size_t adj = buffer->data_offset & (page_size - 1);
+	size_t size = buffer->size + adj;
+	off_t file_offset = buffer->data_offset - adj;
+	void *addr;
+
+	if (buffer->data)
+		return buffer->data;
+
+	addr = mmap(NULL, size, PROT_READ, MAP_SHARED, fd, file_offset);
+	if (addr == MAP_FAILED)
+		return NULL;
+
+	buffer->mmap_addr = addr;
+	buffer->mmap_size = size;
+
+	buffer->data = addr + adj;
+
+	return buffer->data;
+}
+
+void itrace_buffer__put_data(struct itrace_buffer *buffer)
+{
+	if (!buffer->data || !buffer->mmap_addr)
+		return;
+	munmap(buffer->mmap_addr, buffer->mmap_size);
+	buffer->mmap_addr = NULL;
+	buffer->mmap_size = 0;
+	buffer->data = NULL;
+}
+
 void itrace_synth_error(struct itrace_error_event *itrace_error, int type,
 			int code, int cpu, pid_t pid, pid_t tid, u64 ip,
 			const char *msg)
diff --git a/tools/perf/util/itrace.h b/tools/perf/util/itrace.h
index 08877d2..b4aca53 100644
--- a/tools/perf/util/itrace.h
+++ b/tools/perf/util/itrace.h
@@ -23,6 +23,7 @@
 #include <sys/types.h>
 #include <stdbool.h>
 #include <stddef.h>
+#include <linux/list.h>
 #include <linux/perf_event.h>
 
 #include "../perf.h"
@@ -83,6 +84,72 @@ struct itrace {
 };
 
 /**
+ * struct itrace_buffer - a buffer containing Instruction Tracing data.
+ * @list: buffers are queued in a list held by struct itrace_queue
+ * @size: size of the buffer in bytes
+ * @pid: in per-thread mode, the pid this buffer is associated with
+ * @tid: in per-thread mode, the tid this buffer is associated with
+ * @cpu: in per-cpu mode, the cpu this buffer is associated with
+ * @data: actual buffer data (can be null if the data has not been loaded)
+ * @data_offset: file offset at which the buffer can be read
+ * @mmap_addr: mmap address at which the buffer can be read
+ * @mmap_size: size of the mmap at @mmap_addr
+ * @data_needs_freeing: @data was malloc'd so free it when it is no longer
+ *                      needed
+ * @consecutive: the original data was split up and this buffer is consecutive
+ *               to the previous buffer
+ * @offset: offset as determined by data_head / data_tail members of struct
+ *          perf_event_mmap_page
+ * @reference: an implementation-specific reference determined when the data is
+ *             recorded
+ */
+struct itrace_buffer {
+	struct list_head	list;
+	size_t			size;
+	pid_t			pid;
+	pid_t			tid;
+	int			cpu;
+	void			*data;
+	off_t			data_offset;
+	void			*mmap_addr;
+	size_t			mmap_size;
+	bool			data_needs_freeing;
+	bool			consecutive;
+	u64			offset;
+	u64			reference;
+};
+
+/**
+ * struct itrace_queue - a queue of Instruction Tracing data buffers.
+ * @head: head of buffer list
+ * @tid: in per-thread mode, the tid this queue is associated with
+ * @cpu: in per-cpu mode, the cpu this queue is associated with
+ * @set: %true once this queue has been dedicated to a specific thread or cpu
+ * @priv: implementation-specific data
+ */
+struct itrace_queue {
+	struct list_head	head;
+	pid_t			tid;
+	int			cpu;
+	bool			set;
+	void			*priv;
+};
+
+/**
+ * struct itrace_queues - an array of Instruction Tracing queues.
+ * @queue_array: array of queues
+ * @nr_queues: number of queues
+ * @new_data: set whenever new data is queued
+ * @populated: queues have been fully populated using the itrace_index
+ */
+struct itrace_queues {
+	struct itrace_queue	*queue_array;
+	unsigned int		nr_queues;
+	bool			new_data;
+	bool			populated;
+};
+
+/**
  * struct itrace_mmap - records an mmap at PERF_EVENT_ITRACE_OFFSET.
  * @base: address of mapped area
  * @mask: %0 if @len is not a power of two, otherwise (@len - %1)
@@ -189,6 +256,16 @@ int itrace_mmap__read(struct itrace_mmap *mm,
 			    struct itrace_record *itr, struct perf_tool *tool,
 			    process_itrace_t fn);
 
+int itrace_queues__init(struct itrace_queues *queues);
+int itrace_queues__add_event(struct itrace_queues *queues,
+			     struct perf_session *session,
+			     union perf_event *event, off_t data_offset,
+			     struct itrace_buffer **buffer_ptr);
+void itrace_queues__free(struct itrace_queues *queues);
+struct itrace_buffer *itrace_buffer__next(struct itrace_queue *queue,
+					  struct itrace_buffer *buffer);
+void *itrace_buffer__get_data(struct itrace_buffer *buffer, int fd);
+void itrace_buffer__put_data(struct itrace_buffer *buffer);
 struct itrace_record *itrace_record__init(int *err);
 
 int itrace_record__options(struct itrace_record *itr,
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 51/71] perf itrace: Add a heap for sorting Instruction Tracing queues
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (49 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 50/71] perf itrace: Add helpers for queuing Instruction Tracing data Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 52/71] perf itrace: Add processing for Instruction Tracing events Alexander Shishkin
                   ` (21 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

In order to process Instruction Tracing
data in time order, the queue with data
with the lowest timestamp must be
processed first.  Provide a heap to
keep track of which queue that is.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/itrace.c | 86 ++++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/itrace.h | 29 ++++++++++++++++
 2 files changed, 115 insertions(+)

diff --git a/tools/perf/util/itrace.c b/tools/perf/util/itrace.c
index f26d6cd..44214bc 100644
--- a/tools/perf/util/itrace.c
+++ b/tools/perf/util/itrace.c
@@ -344,6 +344,92 @@ void itrace_queues__free(struct itrace_queues *queues)
 	queues->nr_queues = 0;
 }
 
+static void itrace_heapify(struct itrace_heap_item *heap_array,
+			   unsigned int pos, unsigned int queue_nr,
+			   u64 ordinal)
+{
+	unsigned int parent;
+
+	while (pos) {
+		parent = (pos - 1) >> 1;
+		if (heap_array[parent].ordinal <= ordinal)
+			break;
+		heap_array[pos] = heap_array[parent];
+		pos = parent;
+	}
+	heap_array[pos].queue_nr = queue_nr;
+	heap_array[pos].ordinal = ordinal;
+}
+
+int itrace_heap__add(struct itrace_heap *heap, unsigned int queue_nr,
+		     u64 ordinal)
+{
+	struct itrace_heap_item *heap_array;
+
+	if (queue_nr >= heap->heap_sz) {
+		unsigned int heap_sz = ITRACE_INIT_NR_QUEUES;
+
+		while (heap_sz <= queue_nr)
+			heap_sz <<= 1;
+		heap_array = realloc(heap->heap_array,
+				     heap_sz * sizeof(struct itrace_heap_item));
+		if (!heap_array)
+			return -ENOMEM;
+		heap->heap_array = heap_array;
+		heap->heap_sz = heap_sz;
+	}
+
+	itrace_heapify(heap->heap_array, heap->heap_cnt++, queue_nr, ordinal);
+
+	return 0;
+}
+
+void itrace_heap__free(struct itrace_heap *heap)
+{
+	free(heap->heap_array);
+	heap->heap_array = NULL;
+	heap->heap_cnt = 0;
+	heap->heap_sz = 0;
+}
+
+void itrace_heap__pop(struct itrace_heap *heap)
+{
+	unsigned int pos, last, heap_cnt = heap->heap_cnt;
+	struct itrace_heap_item *heap_array;
+
+	if (!heap_cnt)
+		return;
+
+	heap->heap_cnt -= 1;
+
+	heap_array = heap->heap_array;
+
+	pos = 0;
+	while (1) {
+		unsigned int left, right;
+
+		left = (pos << 1) + 1;
+		if (left >= heap_cnt)
+			break;
+		right = left + 1;
+		if (right >= heap_cnt) {
+			heap_array[pos] = heap_array[left];
+			return;
+		}
+		if (heap_array[left].ordinal < heap_array[right].ordinal) {
+			heap_array[pos] = heap_array[left];
+			pos = left;
+		} else {
+			heap_array[pos] = heap_array[right];
+			pos = right;
+		}
+	}
+
+	last = heap_cnt - 1;
+	itrace_heapify(heap_array, pos, heap_array[last].queue_nr,
+		       heap_array[last].ordinal);
+}
+
 size_t itrace_record__info_priv_size(struct itrace_record *itr)
 {
 	if (itr)
diff --git a/tools/perf/util/itrace.h b/tools/perf/util/itrace.h
index b4aca53..304d377 100644
--- a/tools/perf/util/itrace.h
+++ b/tools/perf/util/itrace.h
@@ -150,6 +150,29 @@ struct itrace_queues {
 };
 
 /**
+ * struct itrace_heap_item - element of struct itrace_heap.
+ * @queue_nr: queue number
+ * @ordinal: value used for sorting (lowest ordinal is top of the heap) expected
+ *           to be a timestamp
+ */
+struct itrace_heap_item {
+	unsigned int		queue_nr;
+	u64			ordinal;
+};
+
+/**
+ * struct itrace_heap - a heap suitable for sorting Instruction Tracing queues.
+ * @heap_array: the heap
+ * @heap_cnt: the number of elements in the heap
+ * @heap_sz: maximum number of elements (grows as needed)
+ */
+struct itrace_heap {
+	struct itrace_heap_item	*heap_array;
+	unsigned int		heap_cnt;
+	unsigned int		heap_sz;
+};
+
+/**
  * struct itrace_mmap - records an mmap at PERF_EVENT_ITRACE_OFFSET.
  * @base: address of mapped area
  * @mask: %0 if @len is not a power of two, otherwise (@len - %1)
@@ -266,6 +289,12 @@ struct itrace_buffer *itrace_buffer__next(struct itrace_queue *queue,
 					  struct itrace_buffer *buffer);
 void *itrace_buffer__get_data(struct itrace_buffer *buffer, int fd);
 void itrace_buffer__put_data(struct itrace_buffer *buffer);
+
+int itrace_heap__add(struct itrace_heap *heap, unsigned int queue_nr,
+		     u64 ordinal);
+void itrace_heap__pop(struct itrace_heap *heap);
+void itrace_heap__free(struct itrace_heap *heap);
+
 struct itrace_record *itrace_record__init(int *err);
 
 int itrace_record__options(struct itrace_record *itr,
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 52/71] perf itrace: Add processing for Instruction Tracing events
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (50 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 51/71] perf itrace: Add a heap for sorting Instruction Tracing queues Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 53/71] perf script: Add Instruction Tracing support Alexander Shishkin
                   ` (20 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Provide hooks so that an Instruction Trace
decoder can process Instruction Tracing
events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/itrace.c | 52 ++++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/itrace.h | 13 ++++++++++++
 2 files changed, 65 insertions(+)

diff --git a/tools/perf/util/itrace.c b/tools/perf/util/itrace.c
index 44214bc..91f1fb5 100644
--- a/tools/perf/util/itrace.c
+++ b/tools/perf/util/itrace.c
@@ -579,6 +579,31 @@ out_free:
 	return err;
 }
 
+static bool itrace__dont_decode(struct perf_session *session)
+{
+	return !session->itrace_synth_opts ||
+	       session->itrace_synth_opts->dont_decode;
+}
+
+int perf_event__process_itrace_info(struct perf_tool *tool __maybe_unused,
+				    union perf_event *event,
+				    struct perf_session *session __maybe_unused)
+{
+	enum itrace_type type = event->itrace_info.type;
+
+	if (dump_trace)
+		fprintf(stdout, " type: %u\n", type);
+
+	if (itrace__dont_decode(session))
+		return 0;
+
+	switch (type) {
+	case PERF_ITRACE_UNKNOWN:
+	default:
+		return -EINVAL;
+	}
+}
+
 int perf_event__synthesize_itrace(struct perf_tool *tool,
 				  perf_event__handler_t process,
 				  size_t size, u64 offset, u64 ref, int idx,
@@ -599,6 +624,30 @@ int perf_event__synthesize_itrace(struct perf_tool *tool,
 	return process(tool, &ev, NULL, NULL);
 }
 
+s64 perf_event__process_itrace(struct perf_tool *tool, union perf_event *event,
+			       struct perf_session *session)
+{
+	s64 err;
+
+	if (dump_trace)
+		fprintf(stdout, " size: %"PRIu64"  offset: %"PRIx64"  ref: %"PRIx64"  idx: %u  tid: %d  cpu: %d\n",
+			event->itrace.size, event->itrace.offset,
+			event->itrace.reference, event->itrace.idx,
+			event->itrace.tid, event->itrace.cpu);
+
+	if (itrace__dont_decode(session))
+		return event->itrace.size;
+
+	if (!session->itrace || event->header.type != PERF_RECORD_ITRACE)
+		return -EINVAL;
+
+	err = session->itrace->process_itrace_event(session, event, tool);
+	if (err < 0)
+		return err;
+
+	return event->itrace.size;
+}
+
 #define PERF_ITRACE_DEFAULT_PERIOD_TYPE	PERF_ITRACE_PERIOD_INSTRUCTIONS
 #define PERF_ITRACE_DEFAULT_PERIOD	1000
 
@@ -713,6 +762,9 @@ int perf_event__process_itrace_error(struct perf_tool *tool __maybe_unused,
 				     union perf_event *event,
 				     struct perf_session *session)
 {
+	if (itrace__dont_decode(session))
+		return 0;
+
 	if (session->itrace)
 		session->itrace->error_count += 1;
 
diff --git a/tools/perf/util/itrace.h b/tools/perf/util/itrace.h
index 304d377..ec3b78a 100644
--- a/tools/perf/util/itrace.h
+++ b/tools/perf/util/itrace.h
@@ -38,6 +38,10 @@ struct option;
 struct perf_record_opts;
 struct itrace_info_event;
 
+enum itrace_type {
+	PERF_ITRACE_UNKNOWN,
+};
+
 enum itrace_error_type {
 	PERF_ITRACE_DECODER_ERROR = 1,
 };
@@ -76,6 +80,9 @@ struct itrace {
 			     union perf_event *event,
 			     struct perf_sample *sample,
 			     struct perf_tool *tool);
+	int (*process_itrace_event)(struct perf_session *session,
+				    union perf_event *event,
+				    struct perf_tool *tool);
 	int (*flush_events)(struct perf_session *session,
 			    struct perf_tool *tool);
 	void (*free_events)(struct perf_session *session);
@@ -316,10 +323,16 @@ int perf_event__synthesize_itrace_info(struct itrace_record *itr,
 				       struct perf_tool *tool,
 				       struct perf_session *session,
 				       perf_event__handler_t process);
+int perf_event__process_itrace_info(struct perf_tool *tool,
+				    union perf_event *event,
+				    struct perf_session *session);
 int perf_event__synthesize_itrace(struct perf_tool *tool,
 				  perf_event__handler_t process,
 				  size_t size, u64 offset, u64 ref, int idx,
 				  u32 tid, u32 cpu);
+s64 perf_event__process_itrace(struct perf_tool *tool,
+			       union perf_event *event,
+			       struct perf_session *session);
 int perf_event__process_itrace_error(struct perf_tool *tool,
 				     union perf_event *event,
 				     struct perf_session *session);
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 53/71] perf script: Add Instruction Tracing support
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (51 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 52/71] perf itrace: Add processing for Instruction Tracing events Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 54/71] perf script: Always allow fields 'addr' and 'cpu' for itrace Alexander Shishkin
                   ` (19 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for decoding an Instruction Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-script.txt | 21 +++++++++++++++++++++
 tools/perf/builtin-script.c              | 11 +++++++++++
 2 files changed, 32 insertions(+)

diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt
index cfdbb1e..f9ad25e 100644
--- a/tools/perf/Documentation/perf-script.txt
+++ b/tools/perf/Documentation/perf-script.txt
@@ -209,6 +209,27 @@ OPTIONS
 --show-mmap-events
 	Display mmap related events (e.g. MMAP, MMAP2).
 
+-Z::
+--itrace::
+	Options for decoding Instruction Tracing data. The options are:
+
+		i	synthesize instructions events
+		b	synthesize branches events
+		e	synthesize error events
+
+	The default is all events i.e. the same as -Zibe
+
+	In addition, the period (default 1000) for instructions events can be
+	specified in units of:
+
+		i	instructions (default)
+		t	ticks
+		ms	milliseconds
+		us	microseconds
+		ns	nanoseconds
+
+	To disable decoding entirely, use --no-itrace.
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-script-perl[1],
diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 4484886..96cdcd8 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -16,6 +16,7 @@
 #include "util/evsel.h"
 #include "util/sort.h"
 #include "util/data.h"
+#include "util/itrace.h"
 #include <linux/bitmap.h>
 
 static char const		*script_name;
@@ -1487,6 +1488,7 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 	char *rec_script_path = NULL;
 	char *rep_script_path = NULL;
 	struct perf_session *session;
+	struct itrace_synth_opts itrace_synth_opts = {0};
 	char *script_path = NULL;
 	const char **__argv;
 	int i, j, err;
@@ -1501,6 +1503,10 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 			.attr		 = process_attr,
 			.tracing_data	 = perf_event__process_tracing_data,
 			.build_id	 = perf_event__process_build_id,
+			.id_index	 = perf_event__process_id_index,
+			.itrace_info	 = perf_event__process_itrace_info,
+			.itrace		 = perf_event__process_itrace,
+			.itrace_error	 = perf_event__process_itrace_error,
 			.ordered_samples = true,
 			.ordering_requires_timestamps = true,
 		},
@@ -1550,6 +1556,9 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 		    "Show the fork/comm/exit events"),
 	OPT_BOOLEAN('\0', "show-mmap-events", &script.show_mmap_events,
 		    "Show the mmap events"),
+	OPT_CALLBACK_OPTARG('Z', "itrace", &itrace_synth_opts, NULL, "opts",
+			    "Instruction Tracing options",
+			    itrace_parse_synth_opts),
 	OPT_END()
 	};
 	const char * const script_usage[] = {
@@ -1740,6 +1749,8 @@ int cmd_script(int argc, const char **argv, const char *prefix __maybe_unused)
 
 	script.session = session;
 
+	session->itrace_synth_opts = &itrace_synth_opts;
+
 	if (cpu_list) {
 		if (perf_session__cpu_bitmap(session, cpu_list, cpu_bitmap))
 			return -1;
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 54/71] perf script: Always allow fields 'addr' and 'cpu' for itrace
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (52 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 53/71] perf script: Add Instruction Tracing support Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 19:41   ` David Ahern
  2013-12-11 12:37 ` [PATCH v0 55/71] perf report: Add Instruction Tracing support Alexander Shishkin
                   ` (18 subsequent siblings)
  72 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

If a file contains Instruction Tracing data then always allow
fields 'addr' and 'cpu' to be selected as options for perf
script.  This is necessary because Instruction Trace decoding
may synthesize events with that information.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-script.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 96cdcd8..15f4941 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -190,6 +190,7 @@ static int perf_evsel__check_attr(struct perf_evsel *evsel,
 	}
 
 	if (PRINT_FIELD(ADDR) &&
+		!perf_header__has_feat(&session->header, HEADER_ITRACE) &&
 		perf_evsel__check_stype(evsel, PERF_SAMPLE_ADDR, "ADDR",
 					PERF_OUTPUT_ADDR))
 		return -EINVAL;
@@ -223,6 +224,7 @@ static int perf_evsel__check_attr(struct perf_evsel *evsel,
 		return -EINVAL;
 
 	if (PRINT_FIELD(CPU) &&
+		!perf_header__has_feat(&session->header, HEADER_ITRACE) &&
 		perf_evsel__check_stype(evsel, PERF_SAMPLE_CPU, "CPU",
 					PERF_OUTPUT_CPU))
 		return -EINVAL;
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 55/71] perf report: Add Instruction Tracing support
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (53 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 54/71] perf script: Always allow fields 'addr' and 'cpu' for itrace Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 56/71] perf tools: Add Instruction Trace sampling support Alexander Shishkin
                   ` (17 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for decoding an Instruction Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-report.txt | 21 +++++++++++++++++++++
 tools/perf/builtin-report.c              | 12 ++++++++++++
 2 files changed, 33 insertions(+)

diff --git a/tools/perf/Documentation/perf-report.txt b/tools/perf/Documentation/perf-report.txt
index 10a2798..0c10bfb 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -237,6 +237,27 @@ OPTIONS
 	Do not show entries which have an overhead under that percent.
 	(Default: 0).
 
+-Z::
+--itrace::
+	Options for decoding Instruction Tracing data. The options are:
+
+		i	synthesize instructions events
+		b	synthesize branches events
+		e	synthesize error events
+
+	The default is all events i.e. the same as -Zibe
+
+	In addition, the period (default 1000) for instructions events can be
+	specified in units of:
+
+		i	instructions (default)
+		t	ticks
+		ms	milliseconds
+		us	microseconds
+		ns	nanoseconds
+
+	To disable decoding entirely, use --no-itrace.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-annotate[1]
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 8cf8e66..97e3ee6 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -36,6 +36,8 @@
 #include "util/data.h"
 #include "arch/common.h"
 
+#include "util/itrace.h"
+
 #include <dlfcn.h>
 #include <linux/bitmap.h>
 
@@ -768,6 +770,7 @@ parse_percent_limit(const struct option *opt, const char *str,
 int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 {
 	struct perf_session *session;
+	struct itrace_synth_opts itrace_synth_opts = {0};
 	struct stat st;
 	bool has_br_stack = false;
 	int branch_mode = -1;
@@ -790,6 +793,10 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 			.attr		 = perf_event__process_attr,
 			.tracing_data	 = perf_event__process_tracing_data,
 			.build_id	 = perf_event__process_build_id,
+			.id_index	 = perf_event__process_id_index,
+			.itrace_info	 = perf_event__process_itrace_info,
+			.itrace		 = perf_event__process_itrace,
+			.itrace_error	 = perf_event__count_itrace_error,
 			.ordered_samples = true,
 			.ordering_requires_timestamps = true,
 		},
@@ -884,6 +891,9 @@ int cmd_report(int argc, const char **argv, const char *prefix __maybe_unused)
 	OPT_BOOLEAN(0, "mem-mode", &report.mem_mode, "mem access profile"),
 	OPT_CALLBACK(0, "percent-limit", &report, "percent",
 		     "Don't show entries under that percent", parse_percent_limit),
+	OPT_CALLBACK_OPTARG('Z', "itrace", &itrace_synth_opts, NULL, "opts",
+			    "Instruction Tracing options",
+			    itrace_parse_synth_opts),
 	OPT_END()
 	};
 	struct perf_data_file file = {
@@ -919,6 +929,8 @@ repeat:
 	if (session == NULL)
 		return -ENOMEM;
 
+	session->itrace_synth_opts = &itrace_synth_opts;
+
 	report.session = session;
 
 	has_br_stack = perf_header__has_feat(&session->header,
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 56/71] perf tools: Add Instruction Trace sampling support
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (54 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 55/71] perf report: Add Instruction Tracing support Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 57/71] perf record: " Alexander Shishkin
                   ` (16 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add functions to configure Instruction
Trace sampling and queue Instruction
Trace samples for processing

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/perf.h         |  10 ++++
 tools/perf/util/evsel.c   |   7 +++
 tools/perf/util/itrace.c  | 117 ++++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/itrace.h  |  36 ++++++++++++++
 tools/perf/util/record.c  |   2 +-
 tools/perf/util/session.c |   6 ++-
 6 files changed, 176 insertions(+), 2 deletions(-)

diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index b68b469..c748383 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -262,6 +262,7 @@ struct perf_record_opts {
 	bool	     sample_time;
 	bool	     period;
 	bool	     full_itrace;
+	bool	     sample_itrace;
 	unsigned int freq;
 	unsigned int mmap_pages;
 	unsigned int itrace_mmap_pages;
@@ -269,8 +270,17 @@ struct perf_record_opts {
 	u64          branch_stack;
 	u64	     default_interval;
 	u64	     user_interval;
+	u64	     itrace_sample_config;
+	u32	     itrace_sample_type;
+	size_t	     itrace_sample_size;
 	u16	     stack_dump_size;
 	bool	     sample_transaction;
 };
 
+static
+inline bool perf_record_opts_itracing(const struct perf_record_opts *opts)
+{
+	return opts->full_itrace || opts->sample_itrace;
+}
+
 #endif
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 88b7edd..0972b20 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -640,6 +640,13 @@ void perf_evsel__config(struct perf_evsel *evsel,
 	if (opts->sample_weight)
 		perf_evsel__set_sample_bit(evsel, WEIGHT);
 
+	if (opts->sample_itrace && !evsel->no_aux_samples) {
+		perf_evsel__set_sample_bit(evsel, ITRACE);
+		attr->itrace_config = opts->itrace_sample_config;
+		attr->itrace_sample_size = opts->itrace_sample_size;
+		attr->itrace_sample_type = opts->itrace_sample_type;
+	}
+
 	attr->mmap  = track;
 	attr->comm  = track;
 
diff --git a/tools/perf/util/itrace.c b/tools/perf/util/itrace.c
index 91f1fb5..d64dcb1 100644
--- a/tools/perf/util/itrace.c
+++ b/tools/perf/util/itrace.c
@@ -321,6 +321,108 @@ int itrace_queues__add_event(struct itrace_queues *queues,
 	return itrace_queues__add_buffer(queues, event->itrace.idx, buffer);
 }
 
+struct itrace_queue *itrace_queues__sample_queue(struct itrace_queues *queues,
+						 struct perf_sample *sample,
+						 struct perf_session *session)
+{
+	struct perf_sample_id *sid;
+	unsigned int idx;
+	u64 id;
+
+	id = sample->id;
+	if (!id)
+		return NULL;
+
+	sid = perf_evlist__id2sid(session->evlist, id);
+	if (!sid)
+		return NULL;
+
+	idx = sid->idx;
+
+	if (idx >= queues->nr_queues)
+		return NULL;
+
+	return &queues->queue_array[idx];
+}
+
+int itrace_queues__add_sample(struct itrace_queues *queues,
+			      struct perf_sample *sample,
+			      struct perf_session *session,
+			      unsigned int *queue_nr, u64 ref)
+{
+	struct itrace_buffer *buffer;
+	struct itrace_queue *queue;
+	struct perf_sample_id *sid;
+	unsigned int idx;
+	int err;
+	u64 id;
+
+	queues->populated = true;
+
+	id = sample->id;
+	if (!id)
+		return -EINVAL;
+
+	sid = perf_evlist__id2sid(session->evlist, id);
+	if (!sid)
+		return -ENOENT;
+
+	idx = sid->idx;
+
+	if (idx >= queues->nr_queues) {
+		err = itrace_queues__grow(queues, idx);
+		if (err)
+			return err;
+	}
+
+	queue = &queues->queue_array[idx];
+
+	if (!queue->set) {
+		queue->set = true;
+		queue->cpu = sid->cpu;
+		queue->tid = sid->tid;
+	} else if (sid->cpu != queue->cpu || sid->tid != queue->tid) {
+		pr_err("itrace queue conflicts with event (id %"PRIu64"):", id);
+		pr_err(" cpu %d, tid %d vs cpu %d, tid %d\n",
+		       queue->cpu, queue->tid, sid->cpu, sid->tid);
+		return -EINVAL;
+	}
+
+	buffer = zalloc(sizeof(struct itrace_buffer));
+	if (!buffer)
+		return -ENOMEM;
+
+	buffer->cpu = sample->cpu;
+	buffer->pid = sample->pid;
+	buffer->tid = sample->tid;
+	buffer->reference = ref;
+
+	if (perf_data_file__is_pipe(session->file) || !session->one_mmap) {
+		void *data = memdup(sample->itrace_sample.data,
+				    sample->itrace_sample.size);
+
+		if (!data) {
+			free(buffer);
+			return -ENOMEM;
+		}
+		buffer->size = sample->itrace_sample.size;
+		buffer->data = data;
+		buffer->data_needs_freeing = true;
+	} else {
+		buffer->size = sample->itrace_sample.size;
+		buffer->data = sample->itrace_sample.data;
+	}
+
+	list_add_tail(&buffer->list, &queue->head);
+
+	queues->new_data = true;
+
+	if (queue_nr)
+		*queue_nr = idx;
+
+	return 0;
+}
+
 void itrace_queues__free(struct itrace_queues *queues)
 {
 	unsigned int i;
@@ -475,6 +577,21 @@ u64 itrace_record__reference(struct itrace_record *itr)
 	return 0;
 }
 
+int itrace_parse_sample_options(const struct option *opt, const char *str,
+				int unset)
+{
+	struct itrace_record *itr = *(struct itrace_record **)opt->data;
+	struct perf_record_opts *opts = opt->value;
+
+	if (unset)
+		return 0;
+
+	if (itr)
+		return itr->parse_sample_options(itr, opts, str);
+
+	return itrace_not_supported();
+}
+
 struct itrace_record *__attribute__ ((weak)) itrace_record__init(int *err)
 {
 	*err = 0;
diff --git a/tools/perf/util/itrace.h b/tools/perf/util/itrace.h
index ec3b78a..2ebcdec 100644
--- a/tools/perf/util/itrace.h
+++ b/tools/perf/util/itrace.h
@@ -80,9 +80,14 @@ struct itrace {
 			     union perf_event *event,
 			     struct perf_sample *sample,
 			     struct perf_tool *tool);
+	int (*queue_event)(struct perf_session *session,
+			   union perf_event *event,
+			   struct perf_sample *sample);
 	int (*process_itrace_event)(struct perf_session *session,
 				    union perf_event *event,
 				    struct perf_tool *tool);
+	void (*dump_itrace_sample)(struct perf_session *session,
+				   struct perf_sample *sample);
 	int (*flush_events)(struct perf_session *session,
 			    struct perf_tool *tool);
 	void (*free_events)(struct perf_session *session);
@@ -224,6 +229,9 @@ struct itrace_mmap_params {
 };
 
 struct itrace_record {
+	int (*parse_sample_options)(struct itrace_record *itr,
+				    struct perf_record_opts *opts,
+				    const char *str);
 	int (*recording_options)(struct itrace_record *itr,
 				 struct perf_evlist *evlist,
 				 struct perf_record_opts *opts);
@@ -291,6 +299,13 @@ int itrace_queues__add_event(struct itrace_queues *queues,
 			     struct perf_session *session,
 			     union perf_event *event, off_t data_offset,
 			     struct itrace_buffer **buffer_ptr);
+struct itrace_queue *itrace_queues__sample_queue(struct itrace_queues *queues,
+						 struct perf_sample *sample,
+						 struct perf_session *session);
+int itrace_queues__add_sample(struct itrace_queues *queues,
+			      struct perf_sample *sample,
+			      struct perf_session *session,
+			      unsigned int *queue_nr, u64 ref);
 void itrace_queues__free(struct itrace_queues *queues);
 struct itrace_buffer *itrace_buffer__next(struct itrace_queue *queue,
 					  struct itrace_buffer *buffer);
@@ -304,6 +319,8 @@ void itrace_heap__free(struct itrace_heap *heap);
 
 struct itrace_record *itrace_record__init(int *err);
 
+int itrace_parse_sample_options(const struct option *opt, const char *str,
+				int unset);
 int itrace_record__options(struct itrace_record *itr,
 			     struct perf_evlist *evlist,
 			     struct perf_record_opts *opts);
@@ -356,6 +373,25 @@ static inline int itrace__process_event(struct perf_session *session,
 	return session->itrace->process_event(session, event, sample, tool);
 }
 
+static inline int itrace__queue_event(struct perf_session *session,
+				      union perf_event *event,
+				      struct perf_sample *sample)
+{
+	if (!session->itrace)
+		return 0;
+
+	return session->itrace->queue_event(session, event, sample);
+}
+
+static inline void itrace__dump_itrace_sample(struct perf_session *session,
+					      struct perf_sample *sample)
+{
+	if (!session->itrace)
+		return;
+
+	session->itrace->dump_itrace_sample(session, sample);
+}
+
 static inline int itrace__flush_events(struct perf_session *session,
 				       struct perf_tool *tool)
 {
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index 86f980e..52d5bca 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -93,7 +93,7 @@ void perf_evlist__config(struct perf_evlist *evlist,
 	list_for_each_entry(evsel, &evlist->entries, node)
 		perf_evsel__config(evsel, opts);
 
-	if (opts->full_itrace) {
+	if (perf_record_opts_itracing(opts)) {
 		use_sample_identifier = true;
 		list_for_each_entry(evsel, &evlist->entries, node)
 			perf_evsel__set_sample_id(evsel, use_sample_identifier);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 49c89e7..c60238a 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -771,7 +771,7 @@ int perf_session_queue_event(struct perf_session *s, union perf_event *event,
 
 	__queue_event(new, s);
 
-	return 0;
+	return itrace__queue_event(s, event, sample);
 }
 
 static void callchain__printf(struct perf_sample *sample)
@@ -885,6 +885,10 @@ static void dump_event(struct perf_session *session, union perf_event *event,
 
 	trace_event(event);
 
+	/* Instruction trace sample is so big it is better printed here */
+	if (sample && sample->itrace_sample.size)
+		itrace__dump_itrace_sample(session, sample);
+
 	if (sample)
 		perf_session__print_tstamp(session, event, sample);
 
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 57/71] perf record: Add Instruction Trace sampling support
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (55 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 56/71] perf tools: Add Instruction Trace sampling support Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 58/71] perf tools: Add Instruction Tracing Snapshot Mode Alexander Shishkin
                   ` (15 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for Instruction Trace sampling.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-record.txt |  5 +++++
 tools/perf/builtin-record.c              | 15 +++++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index bb01df7..58bd3c1 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -211,6 +211,11 @@ overrides that and uses per-thread mmaps.  A side-effect of that is that
 inheritance is automatically disabled.  --per-thread is ignored with a warning
 if combined with -a or -C options.
 
+-I::
+--itrace::
+Enable Instruction Trace sampling. Each sample captures the specified number of
+bytes (default 4096) of trace. Instruction Tracing config can also be specified.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 344603f..e75a15e 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -401,7 +401,7 @@ static void perf_record__init_features(struct perf_record *rec)
 	if (!rec->opts.branch_stack)
 		perf_header__clear_feat(&session->header, HEADER_BRANCH_STACK);
 
-	if (!rec->opts.full_itrace)
+	if (!perf_record_opts_itracing(&rec->opts))
 		perf_header__clear_feat(&session->header, HEADER_ITRACE);
 }
 
@@ -507,13 +507,21 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 		}
 	}
 
-	if (rec->opts.full_itrace) {
+	if (perf_record_opts_itracing(&rec->opts)) {
 		err = perf_event__synthesize_itrace_info(rec->itr, tool,
 					session, process_synthesized_event);
 		if (err)
 			goto out_delete_session;
 	}
 
+	if (rec->opts.sample_itrace) {
+		err = perf_event__synthesize_id_index(tool,
+						      process_synthesized_event,
+						      session->evlist, machine);
+		if (err)
+			goto out_delete_session;
+	}
+
 	err = perf_event__synthesize_kernel_mmap(tool, process_synthesized_event,
 						 machine, "_text");
 	if (err < 0)
@@ -987,6 +995,9 @@ const struct option record_options[] = {
 		    "sample transaction flags (special events only)"),
 	OPT_BOOLEAN(0, "per-thread", &record.opts.target.per_thread,
 		    "use per-thread mmaps"),
+	OPT_CALLBACK_OPTARG('I', "itrace", &record.opts, &record.itr, "opts",
+			    "sample Instruction Trace",
+			    itrace_parse_sample_options),
 	OPT_END()
 };
 
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 58/71] perf tools: Add Instruction Tracing Snapshot Mode
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (56 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 57/71] perf record: " Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 59/71] perf record: Add Instruction Tracing Snapshot Mode support Alexander Shishkin
                   ` (14 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for making snapshots of
Instruction Tracing data.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/perf.h        |  2 ++
 tools/perf/util/itrace.c | 79 +++++++++++++++++++++++++++++++++++++++++++-----
 tools/perf/util/itrace.h | 20 ++++++++++++
 3 files changed, 93 insertions(+), 8 deletions(-)

diff --git a/tools/perf/perf.h b/tools/perf/perf.h
index c748383..531b258 100644
--- a/tools/perf/perf.h
+++ b/tools/perf/perf.h
@@ -263,6 +263,7 @@ struct perf_record_opts {
 	bool	     period;
 	bool	     full_itrace;
 	bool	     sample_itrace;
+	bool	     itrace_snapshot_mode;
 	unsigned int freq;
 	unsigned int mmap_pages;
 	unsigned int itrace_mmap_pages;
@@ -273,6 +274,7 @@ struct perf_record_opts {
 	u64	     itrace_sample_config;
 	u32	     itrace_sample_type;
 	size_t	     itrace_sample_size;
+	size_t	     itrace_snapshot_size;
 	u16	     stack_dump_size;
 	bool	     sample_transaction;
 };
diff --git a/tools/perf/util/itrace.c b/tools/perf/util/itrace.c
index d64dcb1..da2f175 100644
--- a/tools/perf/util/itrace.c
+++ b/tools/perf/util/itrace.c
@@ -561,6 +561,29 @@ void itrace_record__free(struct itrace_record *itr)
 		itr->free(itr);
 }
 
+int itrace_record__snapshot_start(struct itrace_record *itr)
+{
+	if (itr && itr->snapshot_start)
+		return itr->snapshot_start(itr);
+	return 0;
+}
+
+int itrace_record__snapshot_finish(struct itrace_record *itr)
+{
+	if (itr && itr->snapshot_finish)
+		return itr->snapshot_finish(itr);
+	return 0;
+}
+
+int itrace_record__find_snapshot(struct itrace_record *itr, int idx,
+				 struct itrace_mmap *mm,
+				 unsigned char *data, u64 *head, u64 *old)
+{
+	if (itr && itr->find_snapshot)
+		return itr->find_snapshot(itr, idx, mm, data, head, old);
+	return 0;
+}
+
 int itrace_record__options(struct itrace_record *itr,
 			   struct perf_evlist *evlist,
 			   struct perf_record_opts *opts)
@@ -592,6 +615,21 @@ int itrace_parse_sample_options(const struct option *opt, const char *str,
 	return itrace_not_supported();
 }
 
+int itrace_parse_snapshot_options(const struct option *opt, const char *str,
+				  int unset)
+{
+	struct itrace_record *itr = *(struct itrace_record **)opt->data;
+	struct perf_record_opts *opts = opt->value;
+
+	if (unset)
+		return 0;
+
+	if (itr)
+		return itr->parse_snapshot_options(itr, opts, str);
+
+	return itrace_not_supported();
+}
+
 struct itrace_record *__attribute__ ((weak)) itrace_record__init(int *err)
 {
 	*err = 0;
@@ -898,8 +936,10 @@ int perf_event__count_itrace_error(struct perf_tool *tool __maybe_unused,
 	return 0;
 }
 
-int itrace_mmap__read(struct itrace_mmap *mm, struct itrace_record *itr,
-		      struct perf_tool *tool, process_itrace_t fn)
+static int __itrace_mmap__read(struct itrace_mmap *mm,
+			       struct itrace_record *itr,
+			       struct perf_tool *tool, process_itrace_t fn,
+			       bool snapshot, size_t snapshot_size)
 {
 	u64 head = itrace_mmap__read_head(mm);
 	u64 old = mm->prev, offset, ref;
@@ -908,6 +948,10 @@ int itrace_mmap__read(struct itrace_mmap *mm, struct itrace_record *itr,
 	union perf_event ev;
 	void *data1, *data2;
 
+	if (snapshot && itrace_record__find_snapshot(itr, mm->idx, mm, data,
+						     &head, &old))
+		return -1;
+
 	if (old == head)
 		return 0;
 
@@ -927,6 +971,9 @@ int itrace_mmap__read(struct itrace_mmap *mm, struct itrace_record *itr,
 	else
 		size = mm->len - (old_off - head_off);
 
+	if (snapshot && size > snapshot_size)
+		size = snapshot_size;
+
 	ref = itrace_record__reference(itr);
 
 	if (head > old || size <= head || mm->mask) {
@@ -969,14 +1016,30 @@ int itrace_mmap__read(struct itrace_mmap *mm, struct itrace_record *itr,
 
 	mm->prev = head;
 
-	itrace_mmap__write_tail(mm, head);
-	if (itr->read_finish) {
-		int err;
+	if (!snapshot) {
+		itrace_mmap__write_tail(mm, head);
+		if (itr->read_finish) {
+			int err;
 
-		err = itr->read_finish(itr, mm->idx);
-		if (err < 0)
-			return err;
+			err = itr->read_finish(itr, mm->idx);
+			if (err < 0)
+				return err;
+		}
 	}
 
 	return 1;
 }
+
+int itrace_mmap__read(struct itrace_mmap *mm, struct itrace_record *itr,
+		      struct perf_tool *tool, process_itrace_t fn)
+{
+	return __itrace_mmap__read(mm, itr, tool, fn, false, 0);
+}
+
+int itrace_mmap__read_snapshot(struct itrace_mmap *mm,
+			       struct itrace_record *itr,
+			       struct perf_tool *tool, process_itrace_t fn,
+			       size_t snapshot_size)
+{
+	return __itrace_mmap__read(mm, itr, tool, fn, true, snapshot_size);
+}
diff --git a/tools/perf/util/itrace.h b/tools/perf/util/itrace.h
index 2ebcdec..9ff633c 100644
--- a/tools/perf/util/itrace.h
+++ b/tools/perf/util/itrace.h
@@ -241,6 +241,14 @@ struct itrace_record {
 			 struct itrace_info_event *itrace_info,
 			 size_t priv_size);
 	void (*free)(struct itrace_record *itr);
+	int (*snapshot_start)(struct itrace_record *itr);
+	int (*snapshot_finish)(struct itrace_record *itr);
+	int (*find_snapshot)(struct itrace_record *itr, int idx,
+			     struct itrace_mmap *mm, unsigned char *data,
+			     u64 *head, u64 *old);
+	int (*parse_snapshot_options)(struct itrace_record *itr,
+				      struct perf_record_opts *opts,
+				      const char *str);
 	u64 (*reference)(struct itrace_record *itr);
 	int (*read_finish)(struct itrace_record *itr, int idx);
 };
@@ -294,6 +302,11 @@ int itrace_mmap__read(struct itrace_mmap *mm,
 			    struct itrace_record *itr, struct perf_tool *tool,
 			    process_itrace_t fn);
 
+int itrace_mmap__read_snapshot(struct itrace_mmap *mm,
+			       struct itrace_record *itr,
+			       struct perf_tool *tool, process_itrace_t fn,
+			       size_t snapshot_size);
+
 int itrace_queues__init(struct itrace_queues *queues);
 int itrace_queues__add_event(struct itrace_queues *queues,
 			     struct perf_session *session,
@@ -321,6 +334,8 @@ struct itrace_record *itrace_record__init(int *err);
 
 int itrace_parse_sample_options(const struct option *opt, const char *str,
 				int unset);
+int itrace_parse_snapshot_options(const struct option *opt, const char *str,
+				  int unset);
 int itrace_record__options(struct itrace_record *itr,
 			     struct perf_evlist *evlist,
 			     struct perf_record_opts *opts);
@@ -330,6 +345,11 @@ int itrace_record__info_fill(struct itrace_record *itr,
 			     struct itrace_info_event *itrace_info,
 			     size_t priv_size);
 void itrace_record__free(struct itrace_record *itr);
+int itrace_record__snapshot_start(struct itrace_record *itr);
+int itrace_record__snapshot_finish(struct itrace_record *itr);
+int itrace_record__find_snapshot(struct itrace_record *itr, int idx,
+				 struct itrace_mmap *mm,
+				 unsigned char *data, u64 *head, u64 *old);
 u64 itrace_record__reference(struct itrace_record *itr);
 
 void itrace_synth_error(struct itrace_error_event *itrace_error, int type,
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 59/71] perf record: Add Instruction Tracing Snapshot Mode support
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (57 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 58/71] perf tools: Add Instruction Tracing Snapshot Mode Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 60/71] perf inject: Re-pipe Instruction Tracing events Alexander Shishkin
                   ` (13 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add a new option and support for Instruction
Tracing Snapshot Mode.  When the new option is
selected, no Instruction Tracing data is
captured until a signal (SIGUSR2) is received.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-record.txt |  7 +++
 tools/perf/builtin-record.c              | 90 +++++++++++++++++++++++++++++++-
 2 files changed, 95 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 58bd3c1..279c808 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -216,6 +216,13 @@ if combined with -a or -C options.
 Enable Instruction Trace sampling. Each sample captures the specified number of
 bytes (default 4096) of trace. Instruction Tracing config can also be specified.
 
+-S::
+--snapshot::
+Select Instruction Tracing Snapshot Mode. This option is valid only with an
+Instruction Tracing event. Optionally the number of bytes to capture per
+snapshot can be specified. In Snapshot Mode, trace data is captured only when
+signal SIGUSR2 is received.
+
 SEE ALSO
 --------
 linkperf:perf-stat[1], linkperf:perf-list[1]
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index e75a15e..46c451c 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -188,9 +188,29 @@ static int perf_record__itrace_mmap_read(struct perf_record *rec,
 	return 0;
 }
 
+static int perf_record__itrace_mmap_read_snapshot(struct perf_record *rec,
+						  struct itrace_mmap *mm)
+{
+	int ret;
+
+	ret = itrace_mmap__read_snapshot(mm, rec->itr, &rec->tool,
+					 perf_record__process_itrace,
+					 rec->opts.itrace_snapshot_size);
+	if (ret < 0)
+		return ret;
+
+	if (ret)
+		rec->samples++;
+
+	return 0;
+}
+
 static volatile int done = 0;
 static volatile int signr = -1;
 static volatile int child_finished = 0;
+static volatile int itrace_snapshot_enabled;
+static volatile int itrace_snapshot_err;
+static volatile int itrace_record__snapshot_started;
 
 static void sig_handler(int sig)
 {
@@ -258,7 +278,7 @@ try_again:
 
 	if (perf_evlist__mmap_ex(evlist, opts->mmap_pages, false,
 				 opts->itrace_mmap_pages,
-				 false) < 0) {
+				 opts->itrace_snapshot_mode) < 0) {
 		if (errno == EPERM) {
 			pr_err("Permission error mapping pages.\n"
 			       "Consider increasing "
@@ -368,7 +388,7 @@ static int perf_record__mmap_read_all(struct perf_record *rec)
 			}
 		}
 
-		if (mm->base &&
+		if (mm->base && !rec->opts.itrace_snapshot_mode &&
 		    perf_record__itrace_mmap_read(rec, mm) != 0) {
 			rc = -1;
 			goto out;
@@ -405,6 +425,41 @@ static void perf_record__init_features(struct perf_record *rec)
 		perf_header__clear_feat(&session->header, HEADER_ITRACE);
 }
 
+static int perf_record__itrace_read_snapshot_all(struct perf_record *rec)
+{
+	int i;
+	int rc = 0;
+
+	for (i = 0; i < rec->evlist->nr_mmaps; i++) {
+		struct itrace_mmap *mm =
+				&rec->evlist->mmap[i].itrace_mmap;
+
+		if (!mm->base)
+			continue;
+
+		if (perf_record__itrace_mmap_read_snapshot(rec, mm) != 0) {
+			rc = -1;
+			goto out;
+		}
+	}
+out:
+	return rc;
+}
+
+static void perf_record__read_itrace_snapshot(struct perf_record *rec)
+{
+	pr_debug("Recording instruction tracing snapshot\n");
+	if (perf_record__itrace_read_snapshot_all(rec) < 0) {
+		itrace_snapshot_err = -1;
+	} else {
+		itrace_snapshot_err = itrace_record__snapshot_finish(rec->itr);
+		if (!itrace_snapshot_err)
+			itrace_snapshot_enabled = 1;
+	}
+}
+
+static void snapshot_sig_handler(int sig);
+
 static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 {
 	int err;
@@ -425,6 +480,10 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 	signal(SIGINT, sig_handler);
 	signal(SIGUSR1, sig_handler);
 	signal(SIGTERM, sig_handler);
+	if (rec->opts.itrace_snapshot_mode)
+		signal(SIGUSR2, snapshot_sig_handler);
+	else
+		signal(SIGUSR2, SIG_IGN);
 
 	session = perf_session__new(file, false, NULL);
 	if (session == NULL) {
@@ -574,14 +633,27 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 	if (forks)
 		perf_evlist__start_workload(evsel_list);
 
+	itrace_snapshot_enabled = 1;
 	for (;;) {
 		int hits = rec->samples;
 
 		if (perf_record__mmap_read_all(rec) < 0) {
+			itrace_snapshot_enabled = 0;
 			err = -1;
 			goto out_delete_session;
 		}
 
+		if (itrace_record__snapshot_started) {
+			itrace_record__snapshot_started = 0;
+			if (!itrace_snapshot_err)
+				perf_record__read_itrace_snapshot(rec);
+			if (itrace_snapshot_err) {
+				pr_err("Instruction tracing snapshot failed\n");
+				err = -1;
+				goto out_delete_session;
+			}
+		}
+
 		if (hits == rec->samples) {
 			if (done)
 				break;
@@ -595,10 +667,12 @@ static int __cmd_record(struct perf_record *rec, int argc, const char **argv)
 		 * disable events in this case.
 		 */
 		if (done && !disabled && !target__none(&opts->target)) {
+			itrace_snapshot_enabled = 0;
 			perf_evlist__disable(evsel_list);
 			disabled = true;
 		}
 	}
+	itrace_snapshot_enabled = 0;
 
 	if (quiet || signr == SIGUSR1)
 		return 0;
@@ -998,6 +1072,9 @@ const struct option record_options[] = {
 	OPT_CALLBACK_OPTARG('I', "itrace", &record.opts, &record.itr, "opts",
 			    "sample Instruction Trace",
 			    itrace_parse_sample_options),
+	OPT_CALLBACK_OPTARG('S', "snapshot", &record.opts, &record.itr, "opts",
+			    "Instruction Tracing Snapshot Mode",
+			    itrace_parse_snapshot_options),
 	OPT_END()
 };
 
@@ -1096,3 +1173,12 @@ out_itrace_free:
 	itrace_record__free(rec->itr);
 	return err;
 }
+
+static void snapshot_sig_handler(int sig __maybe_unused)
+{
+	if (!itrace_snapshot_enabled)
+		return;
+	itrace_snapshot_enabled = 0;
+	itrace_snapshot_err = itrace_record__snapshot_start(record.itr);
+	itrace_record__snapshot_started = 1;
+}
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 60/71] perf inject: Re-pipe Instruction Tracing events
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (58 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 59/71] perf record: Add Instruction Tracing Snapshot Mode support Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 61/71] perf inject: Add Instruction Tracing support Alexander Shishkin
                   ` (12 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

New Instruction Tracing events must be re-piped by default.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-inject.c | 68 +++++++++++++++++++++++++++++++++++++++------
 1 file changed, 59 insertions(+), 9 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 78911a3..8c47982 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -38,20 +38,14 @@ struct event_entry {
 	union perf_event event[0];
 };
 
-static int perf_event__repipe_synth(struct perf_tool *tool,
-				    union perf_event *event)
+static int output_bytes(struct perf_inject *inject, void *buf, size_t size)
 {
-	struct perf_inject *inject = container_of(tool, struct perf_inject, tool);
-	uint32_t size;
-	void *buf = event;
-
-	size = event->header.size;
+	ssize_t ret;
 
 	while (size) {
-		int ret = write(inject->output, buf, size);
+		ret = write(inject->output, buf, size);
 		if (ret < 0)
 			return -errno;
-
 		size -= ret;
 		buf += ret;
 		inject->bytes_written += ret;
@@ -60,6 +54,34 @@ static int perf_event__repipe_synth(struct perf_tool *tool,
 	return 0;
 }
 
+static int copy_bytes(struct perf_inject *inject, int fd, size_t size)
+{
+	char buf[4096];
+	ssize_t ssz;
+	int ret;
+
+	while (size) {
+		ssz = read(fd, buf, min(size, sizeof(buf)));
+		if (ssz < 0)
+			return -errno;
+		ret = output_bytes(inject, buf, ssz);
+		if (ret)
+			return ret;
+		size -= ssz;
+	}
+
+	return 0;
+}
+
+static int perf_event__repipe_synth(struct perf_tool *tool,
+				    union perf_event *event)
+{
+	struct perf_inject *inject = container_of(tool, struct perf_inject,
+						  tool);
+
+	return output_bytes(inject, event, event->header.size);
+}
+
 static int perf_event__repipe_op2_synth(struct perf_tool *tool,
 					union perf_event *event,
 					struct perf_session *session
@@ -86,6 +108,31 @@ static int perf_event__repipe_attr(struct perf_tool *tool,
 	return perf_event__repipe_synth(tool, event);
 }
 
+static s64 perf_event__repipe_itrace(struct perf_tool *tool,
+				     union perf_event *event,
+				     struct perf_session *session
+				     __maybe_unused)
+{
+	struct perf_inject *inject = container_of(tool, struct perf_inject,
+						  tool);
+	int ret;
+
+	if (perf_data_file__is_pipe(session->file) || !session->one_mmap) {
+		ret = output_bytes(inject, event, event->header.size);
+		if (ret < 0)
+			return ret;
+		ret = copy_bytes(inject, perf_data_file__fd(session->file),
+				 event->itrace.size);
+	} else {
+		ret = output_bytes(inject, event,
+				   event->header.size + event->itrace.size);
+	}
+	if (ret < 0)
+		return ret;
+
+	return event->itrace.size;
+}
+
 static int perf_event__repipe(struct perf_tool *tool,
 			      union perf_event *event,
 			      struct perf_sample *sample __maybe_unused,
@@ -423,6 +470,9 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 			.unthrottle	= perf_event__repipe,
 			.attr		= perf_event__repipe_attr,
 			.tracing_data	= perf_event__repipe_op2_synth,
+			.itrace_info	= perf_event__repipe_op2_synth,
+			.itrace		= perf_event__repipe_itrace,
+			.itrace_error	= perf_event__repipe_op2_synth,
 			.finished_round	= perf_event__repipe_op2_synth,
 			.build_id	= perf_event__repipe_op2_synth,
 			.id_index	= perf_event__repipe_op2_synth,
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 61/71] perf inject: Add Instruction Tracing support
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (59 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 60/71] perf inject: Re-pipe Instruction Tracing events Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 62/71] perf inject: Cut Instruction Tracing samples Alexander Shishkin
                   ` (11 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for decoding an Instruction Trace.  The
Instruction Tracing events are stripped and replaced
by synthesized events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/perf-inject.txt | 20 +++++++++
 tools/perf/builtin-inject.c              | 71 +++++++++++++++++++++++++++++++-
 2 files changed, 89 insertions(+), 2 deletions(-)

diff --git a/tools/perf/Documentation/perf-inject.txt b/tools/perf/Documentation/perf-inject.txt
index a00a342..c64cfea 100644
--- a/tools/perf/Documentation/perf-inject.txt
+++ b/tools/perf/Documentation/perf-inject.txt
@@ -41,6 +41,26 @@ OPTIONS
 	tasks slept. sched_switch contains a callchain where a task slept and
 	sched_stat contains a timeslice how long a task slept.
 
+-Z::
+--itrace::
+	Decode Instruction Tracing data, replacing it with synthesized events.
+	Options are:
+
+		i	synthesize instructions events
+		b	synthesize branches events
+		e	synthesize error events
+
+	The default is all events i.e. the same as -Zibe
+
+	In addition, the period (default 1000) for instructions events can be
+	specified in units of:
+
+		i	instructions (default)
+		t	ticks
+		ms	milliseconds
+		us	microseconds
+		ns	nanoseconds
+
 SEE ALSO
 --------
 linkperf:perf-record[1], linkperf:perf-report[1], linkperf:perf-archive[1]
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index 8c47982..feeeb56 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -16,6 +16,7 @@
 #include "util/debug.h"
 #include "util/build-id.h"
 #include "util/data.h"
+#include "util/itrace.h"
 
 #include "util/parse-options.h"
 
@@ -30,6 +31,7 @@ struct perf_inject {
 			 output;
 	u64		 bytes_written;
 	struct list_head samples;
+	struct itrace_synth_opts itrace_synth_opts;
 };
 
 struct event_entry {
@@ -202,6 +204,32 @@ static int perf_event__repipe_fork(struct perf_tool *tool,
 	return err;
 }
 
+static int perf_event__repipe_comm(struct perf_tool *tool,
+				   union perf_event *event,
+				   struct perf_sample *sample,
+				   struct machine *machine)
+{
+	int err;
+
+	err = perf_event__process_comm(tool, event, sample, machine);
+	perf_event__repipe(tool, event, sample, machine);
+
+	return err;
+}
+
+static int perf_event__repipe_exit(struct perf_tool *tool,
+				   union perf_event *event,
+				   struct perf_sample *sample,
+				   struct machine *machine)
+{
+	int err;
+
+	err = perf_event__process_exit(tool, event, sample, machine);
+	perf_event__repipe(tool, event, sample, machine);
+
+	return err;
+}
+
 static int perf_event__repipe_tracing_data(struct perf_tool *tool,
 					   union perf_event *event,
 					   struct perf_session *session)
@@ -214,6 +242,18 @@ static int perf_event__repipe_tracing_data(struct perf_tool *tool,
 	return err;
 }
 
+static int perf_event__repipe_id_index(struct perf_tool *tool,
+				       union perf_event *event,
+				       struct perf_session *session)
+{
+	int err;
+
+	perf_event__repipe_synth(tool, event);
+	err = perf_event__process_id_index(tool, event, session);
+
+	return err;
+}
+
 static int dso__read_build_id(struct dso *dso)
 {
 	if (dso->has_build_id)
@@ -402,10 +442,12 @@ static int __cmd_inject(struct perf_inject *inject)
 		.path = inject->input_name,
 		.mode = PERF_DATA_MODE_READ,
 	};
+	u64 output_data_offset;
 
 	signal(SIGINT, sig_handler);
 
-	if (inject->build_ids || inject->sched_stat) {
+	if (inject->build_ids || inject->sched_stat ||
+	    inject->itrace_synth_opts.set) {
 		inject->tool.mmap	  = perf_event__repipe_mmap;
 		inject->tool.mmap2	  = perf_event__repipe_mmap2;
 		inject->tool.fork	  = perf_event__repipe_fork;
@@ -416,6 +458,8 @@ static int __cmd_inject(struct perf_inject *inject)
 	if (session == NULL)
 		return -ENOMEM;
 
+	output_data_offset = session->header.data_offset;
+
 	if (inject->build_ids) {
 		inject->tool.sample = perf_event__inject_buildid;
 	} else if (inject->sched_stat) {
@@ -436,14 +480,34 @@ static int __cmd_inject(struct perf_inject *inject)
 			else if (!strncmp(name, "sched:sched_stat_", 17))
 				evsel->handler = perf_inject__sched_stat;
 		}
+	} else if (inject->itrace_synth_opts.set) {
+		session->itrace_synth_opts = &inject->itrace_synth_opts;
+		inject->itrace_synth_opts.inject = true;
+		inject->tool.comm	    = perf_event__repipe_comm;
+		inject->tool.exit	    = perf_event__repipe_exit;
+		inject->tool.id_index	    = perf_event__repipe_id_index;
+		inject->tool.itrace_info    = perf_event__process_itrace_info;
+		inject->tool.itrace	    = perf_event__process_itrace;
+		inject->tool.ordered_samples = true;
+		inject->tool.ordering_requires_timestamps = true;
+		/* Allow space in the header for new attributes */
+		output_data_offset = 4096;
 	}
 
 	if (!inject->pipe_output)
-		lseek(inject->output, session->header.data_offset, SEEK_SET);
+		lseek(inject->output, output_data_offset, SEEK_SET);
 
 	ret = perf_session__process_events(session, &inject->tool);
 
 	if (!inject->pipe_output) {
+		/*
+		 * The instruction traces have been removed and replaced with
+		 * synthesized hardware events, so clear the feature flag.
+		 */
+		if (inject->itrace_synth_opts.set)
+			perf_header__clear_feat(&session->header,
+						HEADER_ITRACE);
+		session->header.data_offset = output_data_offset;
 		session->header.data_size = inject->bytes_written;
 		perf_session__write_header(session, session->evlist, inject->output, true);
 	}
@@ -493,6 +557,9 @@ int cmd_inject(int argc, const char **argv, const char *prefix __maybe_unused)
 			    "where and how long tasks slept"),
 		OPT_INCR('v', "verbose", &verbose,
 			 "be more verbose (show build ids, etc)"),
+		OPT_CALLBACK_OPTARG('Z', "itrace", &inject.itrace_synth_opts,
+				    NULL, "opts", "Instruction Tracing options",
+				    itrace_parse_synth_opts),
 		OPT_END()
 	};
 	const char * const inject_usage[] = {
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 62/71] perf inject: Cut Instruction Tracing samples
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (60 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 61/71] perf inject: Add Instruction Tracing support Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 63/71] perf tools: Add Instruction Tracing index Alexander Shishkin
                   ` (10 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

After decoding Instruction Tracing samples, the
Instruction Tracing data is no longer needed
(having been replaced by synthesized events)
so cut it out.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-inject.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index feeeb56..ed2b48f 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -32,6 +32,7 @@ struct perf_inject {
 	u64		 bytes_written;
 	struct list_head samples;
 	struct itrace_synth_opts itrace_synth_opts;
+	char		 event_copy[PERF_SAMPLE_MAX_SIZE];
 };
 
 struct event_entry {
@@ -143,6 +144,28 @@ static int perf_event__repipe(struct perf_tool *tool,
 	return perf_event__repipe_synth(tool, event);
 }
 
+static union perf_event *
+perf_inject__cut_itrace_sample(struct perf_inject *inject,
+			       union perf_event *event,
+			       struct perf_sample *sample)
+{
+	size_t sz1 = sample->itrace_sample.data - (void *)event;
+	size_t sz2 = event->header.size - sample->itrace_sample.size - sz1;
+	union perf_event *ev = (union perf_event *)inject->event_copy;
+
+	if (sz1 > event->header.size || sz2 > event->header.size ||
+	    sz1 + sz2 > event->header.size ||
+	    sz1 < sizeof(struct perf_event_header) + sizeof(u64))
+		return event;
+
+	memcpy(ev, event, sz1);
+	memcpy((void *)ev + sz1, (void *)event + event->header.size - sz2, sz2);
+	ev->header.size = sz1 + sz2;
+	((u64 *)((void *)ev + sz1))[-1] = 0;
+
+	return ev;
+}
+
 typedef int (*inject_handler)(struct perf_tool *tool,
 			      union perf_event *event,
 			      struct perf_sample *sample,
@@ -155,6 +178,9 @@ static int perf_event__repipe_sample(struct perf_tool *tool,
 				     struct perf_evsel *evsel,
 				     struct machine *machine)
 {
+	struct perf_inject *inject = container_of(tool, struct perf_inject,
+						  tool);
+
 	if (evsel->handler) {
 		inject_handler f = evsel->handler;
 		return f(tool, event, sample, evsel, machine);
@@ -162,6 +188,9 @@ static int perf_event__repipe_sample(struct perf_tool *tool,
 
 	build_id__mark_dso_hit(tool, event, sample, evsel, machine);
 
+	if (inject->itrace_synth_opts.set && sample->itrace_sample.size)
+		event = perf_inject__cut_itrace_sample(inject, event, sample);
+
 	return perf_event__repipe_synth(tool, event);
 }
 
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 63/71] perf tools: Add Instruction Tracing index
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (61 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 62/71] perf inject: Cut Instruction Tracing samples Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 64/71] perf tools: Hit all build ids when Instruction Tracing Alexander Shishkin
                   ` (9 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add an index of Instruction Tracing events within
a perf.data file.

Instruction Tracing events contain data that can
span back to the very beginning of the recording
period.  Consequently decoding cannot begin until
all events are sorted.  By adding an index,
Instruction Tracing events can be found in advance
and decoding can begin earlier.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-inject.c |  15 +++
 tools/perf/builtin-record.c |  18 +++-
 tools/perf/util/header.c    |  23 ++++-
 tools/perf/util/itrace.c    | 224 ++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/itrace.h    |  35 +++++++
 tools/perf/util/session.c   |   2 +
 tools/perf/util/session.h   |   1 +
 7 files changed, 313 insertions(+), 5 deletions(-)

diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index ed2b48f..ce1f298 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -120,6 +120,18 @@ static s64 perf_event__repipe_itrace(struct perf_tool *tool,
 						  tool);
 	int ret;
 
+	if (!inject->pipe_output) {
+		off_t offset;
+
+		offset = lseek(inject->output, 0, SEEK_CUR);
+		if (offset == -1)
+			return -errno;
+		ret = itrace_index__itrace_event(&session->itrace_index, event,
+						 offset);
+		if (ret < 0)
+			return ret;
+	}
+
 	if (perf_data_file__is_pipe(session->file) || !session->one_mmap) {
 		ret = output_bytes(inject, event, event->header.size);
 		if (ret < 0)
@@ -523,6 +535,9 @@ static int __cmd_inject(struct perf_inject *inject)
 		output_data_offset = 4096;
 	}
 
+	if (!inject->itrace_synth_opts.set)
+		itrace_index__free(&session->itrace_index);
+
 	if (!inject->pipe_output)
 		lseek(inject->output, output_data_offset, SEEK_SET);
 
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index 46c451c..b35963f 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -157,9 +157,24 @@ static int perf_record__process_itrace(struct perf_tool *tool,
 				       size_t len1, void *data2, size_t len2)
 {
 	struct perf_record *rec = container_of(tool, struct perf_record, tool);
+	struct perf_data_file *file = &rec->file;
 	size_t padding;
 	u8 pad[8] = {0};
 
+	if (!perf_data_file__is_pipe(file)) {
+		off_t file_offset;
+		int fd = perf_data_file__fd(file);
+		int err;
+
+		file_offset = lseek(fd, 0, SEEK_CUR);
+		if (file_offset == -1)
+			return -1;
+		err = itrace_index__itrace_event(&rec->session->itrace_index,
+						 event, file_offset);
+		if (err)
+			return err;
+	}
+
 	padding = (len1 + len2) & 7;
 	if (padding)
 		padding = 8 - padding;
@@ -395,7 +410,8 @@ static int perf_record__mmap_read_all(struct perf_record *rec)
 		}
 	}
 
-	if (perf_header__has_feat(&rec->session->header, HEADER_TRACING_DATA))
+	if (perf_header__has_feat(&rec->session->header, HEADER_TRACING_DATA) ||
+	    perf_header__has_feat(&rec->session->header, HEADER_ITRACE))
 		rc = perf_record__write(rec, &finished_round_event,
 					sizeof(finished_round_event));
 
diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 72bcca9..dd1a1f9 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -1188,11 +1188,14 @@ static int write_branch_stack(int fd __maybe_unused,
 	return 0;
 }
 
-static int write_itrace(int fd __maybe_unused,
-			struct perf_header *h __maybe_unused,
+static int write_itrace(int fd, struct perf_header *h,
 			struct perf_evlist *evlist __maybe_unused)
 {
-	return 0;
+	struct perf_session *session;
+
+	session = container_of(h, struct perf_session, header);
+
+	return itrace_index__write(fd, &session->itrace_index);
 }
 
 static void print_hostname(struct perf_header *ph, int fd __maybe_unused,
@@ -2168,6 +2171,18 @@ out_free:
 	return ret;
 }
 
+static int process_itrace(struct perf_file_section *section,
+			  struct perf_header *ph, int fd,
+			  void *data __maybe_unused)
+{
+	struct perf_session *session;
+
+	session = container_of(ph, struct perf_session, header);
+
+	return itrace_index__process(fd, section->size, session,
+				     ph->needs_swap);
+}
+
 struct feature_ops {
 	int (*write)(int fd, struct perf_header *h, struct perf_evlist *evlist);
 	void (*print)(struct perf_header *h, int fd, FILE *fp);
@@ -2208,7 +2223,7 @@ static const struct feature_ops feat_ops[HEADER_LAST_FEATURE] = {
 	FEAT_OPA(HEADER_BRANCH_STACK,	branch_stack),
 	FEAT_OPP(HEADER_PMU_MAPPINGS,	pmu_mappings),
 	FEAT_OPP(HEADER_GROUP_DESC,	group_desc),
-	FEAT_OPA(HEADER_ITRACE,		itrace),
+	FEAT_OPP(HEADER_ITRACE,		itrace),
 };
 
 struct header_print_data {
diff --git a/tools/perf/util/itrace.c b/tools/perf/util/itrace.c
index da2f175..28455f8 100644
--- a/tools/perf/util/itrace.c
+++ b/tools/perf/util/itrace.c
@@ -321,6 +321,40 @@ int itrace_queues__add_event(struct itrace_queues *queues,
 	return itrace_queues__add_buffer(queues, event->itrace.idx, buffer);
 }
 
+static int itrace_queues__add_indexed_event(struct itrace_queues *queues,
+					    struct perf_session *session,
+					    off_t file_offset, size_t sz)
+{
+	union perf_event *event;
+	struct itrace_event buf;
+
+	if (session->one_mmap && !session->header.needs_swap) {
+		event = file_offset - session->one_mmap_offset +
+			session->one_mmap_addr;
+	} else {
+		int fd = perf_data_file__fd(session->file);
+
+		if (sz > sizeof(struct itrace_event))
+			sz = sizeof(struct itrace_event);
+		else if (sz < sizeof(struct itrace_event))
+			memset(&buf, 0, sizeof(struct itrace_event));
+
+		if (lseek(fd, file_offset, SEEK_SET) == (off_t)-1 ||
+		    readn(fd, &buf, sz) != (ssize_t)sz)
+			return -EINVAL;
+
+		event = (union perf_event *)&buf;
+
+		if (session->header.needs_swap) {
+			perf_event_header__bswap(&event->header);
+			perf_event__itrace_swap(event, true);
+		}
+	}
+
+	return itrace_queues__add_event(queues, session, event,
+					file_offset + event->header.size, NULL);
+}
+
 struct itrace_queue *itrace_queues__sample_queue(struct itrace_queues *queues,
 						 struct perf_sample *sample,
 						 struct perf_session *session)
@@ -636,6 +670,196 @@ struct itrace_record *__attribute__ ((weak)) itrace_record__init(int *err)
 	return NULL;
 }
 
+static int itrace_index__alloc(struct list_head *head)
+{
+	struct itrace_index *itrace_index;
+
+	itrace_index = malloc(sizeof(struct itrace_index));
+	if (!itrace_index)
+		return -ENOMEM;
+
+	itrace_index->nr = 0;
+	INIT_LIST_HEAD(&itrace_index->list);
+
+	list_add_tail(&itrace_index->list, head);
+
+	return 0;
+}
+
+void itrace_index__free(struct list_head *head)
+{
+	struct itrace_index *itrace_index, *n;
+
+	list_for_each_entry_safe(itrace_index, n, head, list) {
+		list_del(&itrace_index->list);
+		free(itrace_index);
+	}
+}
+
+static struct itrace_index *itrace_index__last(struct list_head *head)
+{
+	struct itrace_index *itrace_index;
+	int err;
+
+	if (list_empty(head)) {
+		err = itrace_index__alloc(head);
+		if (err)
+			return NULL;
+	}
+
+	itrace_index = list_entry(head->prev, struct itrace_index, list);
+
+	if (itrace_index->nr >= PERF_ITRACE_INDEX_ENTRY_COUNT) {
+		err = itrace_index__alloc(head);
+		if (err)
+			return NULL;
+		itrace_index = list_entry(head->prev, struct itrace_index,
+					  list);
+	}
+
+	return itrace_index;
+}
+
+int itrace_index__itrace_event(struct list_head *head, union perf_event *event,
+			       off_t file_offset)
+{
+	struct itrace_index *itrace_index;
+	size_t nr;
+
+	itrace_index = itrace_index__last(head);
+	if (!itrace_index)
+		return -ENOMEM;
+
+	nr = itrace_index->nr;
+	itrace_index->entries[nr].file_offset = file_offset;
+	itrace_index->entries[nr].sz = event->header.size;
+	itrace_index->nr += 1;
+
+	return 0;
+}
+
+static int itrace_index__do_write(int fd, struct itrace_index *itrace_index)
+{
+	struct itrace_index_entry index;
+	size_t i;
+	int err;
+
+	for (i = 0; i < itrace_index->nr; i++) {
+		index.file_offset = itrace_index->entries[i].file_offset;
+		index.sz = itrace_index->entries[i].sz;
+		err = writen(fd, &index, sizeof(index));
+		if (err)
+			return err;
+	}
+	return 0;
+}
+
+int itrace_index__write(int fd, struct list_head *head)
+{
+	struct itrace_index *itrace_index;
+	u64 total = 0;
+	int err;
+
+	list_for_each_entry(itrace_index, head, list)
+		total += itrace_index->nr;
+
+	err = writen(fd, &total, sizeof(total));
+	if (err)
+		return err;
+
+	list_for_each_entry(itrace_index, head, list) {
+		err = itrace_index__do_write(fd, itrace_index);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static int itrace_index__process_entry(int fd, struct list_head *head,
+				       bool needs_swap)
+{
+	struct itrace_index *itrace_index;
+	struct itrace_index_entry index;
+	size_t nr;
+
+	if (readn(fd, &index, sizeof(index)) != sizeof(index))
+		return -1;
+
+	itrace_index = itrace_index__last(head);
+	if (!itrace_index)
+		return -1;
+
+	nr = itrace_index->nr;
+	if (needs_swap) {
+		itrace_index->entries[nr].file_offset =
+						bswap_64(index.file_offset);
+		itrace_index->entries[nr].sz = bswap_64(index.sz);
+	} else {
+		itrace_index->entries[nr].file_offset = index.file_offset;
+		itrace_index->entries[nr].sz = index.sz;
+	}
+
+	itrace_index->nr = nr + 1;
+
+	return 0;
+}
+
+int itrace_index__process(int fd, u64 size, struct perf_session *session,
+			  bool needs_swap)
+{
+	struct list_head *head = &session->itrace_index;
+	u64 nr;
+
+	if (readn(fd, &nr, sizeof(u64)) != sizeof(u64))
+		return -1;
+
+	if (needs_swap)
+		nr = bswap_64(nr);
+
+	if (sizeof(u64) + nr * sizeof(struct itrace_index_entry) != size)
+		return -1;
+
+	while (nr--) {
+		int err;
+
+		err = itrace_index__process_entry(fd, head, needs_swap);
+		if (err)
+			return -1;
+	}
+
+	return 0;
+}
+
+static int itrace_queues__process_index_entry(struct itrace_queues *queues,
+					      struct perf_session *session,
+					      struct itrace_index_entry *index)
+{
+	return itrace_queues__add_indexed_event(queues, session,
+						index->file_offset, index->sz);
+}
+
+int itrace_queues__process_index(struct itrace_queues *queues,
+				 struct perf_session *session)
+{
+	struct itrace_index *itrace_index;
+	struct itrace_index_entry *index;
+	size_t i;
+	int err;
+
+	list_for_each_entry(itrace_index, &session->itrace_index, list) {
+		for (i = 0; i < itrace_index->nr; i++) {
+			index = &itrace_index->entries[i];
+			err = itrace_queues__process_index_entry(queues,
+								 session,
+								 index);
+			if (err)
+				return err;
+		}
+	}
+	return 0;
+}
+
 struct itrace_buffer *itrace_buffer__next(struct itrace_queue *queue,
 					  struct itrace_buffer *buffer)
 {
diff --git a/tools/perf/util/itrace.h b/tools/perf/util/itrace.h
index 9ff633c..1005715 100644
--- a/tools/perf/util/itrace.h
+++ b/tools/perf/util/itrace.h
@@ -75,6 +75,32 @@ struct itrace_synth_opts {
 	enum itrace_period_type	period_type;
 };
 
+/**
+ * struct itrace_index_entry - indexes a Instruction Tracing event within a
+ *                             perf.data file.
+ * @file_offset: offset within the perf.data file
+ * @sz: size of the event
+ */
+struct itrace_index_entry {
+	u64			file_offset;
+	u64			sz;
+};
+
+#define PERF_ITRACE_INDEX_ENTRY_COUNT 256
+
+/**
+ * struct itrace_index - index of Instruction Tracing events within a perf.data
+ *                       file.
+ * @list: linking a number of arrays of entries
+ * @nr: number of entries
+ * @entries: array of entries
+ */
+struct itrace_index {
+	struct list_head	list;
+	size_t			nr;
+	struct itrace_index_entry entries[PERF_ITRACE_INDEX_ENTRY_COUNT];
+};
+
 struct itrace {
 	int (*process_event)(struct perf_session *session,
 			     union perf_event *event,
@@ -320,6 +346,8 @@ int itrace_queues__add_sample(struct itrace_queues *queues,
 			      struct perf_session *session,
 			      unsigned int *queue_nr, u64 ref);
 void itrace_queues__free(struct itrace_queues *queues);
+int itrace_queues__process_index(struct itrace_queues *queues,
+				 struct perf_session *session);
 struct itrace_buffer *itrace_buffer__next(struct itrace_queue *queue,
 					  struct itrace_buffer *buffer);
 void *itrace_buffer__get_data(struct itrace_buffer *buffer, int fd);
@@ -352,6 +380,13 @@ int itrace_record__find_snapshot(struct itrace_record *itr, int idx,
 				 unsigned char *data, u64 *head, u64 *old);
 u64 itrace_record__reference(struct itrace_record *itr);
 
+int itrace_index__itrace_event(struct list_head *head, union perf_event *event,
+			       off_t file_offset);
+int itrace_index__write(int fd, struct list_head *head);
+int itrace_index__process(int fd, u64 size, struct perf_session *session,
+			  bool needs_swap);
+void itrace_index__free(struct list_head *head);
+
 void itrace_synth_error(struct itrace_error_event *itrace_error, int type,
 			int code, int cpu, pid_t pid, pid_t tid, u64 ip,
 			const char *msg);
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index c60238a..8d5f457 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -80,6 +80,7 @@ struct perf_session *perf_session__new(struct perf_data_file *file,
 	INIT_LIST_HEAD(&session->ordered_samples.samples);
 	INIT_LIST_HEAD(&session->ordered_samples.sample_cache);
 	INIT_LIST_HEAD(&session->ordered_samples.to_free);
+	INIT_LIST_HEAD(&session->itrace_index);
 	machines__init(&session->machines);
 
 	if (file) {
@@ -150,6 +151,7 @@ static void perf_session_env__delete(struct perf_session_env *env)
 void perf_session__delete(struct perf_session *session)
 {
 	itrace__free(session);
+	itrace_index__free(&session->itrace_index);
 	perf_session__destroy_kernel_maps(session);
 	perf_session__delete_dead_threads(session);
 	perf_session__delete_threads(session);
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 25aa9e7..9cf3840 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -38,6 +38,7 @@ struct perf_session {
 	struct perf_evlist	*evlist;
 	struct itrace		*itrace;
 	struct itrace_synth_opts *itrace_synth_opts;
+	struct list_head	itrace_index;
 	struct trace_event	tevent;
 	struct events_stats	stats;
 	bool			repipe;
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 64/71] perf tools: Hit all build ids when Instruction Tracing
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (62 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 63/71] perf tools: Add Instruction Tracing index Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 65/71] perf itrace: Add Intel PT as an Instruction Tracing type Alexander Shishkin
                   ` (8 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

We need to include all buildids when a perf.data
file contains Instruction Tracing data because we
do not decode the trace for that purpose because
it would take too long.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/builtin-buildid-list.c |  9 +++++++++
 tools/perf/builtin-inject.c       |  8 ++++++++
 tools/perf/builtin-record.c       | 10 +++++++++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/perf/builtin-buildid-list.c b/tools/perf/builtin-buildid-list.c
index ed3873b..d694309 100644
--- a/tools/perf/builtin-buildid-list.c
+++ b/tools/perf/builtin-buildid-list.c
@@ -69,6 +69,15 @@ static int perf_session__list_build_ids(bool force, bool with_hits)
 	session = perf_session__new(&file, false, &build_id__mark_dso_hit_ops);
 	if (session == NULL)
 		return -1;
+
+	/*
+	 * We take all buildids when the file contains Instruction Tracing data
+	 * because we do not decode the trace because it would take too long.
+	 */
+	if (!perf_data_file__is_pipe(&file) &&
+	    perf_header__has_feat(&session->header, HEADER_ITRACE))
+		with_hits = false;
+
 	/*
 	 * in pipe-mode, the only way to get the buildids is to parse
 	 * the record stream. Buildids are stored as RECORD_HEADER_BUILD_ID
diff --git a/tools/perf/builtin-inject.c b/tools/perf/builtin-inject.c
index ce1f298..cafa6ab 100644
--- a/tools/perf/builtin-inject.c
+++ b/tools/perf/builtin-inject.c
@@ -26,6 +26,7 @@ struct perf_inject {
 	struct perf_tool tool;
 	bool		 build_ids;
 	bool		 sched_stat;
+	bool		 have_itrace;
 	const char	 *input_name;
 	int		 pipe_output,
 			 output;
@@ -120,6 +121,8 @@ static s64 perf_event__repipe_itrace(struct perf_tool *tool,
 						  tool);
 	int ret;
 
+	inject->have_itrace = true;
+
 	if (!inject->pipe_output) {
 		off_t offset;
 
@@ -544,6 +547,11 @@ static int __cmd_inject(struct perf_inject *inject)
 	ret = perf_session__process_events(session, &inject->tool);
 
 	if (!inject->pipe_output) {
+		if (inject->build_ids && inject->have_itrace) {
+			perf_header__set_feat(&session->header,
+					      HEADER_BUILD_ID);
+			dsos__hit_all(session);
+		}
 		/*
 		 * The instruction traces have been removed and replaced with
 		 * synthesized hardware events, so clear the feature flag.
diff --git a/tools/perf/builtin-record.c b/tools/perf/builtin-record.c
index b35963f..d51fba6 100644
--- a/tools/perf/builtin-record.c
+++ b/tools/perf/builtin-record.c
@@ -341,8 +341,16 @@ static void perf_record__exit(int status, void *arg)
 	if (!file->is_pipe) {
 		rec->session->header.data_size += rec->bytes_written;
 
-		if (!rec->no_buildid)
+		if (!rec->no_buildid) {
 			process_buildids(rec);
+			/*
+			 * We take all buildids when the file contains
+			 * Instruction Tracing data because we do not decode the
+			 * trace because it would take too long.
+			 */
+			if (perf_record_opts_itracing(&rec->opts))
+				dsos__hit_all(rec->session);
+		}
 		perf_session__write_header(rec->session, rec->evlist,
 					   file->fd, true);
 		perf_session__delete(rec->session);
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 65/71] perf itrace: Add Intel PT as an Instruction Tracing type
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (63 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 64/71] perf tools: Hit all build ids when Instruction Tracing Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 66/71] perf tools: Add Intel PT packet decoder Alexander Shishkin
                   ` (7 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add the Intel Processor Trace type
constant PERF_ITRACE_INTEL_PT.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/util/itrace.c | 1 +
 tools/perf/util/itrace.h | 1 +
 2 files changed, 2 insertions(+)

diff --git a/tools/perf/util/itrace.c b/tools/perf/util/itrace.c
index 28455f8..8ecbfb1 100644
--- a/tools/perf/util/itrace.c
+++ b/tools/perf/util/itrace.c
@@ -977,6 +977,7 @@ int perf_event__process_itrace_info(struct perf_tool *tool __maybe_unused,
 		return 0;
 
 	switch (type) {
+	case PERF_ITRACE_INTEL_PT:
 	case PERF_ITRACE_UNKNOWN:
 	default:
 		return -EINVAL;
diff --git a/tools/perf/util/itrace.h b/tools/perf/util/itrace.h
index 1005715..de4b7a0 100644
--- a/tools/perf/util/itrace.h
+++ b/tools/perf/util/itrace.h
@@ -40,6 +40,7 @@ struct itrace_info_event;
 
 enum itrace_type {
 	PERF_ITRACE_UNKNOWN,
+	PERF_ITRACE_INTEL_PT,
 };
 
 enum itrace_error_type {
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 66/71] perf tools: Add Intel PT packet decoder
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (64 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 65/71] perf itrace: Add Intel PT as an Instruction Tracing type Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 67/71] perf tools: Add Intel PT instruction decoder Alexander Shishkin
                   ` (6 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for decoding Intel Processor Trace
packets.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Makefile.perf                           |   2 +
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.c   | 404 +++++++++++++++++++++
 .../util/intel-pt-decoder/intel-pt-pkt-decoder.h   |  68 ++++
 3 files changed, 474 insertions(+)
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 6ef50f9..a006fac 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -294,6 +294,7 @@ LIB_H += util/unwind.h
 LIB_H += util/vdso.h
 LIB_H += util/tsc.h
 LIB_H += util/itrace.h
+LIB_H += util/intel-pt-decoder/intel-pt-pkt-decoder.h
 LIB_H += ui/helpline.h
 LIB_H += ui/progress.h
 LIB_H += ui/util.h
@@ -374,6 +375,7 @@ LIB_OBJS += $(OUTPUT)util/srcline.o
 LIB_OBJS += $(OUTPUT)util/data.o
 LIB_OBJS += $(OUTPUT)util/tsc.o
 LIB_OBJS += $(OUTPUT)util/itrace.o
+LIB_OBJS += $(OUTPUT)util/intel-pt-decoder/intel-pt-pkt-decoder.o
 
 LIB_OBJS += $(OUTPUT)ui/setup.o
 LIB_OBJS += $(OUTPUT)ui/helpline.o
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
new file mode 100644
index 0000000..c15eaf3
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c
@@ -0,0 +1,404 @@
+/*
+ * intel_pt_pkt_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <endian.h>
+#include <byteswap.h>
+
+#include "intel-pt-pkt-decoder.h"
+
+#define BIT(n)		(1 << (n))
+
+#define BIT63		((uint64_t)1 << 63)
+
+#if __BYTE_ORDER == __BIG_ENDIAN
+#define le16_to_cpu bswap_16
+#define le32_to_cpu bswap_32
+#define le64_to_cpu bswap_64
+#define memcpy_le64(d, s, n) do { \
+	memcpy((d), (s), (n));    \
+	*(d) = le64_to_cpu(*(d)); \
+} while (0)
+#else
+#define le16_to_cpu
+#define le32_to_cpu
+#define le64_to_cpu
+#define memcpy_le64 memcpy
+#endif
+
+static const char * const packet_name[] = {
+	[INTEL_PT_BAD]		= "Bad Packet!",
+	[INTEL_PT_PAD]		= "PAD",
+	[INTEL_PT_TNT]		= "TNT",
+	[INTEL_PT_TIP_PGD]	= "TIP.PGD",
+	[INTEL_PT_TIP_PGE]	= "TIP.PGE",
+	[INTEL_PT_TSC]		= "TSC",
+	[INTEL_PT_MODE_EXEC]	= "MODE.Exec",
+	[INTEL_PT_MODE_TSX]	= "MODE.TSX",
+	[INTEL_PT_TIP]		= "TIP",
+	[INTEL_PT_FUP]		= "FUP",
+	[INTEL_PT_PSB]		= "PSB",
+	[INTEL_PT_PSBEND]	= "PSBEND",
+	[INTEL_PT_CBR]		= "CBR",
+	[INTEL_PT_PIP]		= "PIP",
+	[INTEL_PT_OVF]		= "OVF",
+};
+
+const char *intel_pt_pkt_name(enum intel_pt_pkt_type type)
+{
+	return packet_name[type];
+}
+
+static int intel_pt_get_long_tnt(const unsigned char *buf, size_t len,
+				 struct intel_pt_pkt *packet)
+{
+	uint64_t payload;
+	int count;
+
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	payload = le64_to_cpu(*(uint64_t *)buf);
+
+	for (count = 47; count; count--) {
+		if (payload & BIT63)
+			break;
+		payload <<= 1;
+	}
+
+	packet->type = INTEL_PT_TNT;
+	packet->count = count;
+	packet->payload = payload << 1;
+	return 8;
+}
+
+static int intel_pt_get_pip(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	uint64_t payload = 0;
+
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	packet->type = INTEL_PT_PIP;
+	memcpy_le64(&payload, buf + 2, 6);
+	packet->payload = payload >> 1;
+
+	return 8;
+}
+
+static int intel_pt_get_cbr(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 4)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_CBR;
+	packet->payload = buf[2];
+	return 4;
+}
+
+static int intel_pt_get_ovf(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_OVF;
+	return 2;
+}
+
+static int intel_pt_get_psb(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	int i;
+
+	if (len < 16)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	for (i = 2; i < 16; i += 2) {
+		if (buf[i] != 2 || buf[i + 1] != 0x82)
+			return INTEL_PT_BAD_PACKET;
+	}
+
+	packet->type = INTEL_PT_PSB;
+	return 16;
+}
+
+static int intel_pt_get_psbend(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_PSBEND;
+	return 2;
+}
+
+static int intel_pt_get_pad(struct intel_pt_pkt *packet)
+{
+	packet->type = INTEL_PT_PAD;
+	return 1;
+}
+
+static int intel_pt_get_ext(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	switch (buf[1]) {
+	case 0xa3: /* Long TNT */
+		return intel_pt_get_long_tnt(buf, len, packet);
+	case 0x43: /* PIP */
+		return intel_pt_get_pip(buf, len, packet);
+	case 0x03: /* CBR */
+		return intel_pt_get_cbr(buf, len, packet);
+	case 0xf3: /* OVF */
+		return intel_pt_get_ovf(packet);
+	case 0x82: /* PSB */
+		return intel_pt_get_psb(buf, len, packet);
+	case 0x23: /* PSBEND */
+		return intel_pt_get_psbend(packet);
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+}
+
+static int intel_pt_get_short_tnt(unsigned int byte,
+				  struct intel_pt_pkt *packet)
+{
+	int count;
+
+	for (count = 6; count; count--) {
+		if (byte & BIT(7))
+			break;
+		byte <<= 1;
+	}
+
+	packet->type = INTEL_PT_TNT;
+	packet->count = count;
+	packet->payload = (uint64_t)byte << 57;
+
+	return 1;
+}
+
+static int intel_pt_get_ip(enum intel_pt_pkt_type type, unsigned int byte,
+		       const unsigned char *buf, size_t len,
+		       struct intel_pt_pkt *packet)
+{
+	switch (byte >> 5) {
+	case 0:
+		packet->count = 0;
+		break;
+	case 1:
+		if (len < 3)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 2;
+		packet->payload = le16_to_cpu(*(uint16_t *)(buf + 1));
+		break;
+	case 2:
+		if (len < 5)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 4;
+		packet->payload = le32_to_cpu(*(uint32_t *)(buf + 1));
+		break;
+	case 3:
+	case 6:
+		if (len < 7)
+			return INTEL_PT_NEED_MORE_BYTES;
+		packet->count = 6;
+		memcpy_le64(&packet->payload, buf + 1, 6);
+		break;
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+
+	packet->type = type;
+
+	return packet->count + 1;
+}
+
+static int intel_pt_get_mode(const unsigned char *buf, size_t len,
+			     struct intel_pt_pkt *packet)
+{
+	if (len < 2)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	switch (buf[1] >> 5) {
+	case 0:
+		packet->type = INTEL_PT_MODE_EXEC;
+		switch (buf[1] & 3) {
+		case 0:
+			packet->payload = 16;
+			break;
+		case 1:
+			packet->payload = 64;
+			break;
+		case 2:
+			packet->payload = 32;
+			break;
+		default:
+			return INTEL_PT_BAD_PACKET;
+		}
+		break;
+	case 1:
+		packet->type = INTEL_PT_MODE_TSX;
+		if ((buf[1] & 3) == 3)
+			return INTEL_PT_BAD_PACKET;
+		packet->payload = buf[1] & 3;
+		break;
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+
+	return 2;
+}
+
+static int intel_pt_get_tsc(const unsigned char *buf, size_t len,
+			    struct intel_pt_pkt *packet)
+{
+	if (len < 8)
+		return INTEL_PT_NEED_MORE_BYTES;
+	packet->type = INTEL_PT_TSC;
+	memcpy_le64(&packet->payload, buf + 1, 7);
+	return 8;
+}
+
+static int intel_pt_do_get_packet(const unsigned char *buf, size_t len,
+				  struct intel_pt_pkt *packet)
+{
+	unsigned int byte;
+
+	memset(packet, 0, sizeof(struct intel_pt_pkt));
+
+	if (!len)
+		return INTEL_PT_NEED_MORE_BYTES;
+
+	byte = buf[0];
+	if (!(byte & BIT(0))) {
+		if (byte == 0)
+			return intel_pt_get_pad(packet);
+		if (byte == 2)
+			return intel_pt_get_ext(buf, len, packet);
+		return intel_pt_get_short_tnt(byte, packet);
+	}
+
+	switch (byte & 0x3f) {
+	case 0x0D:
+		return intel_pt_get_ip(INTEL_PT_TIP, byte, buf, len, packet);
+	case 0x11:
+		return intel_pt_get_ip(INTEL_PT_TIP_PGE, byte, buf, len,
+				       packet);
+	case 0x01:
+		return intel_pt_get_ip(INTEL_PT_TIP_PGD, byte, buf, len,
+				       packet);
+	case 0x1D:
+		return intel_pt_get_ip(INTEL_PT_FUP, byte, buf, len, packet);
+	case 0x19:
+		switch (byte) {
+		case 0x99:
+			return intel_pt_get_mode(buf, len, packet);
+		case 0x19:
+			return intel_pt_get_tsc(buf, len, packet);
+		default:
+			return INTEL_PT_BAD_PACKET;
+		}
+	default:
+		return INTEL_PT_BAD_PACKET;
+	}
+}
+
+int intel_pt_get_packet(const unsigned char *buf, size_t len,
+			struct intel_pt_pkt *packet)
+{
+	int ret;
+
+	ret = intel_pt_do_get_packet(buf, len, packet);
+	if (ret > 0) {
+		while (ret < 8 && len > (size_t)ret && !buf[ret])
+			ret += 1;
+	}
+	return ret;
+}
+
+int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf,
+		      size_t buf_len)
+{
+	int ret, i;
+	unsigned long long payload = packet->payload;
+	const char *name = intel_pt_pkt_name(packet->type);
+
+	switch (packet->type) {
+	case INTEL_PT_BAD:
+	case INTEL_PT_PAD:
+	case INTEL_PT_PSB:
+	case INTEL_PT_PSBEND:
+	case INTEL_PT_OVF:
+		return snprintf(buf, buf_len, "%s", name);
+	case INTEL_PT_TNT: {
+		size_t blen = buf_len;
+
+		ret = snprintf(buf, blen, "%s ", name);
+		if (ret < 0)
+			return ret;
+		buf += ret;
+		blen -= ret;
+		for (i = 0; i < packet->count; i++) {
+			if (payload & BIT63)
+				ret = snprintf(buf, blen, "T");
+			else
+				ret = snprintf(buf, blen, "N");
+			if (ret < 0)
+				return ret;
+			buf += ret;
+			blen -= ret;
+			payload <<= 1;
+		}
+		ret = snprintf(buf, blen, " (%d)", packet->count);
+		if (ret < 0)
+			return ret;
+		blen -= ret;
+		return buf_len - blen;
+	}
+	case INTEL_PT_TIP_PGD:
+	case INTEL_PT_TIP_PGE:
+	case INTEL_PT_TIP:
+	case INTEL_PT_FUP:
+		if (!(packet->count))
+			return snprintf(buf, buf_len, "%s no ip", name);
+	case INTEL_PT_CBR:
+		return snprintf(buf, buf_len, "%s 0x%llx", name, payload);
+	case INTEL_PT_TSC:
+		if (packet->count)
+			return snprintf(buf, buf_len,
+					"%s 0x%llx CTC 0x%x FC 0x%x",
+					name, payload, packet->count & 0xffff,
+					(packet->count >> 16) & 0x1ff);
+		else
+			return snprintf(buf, buf_len, "%s 0x%llx",
+					name, payload);
+	case INTEL_PT_MODE_EXEC:
+		return snprintf(buf, buf_len, "%s %lld", name, payload);
+	case INTEL_PT_MODE_TSX:
+		return snprintf(buf, buf_len, "%s TXAbort:%u InTX:%u",
+				name, (unsigned)(payload >> 1) & 1,
+				(unsigned)payload & 1);
+	case INTEL_PT_PIP:
+		ret = snprintf(buf, buf_len, "%s 0x%llx",
+			       name, payload);
+		return ret;
+	default:
+		break;
+	}
+	return snprintf(buf, buf_len, "%s 0x%llx (%d)",
+			name, payload, packet->count);
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
new file mode 100644
index 0000000..89a691f
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.h
@@ -0,0 +1,68 @@
+/*
+ * intel_pt_pkt_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_PKT_DECODER_H__
+#define INCLUDE__INTEL_PT_PKT_DECODER_H__
+
+#include <stddef.h>
+#include <stdint.h>
+
+#define INTEL_PT_PKT_DESC_MAX	256
+
+#define INTEL_PT_NEED_MORE_BYTES	-1
+#define INTEL_PT_BAD_PACKET		-2
+
+#define INTEL_PT_PSB_STR		"\002\202\002\202\002\202\002\202" \
+					"\002\202\002\202\002\202\002\202"
+#define INTEL_PT_PSB_LEN		16
+
+#define INTEL_PT_PKT_MAX_SZ		16
+
+enum intel_pt_pkt_type {
+	INTEL_PT_BAD,
+	INTEL_PT_PAD,
+	INTEL_PT_TNT,
+	INTEL_PT_TIP_PGD,
+	INTEL_PT_TIP_PGE,
+	INTEL_PT_TSC,
+	INTEL_PT_MODE_EXEC,
+	INTEL_PT_MODE_TSX,
+	INTEL_PT_TIP,
+	INTEL_PT_FUP,
+	INTEL_PT_PSB,
+	INTEL_PT_PSBEND,
+	INTEL_PT_CBR,
+	INTEL_PT_PIP,
+	INTEL_PT_OVF,
+};
+
+struct intel_pt_pkt {
+	enum intel_pt_pkt_type	type;
+	int			count;
+	uint64_t		payload;
+};
+
+const char *intel_pt_pkt_name(enum intel_pt_pkt_type);
+
+int intel_pt_get_packet(const unsigned char *buf, size_t len,
+			struct intel_pt_pkt *packet);
+
+int intel_pt_pkt_desc(const struct intel_pt_pkt *packet, char *buf, size_t len);
+
+#endif
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 67/71] perf tools: Add Intel PT instruction decoder
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (65 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 66/71] perf tools: Add Intel PT packet decoder Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 68/71] perf tools: Add Intel PT log Alexander Shishkin
                   ` (5 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for decoding instructions for Intel
Processor Trace.  The kernel x86 instruction
decoder is used for this.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Makefile.perf                           |  18 +-
 .../util/intel-pt-decoder/intel-pt-insn-decoder.c  | 224 +++++++++++++++++++++
 .../util/intel-pt-decoder/intel-pt-insn-decoder.h  |  67 ++++++
 3 files changed, 308 insertions(+), 1 deletion(-)
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index a006fac..77310c0 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -85,6 +85,7 @@ INSTALL = install
 FLEX    = flex
 BISON   = bison
 STRIP   = strip
+AWK     = awk
 
 LK_DIR          = $(srctree)/tools/lib/lk/
 TRACE_EVENT_DIR = $(srctree)/tools/lib/traceevent/
@@ -295,6 +296,7 @@ LIB_H += util/vdso.h
 LIB_H += util/tsc.h
 LIB_H += util/itrace.h
 LIB_H += util/intel-pt-decoder/intel-pt-pkt-decoder.h
+LIB_H += util/intel-pt-decoder/intel-pt-insn-decoder.h
 LIB_H += ui/helpline.h
 LIB_H += ui/progress.h
 LIB_H += ui/util.h
@@ -376,6 +378,7 @@ LIB_OBJS += $(OUTPUT)util/data.o
 LIB_OBJS += $(OUTPUT)util/tsc.o
 LIB_OBJS += $(OUTPUT)util/itrace.o
 LIB_OBJS += $(OUTPUT)util/intel-pt-decoder/intel-pt-pkt-decoder.o
+LIB_OBJS += $(OUTPUT)util/intel-pt-decoder/intel-pt-insn-decoder.o
 
 LIB_OBJS += $(OUTPUT)ui/setup.o
 LIB_OBJS += $(OUTPUT)ui/helpline.o
@@ -660,6 +663,18 @@ $(OUTPUT)tests/python-use.o: tests/python-use.c $(OUTPUT)PERF-CFLAGS
 $(OUTPUT)util/config.o: util/config.c $(OUTPUT)PERF-CFLAGS
 	$(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DETC_PERFCONFIG='"$(ETC_PERFCONFIG_SQ)"' $<
 
+inat_tables_script = ../../arch/x86/tools/gen-insn-attr-x86.awk
+inat_tables_maps = ../../arch/x86/lib/x86-opcode-map.txt
+
+$(OUTPUT)util/intel-pt-decoder/inat-tables.c: $(inat_tables_script) $(inat_tables_maps)
+	$(QUIET_GEN)$(AWK) -f $(inat_tables_script) $(inat_tables_maps) > $@ || rm -f $@
+
+$(OUTPUT)util/intel-pt-decoder/inat.c:
+	$(QUIET_GEN)cp ../../arch/x86/lib/inat.c $(OUTPUT)util/intel-pt-decoder/inat.c
+
+$(OUTPUT)util/intel-pt-decoder/intel-pt-insn-decoder.o: util/intel-pt-decoder/intel-pt-insn-decoder.c ../../arch/x86/include/asm/insn.h ../../arch/x86/lib/insn.c $(OUTPUT)util/intel-pt-decoder/inat.c $(OUTPUT)util/intel-pt-decoder/inat-tables.c $(OUTPUT)PERF-CFLAGS
+	$(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -I../../arch/x86/include -I$(OUTPUT)util/intel-pt-decoder -I../../arch/x86/lib $<
+
 $(OUTPUT)ui/setup.o: ui/setup.c $(OUTPUT)PERF-CFLAGS
 	$(QUIET_CC)$(CC) -o $@ -c $(CFLAGS) -DLIBDIR='"$(libdir_SQ)"' $<
 
@@ -885,7 +900,8 @@ config-clean:
 clean: $(LIBTRACEEVENT)-clean $(LIBLK)-clean config-clean
 	$(call QUIET_CLEAN, core-objs)  $(RM) $(LIB_OBJS) $(BUILTIN_OBJS) $(LIB_FILE) $(OUTPUT)perf-archive $(OUTPUT)perf.o $(LANG_BINDINGS) $(GTK_OBJS)
 	$(call QUIET_CLEAN, core-progs) $(RM) $(ALL_PROGRAMS) perf
-	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)PERF-CFLAGS $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex*
+	$(call QUIET_CLEAN, core-gen)   $(RM)  *.spec *.pyc *.pyo */*.pyc */*.pyo $(OUTPUT)common-cmds.h TAGS tags cscope* $(OUTPUT)PERF-VERSION-FILE $(OUTPUT)PERF-CFLAGS $(OUTPUT)util/*-bison* $(OUTPUT)util/*-flex* \
+		$(OUTPUT)util/intel-pt-decoder/inat.c $(OUTPUT)util/intel-pt-decoder/inat-tables.c
 	$(call QUIET_CLEAN, Documentation)
 	@$(MAKE) -C Documentation O=$(OUTPUT) clean >/dev/null
 	$(python-clean)
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
new file mode 100644
index 0000000..3a3c378
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.c
@@ -0,0 +1,224 @@
+/*
+ * intel_pt_insn_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#include <stdio.h>
+#include <string.h>
+#include <endian.h>
+#include <byteswap.h>
+
+#define unlikely(cond) (cond)
+
+#include <asm/insn.h>
+
+#include "inat.c"
+#include <insn.c>
+
+#include "intel-pt-insn-decoder.h"
+
+/* Based on branch_type() from perf_event_intel_lbr.c */
+static void intel_pt_insn_decoder(struct insn *insn,
+				  struct intel_pt_insn *intel_pt_insn)
+{
+	enum intel_pt_insn_op op = INTEL_PT_OP_OTHER;
+	enum intel_pt_insn_branch branch = INTEL_PT_BR_NO_BRANCH;
+	int ext;
+
+	if (insn_is_avx(insn)) {
+		intel_pt_insn->op = INTEL_PT_OP_OTHER;
+		intel_pt_insn->branch = INTEL_PT_BR_NO_BRANCH;
+		intel_pt_insn->length = insn->length;
+		return;
+	}
+
+	switch (insn->opcode.bytes[0]) {
+	case 0xf:
+		switch (insn->opcode.bytes[1]) {
+		case 0x05: /* syscall */
+		case 0x34: /* sysenter */
+			op = INTEL_PT_OP_SYSCALL;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 0x07: /* sysret */
+		case 0x35: /* sysexit */
+			op = INTEL_PT_OP_SYSRET;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 0x80 ... 0x8f: /* jcc */
+			op = INTEL_PT_OP_JCC;
+			branch = INTEL_PT_BR_CONDITIONAL;
+			break;
+		default:
+			break;
+		}
+		break;
+	case 0x70 ... 0x7f: /* jcc */
+		op = INTEL_PT_OP_JCC;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xc2: /* near ret */
+	case 0xc3: /* near ret */
+	case 0xca: /* far ret */
+	case 0xcb: /* far ret */
+		op = INTEL_PT_OP_RET;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xcf: /* iret */
+		op = INTEL_PT_OP_IRET;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xcc ... 0xce: /* int */
+		op = INTEL_PT_OP_INT;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xe8: /* call near rel */
+		op = INTEL_PT_OP_CALL;
+		branch = INTEL_PT_BR_UNCONDITIONAL;
+		break;
+	case 0x9a: /* call far absolute */
+		op = INTEL_PT_OP_CALL;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xe0 ... 0xe2: /* loop */
+		op = INTEL_PT_OP_LOOP;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xe3: /* jcc */
+		op = INTEL_PT_OP_JCC;
+		branch = INTEL_PT_BR_CONDITIONAL;
+		break;
+	case 0xe9: /* jmp */
+	case 0xeb: /* jmp */
+		op = INTEL_PT_OP_JMP;
+		branch = INTEL_PT_BR_UNCONDITIONAL;
+		break;
+	case 0xea: /* far jmp */
+		op = INTEL_PT_OP_JMP;
+		branch = INTEL_PT_BR_INDIRECT;
+		break;
+	case 0xff: /* call near absolute, call far absolute ind */
+		ext = (insn->modrm.bytes[0] >> 3) & 0x7;
+		switch (ext) {
+		case 2: /* near ind call */
+		case 3: /* far ind call */
+			op = INTEL_PT_OP_CALL;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		case 4:
+		case 5:
+			op = INTEL_PT_OP_JMP;
+			branch = INTEL_PT_BR_INDIRECT;
+			break;
+		default:
+			break;
+		}
+		break;
+	default:
+		break;
+	}
+
+	intel_pt_insn->op = op;
+	intel_pt_insn->branch = branch;
+	intel_pt_insn->length = insn->length;
+
+	if (branch == INTEL_PT_BR_CONDITIONAL ||
+	    branch == INTEL_PT_BR_UNCONDITIONAL) {
+#if __BYTE_ORDER == __BIG_ENDIAN
+		switch (insn->immediate.nbytes) {
+		case 1:
+			intel_pt_insn->rel = insn->immediate.value;
+			break;
+		case 2:
+			intel_pt_insn->rel =
+					bswap_16((short)insn->immediate.value);
+			break;
+		case 4:
+			intel_pt_insn->rel = bswap_32(insn->immediate.value);
+			break;
+		}
+#else
+		intel_pt_insn->rel = insn->immediate.value;
+#endif
+	}
+}
+
+int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
+		      struct intel_pt_insn *intel_pt_insn)
+{
+	struct insn insn;
+	unsigned char sbuf[MAX_INSN_SIZE];
+
+	if (len < MAX_INSN_SIZE) {
+		memset(sbuf, 0, MAX_INSN_SIZE);
+		memcpy(sbuf, buf, len);
+		buf = sbuf;
+	}
+	insn_init(&insn, buf, x86_64);
+	insn_get_length(&insn);
+	if (!insn_complete(&insn) || insn.length > len)
+		return -1;
+	intel_pt_insn_decoder(&insn, intel_pt_insn);
+	if (insn.length < INTEL_PT_INSN_DBG_BUF_SZ)
+		memcpy(intel_pt_insn->buf, buf, insn.length);
+	else
+		memcpy(intel_pt_insn->buf, buf, INTEL_PT_INSN_DBG_BUF_SZ);
+	return 0;
+}
+
+const char *branch_name[] = {
+	[INTEL_PT_OP_OTHER]	= "Other",
+	[INTEL_PT_OP_CALL]	= "Call",
+	[INTEL_PT_OP_RET]	= "Ret",
+	[INTEL_PT_OP_JCC]	= "Jcc",
+	[INTEL_PT_OP_JMP]	= "Jmp",
+	[INTEL_PT_OP_LOOP]	= "Loop",
+	[INTEL_PT_OP_IRET]	= "IRet",
+	[INTEL_PT_OP_INT]	= "Int",
+	[INTEL_PT_OP_SYSCALL]	= "Syscall",
+	[INTEL_PT_OP_SYSRET]	= "Sysret",
+};
+
+const char *intel_pt_insn_name(enum intel_pt_insn_op op)
+{
+	return branch_name[op];
+}
+
+int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
+		       size_t buf_len)
+{
+	switch (intel_pt_insn->branch) {
+	case INTEL_PT_BR_CONDITIONAL:
+	case INTEL_PT_BR_UNCONDITIONAL:
+		return snprintf(buf, buf_len, "%s %s%d",
+				intel_pt_insn_name(intel_pt_insn->op),
+				intel_pt_insn->rel > 0 ? "+" : "",
+				intel_pt_insn->rel);
+	case INTEL_PT_BR_NO_BRANCH:
+	case INTEL_PT_BR_INDIRECT:
+		return snprintf(buf, buf_len, "%s",
+				intel_pt_insn_name(intel_pt_insn->op));
+	default:
+		break;
+	}
+	return 0;
+}
+
+size_t intel_pt_insn_max_size(void)
+{
+	return MAX_INSN_SIZE;
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
new file mode 100644
index 0000000..593ab37
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-insn-decoder.h
@@ -0,0 +1,67 @@
+/*
+ * intel_pt_insn_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_INSN_DECODER_H__
+#define INCLUDE__INTEL_PT_INSN_DECODER_H__
+
+#include <stddef.h>
+#include <stdint.h>
+
+#define INTEL_PT_INSN_DESC_MAX		32
+#define INTEL_PT_INSN_DBG_BUF_SZ	16
+
+enum intel_pt_insn_op {
+	INTEL_PT_OP_OTHER,
+	INTEL_PT_OP_CALL,
+	INTEL_PT_OP_RET,
+	INTEL_PT_OP_JCC,
+	INTEL_PT_OP_JMP,
+	INTEL_PT_OP_LOOP,
+	INTEL_PT_OP_IRET,
+	INTEL_PT_OP_INT,
+	INTEL_PT_OP_SYSCALL,
+	INTEL_PT_OP_SYSRET,
+};
+
+enum intel_pt_insn_branch {
+	INTEL_PT_BR_NO_BRANCH,
+	INTEL_PT_BR_INDIRECT,
+	INTEL_PT_BR_CONDITIONAL,
+	INTEL_PT_BR_UNCONDITIONAL,
+};
+
+struct intel_pt_insn {
+	enum intel_pt_insn_op		op;
+	enum intel_pt_insn_branch	branch;
+	int				length;
+	int32_t				rel;
+	unsigned char			buf[INTEL_PT_INSN_DBG_BUF_SZ];
+};
+
+int intel_pt_get_insn(const unsigned char *buf, size_t len, int x86_64,
+		      struct intel_pt_insn *intel_pt_insn);
+
+const char *intel_pt_insn_name(enum intel_pt_insn_op op);
+
+int intel_pt_insn_desc(const struct intel_pt_insn *intel_pt_insn, char *buf,
+		       size_t buf_len);
+
+size_t intel_pt_insn_max_size(void);
+
+#endif
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 68/71] perf tools: Add Intel PT log
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (66 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 67/71] perf tools: Add Intel PT instruction decoder Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 69/71] perf tools: Add Intel PT decoder Alexander Shishkin
                   ` (4 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add a facility to log Intel Processor Trace
decoding.  The log is intended for debugging
purposes only.  If the log file exists, then
logging proceeds otherwise there is no
logging.

The log file name is "intel_pt.log" and is
opened in the current directory.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Makefile.perf                        |   2 +
 tools/perf/util/intel-pt-decoder/intel-pt-log.c | 119 ++++++++++++++++++++++++
 tools/perf/util/intel-pt-decoder/intel-pt-log.h |  52 +++++++++++
 3 files changed, 173 insertions(+)
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-log.h

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 77310c0..e4faac1 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -297,6 +297,7 @@ LIB_H += util/tsc.h
 LIB_H += util/itrace.h
 LIB_H += util/intel-pt-decoder/intel-pt-pkt-decoder.h
 LIB_H += util/intel-pt-decoder/intel-pt-insn-decoder.h
+LIB_H += util/intel-pt-decoder/intel-pt-log.h
 LIB_H += ui/helpline.h
 LIB_H += ui/progress.h
 LIB_H += ui/util.h
@@ -379,6 +380,7 @@ LIB_OBJS += $(OUTPUT)util/tsc.o
 LIB_OBJS += $(OUTPUT)util/itrace.o
 LIB_OBJS += $(OUTPUT)util/intel-pt-decoder/intel-pt-pkt-decoder.o
 LIB_OBJS += $(OUTPUT)util/intel-pt-decoder/intel-pt-insn-decoder.o
+LIB_OBJS += $(OUTPUT)util/intel-pt-decoder/intel-pt-log.o
 
 LIB_OBJS += $(OUTPUT)ui/setup.o
 LIB_OBJS += $(OUTPUT)ui/helpline.o
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.c b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
new file mode 100644
index 0000000..b47d6c1
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.c
@@ -0,0 +1,119 @@
+/*
+ * intel_pt_log.c: Intel Processor Trace support
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <inttypes.h>
+#include <stdarg.h>
+#include <string.h>
+
+#include "intel-pt-log.h"
+#include "intel-pt-insn-decoder.h"
+
+#include "intel-pt-pkt-decoder.h"
+
+#define MAX_LOG_NAME 256
+
+static FILE *f;
+static char log_name[MAX_LOG_NAME];
+
+void intel_pt_log_set_name(const char *name)
+{
+	strncpy(log_name, name, MAX_LOG_NAME - 5);
+	strcat(log_name, ".log");
+}
+
+static void intel_pt_print_data(const unsigned char *buf, int len, uint64_t pos,
+				int indent)
+{
+	int i;
+
+	for (i = 0; i < indent; i++)
+		fprintf(f, " ");
+
+	fprintf(f, "  %08" PRIx64 ": ", pos);
+	for (i = 0; i < len; i++)
+		fprintf(f, " %02x", buf[i]);
+	for (; i < 16; i++)
+		fprintf(f, "   ");
+	fprintf(f, " ");
+}
+
+static int intel_pt_log_open(void)
+{
+	if (f)
+		return 0;
+
+	if (!log_name[0])
+		return -1;
+
+	f = fopen(log_name, "r");
+	if (!f)
+		return -1;
+
+	fclose(f);
+
+	f = fopen(log_name, "w+");
+	if (!f)
+		return -1;
+
+	return 0;
+}
+
+void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			 uint64_t pos, const unsigned char *buf)
+{
+	char desc[INTEL_PT_PKT_DESC_MAX];
+
+	if (intel_pt_log_open())
+		return;
+
+	intel_pt_print_data(buf, pkt_len, pos, 0);
+	intel_pt_pkt_desc(packet, desc, INTEL_PT_PKT_DESC_MAX);
+	fprintf(f, "%s\n", desc);
+}
+
+void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	char desc[INTEL_PT_INSN_DESC_MAX];
+	size_t len = intel_pt_insn->length;
+
+	if (intel_pt_log_open())
+		return;
+
+	if (len > INTEL_PT_INSN_DBG_BUF_SZ)
+		len = INTEL_PT_INSN_DBG_BUF_SZ;
+	intel_pt_print_data(intel_pt_insn->buf, len, ip, 8);
+	if (intel_pt_insn_desc(intel_pt_insn, desc, INTEL_PT_INSN_DESC_MAX) > 0)
+		fprintf(f, "%s\n", desc);
+	else
+		fprintf(f, "Bad instruction!\n");
+}
+
+void intel_pt_log(const char *fmt, ...)
+{
+	va_list args;
+
+	if (intel_pt_log_open())
+		return;
+
+	va_start(args, fmt);
+	vfprintf(f, fmt, args);
+	va_end(args);
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-log.h b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
new file mode 100644
index 0000000..58c72c9
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-log.h
@@ -0,0 +1,52 @@
+/*
+ * intel_pt_log.h: Intel Processor Trace support
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_LOG_H__
+#define INCLUDE__INTEL_PT_LOG_H__
+
+#include <stdint.h>
+#include <inttypes.h>
+
+struct intel_pt_pkt;
+
+void intel_pt_log_set_name(const char *name);
+
+void intel_pt_log_packet(const struct intel_pt_pkt *packet, int pkt_len,
+			 uint64_t pos, const unsigned char *buf);
+
+struct intel_pt_insn;
+
+void intel_pt_log_insn(struct intel_pt_insn *intel_pt_insn, uint64_t ip);
+
+__attribute__((format(printf, 1, 2)))
+void intel_pt_log(const char *fmt, ...);
+
+#define x64_fmt "0x%" PRIx64
+
+static inline void intel_pt_log_at(const char *msg, uint64_t u)
+{
+	intel_pt_log("%s at " x64_fmt "\n", msg, u);
+}
+
+static inline void intel_pt_log_to(const char *msg, uint64_t u)
+{
+	intel_pt_log("%s to " x64_fmt "\n", msg, u);
+}
+
+#endif
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 69/71] perf tools: Add Intel PT decoder
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (67 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 68/71] perf tools: Add Intel PT log Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 70/71] perf tools: Add Intel PT support Alexander Shishkin
                   ` (3 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for decoding an Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Makefile.perf                           |    2 +
 .../perf/util/intel-pt-decoder/intel-pt-decoder.c  | 1678 ++++++++++++++++++++
 .../perf/util/intel-pt-decoder/intel-pt-decoder.h  |   83 +
 3 files changed, 1763 insertions(+)
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
 create mode 100644 tools/perf/util/intel-pt-decoder/intel-pt-decoder.h

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index e4faac1..41f8a97 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -298,6 +298,7 @@ LIB_H += util/itrace.h
 LIB_H += util/intel-pt-decoder/intel-pt-pkt-decoder.h
 LIB_H += util/intel-pt-decoder/intel-pt-insn-decoder.h
 LIB_H += util/intel-pt-decoder/intel-pt-log.h
+LIB_H += util/intel-pt-decoder/intel-pt-decoder.h
 LIB_H += ui/helpline.h
 LIB_H += ui/progress.h
 LIB_H += ui/util.h
@@ -381,6 +382,7 @@ LIB_OBJS += $(OUTPUT)util/itrace.o
 LIB_OBJS += $(OUTPUT)util/intel-pt-decoder/intel-pt-pkt-decoder.o
 LIB_OBJS += $(OUTPUT)util/intel-pt-decoder/intel-pt-insn-decoder.o
 LIB_OBJS += $(OUTPUT)util/intel-pt-decoder/intel-pt-log.o
+LIB_OBJS += $(OUTPUT)util/intel-pt-decoder/intel-pt-decoder.o
 
 LIB_OBJS += $(OUTPUT)ui/setup.o
 LIB_OBJS += $(OUTPUT)ui/helpline.o
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
new file mode 100644
index 0000000..11fb914
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.c
@@ -0,0 +1,1678 @@
+/*
+ * intel_pt_decoder.c: Intel Processor Trace support
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#ifndef _GNU_SOURCE
+#define _GNU_SOURCE
+#endif
+#include <stdlib.h>
+#include <stdbool.h>
+#include <string.h>
+#include <errno.h>
+#include <stdint.h>
+#include <inttypes.h>
+
+#include "intel-pt-insn-decoder.h"
+#include "intel-pt-pkt-decoder.h"
+#include "intel-pt-decoder.h"
+#include "intel-pt-log.h"
+
+#define INTEL_PT_BLK_SIZE 1024
+
+#define BIT63 (((uint64_t)1 << 63))
+
+#define INTEL_PT_RETURN 1
+
+struct intel_pt_blk {
+	struct intel_pt_blk *prev;
+	uint64_t ip[INTEL_PT_BLK_SIZE];
+};
+
+struct intel_pt_stack {
+	struct intel_pt_blk *blk;
+	struct intel_pt_blk *spare;
+	int pos;
+};
+
+enum intel_pt_pkt_state {
+	INTEL_PT_STATE_NO_PSB,
+	INTEL_PT_STATE_NO_IP,
+	INTEL_PT_STATE_ERR_RESYNC,
+	INTEL_PT_STATE_IN_SYNC,
+	INTEL_PT_STATE_TNT,
+	INTEL_PT_STATE_TIP,
+	INTEL_PT_STATE_TIP_PGD,
+	INTEL_PT_STATE_FUP,
+	INTEL_PT_STATE_FUP_NO_TIP,
+};
+
+#ifdef INTEL_PT_STRICT
+#define INTEL_PT_STATE_ERR1	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR2	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR3	INTEL_PT_STATE_NO_PSB
+#define INTEL_PT_STATE_ERR4	INTEL_PT_STATE_NO_PSB
+#else
+#define INTEL_PT_STATE_ERR1	(decoder->pkt_state)
+#define INTEL_PT_STATE_ERR2	INTEL_PT_STATE_NO_IP
+#define INTEL_PT_STATE_ERR3	INTEL_PT_STATE_ERR_RESYNC
+#define INTEL_PT_STATE_ERR4	INTEL_PT_STATE_IN_SYNC
+#endif
+
+struct intel_pt_decoder {
+	int (*get_trace)(struct intel_pt_buffer *buffer, void *data);
+	int (*get_insn)(struct intel_pt_insn *intel_pt_insn, uint64_t ip,
+			uint64_t cr3, void *data);
+	void *data;
+	struct intel_pt_state state;
+	const unsigned char *buf;
+	size_t len;
+	bool return_compression;
+	bool pge;
+	uint64_t pos;
+	uint64_t last_ip;
+	uint64_t ip;
+	uint64_t cr3;
+	uint64_t timestamp;
+	uint64_t tsc_timestamp;
+	uint64_t ref_timestamp;
+	uint64_t ret_addr;
+	struct intel_pt_stack stack;
+	enum intel_pt_pkt_state pkt_state;
+	struct intel_pt_pkt packet;
+	struct intel_pt_pkt tnt;
+	int pkt_step;
+	int pkt_len;
+	unsigned int cbr;
+	int exec_mode;
+	unsigned int insn_bytes;
+	uint64_t sign_bit;
+	uint64_t sign_bits;
+	uint64_t period;
+	enum intel_pt_period_type period_type;
+	uint64_t period_insn_cnt;
+	uint64_t period_mask;
+	uint64_t last_masked_timestamp;
+	bool continuous_period;
+	bool overflow;
+	uint64_t timestamp_insn_cnt;
+	const unsigned char *next_buf;
+	size_t next_len;
+	unsigned char temp_buf[INTEL_PT_PKT_MAX_SZ];
+};
+
+static uint64_t intel_pt_lower_power_of_2(uint64_t x)
+{
+	int i;
+
+	for (i = 0; x != 1; i++)
+		x >>= 1;
+
+	return x << i;
+}
+
+static void intel_pt_setup_period(struct intel_pt_decoder *decoder)
+{
+	if (decoder->period_type == INTEL_PT_PERIOD_TICKS) {
+		uint64_t period;
+
+		period = intel_pt_lower_power_of_2(decoder->period);
+		decoder->period_mask = ~(period - 1);
+	}
+}
+
+struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params)
+{
+	struct intel_pt_decoder *decoder;
+
+	if (!params->get_trace || !params->get_insn)
+		return NULL;
+
+	decoder = malloc(sizeof(struct intel_pt_decoder));
+	if (!decoder)
+		return NULL;
+
+	memset(decoder, 0, sizeof(struct intel_pt_decoder));
+
+	decoder->get_trace = params->get_trace;
+	decoder->get_insn = params->get_insn;
+	decoder->data = params->data;
+	decoder->return_compression = params->return_compression;
+
+	decoder->sign_bit = (uint64_t)1 << 47;
+	decoder->sign_bits = ~(((uint64_t)1 << 48) - 1);
+
+	decoder->period = params->period;
+	decoder->period_type = params->period_type;
+
+	intel_pt_setup_period(decoder);
+
+	return decoder;
+}
+
+static void intel_pt_pop_blk(struct intel_pt_stack *stack)
+{
+	struct intel_pt_blk *blk;
+
+	blk = stack->blk;
+	stack->blk = blk->prev;
+	if (!stack->spare)
+		stack->spare = blk;
+	else
+		free(blk);
+}
+
+static uint64_t intel_pt_pop(struct intel_pt_stack *stack)
+{
+	if (!stack->pos) {
+		if (!stack->blk)
+			return 0;
+		intel_pt_pop_blk(stack);
+		if (!stack->blk)
+			return 0;
+		stack->pos = INTEL_PT_BLK_SIZE;
+	}
+	return stack->blk->ip[--stack->pos];
+}
+
+static int intel_pt_alloc_blk(struct intel_pt_stack *stack)
+{
+	struct intel_pt_blk *blk;
+
+	if (stack->spare) {
+		blk = stack->spare;
+		stack->spare = NULL;
+	} else {
+		blk = malloc(sizeof(struct intel_pt_blk));
+		if (!blk)
+			return -ENOMEM;
+	}
+
+	blk->prev = stack->blk;
+	stack->blk = blk;
+	stack->pos = 0;
+	return 0;
+}
+
+static int intel_pt_push(struct intel_pt_stack *stack, uint64_t ip)
+{
+	int err;
+
+	if (!stack->blk || stack->pos == INTEL_PT_BLK_SIZE) {
+		err = intel_pt_alloc_blk(stack);
+		if (err)
+			return err;
+	}
+
+	stack->blk->ip[stack->pos++] = ip;
+	return 0;
+}
+
+static void intel_pt_clear_stack(struct intel_pt_stack *stack)
+{
+	while (stack->blk)
+		intel_pt_pop_blk(stack);
+	stack->pos = 0;
+}
+
+static void intel_pt_free_stack(struct intel_pt_stack *stack)
+{
+	intel_pt_clear_stack(stack);
+	free(stack->blk);
+	free(stack->spare);
+}
+
+void intel_pt_decoder_free(struct intel_pt_decoder *decoder)
+{
+	intel_pt_free_stack(&decoder->stack);
+	free(decoder);
+}
+
+const char *intel_pt_error_message(int code)
+{
+	switch (code) {
+	case ENOMEM:
+		return "Memory allocation failed";
+	case ENOSYS:
+		return "Internal error";
+	case EBADMSG:
+		return "Bad packet";
+	case ENODATA:
+		return "No more data";
+	case EILSEQ:
+		return "Failed to get instruction";
+	case ENOENT:
+		return "Trace doesn't match instruction";
+	case EOVERFLOW:
+		return "Overflow packet";
+	case ESHUTDOWN:
+		return "Trace stop packet";
+	default:
+		return "Unknown error!";
+	}
+}
+
+static uint64_t intel_pt_calc_ip(struct intel_pt_decoder *decoder,
+				 const struct intel_pt_pkt *packet,
+				 uint64_t last_ip)
+{
+	uint64_t ip;
+
+	switch (packet->count) {
+	case 2:
+		ip = (last_ip & (uint64_t)0xffffffffffff0000ULL) |
+		     packet->payload;
+		break;
+	case 4:
+		ip = (last_ip & (uint64_t)0xffffffff00000000ULL) |
+		     packet->payload;
+		break;
+	case 6:
+		ip = packet->payload;
+		break;
+	default:
+		return 0;
+	}
+
+	if (ip & decoder->sign_bit)
+		return ip | decoder->sign_bits;
+
+	return ip;
+}
+
+static inline void intel_pt_set_last_ip(struct intel_pt_decoder *decoder)
+{
+	decoder->last_ip = intel_pt_calc_ip(decoder, &decoder->packet,
+					    decoder->last_ip);
+}
+
+static inline void intel_pt_set_ip(struct intel_pt_decoder *decoder)
+{
+	intel_pt_set_last_ip(decoder);
+	decoder->ip = decoder->last_ip;
+}
+
+static void intel_pt_decoder_log_packet(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log_packet(&decoder->packet, decoder->pkt_len, decoder->pos,
+			    decoder->buf);
+}
+
+static int intel_pt_bug(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log("ERROR: Internal error\n");
+	decoder->pkt_state = INTEL_PT_STATE_NO_PSB;
+	return -ENOSYS;
+}
+
+static inline void intel_pt_update_tx_flags(struct intel_pt_decoder *decoder)
+{
+	decoder->state.flags &= ~(INTEL_PT_IN_TX | INTEL_PT_ABORT_TX);
+	decoder->state.flags |= decoder->packet.payload & (INTEL_PT_IN_TX |
+				INTEL_PT_ABORT_TX);
+}
+
+static inline void intel_pt_clear_tx_flags(struct intel_pt_decoder *decoder)
+{
+	decoder->state.flags &= ~(INTEL_PT_IN_TX | INTEL_PT_ABORT_TX);
+}
+
+static inline void intel_pt_clear_tx_abort(struct intel_pt_decoder *decoder)
+{
+	decoder->state.flags &= ~INTEL_PT_ABORT_TX;
+}
+
+static inline void intel_pt_update_in_tx(struct intel_pt_decoder *decoder)
+{
+	decoder->state.flags &= ~(INTEL_PT_IN_TX | INTEL_PT_ABORT_TX);
+	decoder->state.flags |= decoder->packet.payload & INTEL_PT_IN_TX;
+}
+
+static int intel_pt_bad_packet(struct intel_pt_decoder *decoder)
+{
+	intel_pt_clear_tx_flags(decoder);
+	decoder->pkt_len = 1;
+	decoder->pkt_step = 1;
+	intel_pt_decoder_log_packet(decoder);
+	if (decoder->pkt_state != INTEL_PT_STATE_NO_PSB) {
+		intel_pt_log("ERROR: Bad packet\n");
+		decoder->pkt_state = INTEL_PT_STATE_ERR1;
+	}
+	return -EBADMSG;
+}
+
+static int intel_pt_get_data(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_buffer buffer = {0};
+	int ret;
+
+	decoder->pkt_step = 0;
+
+	intel_pt_log("Getting more data\n");
+	ret = decoder->get_trace(&buffer, decoder->data);
+	if (ret)
+		return ret;
+	decoder->buf = buffer.buf;
+	decoder->len = buffer.len;
+	if (!decoder->len) {
+		intel_pt_log("No more data\n");
+		return -ENODATA;
+	}
+	if (!buffer.consecutive) {
+		decoder->ip = 0;
+		decoder->pkt_state = INTEL_PT_STATE_NO_PSB;
+		decoder->ref_timestamp = buffer.ref_timestamp;
+		decoder->timestamp = 0;
+		intel_pt_log("Reference timestamp 0x%" PRIx64 "\n",
+			     decoder->ref_timestamp);
+		return -ENOLINK;
+	}
+
+	return 0;
+}
+
+static int intel_pt_get_next_data(struct intel_pt_decoder *decoder)
+{
+	if (!decoder->next_buf)
+		return intel_pt_get_data(decoder);
+
+	decoder->buf = decoder->next_buf;
+	decoder->len = decoder->next_len;
+	decoder->next_buf = 0;
+	decoder->next_len = 0;
+	return 0;
+}
+
+static int intel_pt_get_split_packet(struct intel_pt_decoder *decoder)
+{
+	unsigned char *buf = decoder->temp_buf;
+	size_t old_len, len, n;
+	int ret;
+
+	old_len = decoder->len;
+	len = decoder->len;
+	memcpy(buf, decoder->buf, len);
+
+	ret = intel_pt_get_data(decoder);
+	if (ret) {
+		decoder->pos += old_len;
+		return ret < 0 ? ret : -EINVAL;
+	}
+
+	n = INTEL_PT_PKT_MAX_SZ - len;
+	if (n > decoder->len)
+		n = decoder->len;
+	memcpy(buf + len, decoder->buf, n);
+	len += n;
+
+	ret = intel_pt_get_packet(buf, len, &decoder->packet);
+	if (ret < (int)old_len) {
+		decoder->next_buf = decoder->buf;
+		decoder->next_len = decoder->len;
+		decoder->buf = buf;
+		decoder->len = old_len;
+		return intel_pt_bad_packet(decoder);
+	}
+
+	decoder->next_buf = decoder->buf + (ret - old_len);
+	decoder->next_len = decoder->len - (ret - old_len);
+
+	decoder->buf = buf;
+	decoder->len = ret;
+
+	return ret;
+}
+
+static int intel_pt_get_next_packet(struct intel_pt_decoder *decoder)
+{
+	int ret;
+
+	do {
+		decoder->pos += decoder->pkt_step;
+		decoder->buf += decoder->pkt_step;
+		decoder->len -= decoder->pkt_step;
+
+		if (!decoder->len) {
+			ret = intel_pt_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		ret = intel_pt_get_packet(decoder->buf, decoder->len,
+					  &decoder->packet);
+		if (ret == INTEL_PT_NEED_MORE_BYTES &&
+		    decoder->len < INTEL_PT_PKT_MAX_SZ && !decoder->next_buf) {
+			ret = intel_pt_get_split_packet(decoder);
+			if (ret < 0)
+				return ret;
+		}
+		if (ret <= 0)
+			return intel_pt_bad_packet(decoder);
+
+		decoder->pkt_len = ret;
+		decoder->pkt_step = ret;
+		intel_pt_decoder_log_packet(decoder);
+	} while (decoder->packet.type == INTEL_PT_PAD);
+
+	return 0;
+}
+
+static int intel_pt_decoder_get_insn(struct intel_pt_decoder *decoder,
+				     struct intel_pt_insn *intel_pt_insn)
+{
+	int err;
+
+	err = decoder->get_insn(intel_pt_insn, decoder->ip, decoder->cr3,
+				decoder->data);
+	if (err) {
+		intel_pt_log_at("ERROR: Failed to get instruction",
+				decoder->ip);
+		decoder->pkt_state = INTEL_PT_STATE_ERR2;
+		return -EILSEQ;
+	}
+	intel_pt_log_insn(intel_pt_insn, decoder->ip);
+	return 0;
+}
+
+static inline bool intel_pt_sample_insn(struct intel_pt_decoder *decoder)
+{
+	if (decoder->period_type == INTEL_PT_PERIOD_INSTRUCTIONS &&
+	    ++decoder->period_insn_cnt >= decoder->period) {
+		decoder->period_insn_cnt = 0;
+		decoder->state.type |= INTEL_PT_INSTRUCTION;
+		return true;
+	}
+
+	if (decoder->period_type == INTEL_PT_PERIOD_TICKS) {
+		uint64_t timestamp, masked_timestamp;
+
+		timestamp = decoder->timestamp + ++decoder->timestamp_insn_cnt;
+		masked_timestamp = timestamp & decoder->period_mask;
+		if (masked_timestamp != decoder->last_masked_timestamp) {
+			decoder->last_masked_timestamp = masked_timestamp;
+			if (decoder->continuous_period) {
+				decoder->state.type |= INTEL_PT_INSTRUCTION;
+				return true;
+			}
+			decoder->continuous_period = true;
+		}
+	}
+
+	return false;
+}
+
+static int intel_pt_walk_insn(struct intel_pt_decoder *decoder,
+			      struct intel_pt_insn *intel_pt_insn, uint64_t ip)
+{
+	bool sample_insn = false;
+	int err;
+
+	while (1) {
+		if (decoder->ip == ip && ip)
+			return -EAGAIN;
+
+		err = intel_pt_decoder_get_insn(decoder, intel_pt_insn);
+		if (err)
+			return err;
+
+		sample_insn = intel_pt_sample_insn(decoder);
+
+		if (intel_pt_insn->branch == INTEL_PT_BR_NO_BRANCH) {
+			if (sample_insn) {
+				decoder->state.type = INTEL_PT_INSTRUCTION;
+				decoder->state.from_ip = decoder->ip;
+				decoder->state.to_ip = 0;
+				decoder->ip += intel_pt_insn->length;
+				return INTEL_PT_RETURN;
+			}
+			decoder->ip += intel_pt_insn->length;
+			continue;
+		}
+
+		if (intel_pt_insn->op == INTEL_PT_OP_CALL) {
+			err = intel_pt_push(&decoder->stack, decoder->ip +
+					    intel_pt_insn->length);
+			if (err)
+				return err;
+		} else if (intel_pt_insn->op == INTEL_PT_OP_RET) {
+			decoder->ret_addr = intel_pt_pop(&decoder->stack);
+		}
+
+		if (intel_pt_insn->branch == INTEL_PT_BR_UNCONDITIONAL) {
+			decoder->state.from_ip = decoder->ip;
+			decoder->ip += intel_pt_insn->length +
+				       intel_pt_insn->rel;
+			decoder->state.to_ip = decoder->ip;
+			return INTEL_PT_RETURN;
+		}
+
+		return 0;
+	}
+}
+
+static int intel_pt_walk_fup(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	uint64_t ip;
+	int err;
+
+	ip = decoder->last_ip;
+
+	while (1) {
+		err = intel_pt_walk_insn(decoder, &intel_pt_insn, ip);
+		if (err == INTEL_PT_RETURN)
+			return 0;
+		if (err)
+			return err;
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+			if (decoder->ip + intel_pt_insn.length == ip)
+				return -EAGAIN;
+			intel_pt_log_at("ERROR: Unexpected indirect branch",
+					decoder->ip);
+			decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+			return -ENOENT;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+			intel_pt_log_at("ERROR: Unexpected conditional branch",
+					decoder->ip);
+			decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+			return -ENOENT;
+		}
+
+		intel_pt_bug(decoder);
+	}
+}
+
+static int intel_pt_walk_tip(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	int err;
+
+	err = intel_pt_walk_insn(decoder, &intel_pt_insn, 0);
+	if (err == INTEL_PT_RETURN)
+		return 0;
+	if (err)
+		return err;
+
+	if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+		if (decoder->pkt_state == INTEL_PT_STATE_TIP_PGD) {
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = 0;
+			if (decoder->packet.count != 0)
+				decoder->ip = decoder->last_ip;
+		} else {
+			decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->state.from_ip = decoder->ip;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				decoder->state.to_ip = decoder->last_ip;
+				decoder->ip = decoder->last_ip;
+			}
+		}
+		return 0;
+	}
+
+	if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+		intel_pt_log_at("ERROR: Conditional branch when expecting indirect branch",
+				decoder->ip);
+		decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+		return -ENOENT;
+	}
+
+	return intel_pt_bug(decoder);
+}
+
+static int intel_pt_walk_tnt(struct intel_pt_decoder *decoder)
+{
+	struct intel_pt_insn intel_pt_insn;
+	int err;
+
+	while (1) {
+		err = intel_pt_walk_insn(decoder, &intel_pt_insn, 0);
+		if (err == INTEL_PT_RETURN)
+			return 0;
+		if (err)
+			return err;
+
+		if (intel_pt_insn.op == INTEL_PT_OP_RET) {
+			if (!decoder->return_compression) {
+				intel_pt_log_at("ERROR: RET when expecting conditional branch",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			if (!decoder->ret_addr) {
+				intel_pt_log_at("ERROR: Bad RET compression (stack empty)",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			if (!(decoder->tnt.payload & BIT63)) {
+				intel_pt_log_at("ERROR: Bad RET compression (TNT=N)",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				return -ENOENT;
+			}
+			decoder->tnt.count -= 1;
+			if (!decoder->tnt.count)
+				decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			decoder->tnt.payload <<= 1;
+			decoder->state.from_ip = decoder->ip;
+			decoder->ip = decoder->ret_addr;
+			decoder->state.to_ip = decoder->ip;
+			return 0;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_INDIRECT) {
+			/* Handle deferred TIPs */
+			err = intel_pt_get_next_packet(decoder);
+			if (err)
+				return err;
+			if (decoder->packet.type != INTEL_PT_TIP ||
+			    decoder->packet.count == 0) {
+				intel_pt_log_at("ERROR: Missing deferred TIP for indirect branch",
+						decoder->ip);
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+				decoder->pkt_step = 0;
+				return -ENOENT;
+			}
+			intel_pt_set_last_ip(decoder);
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = decoder->last_ip;
+			decoder->ip = decoder->last_ip;
+			return 0;
+		}
+
+		if (intel_pt_insn.branch == INTEL_PT_BR_CONDITIONAL) {
+			decoder->tnt.count -= 1;
+			if (!decoder->tnt.count)
+				decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+			if (decoder->tnt.payload & BIT63) {
+				decoder->tnt.payload <<= 1;
+				decoder->state.from_ip = decoder->ip;
+				decoder->ip += intel_pt_insn.length +
+					       intel_pt_insn.rel;
+				decoder->state.to_ip = decoder->ip;
+				return 0;
+			}
+			/* Instruction sample for a non-taken branch */
+			if (decoder->state.type & INTEL_PT_INSTRUCTION) {
+				decoder->tnt.payload <<= 1;
+				decoder->state.type = INTEL_PT_INSTRUCTION;
+				decoder->state.from_ip = decoder->ip;
+				decoder->state.to_ip = 0;
+				decoder->ip += intel_pt_insn.length;
+				return 0;
+			}
+			decoder->ip += intel_pt_insn.length;
+			if (!decoder->tnt.count)
+				return -EAGAIN;
+			decoder->tnt.payload <<= 1;
+			continue;
+		}
+
+		return intel_pt_bug(decoder);
+	}
+}
+
+static void intel_pt_calc_tsc_timestamp(struct intel_pt_decoder *decoder)
+{
+	uint64_t timestamp;
+
+	if (decoder->ref_timestamp) {
+		timestamp = decoder->packet.payload |
+			    (decoder->ref_timestamp & (0xffULL << 56));
+		if (timestamp < decoder->ref_timestamp) {
+			if (decoder->ref_timestamp - timestamp > (1ULL << 55))
+				timestamp += (1ULL << 56);
+		} else {
+			if (timestamp - decoder->ref_timestamp > (1ULL << 55))
+				timestamp -= (1ULL << 56);
+		}
+		decoder->tsc_timestamp = timestamp;
+		decoder->timestamp = timestamp;
+		decoder->ref_timestamp = 0;
+		decoder->timestamp_insn_cnt = 0;
+	} else if (decoder->timestamp) {
+		timestamp = decoder->packet.payload |
+			    (decoder->timestamp & (0xffULL << 56));
+		if (timestamp < decoder->timestamp &&
+		    decoder->timestamp - timestamp < 0x100) {
+			intel_pt_log_to("ERROR: Suppressing backwards timestamp",
+					timestamp);
+			timestamp = decoder->timestamp;
+		}
+		while (timestamp < decoder->timestamp) {
+			intel_pt_log_to("Wraparound timestamp", timestamp);
+			timestamp += (1ULL << 56);
+		}
+		decoder->tsc_timestamp = timestamp;
+		decoder->timestamp = timestamp;
+		decoder->timestamp_insn_cnt = 0;
+	}
+
+	intel_pt_log_to("Setting timestamp", decoder->timestamp);
+}
+
+static int intel_pt_overflow(struct intel_pt_decoder *decoder)
+{
+	intel_pt_log("ERROR: Buffer overflow\n");
+	intel_pt_clear_tx_flags(decoder);
+	decoder->pkt_state = INTEL_PT_STATE_ERR_RESYNC;
+	decoder->overflow = true;
+	return -EOVERFLOW;
+}
+
+/* Walk PSB+ packets when already in sync. */
+static int intel_pt_walk_psbend(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_PSBEND:
+			return 0;
+
+		case INTEL_PT_TIP_PGD:
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+		case INTEL_PT_TNT:
+		case INTEL_PT_BAD:
+		case INTEL_PT_PSB:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			return -EAGAIN;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_FUP:
+			decoder->pge = true;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_walk_fup_tip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TNT:
+		case INTEL_PT_FUP:
+		case INTEL_PT_PSB:
+		case INTEL_PT_TSC:
+		case INTEL_PT_CBR:
+		case INTEL_PT_MODE_TSX:
+		case INTEL_PT_BAD:
+		case INTEL_PT_PSBEND:
+			intel_pt_log("ERROR: Missing TIP after FUP");
+			decoder->pkt_state = INTEL_PT_STATE_ERR3;
+			return -ENOENT;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TIP_PGD:
+			decoder->state.from_ip = decoder->ip;
+			decoder->state.to_ip = 0;
+			if (decoder->packet.count != 0) {
+				intel_pt_set_ip(decoder);
+				intel_pt_log("Omitting PGD ip " x64_fmt "\n",
+					     decoder->ip);
+			}
+			decoder->pge = false;
+			decoder->continuous_period = false;
+			return 0;
+
+		case INTEL_PT_TIP_PGE:
+			decoder->pge = true;
+			intel_pt_log("Omitting PGE ip " x64_fmt "\n",
+				     decoder->ip);
+			decoder->state.from_ip = 0;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				intel_pt_set_ip(decoder);
+				decoder->state.to_ip = decoder->ip;
+			}
+			return 0;
+
+		case INTEL_PT_TIP:
+			decoder->state.from_ip = decoder->ip;
+			if (decoder->packet.count == 0) {
+				decoder->state.to_ip = 0;
+			} else {
+				intel_pt_set_ip(decoder);
+				decoder->state.to_ip = decoder->ip;
+			}
+			return 0;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PAD:
+			break;
+
+		default:
+			return intel_pt_bug(decoder);
+		}
+	}
+}
+
+static int intel_pt_walk_trace(struct intel_pt_decoder *decoder)
+{
+	bool no_tip = false;
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+next:
+		switch (decoder->packet.type) {
+		case INTEL_PT_TNT:
+			if (!decoder->packet.count)
+				break;
+			decoder->tnt = decoder->packet;
+			decoder->pkt_state = INTEL_PT_STATE_TNT;
+			err = intel_pt_walk_tnt(decoder);
+			if (err == -EAGAIN)
+				break;
+			return err;
+
+		case INTEL_PT_TIP_PGD:
+			if (decoder->packet.count != 0)
+				intel_pt_set_last_ip(decoder);
+			decoder->pkt_state = INTEL_PT_STATE_TIP_PGD;
+			return intel_pt_walk_tip(decoder);
+
+		case INTEL_PT_TIP_PGE: {
+			decoder->pge = true;
+			if (decoder->packet.count == 0) {
+				intel_pt_log_at("Skipping zero TIP.PGE",
+						decoder->pos);
+				break;
+			}
+			intel_pt_set_ip(decoder);
+			decoder->state.from_ip = 0;
+			decoder->state.to_ip = decoder->ip;
+			return 0;
+		}
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_TIP:
+			if (decoder->packet.count != 0)
+				intel_pt_set_last_ip(decoder);
+			decoder->pkt_state = INTEL_PT_STATE_TIP;
+			return intel_pt_walk_tip(decoder);
+
+		case INTEL_PT_FUP:
+			if (decoder->packet.count == 0) {
+				intel_pt_log_at("Skipping zero FUP",
+						decoder->pos);
+				no_tip = false;
+				break;
+			}
+			intel_pt_set_last_ip(decoder);
+			err = intel_pt_walk_fup(decoder);
+			if (err != -EAGAIN) {
+				if (err)
+					return err;
+				if (no_tip)
+					decoder->pkt_state =
+						INTEL_PT_STATE_FUP_NO_TIP;
+				else
+					decoder->pkt_state = INTEL_PT_STATE_FUP;
+				return 0;
+			}
+			if (no_tip) {
+				no_tip = false;
+				break;
+			}
+			return intel_pt_walk_fup_tip(decoder);
+
+		case INTEL_PT_PSB:
+			intel_pt_clear_stack(&decoder->stack);
+			err = intel_pt_walk_psbend(decoder);
+			if (err == -EAGAIN)
+				goto next;
+			if (err)
+				return err;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_tx_flags(decoder);
+			/* MODE_TSX need not be followed by FUP */
+			if (!decoder->pge)
+				break;
+			err = intel_pt_get_next_packet(decoder);
+			if (err)
+				return err;
+			if (decoder->packet.type == INTEL_PT_FUP) {
+				if (!(decoder->state.flags & INTEL_PT_ABORT_TX))
+					no_tip = true;
+			} else {
+				intel_pt_log_at("ERROR: Missing FUP after MODE.TSX",
+						decoder->pos);
+			}
+			goto next;
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_PSBEND:
+		case INTEL_PT_PAD:
+			break;
+
+		default:
+			return intel_pt_bug(decoder);
+		}
+	}
+}
+
+/* Walk PSB+ packets to get in sync. */
+static int intel_pt_walk_psb(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TIP_PGD:
+			decoder->continuous_period = false;
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			return -ENOENT;
+
+		case INTEL_PT_FUP:
+			decoder->pge = true;
+			if (decoder->last_ip || decoder->packet.count == 6 ||
+			    decoder->packet.count == 0) {
+				uint64_t current_ip = decoder->ip;
+
+				intel_pt_set_ip(decoder);
+				if (current_ip)
+					intel_pt_log_to("Setting IP",
+							decoder->ip);
+			}
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_in_tx(decoder);
+			break;
+
+		case INTEL_PT_TNT:
+			intel_pt_log("ERROR: Unexpected packet\n");
+			if (decoder->ip)
+				decoder->pkt_state = INTEL_PT_STATE_ERR4;
+			else
+				decoder->pkt_state = INTEL_PT_STATE_ERR3;
+			return -ENOENT;
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_PSBEND:
+			return 0;
+
+		case INTEL_PT_PSB:
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_walk_to_ip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	while (1) {
+		err = intel_pt_get_next_packet(decoder);
+		if (err)
+			return err;
+
+		switch (decoder->packet.type) {
+		case INTEL_PT_TIP_PGD:
+			decoder->continuous_period = false;
+		case INTEL_PT_TIP_PGE:
+		case INTEL_PT_TIP:
+			decoder->pge = decoder->packet.type != INTEL_PT_TIP_PGD;
+			if (decoder->last_ip || decoder->packet.count == 6 ||
+			    decoder->packet.count == 0)
+				intel_pt_set_ip(decoder);
+			if (decoder->ip)
+				return 0;
+			break;
+
+		case INTEL_PT_FUP:
+			if (decoder->overflow) {
+				if (decoder->last_ip ||
+				    decoder->packet.count == 6 ||
+				    decoder->packet.count == 0)
+					intel_pt_set_ip(decoder);
+				if (decoder->ip)
+					return 0;
+			}
+			if (decoder->packet.count)
+				intel_pt_set_last_ip(decoder);
+			break;
+
+		case INTEL_PT_TSC:
+			intel_pt_calc_tsc_timestamp(decoder);
+			break;
+
+		case INTEL_PT_CBR:
+			decoder->cbr = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_PIP:
+			decoder->cr3 = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_EXEC:
+			decoder->exec_mode = decoder->packet.payload;
+			break;
+
+		case INTEL_PT_MODE_TSX:
+			intel_pt_update_tx_flags(decoder);
+			break;
+
+		case INTEL_PT_OVF:
+			return intel_pt_overflow(decoder);
+
+		case INTEL_PT_BAD: /* Does not happen */
+			return intel_pt_bug(decoder);
+
+		case INTEL_PT_PSB:
+			err = intel_pt_walk_psb(decoder);
+			if (err)
+				return err;
+			if (decoder->ip) {
+				/* Do not have a sample */
+				decoder->state.type = 0;
+				return 0;
+			}
+			break;
+
+		case INTEL_PT_TNT:
+		case INTEL_PT_PSBEND:
+		case INTEL_PT_PAD:
+		default:
+			break;
+		}
+	}
+}
+
+static int intel_pt_sync_ip(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	intel_pt_log("Scanning for full IP\n");
+	err = intel_pt_walk_to_ip(decoder);
+	if (err)
+		return err;
+
+	decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+	decoder->overflow = false;
+
+	decoder->state.from_ip = 0;
+	decoder->state.to_ip = decoder->ip;
+	intel_pt_log_to("Setting IP", decoder->ip);
+
+	return 0;
+}
+
+static int intel_pt_part_psb(struct intel_pt_decoder *decoder)
+{
+	const unsigned char *end = decoder->buf + decoder->len;
+	size_t i;
+
+	for (i = INTEL_PT_PSB_LEN - 1; i; i--) {
+		if (i > decoder->len)
+			continue;
+		if (!memcmp(end - i, INTEL_PT_PSB_STR, i))
+			return i;
+	}
+	return 0;
+}
+
+static int intel_pt_rest_psb(struct intel_pt_decoder *decoder, int part_psb)
+{
+	size_t rest_psb = INTEL_PT_PSB_LEN - part_psb;
+	const char *psb = INTEL_PT_PSB_STR;
+
+	if (rest_psb > decoder->len ||
+	    memcmp(decoder->buf, psb + part_psb, rest_psb))
+		return 0;
+
+	return rest_psb;
+}
+
+static int intel_pt_get_split_psb(struct intel_pt_decoder *decoder,
+				  int part_psb)
+{
+	int rest_psb, ret;
+
+	decoder->pos += decoder->len;
+	decoder->len = 0;
+
+	ret = intel_pt_get_next_data(decoder);
+	if (ret)
+		return ret;
+
+	rest_psb = intel_pt_rest_psb(decoder, part_psb);
+	if (!rest_psb)
+		return 0;
+
+	decoder->pos -= part_psb;
+	decoder->next_buf = decoder->buf + rest_psb;
+	decoder->next_len = decoder->len - rest_psb;
+	memcpy(decoder->temp_buf, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	decoder->buf = decoder->temp_buf;
+	decoder->len = INTEL_PT_PSB_LEN;
+
+	return 0;
+}
+
+static int intel_pt_scan_for_psb(struct intel_pt_decoder *decoder)
+{
+	unsigned char *next;
+	int ret;
+
+	intel_pt_log("Scanning for PSB\n");
+	while (1) {
+		if (!decoder->len) {
+			ret = intel_pt_get_next_data(decoder);
+			if (ret)
+				return ret;
+		}
+
+		next = memmem(decoder->buf, decoder->len, INTEL_PT_PSB_STR,
+			      INTEL_PT_PSB_LEN);
+		if (!next) {
+			int part_psb;
+
+			part_psb = intel_pt_part_psb(decoder);
+			if (part_psb) {
+				ret = intel_pt_get_split_psb(decoder, part_psb);
+				if (ret)
+					return ret;
+			} else {
+				decoder->pos += decoder->len;
+				decoder->len = 0;
+			}
+			continue;
+		}
+
+		decoder->pkt_step = next - decoder->buf;
+		return intel_pt_get_next_packet(decoder);
+	}
+}
+
+static int intel_pt_sync(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	decoder->pge = false;
+	decoder->continuous_period = false;
+	decoder->last_ip = 0;
+	decoder->ip = 0;
+	intel_pt_clear_stack(&decoder->stack);
+
+	err = intel_pt_scan_for_psb(decoder);
+	if (err)
+		return err;
+
+	decoder->pkt_state = INTEL_PT_STATE_NO_IP;
+
+	err = intel_pt_walk_psb(decoder);
+	if (err)
+		return err;
+
+	if (decoder->ip) {
+		decoder->state.type = 0; /* Do not have a sample */
+		decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+	} else {
+		return intel_pt_sync_ip(decoder);
+	}
+
+	return 0;
+}
+
+const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder)
+{
+	int err;
+
+	decoder->state.type = INTEL_PT_BRANCH;
+	intel_pt_clear_tx_abort(decoder);
+
+	switch (decoder->pkt_state) {
+	case INTEL_PT_STATE_NO_PSB:
+		err = intel_pt_sync(decoder);
+		break;
+	case INTEL_PT_STATE_NO_IP:
+		decoder->last_ip = 0;
+		/* Fall through */
+	case INTEL_PT_STATE_ERR_RESYNC:
+		err = intel_pt_sync_ip(decoder);
+		break;
+	case INTEL_PT_STATE_IN_SYNC:
+		err = intel_pt_walk_trace(decoder);
+		break;
+	case INTEL_PT_STATE_TNT:
+		err = intel_pt_walk_tnt(decoder);
+		if (err == -EAGAIN)
+			err = intel_pt_walk_trace(decoder);
+		break;
+	case INTEL_PT_STATE_TIP:
+	case INTEL_PT_STATE_TIP_PGD:
+		err = intel_pt_walk_tip(decoder);
+		break;
+	case INTEL_PT_STATE_FUP:
+		decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+		err = intel_pt_walk_fup(decoder);
+		if (err == -EAGAIN)
+			err = intel_pt_walk_fup_tip(decoder);
+		else if (!err)
+			decoder->pkt_state = INTEL_PT_STATE_FUP;
+		break;
+	case INTEL_PT_STATE_FUP_NO_TIP:
+		decoder->pkt_state = INTEL_PT_STATE_IN_SYNC;
+		err = intel_pt_walk_fup(decoder);
+		if (err == -EAGAIN)
+			err = intel_pt_walk_trace(decoder);
+		break;
+	default:
+		err = intel_pt_bug(decoder);
+		break;
+	}
+
+	if (err == -ENOLINK)
+		return intel_pt_decode(decoder);
+
+	decoder->state.err = err;
+	decoder->state.timestamp = decoder->timestamp;
+	decoder->state.cr3 = decoder->cr3;
+
+	if (err)
+		decoder->state.from_ip = decoder->ip;
+
+	return &decoder->state;
+}
+
+static bool intel_pt_at_psb(unsigned char *buf, size_t len)
+{
+	if (len < INTEL_PT_PSB_LEN)
+		return false;
+	return memmem(buf, INTEL_PT_PSB_LEN, INTEL_PT_PSB_STR,
+		      INTEL_PT_PSB_LEN);
+}
+
+/**
+ * intel_pt_next_psb - move buffer pointer to the start of the next PSB packet.
+ * @buf: pointer to buffer pointer
+ * @len: size of buffer
+ *
+ * Updates the buffer pointer to point to the start of the next PSB packet if
+ * there is one, otherwise the buffer pointer is unchanged.  If @buf is updated,
+ * @len is adjusted accordingly.
+ *
+ * Return: %true if a PSB packet is found, %false otherwise.
+ */
+static bool intel_pt_next_psb(unsigned char **buf, size_t *len)
+{
+	unsigned char *next;
+
+	next = memmem(*buf, *len, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	if (next) {
+		*len -= next - *buf;
+		*buf = next;
+		return true;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_step_psb - move buffer pointer to the start of the following PSB
+ *                     packet.
+ * @buf: pointer to buffer pointer
+ * @len: size of buffer
+ *
+ * Updates the buffer pointer to point to the start of the following PSB packet
+ * (skipping the PSB at @buf itself) if there is one, otherwise the buffer
+ * pointer is unchanged.  If @buf is updated, @len is adjusted accordingly.
+ *
+ * Return: %true if a PSB packet is found, %false otherwise.
+ */
+static bool intel_pt_step_psb(unsigned char **buf, size_t *len)
+{
+	unsigned char *next;
+
+	if (!*len)
+		return false;
+
+	next = memmem(*buf + 1, *len - 1, INTEL_PT_PSB_STR, INTEL_PT_PSB_LEN);
+	if (next) {
+		*len -= next - *buf;
+		*buf = next;
+		return true;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_last_psb - find the last PSB packet in a buffer.
+ * @buf: buffer
+ * @len: size of buffer
+ *
+ * This function finds the last PSB in a buffer.
+ *
+ * Return: A pointer to the last PSB in @buf if found, %NULL otherwise.
+ */
+static unsigned char *intel_pt_last_psb(unsigned char *buf, size_t len)
+{
+	const char *n = INTEL_PT_PSB_STR;
+	unsigned char *p;
+	size_t k;
+
+	if (len < INTEL_PT_PSB_LEN)
+		return NULL;
+
+	k = len - INTEL_PT_PSB_LEN + 1;
+	while (1) {
+		p = memrchr(buf, n[0], k);
+		if (!p)
+			return NULL;
+		if (!memcmp(p + 1, n + 1, INTEL_PT_PSB_LEN - 1))
+			return p;
+		k = p - buf;
+		if (!k)
+			return NULL;
+	}
+}
+
+/**
+ * intel_pt_next_tsc - find and return next TSC.
+ * @buf: buffer
+ * @len: size of buffer
+ * @tsc: TSC value returned
+ *
+ * Find a TSC packet in @buf and return the TSC value.  This function assumes
+ * that @buf starts at a PSB and that PSB+ will contain TSC and so stops if a
+ * PSBEND packet is found.
+ *
+ * Return: %true if TSC is found, false otherwise.
+ */
+static bool intel_pt_next_tsc(unsigned char *buf, size_t len, uint64_t *tsc)
+{
+	struct intel_pt_pkt packet;
+	int ret;
+
+	while (len) {
+		ret = intel_pt_get_packet(buf, len, &packet);
+		if (ret <= 0)
+			return false;
+		if (packet.type == INTEL_PT_TSC) {
+			*tsc = packet.payload;
+			return true;
+		}
+		if (packet.type == INTEL_PT_PSBEND)
+			return false;
+		buf += ret;
+		len -= ret;
+	}
+	return false;
+}
+
+/**
+ * intel_pt_tsc_cmp - compare 7-byte TSCs.
+ * @tsc1: first TSC to compare
+ * @tsc2: second TSC to compare
+ *
+ * This function compares 7-byte TSC values allowing for the possibility that
+ * TSC wrapped around.  Generally it is not possible to know if TSC has wrapped
+ * around so for that purpose this function assumes the absolute difference is
+ * less than half the maximum difference.
+ *
+ * Return: %-1 if @tsc1 is before @tsc2, %0 if @tsc1 == @tsc2, %1 if @tsc1 is
+ * after @tsc2.
+ */
+static int intel_pt_tsc_cmp(uint64_t tsc1, uint64_t tsc2)
+{
+	const uint64_t halfway = (1ULL << 55);
+
+	if (tsc1 == tsc2)
+		return 0;
+
+	if (tsc1 < tsc2) {
+		if (tsc2 - tsc1 < halfway)
+			return -1;
+		else
+			return 1;
+	} else {
+		if (tsc1 - tsc2 < halfway)
+			return 1;
+		else
+			return -1;
+	}
+}
+
+/**
+ * intel_pt_find_overlap_tsc - determine start of non-overlapped trace data
+ *                             using TSC.
+ * @buf_a: first buffer
+ * @len_a: size of first buffer
+ * @buf_b: second buffer
+ * @len_b: size of second buffer
+ *
+ * If the trace contains TSC we can look at the last TSC of @buf_a and the
+ * first TSC of @buf_b in order to determine if the buffers overlap, and then
+ * walk forward in @buf_b until a later TSC is found.  A precondition is that
+ * @buf_a and @buf_b are positioned at a PSB.
+ *
+ * Return: A pointer into @buf_b from where non-overlapped data starts, or
+ * @buf_b + @len_b if there is no non-overlapped data.
+ */
+static unsigned char *intel_pt_find_overlap_tsc(unsigned char *buf_a,
+						size_t len_a,
+						unsigned char *buf_b,
+						size_t len_b)
+{
+	uint64_t tsc_a, tsc_b;
+	unsigned char *p;
+	size_t len;
+
+	p = intel_pt_last_psb(buf_a, len_a);
+	if (!p)
+		return buf_b; /* No PSB in buf_a => no overlap */
+
+	len = len_a - (p - buf_a);
+	if (!intel_pt_next_tsc(p, len, &tsc_a)) {
+		/* The last PSB+ in buf_a is incomplete, so go back one more */
+		len_a -= len;
+		p = intel_pt_last_psb(buf_a, len_a);
+		if (!p)
+			return buf_b; /* No full PSB+ => assume no overlap */
+		len = len_a - (p - buf_a);
+		if (!intel_pt_next_tsc(p, len, &tsc_a))
+			return buf_b; /* No TSC in buf_a => assume no overlap */
+	}
+
+	while (1) {
+		/* Ignore PSB+ with no TSC */
+		if (intel_pt_next_tsc(buf_b, len_b, &tsc_b) &&
+		    intel_pt_tsc_cmp(tsc_a, tsc_b) < 0)
+			return buf_b; /* tsc_a < tsc_b => no overlap */
+
+		if (!intel_pt_step_psb(&buf_b, &len_b))
+			return buf_b + len_b; /* No PSB in buf_b => no data */
+	}
+}
+
+/**
+ * intel_pt_find_overlap - determine start of non-overlapped trace data.
+ * @buf_a: first buffer
+ * @len_a: size of first buffer
+ * @buf_b: second buffer
+ * @len_b: size of second buffer
+ * @have_tsc: can use TSC packets to detect overlap
+ *
+ * When trace samples or snapshots are recorded there is the possibility that
+ * the data overlaps.  Note that, for the purposes of decoding, data is only
+ * useful if it begins with a PSB packet.  Note also that a precondition is that
+ * @buf_a starts with a PSB packet.
+ *
+ * Return: A pointer into @buf_b from where non-overlapped data starts, or
+ * @buf_b + @len_b if there is no non-overlapped data.
+ */
+unsigned char *intel_pt_find_overlap(unsigned char *buf_a, size_t len_a,
+				     unsigned char *buf_b, size_t len_b,
+				     bool have_tsc)
+{
+	unsigned char *found;
+
+	/* Buffer 'b' must start at PSB so throw away everything before that */
+	if (!intel_pt_next_psb(&buf_b, &len_b))
+		return buf_b + len_b; /* No PSB */
+
+	if (have_tsc) {
+		found = intel_pt_find_overlap_tsc(buf_a, len_a, buf_b, len_b);
+		if (found)
+			return found;
+	}
+
+	/*
+	 * Buffer 'b' cannot end within buffer 'a' so, for comparison purposes,
+	 * we can ignore the first part of buffer 'a'.
+	 */
+	while (len_b < len_a) {
+		if (!intel_pt_step_psb(&buf_a, &len_a))
+			return buf_b; /* No overlap */
+	}
+
+	/* Now len_b >= len_a */
+	if (len_b > len_a) {
+		/* The leftover buffer 'b' must start at a PSB */
+		while (!intel_pt_at_psb(buf_b + len_a, len_b - len_a)) {
+			if (!intel_pt_step_psb(&buf_a, &len_a))
+				return buf_b; /* No overlap */
+		}
+	}
+
+	while (1) {
+		/* Potential overlap so check the bytes */
+		found = memmem(buf_a, len_a, buf_b, len_a);
+		if (found)
+			return buf_b + len_a;
+
+		/* Try again at next PSB in buffer 'a' */
+		if (!intel_pt_step_psb(&buf_a, &len_a))
+			return buf_b; /* No overlap */
+
+		/* The leftover buffer 'b' must start at a PSB */
+		while (!intel_pt_at_psb(buf_b + len_a, len_b - len_a)) {
+			if (!intel_pt_step_psb(&buf_a, &len_a))
+				return buf_b; /* No overlap */
+		}
+	}
+}
diff --git a/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
new file mode 100644
index 0000000..dfda0f8
--- /dev/null
+++ b/tools/perf/util/intel-pt-decoder/intel-pt-decoder.h
@@ -0,0 +1,83 @@
+/*
+ * intel_pt_decoder.h: Intel Processor Trace support
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#ifndef INCLUDE__INTEL_PT_DECODER_H__
+#define INCLUDE__INTEL_PT_DECODER_H__
+
+#include <stdint.h>
+#include <stddef.h>
+#include <stdbool.h>
+
+#define INTEL_PT_IN_TX		(1 << 0)
+#define INTEL_PT_ABORT_TX	(1 << 1)
+
+enum intel_pt_sample_type {
+	INTEL_PT_BRANCH		= 1 << 0,
+	INTEL_PT_INSTRUCTION	= 1 << 1,
+};
+
+enum intel_pt_period_type {
+	INTEL_PT_PERIOD_NONE,
+	INTEL_PT_PERIOD_INSTRUCTIONS,
+	INTEL_PT_PERIOD_TICKS,
+};
+
+struct intel_pt_state {
+	enum intel_pt_sample_type type;
+	int err;
+	uint64_t from_ip;
+	uint64_t to_ip;
+	uint64_t cr3;
+	uint64_t timestamp;
+	uint32_t flags;
+};
+
+struct intel_pt_insn;
+
+struct intel_pt_buffer {
+	const unsigned char *buf;
+	size_t len;
+	bool consecutive;
+	uint64_t ref_timestamp;
+};
+
+struct intel_pt_params {
+	int (*get_trace)(struct intel_pt_buffer *buffer, void *data);
+	int (*get_insn)(struct intel_pt_insn *intel_pt_insn, uint64_t ip,
+			uint64_t cr3, void *data);
+	void *data;
+	bool return_compression;
+	uint64_t period;
+	enum intel_pt_period_type period_type;
+};
+
+struct intel_pt_decoder;
+
+struct intel_pt_decoder *intel_pt_decoder_new(struct intel_pt_params *params);
+void intel_pt_decoder_free(struct intel_pt_decoder *decoder);
+
+const struct intel_pt_state *intel_pt_decode(struct intel_pt_decoder *decoder);
+
+unsigned char *intel_pt_find_overlap(unsigned char *buf_a, size_t len_a,
+				     unsigned char *buf_b, size_t len_b,
+				     bool have_tsc);
+
+const char *intel_pt_error_message(int code);
+
+#endif
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 70/71] perf tools: Add Intel PT support
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (68 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 69/71] perf tools: Add Intel PT decoder Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 12:37 ` [PATCH v0 71/71] perf tools: Take Intel PT into use Alexander Shishkin
                   ` (2 subsequent siblings)
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

Add support for Intel Processor Trace.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Makefile.perf   |    2 +
 tools/perf/util/intel-pt.c | 2193 ++++++++++++++++++++++++++++++++++++++++++++
 tools/perf/util/intel-pt.h |   40 +
 3 files changed, 2235 insertions(+)
 create mode 100644 tools/perf/util/intel-pt.c
 create mode 100644 tools/perf/util/intel-pt.h

diff --git a/tools/perf/Makefile.perf b/tools/perf/Makefile.perf
index 41f8a97..8ed9434 100644
--- a/tools/perf/Makefile.perf
+++ b/tools/perf/Makefile.perf
@@ -295,6 +295,7 @@ LIB_H += util/unwind.h
 LIB_H += util/vdso.h
 LIB_H += util/tsc.h
 LIB_H += util/itrace.h
+LIB_H += util/intel-pt.h
 LIB_H += util/intel-pt-decoder/intel-pt-pkt-decoder.h
 LIB_H += util/intel-pt-decoder/intel-pt-insn-decoder.h
 LIB_H += util/intel-pt-decoder/intel-pt-log.h
@@ -379,6 +380,7 @@ LIB_OBJS += $(OUTPUT)util/srcline.o
 LIB_OBJS += $(OUTPUT)util/data.o
 LIB_OBJS += $(OUTPUT)util/tsc.o
 LIB_OBJS += $(OUTPUT)util/itrace.o
+LIB_OBJS += $(OUTPUT)util/intel-pt.o
 LIB_OBJS += $(OUTPUT)util/intel-pt-decoder/intel-pt-pkt-decoder.o
 LIB_OBJS += $(OUTPUT)util/intel-pt-decoder/intel-pt-insn-decoder.o
 LIB_OBJS += $(OUTPUT)util/intel-pt-decoder/intel-pt-log.o
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
new file mode 100644
index 0000000..3223e40
--- /dev/null
+++ b/tools/perf/util/intel-pt.c
@@ -0,0 +1,2193 @@
+/*
+ * intel_pt.c: Intel Processor Trace support
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#include <stdio.h>
+#include <stdint.h>
+#include <stdbool.h>
+#include <errno.h>
+#include <linux/kernel.h>
+
+#include "../perf.h"
+#include "session.h"
+#include "machine.h"
+#include "tool.h"
+#include "event.h"
+#include "evlist.h"
+#include "evsel.h"
+#include "map.h"
+#include "cpumap.h"
+#include "types.h"
+#include "color.h"
+#include "util.h"
+#include "thread.h"
+#include "symbol.h"
+#include "parse-options.h"
+#include "parse-events.h"
+#include "pmu.h"
+#include "dso.h"
+#include "debug.h"
+#include "itrace.h"
+#include "tsc.h"
+#include "intel-pt.h"
+
+#include "intel-pt-decoder/intel-pt-log.h"
+#include "intel-pt-decoder/intel-pt-decoder.h"
+#include "intel-pt-decoder/intel-pt-insn-decoder.h"
+#include "intel-pt-decoder/intel-pt-pkt-decoder.h"
+
+#define MAX_TIMESTAMP (~0ULL)
+
+#define KiB(x) ((x) * 1024)
+#define MiB(x) ((x) * 1024 * 1024)
+#define KiB_MASK(x) (KiB(x) - 1)
+#define MiB_MASK(x) (MiB(x) - 1)
+
+#define INTEL_PT_DEFAULT_SAMPLE_SIZE	KiB(4)
+
+#define INTEL_PT_MAX_SAMPLE_SIZE	KiB(60)
+
+#define INTEL_PT_PSB_PERIOD_NEAR	256
+
+struct intel_pt_snapshot_ref {
+	void *ref_buf;
+	size_t ref_offset;
+	bool wrapped;
+};
+
+struct intel_pt_recording {
+	struct itrace_record		itr;
+	struct perf_pmu			*intel_pt_pmu;
+	int				have_sched_switch;
+	struct perf_evlist		*evlist;
+	bool				snapshot_mode;
+	bool				snapshot_init_done;
+	size_t				snapshot_size;
+	size_t				snapshot_ref_buf_size;
+	int				snapshot_ref_cnt;
+	struct intel_pt_snapshot_ref	*snapshot_refs;
+};
+
+struct intel_pt {
+	struct itrace itrace;
+	struct itrace_queues queues;
+	struct itrace_heap heap;
+	u32 itrace_type;
+	struct perf_session *session;
+	struct machine *machine;
+	struct perf_evsel *switch_evsel;
+	bool timeless_decoding;
+	bool sampling_mode;
+	bool snapshot_mode;
+	bool per_cpu_mmaps;
+	bool have_tsc;
+	bool data_queued;
+	int have_sched_switch;
+	u32 pmu_type;
+
+	struct perf_tsc_conversion tc;
+	bool cap_user_time_zero;
+
+	struct itrace_synth_opts synth_opts;
+
+	bool sample_instructions;
+	u64 instructions_sample_type;
+	u64 instructions_sample_period;
+	u64 instructions_id;
+	size_t instructions_event_size;
+
+	bool sample_branches;
+	u64 branches_sample_type;
+	u64 branches_id;
+	size_t branches_event_size;
+
+	bool synth_needs_swap;
+
+	u64 tsc_bit;
+	u64 noretcomp_bit;
+};
+
+struct intel_pt_queue {
+	struct intel_pt *pt;
+	unsigned int queue_nr;
+	struct itrace_buffer *buffer;
+	void *decoder;
+	const struct intel_pt_state *state;
+	bool on_heap;
+	bool stop;
+	bool step_through_buffers;
+	bool use_buffer_pid_tid;
+	pid_t pid, tid;
+	int cpu;
+	bool exclude_kernel;
+	bool have_sample;
+	u64 time;
+};
+
+static void intel_pt_dump(struct intel_pt *pt __maybe_unused,
+			  unsigned char *buf, size_t len)
+{
+	struct intel_pt_pkt packet;
+	size_t pos = 0;
+	int ret, pkt_len, i;
+	char desc[INTEL_PT_PKT_DESC_MAX];
+	const char *color = PERF_COLOR_BLUE;
+
+	color_fprintf(stdout, color,
+		      ". ... Intel Processor Trace data: size %d bytes\n",
+		      len);
+
+	while (len) {
+		ret = intel_pt_get_packet(buf, len, &packet);
+		if (ret > 0)
+			pkt_len = ret;
+		else
+			pkt_len = 1;
+		printf(".");
+		color_fprintf(stdout, color, "  %08x: ", pos);
+		for (i = 0; i < pkt_len; i++)
+			color_fprintf(stdout, color, " %02x", buf[i]);
+		for (; i < 16; i++)
+			color_fprintf(stdout, color, "   ");
+		if (ret > 0) {
+			ret = intel_pt_pkt_desc(&packet, desc,
+						INTEL_PT_PKT_DESC_MAX);
+			if (ret > 0)
+				color_fprintf(stdout, color, " %s\n", desc);
+		} else {
+			color_fprintf(stdout, color, " Bad packet!\n");
+		}
+		pos += pkt_len;
+		buf += pkt_len;
+		len -= pkt_len;
+	}
+}
+
+static void intel_pt_dump_event(struct intel_pt *pt, unsigned char *buf,
+				size_t len)
+{
+	printf(".\n");
+	intel_pt_dump(pt, buf, len);
+}
+
+static void intel_pt_dump_sample(struct perf_session *session,
+				 struct perf_sample *sample)
+{
+	struct intel_pt *pt = container_of(session->itrace, struct intel_pt,
+					   itrace);
+
+	intel_pt_dump(pt, sample->itrace_sample.data,
+		      sample->itrace_sample.size);
+	printf(".\n");
+}
+
+static int intel_pt_fix_overlap(struct intel_pt *pt, unsigned int queue_nr)
+{
+	struct itrace_queue *queue = &pt->queues.queue_array[queue_nr];
+	struct itrace_buffer *a, *b;
+	void *start;
+
+	b = list_entry(queue->head.prev, struct itrace_buffer, list);
+	if (b->list.prev == &queue->head)
+		return 0;
+	a = list_entry(b->list.prev, struct itrace_buffer, list);
+	start = intel_pt_find_overlap(a->data, a->size, b->data,
+				      b->size, pt->have_tsc);
+	if (!start)
+		return -EINVAL;
+	b->size -= start - b->data;
+	b->data = start;
+	return 0;
+}
+
+static void intel_pt_drop_data(struct itrace_buffer *buffer)
+{
+	itrace_buffer__put_data(buffer);
+	if (buffer->data_needs_freeing) {
+		buffer->data_needs_freeing = false;
+		free(buffer->data);
+		buffer->data = NULL;
+		buffer->size = 0;
+	}
+}
+
+/* This function assumes data is processed sequentially only */
+static int intel_pt_get_trace(struct intel_pt_buffer *b, void *data)
+{
+	struct intel_pt_queue *ptq = data;
+	struct itrace_buffer *buffer = ptq->buffer, *old_buffer = buffer;
+	struct itrace_queue *queue;
+
+	if (ptq->stop) {
+		b->len = 0;
+		return 0;
+	}
+
+	queue = &ptq->pt->queues.queue_array[ptq->queue_nr];
+
+	buffer = itrace_buffer__next(queue, buffer);
+	if (!buffer) {
+		if (old_buffer)
+			intel_pt_drop_data(old_buffer);
+		b->len = 0;
+		return 0;
+	}
+
+	ptq->buffer = buffer;
+
+	if (!buffer->data) {
+		int fd = perf_data_file__fd(ptq->pt->session->file);
+
+		buffer->data = itrace_buffer__get_data(buffer, fd);
+		if (!buffer->data)
+			return -ENOMEM;
+	}
+
+	if (ptq->pt->snapshot_mode && !buffer->consecutive &&
+	    intel_pt_fix_overlap(ptq->pt, ptq->queue_nr))
+		return -ENOMEM;
+
+	if (old_buffer)
+		intel_pt_drop_data(old_buffer);
+
+	b->len = buffer->size;
+	b->buf = buffer->data;
+	b->ref_timestamp = buffer->reference;
+
+	if (!old_buffer || ptq->pt->sampling_mode || (ptq->pt->snapshot_mode &&
+						      !buffer->consecutive))
+		b->consecutive = false;
+	else
+		b->consecutive = true;
+
+	if (ptq->use_buffer_pid_tid && (ptq->pid != buffer->pid ||
+					ptq->tid != buffer->tid)) {
+		if (queue->cpu == -1 && buffer->cpu != -1)
+			ptq->cpu = buffer->cpu;
+		ptq->pid = buffer->pid;
+		ptq->tid = buffer->tid;
+		intel_pt_log("queue %u cpu %d pid %d tid %d\n",
+			     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+	}
+
+	if (ptq->step_through_buffers)
+		ptq->stop = true;
+
+	if (!b->len)
+		return intel_pt_get_trace(b, data);
+
+	return 0;
+}
+
+static int intel_pt_get_next_insn(struct intel_pt_insn *intel_pt_insn,
+				  uint64_t ip, uint64_t cr3 __maybe_unused,
+				  void *data)
+{
+	struct intel_pt_queue *ptq = data;
+	struct machine *machine = ptq->pt->machine;
+	struct thread *thread;
+	struct addr_location al;
+	unsigned char buf[1024];
+	size_t bufsz;
+	ssize_t len;
+	int x86_64;
+	pid_t pid = ptq->pid;
+	uint8_t cpumode;
+
+	bufsz = intel_pt_insn_max_size();
+
+	/* Assume kernel addresses can be identified by "ip < 0" */
+	if ((int64_t)ip < 0)
+		cpumode = PERF_RECORD_MISC_KERNEL;
+	else
+		cpumode = PERF_RECORD_MISC_USER;
+
+	thread = machine__findnew_thread(machine, pid, pid);
+	if (!thread)
+		return -1;
+
+	thread__find_addr_map(thread, machine, cpumode, MAP__FUNCTION, ip, &al);
+	if (!al.map || !al.map->dso)
+		return -1;
+
+	len = dso__data_read_addr(al.map->dso, al.map, machine, ip, buf, bufsz);
+	if (len <= 0)
+		return -1;
+
+	x86_64 = al.map->dso->is_64_bit;
+
+	if (intel_pt_get_insn(buf, len, x86_64, intel_pt_insn))
+		return -1;
+
+	return 0;
+}
+
+static bool intel_pt_exclude_kernel(struct intel_pt *pt)
+{
+	struct perf_session *session = pt->session;
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		if ((evsel->attr.type == pt->pmu_type ||
+		     (evsel->attr.sample_type & PERF_SAMPLE_ITRACE)) &&
+		    !evsel->attr.exclude_kernel)
+			return false;
+	}
+	return true;
+}
+
+static bool intel_pt_return_compression(struct intel_pt *pt)
+{
+	struct perf_session *session = pt->session;
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+
+	if (!pt->noretcomp_bit)
+		return true;
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		if (evsel->attr.itrace_config & pt->noretcomp_bit)
+				return false;
+	}
+	return true;
+}
+
+static bool intel_pt_timeless_decoding(struct intel_pt *pt)
+{
+	struct perf_session *session = pt->session;
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+	bool timeless_decoding = true;
+
+	if (!pt->tsc_bit || !pt->cap_user_time_zero)
+		return true;
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		if (!(evsel->attr.sample_type & PERF_SAMPLE_TIME))
+			return true;
+		if (evsel->attr.type == pt->pmu_type ||
+		    (evsel->attr.sample_type & PERF_SAMPLE_ITRACE)) {
+			if (evsel->attr.itrace_config & pt->tsc_bit)
+				timeless_decoding = false;
+			else
+				return true;
+		}
+	}
+	return timeless_decoding;
+}
+
+static bool intel_pt_have_tsc(struct intel_pt *pt)
+{
+	struct perf_session *session = pt->session;
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+	bool have_tsc = false;
+
+	if (!pt->tsc_bit)
+		return false;
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		if (evsel->attr.type == pt->pmu_type ||
+		    (evsel->attr.sample_type & PERF_SAMPLE_ITRACE)) {
+			if (evsel->attr.itrace_config & pt->tsc_bit)
+				have_tsc = true;
+			else
+				return false;
+		}
+	}
+	return have_tsc;
+}
+
+static bool intel_pt_sampling_mode(struct intel_pt *pt)
+{
+	struct perf_session *session = pt->session;
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		if (evsel->attr.type == pt->pmu_type)
+			return false;
+		if (evsel->attr.sample_type & PERF_SAMPLE_ITRACE)
+			return true;
+	}
+	return false;
+}
+
+static u64 intel_pt_ns_to_ticks(const struct intel_pt *pt, u64 ns)
+{
+	u64 quot, rem;
+
+	quot = ns / pt->tc.time_mult;
+	rem  = ns % pt->tc.time_mult;
+	return (quot << pt->tc.time_shift) + (rem << pt->tc.time_shift) /
+		pt->tc.time_mult;
+}
+
+static struct intel_pt_queue *intel_pt_alloc_queue(struct intel_pt *pt,
+						   unsigned int queue_nr)
+{
+	struct intel_pt_params params = {0};
+	struct intel_pt_queue *ptq;
+
+	ptq = zalloc(sizeof(struct intel_pt_queue));
+	if (!ptq)
+		return NULL;
+
+	ptq->pt = pt;
+	ptq->queue_nr = queue_nr;
+	ptq->exclude_kernel = intel_pt_exclude_kernel(pt);
+	ptq->pid = -1;
+	ptq->tid = -1;
+	ptq->cpu = -1;
+
+	params.get_trace = intel_pt_get_trace;
+	params.get_insn = intel_pt_get_next_insn;
+	params.data = ptq;
+	params.return_compression = intel_pt_return_compression(pt);
+
+	if (pt->synth_opts.instructions) {
+		if (pt->synth_opts.period) {
+			switch (pt->synth_opts.period_type) {
+			case PERF_ITRACE_PERIOD_INSTRUCTIONS:
+				params.period_type =
+						INTEL_PT_PERIOD_INSTRUCTIONS;
+				params.period = pt->synth_opts.period;
+				break;
+			case PERF_ITRACE_PERIOD_TICKS:
+				params.period_type = INTEL_PT_PERIOD_TICKS;
+				params.period = pt->synth_opts.period;
+				break;
+			case PERF_ITRACE_PERIOD_NANOSECS:
+				params.period_type = INTEL_PT_PERIOD_TICKS;
+				params.period = intel_pt_ns_to_ticks(pt,
+							pt->synth_opts.period);
+				break;
+			default:
+				break;
+			}
+		}
+
+		if (!params.period) {
+			params.period_type = INTEL_PT_PERIOD_INSTRUCTIONS;
+			params.period = 1000;
+		}
+	}
+
+	ptq->decoder = intel_pt_decoder_new(&params);
+	if (!ptq->decoder) {
+		free(ptq);
+		return NULL;
+	}
+
+	return ptq;
+}
+
+static void intel_pt_free_queue(void *priv)
+{
+	struct intel_pt_queue *ptq = priv;
+
+	if (!ptq)
+		return;
+	intel_pt_decoder_free(ptq->decoder);
+	free(ptq);
+}
+
+static void intel_pt_set_pid_tid_cpu(struct intel_pt *pt,
+				     struct itrace_queue *queue)
+{
+	struct intel_pt_queue *ptq = queue->priv;
+
+	if (queue->cpu == -1) {
+		/* queue per-thread */
+		ptq->cpu = machine__get_thread_cpu(pt->machine, ptq->tid,
+						   &ptq->pid);
+	} else if (queue->tid != -1 && !pt->have_sched_switch) {
+		/* queue per-cpu workload only */
+		if (ptq->pid == -1)
+			ptq->pid = machine__get_thread_pid(pt->machine,
+							   ptq->tid);
+	} else {
+		/* queue per-cpu */
+		ptq->tid = machine__get_current_tid(pt->machine, ptq->cpu);
+		ptq->pid = machine__get_thread_pid(pt->machine, ptq->tid);
+	}
+}
+
+static int intel_pt_setup_queue(struct intel_pt *pt, struct itrace_queue *queue,
+				unsigned int queue_nr)
+{
+	struct intel_pt_queue *ptq = queue->priv;
+
+	if (list_empty(&queue->head))
+		return 0;
+
+	if (!ptq) {
+		ptq = intel_pt_alloc_queue(pt, queue_nr);
+		if (!ptq)
+			return -ENOMEM;
+		queue->priv = ptq;
+
+		if (queue->cpu != -1)
+			ptq->cpu = queue->cpu;
+		ptq->tid = queue->tid;
+
+		if (pt->sampling_mode) {
+			if (pt->timeless_decoding)
+				ptq->step_through_buffers = true;
+			if (pt->timeless_decoding || !pt->have_sched_switch)
+				ptq->use_buffer_pid_tid = true;
+		}
+	}
+
+	if (!ptq->on_heap) {
+		const struct intel_pt_state *state;
+		int ret;
+
+		if (pt->timeless_decoding)
+			return 0;
+
+		intel_pt_set_pid_tid_cpu(pt, queue);
+
+		intel_pt_log("queue %u getting timestamp\n", queue_nr);
+		intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
+			     queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+		while (1) {
+			state = intel_pt_decode(ptq->decoder);
+			if (state->err) {
+				if (state->err == -ENODATA) {
+					intel_pt_log("queue %u has no timestamp\n",
+						     queue_nr);
+					return 0;
+				}
+				continue;
+			}
+			if (state->timestamp)
+				break;
+		}
+
+		intel_pt_log("queue %u timestamp 0x%" PRIx64 "\n",
+			     queue_nr, state->timestamp);
+		ptq->state = state;
+		ptq->have_sample = true;
+		ret = itrace_heap__add(&pt->heap, queue_nr, state->timestamp);
+		if (ret)
+			return ret;
+		ptq->on_heap = true;
+	}
+
+	return 0;
+}
+
+static int intel_pt_setup_queues(struct intel_pt *pt)
+{
+	unsigned int i;
+	int ret;
+
+	for (i = 0; i < pt->queues.nr_queues; i++) {
+		ret = intel_pt_setup_queue(pt, &pt->queues.queue_array[i], i);
+		if (ret)
+			return ret;
+	}
+	return 0;
+}
+
+static int intel_pt_synth_branch_sample(struct intel_pt_queue *ptq,
+					struct perf_tool *tool)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event event;
+	struct perf_sample sample = {0};
+
+	event.sample.header.type = PERF_RECORD_SAMPLE;
+	event.sample.header.misc = PERF_RECORD_MISC_USER;
+	event.sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->state->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->branches_id;
+	sample.stream_id = ptq->pt->branches_id;
+	sample.period = 1;
+	sample.cpu = ptq->cpu;
+
+	if (pt->synth_opts.inject) {
+		event.sample.header.size = pt->branches_event_size;
+		ret = perf_event__synthesize_sample(&event,
+						    pt->branches_sample_type, 0,
+						    0, &sample,
+						    pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, &event, &sample,
+						tool);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver branch event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_instruction_sample(struct intel_pt_queue *ptq,
+					     struct perf_tool *tool)
+{
+	int ret;
+	struct intel_pt *pt = ptq->pt;
+	union perf_event event;
+	struct perf_sample sample = {0};
+
+	event.sample.header.type = PERF_RECORD_SAMPLE;
+	event.sample.header.misc = PERF_RECORD_MISC_USER;
+	event.sample.header.size = sizeof(struct perf_event_header);
+
+	if (!pt->timeless_decoding)
+		sample.time = tsc_to_perf_time(ptq->state->timestamp, &pt->tc);
+
+	sample.ip = ptq->state->from_ip;
+	sample.pid = ptq->pid;
+	sample.tid = ptq->tid;
+	sample.addr = ptq->state->to_ip;
+	sample.id = ptq->pt->instructions_id;
+	sample.stream_id = ptq->pt->instructions_id;
+	sample.period = ptq->pt->instructions_sample_period;
+	sample.cpu = ptq->cpu;
+
+	if (pt->synth_opts.inject) {
+		event.sample.header.size = pt->instructions_event_size;
+		ret = perf_event__synthesize_sample(&event,
+						pt->instructions_sample_type, 0,
+						0, &sample,
+						pt->synth_needs_swap);
+		if (ret)
+			return ret;
+	}
+
+	ret = perf_session__deliver_synth_event(pt->session, &event, &sample,
+						tool);
+	if (ret)
+		pr_err("Intel Processor Trace: failed to deliver instruction event, error %d\n",
+		       ret);
+
+	return ret;
+}
+
+static int intel_pt_synth_error(struct intel_pt *pt, struct perf_tool *tool,
+				int code, int cpu, pid_t pid, pid_t tid, u64 ip)
+{
+	union perf_event event;
+	const char *msg;
+	int err;
+
+	msg = intel_pt_error_message(code);
+
+	itrace_synth_error(&event.itrace_error, PERF_ITRACE_DECODER_ERROR, code,
+			   cpu, pid, tid, ip, msg);
+
+	err = perf_session__deliver_synth_event(pt->session, &event, NULL,
+						tool);
+	if (err)
+		pr_err("Intel Processor Trace: failed to deliver error event, error %d\n",
+		       err);
+
+	return err;
+}
+
+static int intel_pt_run_decoder(struct intel_pt_queue *ptq, u64 *timestamp,
+				struct perf_tool *tool)
+{
+	const struct intel_pt_state *state = ptq->state;
+	struct intel_pt *pt = ptq->pt;
+	int err;
+
+	intel_pt_log("queue %u decoding cpu %d pid %d tid %d\n",
+		     ptq->queue_nr, ptq->cpu, ptq->pid, ptq->tid);
+	while (1) {
+		if (ptq->have_sample) {
+			ptq->have_sample = false;
+
+			if (pt->sample_instructions &&
+			    (state->type & INTEL_PT_INSTRUCTION)) {
+				err = intel_pt_synth_instruction_sample(ptq,
+									tool);
+				if (err)
+					return err;
+			}
+
+			if (pt->sample_branches &&
+			    (state->type & INTEL_PT_BRANCH)) {
+				err = intel_pt_synth_branch_sample(ptq, tool);
+				if (err)
+					return err;
+			}
+		}
+
+		state = intel_pt_decode(ptq->decoder);
+		if (state->err) {
+			if (state->err == -ENODATA)
+				return 1;
+			if (pt->synth_opts.errors) {
+				err = intel_pt_synth_error(pt, tool,
+						-state->err, ptq->cpu, ptq->pid,
+						ptq->tid, state->from_ip);
+				if (err)
+					return err;
+			}
+			continue;
+		}
+
+		ptq->state = state;
+		ptq->have_sample = true;
+
+		if (!pt->timeless_decoding && state->timestamp >= *timestamp) {
+			*timestamp = state->timestamp;
+			return 0;
+		}
+	}
+	return 0;
+}
+
+static inline int intel_pt_update_queues(struct intel_pt *pt)
+{
+	if (pt->queues.new_data) {
+		pt->queues.new_data = false;
+		return intel_pt_setup_queues(pt);
+	}
+	return 0;
+}
+
+static int intel_pt_process_queues(struct intel_pt *pt, u64 timestamp,
+				   struct perf_tool *tool)
+{
+	unsigned int queue_nr;
+	u64 ts;
+	int ret;
+
+	while (1) {
+		struct itrace_queue *queue;
+		struct intel_pt_queue *ptq;
+
+		if (!pt->heap.heap_cnt)
+			return 0;
+
+		if (pt->heap.heap_array[0].ordinal >= timestamp)
+			return 0;
+
+		queue_nr = pt->heap.heap_array[0].queue_nr;
+		queue = &pt->queues.queue_array[queue_nr];
+		ptq = queue->priv;
+
+		intel_pt_log("queue %u processing 0x%" PRIx64 " < 0x%" PRIx64 "\n",
+			     queue_nr, pt->heap.heap_array[0].ordinal,
+			     timestamp);
+
+		itrace_heap__pop(&pt->heap);
+
+		if (pt->heap.heap_cnt) {
+			ts = pt->heap.heap_array[0].ordinal + 1;
+			if (ts > timestamp)
+				ts = timestamp;
+		} else {
+			ts = timestamp;
+		}
+
+		intel_pt_set_pid_tid_cpu(pt, queue);
+
+		ret = intel_pt_run_decoder(ptq, &ts, tool);
+
+		if (ret < 0) {
+			itrace_heap__add(&pt->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (!ret) {
+			ret = itrace_heap__add(&pt->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		} else {
+			ptq->on_heap = false;
+		}
+	}
+
+	return 0;
+}
+
+static int intel_pt_process_sample_queues(struct intel_pt *pt, u64 timestamp,
+				union perf_event *event __maybe_unused,
+				struct perf_sample *sample __maybe_unused,
+				struct perf_tool *tool)
+{
+	unsigned int queue_nr;
+	u64 ts;
+	int ret;
+
+	while (1) {
+		struct itrace_queue *queue;
+		struct intel_pt_queue *ptq;
+
+		if (!pt->heap.heap_cnt)
+			return 0;
+
+		if (pt->heap.heap_array[0].ordinal >= timestamp)
+			return 0;
+
+		queue_nr = pt->heap.heap_array[0].queue_nr;
+		queue = &pt->queues.queue_array[queue_nr];
+		ptq = queue->priv;
+
+		intel_pt_log("queue %u processing 0x%" PRIx64 " < 0x%" PRIx64 "\n",
+			     queue_nr, pt->heap.heap_array[0].ordinal,
+			     timestamp);
+
+		itrace_heap__pop(&pt->heap);
+
+		if (pt->heap.heap_cnt) {
+			ts = pt->heap.heap_array[0].ordinal + 1;
+			if (ts > timestamp)
+				ts = timestamp;
+		} else {
+			ts = timestamp;
+		}
+
+		if (!ptq->use_buffer_pid_tid)
+			intel_pt_set_pid_tid_cpu(pt, queue);
+
+		ret = intel_pt_run_decoder(ptq, &ts, tool);
+		if (ret < 0) {
+			itrace_heap__add(&pt->heap, queue_nr, ts);
+			return ret;
+		}
+
+		if (ret) {
+			ptq->on_heap = false;
+		} else {
+			ret = itrace_heap__add(&pt->heap, queue_nr, ts);
+			if (ret < 0)
+				return ret;
+		}
+	}
+
+	return 0;
+}
+
+static int intel_pt_process_timeless_queues(struct intel_pt *pt, pid_t tid,
+					    u64 time, struct perf_tool *tool)
+{
+	struct itrace_queues *queues = &pt->queues;
+	unsigned int i;
+	u64 ts = 0;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		struct itrace_queue *queue = &pt->queues.queue_array[i];
+		struct intel_pt_queue *ptq = queue->priv;
+
+		if (ptq && (tid == -1 || ptq->tid == tid)) {
+			ptq->time = time;
+
+			if (ptq->pid == -1 && ptq->tid != -1)
+				ptq->pid = machine__get_thread_pid(pt->machine,
+								   ptq->tid);
+
+			intel_pt_run_decoder(ptq, &ts, tool);
+		}
+	}
+	return 0;
+}
+
+static int intel_pt_process_timeless_sample(struct intel_pt *pt,
+					    struct perf_sample *sample,
+					    struct perf_tool *tool)
+{
+	struct itrace_queue *queue = itrace_queues__sample_queue(&pt->queues,
+								 sample,
+								 pt->session);
+	struct intel_pt_queue *ptq = queue->priv;
+	u64 ts = 0;
+
+	ptq->stop = false;
+	ptq->time = sample->time;
+	intel_pt_run_decoder(ptq, &ts, tool);
+	return 0;
+}
+
+static int intel_pt_lost(struct intel_pt *pt, struct perf_sample *sample,
+			 struct perf_tool *tool)
+{
+	union perf_event event;
+	int err;
+
+	itrace_synth_error(&event.itrace_error, PERF_ITRACE_DECODER_ERROR,
+			   ENOSPC, sample->cpu, sample->pid, sample->tid, 0,
+			   "Lost trace data");
+
+	err = perf_session__deliver_synth_event(pt->session, &event, NULL,
+						tool);
+	if (err)
+		pr_err("Intel Processor Trace: failed to deliver error event, error %d\n",
+		       err);
+
+	return err;
+}
+
+static int intel_pt_process_switch(struct intel_pt *pt,
+				   struct perf_sample *sample)
+{
+	struct perf_evsel *evsel;
+	pid_t tid;
+	int cpu;
+
+	evsel = perf_evlist__id2evsel(pt->session->evlist, sample->id);
+	if (evsel != pt->switch_evsel)
+		return 0;
+
+	tid = perf_evsel__intval(evsel, sample, "next_pid");
+	cpu = sample->cpu;
+
+	return machine__set_current_tid(pt->machine, cpu, 0, tid);
+}
+
+static int intel_pt_process_event(struct perf_session *session,
+				  union perf_event *event,
+				  struct perf_sample *sample,
+				  struct perf_tool *tool)
+{
+	struct intel_pt *pt = container_of(session->itrace, struct intel_pt,
+					   itrace);
+	u64 timestamp;
+	int err = 0;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_samples) {
+		pr_err("Intel Processor Trace requires ordered samples\n");
+		return -EINVAL;
+	}
+
+	if (sample->time)
+		timestamp = perf_time_to_tsc(sample->time, &pt->tc);
+	else
+		timestamp = 0;
+
+	if (timestamp || pt->timeless_decoding) {
+		err = intel_pt_update_queues(pt);
+		if (err)
+			return err;
+	}
+
+	if (pt->timeless_decoding) {
+		if (pt->sampling_mode) {
+			if (sample->itrace_sample.size)
+				err = intel_pt_process_timeless_sample(pt,
+								sample, tool);
+		} else if (event->header.type == PERF_RECORD_EXIT) {
+			err = intel_pt_process_timeless_queues(pt,
+					event->comm.tid, sample->time, tool);
+		}
+	} else if (timestamp) {
+		if (pt->sampling_mode)
+			err = intel_pt_process_sample_queues(pt, timestamp,
+							event, sample, tool);
+		else
+			err = intel_pt_process_queues(pt, timestamp, tool);
+	}
+	if (err)
+		return err;
+
+	if (event->header.type == PERF_RECORD_ITRACE_LOST &&
+	    pt->synth_opts.errors)
+		err = intel_pt_lost(pt, sample, tool);
+
+	if (pt->switch_evsel && event->header.type == PERF_RECORD_SAMPLE)
+		err = intel_pt_process_switch(pt, sample);
+
+	return err;
+}
+
+static int intel_pt_flush(struct perf_session *session, struct perf_tool *tool)
+{
+	struct intel_pt *pt = container_of(session->itrace, struct intel_pt,
+					   itrace);
+	int ret;
+
+	if (dump_trace)
+		return 0;
+
+	if (!tool->ordered_samples)
+		return -EINVAL;
+
+	ret = intel_pt_update_queues(pt);
+	if (ret < 0)
+		return ret;
+
+	if (pt->timeless_decoding)
+		return intel_pt_process_timeless_queues(pt, -1,
+						MAX_TIMESTAMP - 1, tool);
+
+	return intel_pt_process_queues(pt, MAX_TIMESTAMP, tool);
+}
+
+static void intel_pt_free_events(struct perf_session *session)
+{
+	struct intel_pt *pt = container_of(session->itrace, struct intel_pt,
+					   itrace);
+	struct itrace_queues *queues = &pt->queues;
+	unsigned int i;
+
+	for (i = 0; i < queues->nr_queues; i++) {
+		intel_pt_free_queue(queues->queue_array[i].priv);
+		queues->queue_array[i].priv = NULL;
+	}
+	itrace_queues__free(queues);
+}
+
+static void intel_pt_free(struct perf_session *session)
+{
+	struct intel_pt *pt = container_of(session->itrace, struct intel_pt,
+					   itrace);
+
+	itrace_heap__free(&pt->heap);
+	intel_pt_free_events(session);
+	session->itrace = NULL;
+	free(pt);
+}
+
+static int intel_pt_process_itrace_event(struct perf_session *session,
+					 union perf_event *event,
+					 struct perf_tool *tool __maybe_unused)
+{
+	struct intel_pt *pt = container_of(session->itrace, struct intel_pt,
+					   itrace);
+
+	if (pt->sampling_mode)
+		return 0;
+
+	if (!pt->data_queued) {
+		struct itrace_buffer *buffer;
+		off_t data_offset;
+		int fd = perf_data_file__fd(session->file);
+		int err;
+
+		if (perf_data_file__is_pipe(session->file)) {
+			data_offset = 0;
+		} else {
+			data_offset = lseek(fd, 0, SEEK_CUR);
+			if (data_offset == -1)
+				return -errno;
+		}
+
+		err = itrace_queues__add_event(&pt->queues, session, event,
+					       data_offset, &buffer);
+		if (err)
+			return err;
+
+		/* Dump here now we have copied a piped trace out of the pipe */
+		if (dump_trace) {
+			if (itrace_buffer__get_data(buffer, fd)) {
+				intel_pt_dump_event(pt, buffer->data,
+						    buffer->size);
+				itrace_buffer__put_data(buffer);
+			}
+		}
+	}
+
+	return 0;
+}
+
+static int intel_pt_queue_event(struct perf_session *session,
+				union perf_event *event __maybe_unused,
+				struct perf_sample *sample)
+{
+	struct intel_pt *pt = container_of(session->itrace, struct intel_pt,
+					   itrace);
+	unsigned int queue_nr;
+	u64 timestamp;
+	int err;
+
+	if (!sample->itrace_sample.size)
+		return 0;
+
+	if (!pt->sampling_mode)
+		return 0;
+
+	if (sample->time)
+		timestamp = perf_time_to_tsc(sample->time, &pt->tc);
+	else
+		timestamp = 0;
+
+	err = itrace_queues__add_sample(&pt->queues, sample, session, &queue_nr,
+					timestamp);
+	if (err)
+		return err;
+
+	return intel_pt_fix_overlap(pt, queue_nr);
+}
+
+struct intel_pt_synth {
+	struct perf_tool dummy_tool;
+	struct perf_tool *tool;
+	struct perf_session *session;
+};
+
+static int intel_pt_event_synth(struct perf_tool *tool,
+				union perf_event *event,
+				struct perf_sample *sample __maybe_unused,
+				struct machine *machine __maybe_unused)
+{
+	struct intel_pt_synth *intel_pt_synth =
+			container_of(tool, struct intel_pt_synth, dummy_tool);
+
+	return perf_session__deliver_synth_event(intel_pt_synth->session, event,
+						 NULL, intel_pt_synth->tool);
+}
+
+static int intel_pt_synth_event(struct perf_session *session,
+				struct perf_tool *tool,
+				struct perf_event_attr *attr, u64 id)
+{
+	struct intel_pt_synth intel_pt_synth;
+
+	memset(&intel_pt_synth, 0, sizeof(struct intel_pt_synth));
+	intel_pt_synth.tool = tool;
+	intel_pt_synth.session = session;
+
+	return perf_event__synthesize_attr(&intel_pt_synth.dummy_tool, attr, 1,
+					   &id, intel_pt_event_synth);
+}
+
+static int intel_pt_synth_events(struct intel_pt *pt,
+				 struct perf_session *session,
+				 struct perf_tool *tool)
+{
+	struct perf_evlist *evlist = session->evlist;
+	struct perf_evsel *evsel;
+	struct perf_event_attr attr;
+	bool found = false;
+	u64 id;
+	int err;
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		if ((evsel->attr.type == pt->pmu_type ||
+		     (evsel->attr.sample_type & PERF_SAMPLE_ITRACE)) &&
+		    evsel->ids) {
+			found = true;
+			break;
+		}
+	}
+
+	if (!found) {
+		pr_err("%s: failed\n", __func__);
+		return -EINVAL;
+	}
+
+	memset(&attr, 0, sizeof(struct perf_event_attr));
+	attr.size = sizeof(struct perf_event_attr);
+	attr.type = PERF_TYPE_HARDWARE;
+	attr.sample_type = evsel->attr.sample_type & PERF_SAMPLE_MASK;
+	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
+			    PERF_SAMPLE_PERIOD;
+	if (pt->timeless_decoding)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
+	else
+		attr.sample_type |= PERF_SAMPLE_TIME;
+	if (!pt->per_cpu_mmaps)
+		attr.sample_type &= ~(u64)PERF_SAMPLE_CPU;
+	attr.exclude_user = evsel->attr.exclude_user;
+	attr.exclude_kernel = evsel->attr.exclude_kernel;
+	attr.exclude_hv = evsel->attr.exclude_hv;
+	attr.exclude_host = evsel->attr.exclude_host;
+	attr.exclude_guest = evsel->attr.exclude_guest;
+	attr.sample_id_all = evsel->attr.sample_id_all;
+	attr.read_format = evsel->attr.read_format;
+
+	id = evsel->id[0] + 1000000000;
+	if (!id)
+		id = 1;
+
+	if (pt->synth_opts.instructions) {
+		attr.config = PERF_COUNT_HW_INSTRUCTIONS;
+		if (pt->synth_opts.period_type == PERF_ITRACE_PERIOD_NANOSECS)
+			attr.sample_period =
+				intel_pt_ns_to_ticks(pt, pt->synth_opts.period);
+		else
+			attr.sample_period = pt->synth_opts.period;
+		pt->instructions_sample_period = attr.sample_period;
+		pr_debug("Synthesizing 'instructions' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, tool, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'instructions' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_instructions = true;
+		pt->instructions_sample_type = attr.sample_type;
+		pt->instructions_id = id;
+		/*
+		 * We only use sample types from PERF_SAMPLE_MASK so we can use
+		 * __perf_evsel__sample_size() here.
+		 */
+		pt->instructions_event_size = sizeof(struct sample_event) +
+				__perf_evsel__sample_size(attr.sample_type);
+		id += 1;
+	}
+
+	if (pt->synth_opts.branches) {
+		attr.config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS;
+		attr.sample_period = 1;
+		attr.sample_type |= PERF_SAMPLE_ADDR;
+		pr_debug("Synthesizing 'branches' event with id %" PRIu64 " sample type %#" PRIx64 "\n",
+			 id, (u64)attr.sample_type);
+		err = intel_pt_synth_event(session, tool, &attr, id);
+		if (err) {
+			pr_err("%s: failed to synthesize 'branches' event type\n",
+			       __func__);
+			return err;
+		}
+		pt->sample_branches = true;
+		pt->branches_sample_type = attr.sample_type;
+		pt->branches_id = id;
+		/*
+		 * We only use sample types from PERF_SAMPLE_MASK so we can use
+		 * __perf_evsel__sample_size() here.
+		 */
+		pt->branches_event_size = sizeof(struct sample_event) +
+				__perf_evsel__sample_size(attr.sample_type);
+	}
+
+	pt->synth_needs_swap = evsel->needs_swap;
+
+	return 0;
+}
+
+static struct perf_evsel *intel_pt_find_sched_switch(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+
+	list_for_each_entry_reverse(evsel, &evlist->entries, node) {
+		const char *name = perf_evsel__name(evsel);
+
+		if (!strcmp(name, "sched:sched_switch"))
+			return evsel;
+	}
+
+	return NULL;
+}
+
+enum {
+	INTEL_PT_PMU_TYPE,
+	INTEL_PT_TIME_SHIFT,
+	INTEL_PT_TIME_MULT,
+	INTEL_PT_TIME_ZERO,
+	INTEL_PT_CAP_USER_TIME_ZERO,
+	INTEL_PT_TSC_BIT,
+	INTEL_PT_NORETCOMP_BIT,
+	INTEL_PT_HAVE_SCHED_SWITCH,
+	INTEL_PT_SNAPSHOT_MODE,
+	INTEL_PT_PER_CPU_MMAPS,
+	INTEL_PT_ITRACE_PRIV_SIZE,
+};
+
+u64 intel_pt_itrace_info_priv[INTEL_PT_ITRACE_PRIV_SIZE];
+
+int intel_pt_process_itrace_info(struct perf_tool *tool,
+				 union perf_event *event,
+				 struct perf_session *session)
+{
+	struct itrace_info_event *itrace_info = &event->itrace_info;
+	size_t min_sz = sizeof(u64) * INTEL_PT_PER_CPU_MMAPS;
+	struct intel_pt *pt;
+	int err;
+
+	if (itrace_info->header.size < sizeof(struct itrace_info_event) +
+					min_sz)
+		return -EINVAL;
+
+	pt = zalloc(sizeof(struct intel_pt));
+	if (!pt)
+		return -ENOMEM;
+
+	err = itrace_queues__init(&pt->queues);
+	if (err)
+		goto err_free;
+
+	intel_pt_log_set_name(INTEL_PT_PMU_NAME);
+
+	pt->session = session;
+	pt->machine = &session->machines.host; /* No kvm support */
+	pt->itrace_type = itrace_info->type;
+	pt->pmu_type = itrace_info->priv[INTEL_PT_PMU_TYPE];
+	pt->tc.time_shift = itrace_info->priv[INTEL_PT_TIME_SHIFT];
+	pt->tc.time_mult = itrace_info->priv[INTEL_PT_TIME_MULT];
+	pt->tc.time_zero = itrace_info->priv[INTEL_PT_TIME_ZERO];
+	pt->cap_user_time_zero = itrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO];
+	pt->tsc_bit = itrace_info->priv[INTEL_PT_TSC_BIT];
+	pt->noretcomp_bit = itrace_info->priv[INTEL_PT_NORETCOMP_BIT];
+	pt->have_sched_switch = itrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH];
+	pt->snapshot_mode = itrace_info->priv[INTEL_PT_SNAPSHOT_MODE];
+	pt->per_cpu_mmaps = itrace_info->priv[INTEL_PT_PER_CPU_MMAPS];
+
+	pt->timeless_decoding = intel_pt_timeless_decoding(pt);
+	pt->have_tsc = intel_pt_have_tsc(pt);
+	pt->sampling_mode = intel_pt_sampling_mode(pt);
+
+	pt->itrace.process_event = intel_pt_process_event;
+	pt->itrace.queue_event = intel_pt_queue_event;
+	pt->itrace.process_itrace_event = intel_pt_process_itrace_event;
+	pt->itrace.dump_itrace_sample = intel_pt_dump_sample;
+	pt->itrace.flush_events = intel_pt_flush;
+	pt->itrace.free_events = intel_pt_free_events;
+	pt->itrace.free = intel_pt_free;
+	session->itrace = &pt->itrace;
+
+	if (dump_trace)
+		return 0;
+
+	if (pt->have_sched_switch == 1) {
+		pt->switch_evsel = intel_pt_find_sched_switch(session->evlist);
+		if (!pt->switch_evsel) {
+			pr_err("%s: missing sched_switch event\n", __func__);
+			goto err_free_queues;
+		}
+	}
+
+	if (session->itrace_synth_opts && session->itrace_synth_opts->set)
+		pt->synth_opts = *session->itrace_synth_opts;
+	else
+		itrace_synth_opts__set_default(&pt->synth_opts);
+
+	err = intel_pt_synth_events(pt, session, tool);
+	if (err)
+		goto err_free_queues;
+
+	err = itrace_queues__process_index(&pt->queues, session);
+	if (err)
+		goto err_free_queues;
+
+	if (pt->queues.populated)
+		pt->data_queued = true;
+
+	if (pt->timeless_decoding)
+		pr_debug2("Intel PT decoding without timestamps\n");
+
+	return 0;
+
+err_free_queues:
+	itrace_queues__free(&pt->queues);
+	session->itrace = NULL;
+err_free:
+	free(pt);
+	return err;
+}
+
+static bool intel_pt_has_topa_multiple_entries(struct perf_pmu *intel_pt_pmu)
+{
+	unsigned int topa_multiple_entries;
+
+	if (perf_pmu__scan_file(intel_pt_pmu,
+				"caps/topa_multiple_entries", "%u",
+				&topa_multiple_entries) == 1 &&
+	    topa_multiple_entries)
+		return true;
+
+	return false;
+}
+
+static int intel_pt_parse_terms_with_default(struct list_head *formats,
+					     const char *str,
+					     u64 *itrace_config)
+{
+	struct list_head *terms;
+	struct perf_event_attr attr = {0};
+	int err;
+
+	terms = malloc(sizeof(struct list_head));
+	if (!terms)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(terms);
+
+	err = parse_events_terms(terms, str);
+	if (err)
+		goto out_free;
+
+	attr.itrace_config = *itrace_config;
+	err = perf_pmu__config_terms(formats, &attr, terms, true);
+	if (err)
+		goto out_free;
+
+	*itrace_config = attr.itrace_config;
+out_free:
+	parse_events__free_terms(terms);
+	return err;
+}
+
+static int intel_pt_parse_terms(struct list_head *formats, const char *str,
+				u64 *itrace_config)
+{
+	*itrace_config = 0;
+	return intel_pt_parse_terms_with_default(formats, str, itrace_config);
+}
+
+static size_t intel_pt_psb_period(struct perf_pmu *intel_pt_pmu __maybe_unused,
+				  struct perf_evlist *evlist __maybe_unused)
+{
+	return 256;
+}
+
+static u64 intel_pt_default_config(struct perf_pmu *intel_pt_pmu)
+{
+	u64 itrace_config;
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &itrace_config);
+	return itrace_config;
+}
+
+static size_t intel_pt_sample_size(const char **str)
+{
+	char *endptr;
+	unsigned long sample_size;
+
+	sample_size = strtoul(*str, &endptr, 0);
+	if (sample_size)
+		*str = endptr;
+	return sample_size;
+}
+
+static int intel_pt_parse_sample_options(struct itrace_record *itr,
+					 struct perf_record_opts *opts,
+					 const char *str)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
+	u64 *itrace_config = &opts->itrace_sample_config;
+	int err;
+
+	opts->itrace_sample_size = str ? intel_pt_sample_size(&str) : 0;
+	if (opts->itrace_sample_size > INTEL_PT_MAX_SAMPLE_SIZE) {
+		pr_err("Intel Processor Trace: sample size too big\n");
+		return -1;
+	}
+
+	*itrace_config = intel_pt_default_config(intel_pt_pmu);
+	opts->itrace_sample_type = intel_pt_pmu->type;
+	opts->sample_itrace = true;
+
+	if (!str || !*str)
+		return 0;
+
+	err = intel_pt_parse_terms_with_default(&intel_pt_pmu->format, str,
+						itrace_config);
+	if (err)
+		goto bad_options;
+
+	return 0;
+
+bad_options:
+	pr_err("Intel Processor Trace: bad sampling options \"%s\"\n", str);
+	return -1;
+}
+
+static int intel_pt_parse_snapshot_options(struct itrace_record *itr,
+					   struct perf_record_opts *opts,
+					   const char *str)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	unsigned long long snapshot_size = 0;
+	char *endptr;
+
+	if (str) {
+		snapshot_size = strtoull(str, &endptr, 0);
+		if (*endptr || snapshot_size > SIZE_MAX)
+			return -1;
+	}
+
+	opts->itrace_snapshot_mode = true;
+	opts->itrace_snapshot_size = snapshot_size;
+
+	ptr->snapshot_size = snapshot_size;
+
+	return 0;
+}
+
+struct perf_event_attr *
+intel_pt_pmu_default_config(struct perf_pmu *intel_pt_pmu)
+{
+	struct perf_event_attr *attr;
+
+	attr = zalloc(sizeof(struct perf_event_attr));
+	if (!attr)
+		return NULL;
+
+	attr->itrace_config = intel_pt_default_config(intel_pt_pmu);
+
+	intel_pt_pmu->selectable = true;
+
+	return attr;
+}
+
+static size_t intel_pt_info_priv_size(struct itrace_record *itr __maybe_unused)
+{
+	return sizeof(intel_pt_itrace_info_priv);
+}
+
+static int intel_pt_info_fill(struct itrace_record *itr,
+			      struct perf_session *session,
+			      struct itrace_info_event *itrace_info,
+			      size_t priv_size)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
+	struct perf_event_mmap_page *pc;
+	struct perf_tsc_conversion tc;
+	bool cap_user_time_zero, per_cpu_mmaps;
+	u64 tsc_bit, noretcomp_bit;
+	int err;
+
+	if (priv_size != sizeof(intel_pt_itrace_info_priv))
+		return -EINVAL;
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
+	intel_pt_parse_terms(&intel_pt_pmu->format, "noretcomp",
+			     &noretcomp_bit);
+
+	if (!session->evlist->nr_mmaps)
+		return -EINVAL;
+
+	pc = session->evlist->mmap[0].base;
+	err = perf_read_tsc_conversion(pc, &tc);
+	if (err) {
+		if (err != -EOPNOTSUPP)
+			return err;
+		cap_user_time_zero = false;
+	} else {
+		cap_user_time_zero = tc.time_mult != 0;
+	}
+
+	if (!cap_user_time_zero)
+		ui__warning("Intel Processor Trace: TSC not available\n");
+
+	per_cpu_mmaps = !cpu_map__empty(session->evlist->cpus);
+
+	itrace_info->type = PERF_ITRACE_INTEL_PT;
+	itrace_info->priv[INTEL_PT_PMU_TYPE] = intel_pt_pmu->type;
+	itrace_info->priv[INTEL_PT_TIME_SHIFT] = tc.time_shift;
+	itrace_info->priv[INTEL_PT_TIME_MULT] = tc.time_mult;
+	itrace_info->priv[INTEL_PT_TIME_ZERO] = tc.time_zero;
+	itrace_info->priv[INTEL_PT_CAP_USER_TIME_ZERO] = cap_user_time_zero;
+	itrace_info->priv[INTEL_PT_TSC_BIT] = tsc_bit;
+	itrace_info->priv[INTEL_PT_NORETCOMP_BIT] = noretcomp_bit;
+	itrace_info->priv[INTEL_PT_HAVE_SCHED_SWITCH] = ptr->have_sched_switch;
+	itrace_info->priv[INTEL_PT_SNAPSHOT_MODE] = ptr->snapshot_mode;
+	itrace_info->priv[INTEL_PT_PER_CPU_MMAPS] = per_cpu_mmaps;
+
+	return 0;
+}
+
+static size_t intel_pt_to_power_of_2(size_t x, size_t min)
+{
+	size_t y = x;
+	int i;
+
+	if (!x)
+		return min;
+	for (i = 0; y != 1; i++)
+		y >>= 1;
+	y <<= i;
+	if (x & (y - 1))
+		y <<= 1;
+	if (y < min)
+		return min;
+	return y;
+}
+
+static int intel_pt_track_switches(struct perf_evlist *evlist)
+{
+	const char *sched_switch = "sched:sched_switch";
+	struct perf_evsel *evsel;
+	int err;
+
+	if (!perf_evlist__can_select_event(evlist, sched_switch))
+		return -EPERM;
+
+	err = parse_events(evlist, sched_switch);
+	if (err) {
+		pr_debug2("%s: failed to parse %s, error %d\n",
+			  __func__, sched_switch, err);
+		return err;
+	}
+
+	evsel = perf_evlist__last(evlist);
+
+	perf_evsel__set_sample_bit(evsel, CPU);
+
+	evsel->system_wide = true;
+	evsel->no_aux_samples = true;
+	evsel->immediate = true;
+
+	return 0;
+}
+
+static int intel_pt_recording_options(struct itrace_record *itr,
+				      struct perf_evlist *evlist,
+				      struct perf_record_opts *opts)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_pmu *intel_pt_pmu = ptr->intel_pt_pmu;
+	bool have_timing_info, topa_multiple_entries;
+	struct perf_evsel *evsel, *intel_pt_evsel = NULL;
+	const struct cpu_map *cpus = evlist->cpus;
+	bool privileged = geteuid() == 0 || perf_event_paranoid() < 0;
+	u64 tsc_bit;
+
+	ptr->evlist = evlist;
+	ptr->snapshot_mode = opts->itrace_snapshot_mode;
+
+	list_for_each_entry(evsel, &evlist->entries, node) {
+		if (evsel->attr.type == intel_pt_pmu->type) {
+			if (intel_pt_evsel) {
+				pr_err("There may be only one " INTEL_PT_PMU_NAME " event\n");
+				return -EINVAL;
+			}
+			evsel->attr.freq = 0;
+			evsel->attr.sample_period = 1;
+			intel_pt_evsel = evsel;
+			opts->full_itrace = true;
+		}
+	}
+
+	if (opts->itrace_snapshot_mode && !opts->full_itrace) {
+		pr_err("Snapshot mode (-S option) requires " INTEL_PT_PMU_NAME " PMU event (-e " INTEL_PT_PMU_NAME ")\n");
+		return -EINVAL;
+	}
+
+	if (!opts->full_itrace && !opts->sample_itrace)
+		return 0;
+
+	if (opts->full_itrace && opts->sample_itrace) {
+		pr_err("Full trace (" INTEL_PT_PMU_NAME " PMU) and sample trace (-I option) cannot be used together\n");
+		return -EINVAL;
+	}
+
+	/* Set default size for sample mode */
+	if (opts->sample_itrace) {
+		size_t psb_period = intel_pt_psb_period(intel_pt_pmu, evlist);
+
+		if (!opts->itrace_sample_size)
+			opts->itrace_sample_size = INTEL_PT_DEFAULT_SAMPLE_SIZE;
+		pr_debug2("Intel PT sample size: %zu\n",
+			  opts->itrace_sample_size);
+		if (psb_period &&
+		    opts->itrace_sample_size <= psb_period +
+						INTEL_PT_PSB_PERIOD_NEAR)
+			ui__warning("Intel PT sample size (%zu) may be too small for PSB period (%zu)\n",
+				    opts->itrace_sample_size, psb_period);
+	}
+
+	topa_multiple_entries =
+			intel_pt_has_topa_multiple_entries(intel_pt_pmu);
+
+	/* Set default sizes for snapshot mode */
+	if (opts->itrace_snapshot_mode) {
+		size_t psb_period = intel_pt_psb_period(intel_pt_pmu, evlist);
+
+		if (!opts->itrace_snapshot_size && !opts->itrace_mmap_pages) {
+			if (privileged) {
+				opts->itrace_mmap_pages = MiB(4) / page_size;
+			} else {
+				opts->itrace_mmap_pages = KiB(128) / page_size;
+				if (opts->mmap_pages == UINT_MAX)
+					opts->mmap_pages = KiB(256) / page_size;
+			}
+		} else if (!opts->itrace_mmap_pages && !privileged &&
+			   opts->mmap_pages == UINT_MAX) {
+			opts->mmap_pages = KiB(256) / page_size;
+		}
+		if (!opts->itrace_snapshot_size)
+			opts->itrace_snapshot_size =
+					opts->itrace_mmap_pages * page_size;
+		if (!opts->itrace_mmap_pages) {
+			size_t sz = opts->itrace_snapshot_size;
+
+			if (topa_multiple_entries)
+				sz += page_size - 1;
+			else if (sz <= MiB(128))
+				sz = intel_pt_to_power_of_2(sz, 4096);
+			else
+				sz = roundup(sz, MiB(128));
+			opts->itrace_mmap_pages = sz / page_size;
+		}
+		if (opts->itrace_snapshot_size >
+					opts->itrace_mmap_pages * page_size) {
+			pr_err("Snapshot size %zu must not be greater than instruction tracing mmap size %zu\n",
+			       opts->itrace_snapshot_size,
+			       opts->itrace_mmap_pages * (size_t)page_size);
+			return -EINVAL;
+		}
+		if (!opts->itrace_snapshot_size || !opts->itrace_mmap_pages) {
+			pr_err("Failed to calculate default snapshot size and/or instruction tracing mmap pages\n");
+			return -EINVAL;
+		}
+		pr_debug2("Intel PT snapshot size: %zu\n",
+			  opts->itrace_snapshot_size);
+		if (psb_period &&
+		    opts->itrace_snapshot_size <= psb_period +
+						  INTEL_PT_PSB_PERIOD_NEAR)
+			ui__warning("Intel PT snapshot size (%zu) may be too small for PSB period (%zu)\n",
+				    opts->itrace_sample_size, psb_period);
+	}
+
+	/* Set default sizes for full trace mode */
+	if (opts->full_itrace && !opts->itrace_mmap_pages) {
+		if (privileged) {
+			opts->itrace_mmap_pages = MiB(4) / page_size;
+		} else {
+			opts->itrace_mmap_pages = KiB(128) / page_size;
+			if (opts->mmap_pages == UINT_MAX)
+				opts->mmap_pages = KiB(256) / page_size;
+		}
+	}
+
+	/* Validate itrace_mmap_pages */
+	if (opts->itrace_mmap_pages && !topa_multiple_entries) {
+		size_t sz = opts->itrace_mmap_pages * page_size;
+		size_t min_sz;
+
+		if (opts->itrace_snapshot_mode)
+			min_sz = KiB(4);
+		else
+			min_sz = KiB(8);
+
+		if (sz < min_sz ||
+		    (!is_power_of_2(sz) && (sz & MiB_MASK(128)))) {
+			pr_err("Invalid mmap size for Intel Processor Trace: must be at least %zuKiB and a power of 2 or a multiple of 128MiB\n",
+			       min_sz / 1024);
+			return -EINVAL;
+		}
+	}
+
+	intel_pt_parse_terms(&intel_pt_pmu->format, "tsc", &tsc_bit);
+
+	if ((opts->sample_itrace && (opts->itrace_sample_config & tsc_bit)) ||
+	    (opts->full_itrace &&
+		    (intel_pt_evsel->attr.itrace_config & tsc_bit)))
+		have_timing_info = true;
+	else
+		have_timing_info = false;
+
+	/*
+	 * Per-cpu recording needs sched_switch events to distinguish different
+	 * threads.
+	 */
+	if (have_timing_info && !cpu_map__empty(cpus)) {
+		int err;
+
+		err = intel_pt_track_switches(evlist);
+		if (err == -EPERM)
+			pr_debug2("Unable to select sched:sched_switch\n");
+		else if (err)
+			return err;
+		else
+			ptr->have_sched_switch = 1;
+	}
+
+	if (intel_pt_evsel) {
+		/*
+		 * To mmap the magic offset, the Intel PT event must come first.
+		 */
+		perf_evlist__to_front(evlist, intel_pt_evsel);
+		/*
+		 * In the case of per-cpu mmaps, we need the CPU on the
+		 * ITRACE_LOST event.
+		 */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(intel_pt_evsel, CPU);
+	}
+
+	/* Add dummy event to keep tracking */
+	if (opts->full_itrace) {
+		struct perf_evsel *tracking_evsel;
+		int err;
+
+		err = parse_events(evlist, "dummy:u");
+		if (err)
+			return err;
+
+		tracking_evsel = perf_evlist__last(evlist);
+
+		err = perf_evlist__set_tracking_event(evlist, tracking_evsel);
+		if (err)
+			return err;
+
+		tracking_evsel->attr.freq = 0;
+		tracking_evsel->attr.sample_period = 1;
+
+		/* In per-cpu case, always need the time of mmap events etc */
+		if (!cpu_map__empty(cpus))
+			perf_evsel__set_sample_bit(tracking_evsel, TIME);
+	}
+
+	/*
+	 * Warn the user when we do not have enough information to decode i.e.
+	 * per-cpu with no sched_switch (except workload-only).
+	 */
+	if (!ptr->have_sched_switch && !opts->sample_itrace &&
+	    !cpu_map__empty(cpus) && !target__none(&opts->target))
+		ui__warning("Intel Processor Trace decoding will not be possible except for kernel tracing!\n");
+
+	return 0;
+}
+
+static int intel_pt_snapshot_start(struct itrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	list_for_each_entry(evsel, &ptr->evlist->entries, node) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__disable_event(ptr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_pt_snapshot_finish(struct itrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	list_for_each_entry(evsel, &ptr->evlist->entries, node) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__enable_event(ptr->evlist, evsel);
+	}
+	return -EINVAL;
+}
+
+static int intel_pt_alloc_snapshot_refs(struct intel_pt_recording *ptr, int idx)
+{
+	const size_t sz = sizeof(struct intel_pt_snapshot_ref);
+	int cnt = ptr->snapshot_ref_cnt, new_cnt = cnt * 2;
+	struct intel_pt_snapshot_ref *refs;
+
+	if (!new_cnt)
+		new_cnt = 16;
+
+	while (new_cnt <= idx)
+		new_cnt *= 2;
+
+	refs = calloc(new_cnt, sz);
+	if (!refs)
+		return -ENOMEM;
+
+	memcpy(refs, ptr->snapshot_refs, cnt * sz);
+
+	ptr->snapshot_refs = refs;
+	ptr->snapshot_ref_cnt = new_cnt;
+
+	return 0;
+}
+
+static void intel_pt_free_snapshot_refs(struct intel_pt_recording *ptr)
+{
+	int i;
+
+	for (i = 0; i < ptr->snapshot_ref_cnt; i++)
+		free(ptr->snapshot_refs[i].ref_buf);
+	free(ptr->snapshot_refs);
+}
+
+static void intel_pt_recording_free(struct itrace_record *itr)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+
+	intel_pt_free_snapshot_refs(ptr);
+	free(ptr);
+}
+
+static int intel_pt_alloc_snapshot_ref(struct intel_pt_recording *ptr, int idx,
+				       size_t snapshot_buf_size)
+{
+	size_t ref_buf_size = ptr->snapshot_ref_buf_size;
+	void *ref_buf;
+
+	ref_buf = zalloc(ref_buf_size);
+	if (!ref_buf)
+		return -ENOMEM;
+
+	ptr->snapshot_refs[idx].ref_buf = ref_buf;
+	ptr->snapshot_refs[idx].ref_offset = snapshot_buf_size - ref_buf_size;
+
+	return 0;
+}
+
+static size_t intel_pt_snapshot_ref_buf_size(struct intel_pt_recording *ptr,
+					     size_t snapshot_buf_size)
+{
+	const size_t max_size = 256 * 1024;
+	size_t buf_size = 0, psb_period;
+
+	if (ptr->snapshot_size <= 64 * 1024)
+		return 0;
+
+	psb_period = intel_pt_psb_period(ptr->intel_pt_pmu, ptr->evlist);
+	if (psb_period)
+		buf_size = psb_period * 2;
+
+	if (!buf_size || buf_size > max_size)
+		buf_size = max_size;
+
+	if (buf_size >= snapshot_buf_size)
+		return 0;
+
+	if (buf_size >= ptr->snapshot_size / 2)
+		return 0;
+
+	return buf_size;
+}
+
+static int intel_pt_snapshot_init(struct intel_pt_recording *ptr,
+				  size_t snapshot_buf_size)
+{
+	if (ptr->snapshot_init_done)
+		return 0;
+
+	ptr->snapshot_init_done = true;
+
+	ptr->snapshot_ref_buf_size = intel_pt_snapshot_ref_buf_size(ptr,
+							snapshot_buf_size);
+
+	return 0;
+}
+
+/**
+ * intel_pt_compare_buffers - compare bytes in a buffer to a circular buffer.
+ * @buf1: first buffer
+ * @compare_size: number of bytes to compare
+ * @buf2: second buffer (a circular buffer)
+ * @offs2: offset in second buffer
+ * @buf2_size: size of second buffer
+ *
+ * The comparison allows for the possibility that the bytes to compare in the
+ * circular buffer are not contiguous.  It is assumed that @compare_size <=
+ * @buf2_size.  This function returns %false if the bytes are identical, %true
+ * otherwise.
+ */
+static bool intel_pt_compare_buffers(void *buf1, size_t compare_size,
+				     void *buf2, size_t offs2, size_t buf2_size)
+{
+	size_t end2 = offs2 + compare_size, part_size;
+
+	if (end2 <= buf2_size)
+		return memcmp(buf1, buf2 + offs2, compare_size);
+
+	part_size = end2 - buf2_size;
+	if (memcmp(buf1, buf2 + offs2, part_size))
+		return true;
+
+	compare_size -= part_size;
+
+	return memcmp(buf1 + part_size, buf2, compare_size);
+}
+
+static bool intel_pt_compare_ref(void *ref_buf, size_t ref_offset,
+				 size_t ref_size, size_t buf_size,
+				 void *data, size_t head)
+{
+	size_t ref_end = ref_offset + ref_size;
+
+	if (ref_end > buf_size) {
+		if (head > ref_offset || head < ref_end - buf_size)
+			return true;
+	} else if (head > ref_offset && head < ref_end) {
+		return true;
+	}
+
+	return intel_pt_compare_buffers(ref_buf, ref_size, data, ref_offset,
+					buf_size);
+}
+
+static void intel_pt_copy_ref(void *ref_buf, size_t ref_size, size_t buf_size,
+			      void *data, size_t head)
+{
+	if (head >= ref_size) {
+		memcpy(ref_buf, data + head - ref_size, ref_size);
+	} else {
+		memcpy(ref_buf, data, head);
+		ref_size -= head;
+		memcpy(ref_buf + head, data + buf_size - ref_size, ref_size);
+	}
+}
+
+static bool intel_pt_wrapped(struct intel_pt_recording *ptr, int idx,
+			     struct itrace_mmap *mm, unsigned char *data,
+			     u64 head)
+{
+	struct intel_pt_snapshot_ref *ref = &ptr->snapshot_refs[idx];
+	bool wrapped;
+
+	wrapped = intel_pt_compare_ref(ref->ref_buf, ref->ref_offset,
+				       ptr->snapshot_ref_buf_size, mm->len,
+				       data, head);
+
+	intel_pt_copy_ref(ref->ref_buf, ptr->snapshot_ref_buf_size, mm->len,
+			  data, head);
+
+	return wrapped;
+}
+
+static bool intel_pt_first_wrap(u64 *data, size_t buf_size)
+{
+	int i, a, b;
+
+	b = buf_size >> 3;
+	a = b - 512;
+	if (a < 0)
+		a = 0;
+
+	for (i = a; i < b; i++) {
+		if (data[i])
+			return true;
+	}
+
+	return false;
+}
+
+static int intel_pt_find_snapshot(struct itrace_record *itr, int idx,
+				  struct itrace_mmap *mm, unsigned char *data,
+				  u64 *head, u64 *old)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	bool wrapped;
+	int err;
+
+	pr_debug3("%s: mmap index %d old head %zu new head %zu\n",
+		  __func__, idx, (size_t)*old, (size_t)*head);
+
+	err = intel_pt_snapshot_init(ptr, mm->len);
+	if (err)
+		goto out_err;
+
+	if (idx >= ptr->snapshot_ref_cnt) {
+		err = intel_pt_alloc_snapshot_refs(ptr, idx);
+		if (err)
+			goto out_err;
+	}
+
+	if (ptr->snapshot_ref_buf_size) {
+		if (!ptr->snapshot_refs[idx].ref_buf) {
+			err = intel_pt_alloc_snapshot_ref(ptr, idx, mm->len);
+			if (err)
+				goto out_err;
+		}
+		wrapped = intel_pt_wrapped(ptr, idx, mm, data, *head);
+	} else {
+		wrapped = ptr->snapshot_refs[idx].wrapped;
+		if (!wrapped && intel_pt_first_wrap((u64 *)data, mm->len)) {
+			ptr->snapshot_refs[idx].wrapped = true;
+			wrapped = true;
+		}
+	}
+
+	/*
+	 * In full trace mode 'head' continually increases.  However in snapshot
+	 * mode 'head' is an offset within the buffer.  Here 'old' and 'head'
+	 * are adjusted to match the full trace case which expects that 'old' is
+	 * always less than 'head'.
+	 */
+	if (wrapped) {
+		*old = *head;
+		*head += mm->len;
+	} else {
+		if (mm->mask)
+			*old &= mm->mask;
+		else
+			*old %= mm->len;
+		if (*old > *head)
+			*head += mm->len;
+	}
+
+	pr_debug3("%s: wrap-around %sdetected, adjusted old head %zu adjusted new head %zu\n",
+		  __func__, wrapped ? "" : "not ", (size_t)*old, (size_t)*head);
+
+	return 0;
+
+out_err:
+	pr_err("%s: failed, error %d\n", __func__, err);
+	return err;
+}
+
+static u64 intel_pt_reference(struct itrace_record *itr __maybe_unused)
+{
+	return rdtsc();
+}
+
+static int intel_pt_read_finish(struct itrace_record *itr, int idx)
+{
+	struct intel_pt_recording *ptr =
+			container_of(itr, struct intel_pt_recording, itr);
+	struct perf_evsel *evsel;
+
+	list_for_each_entry(evsel, &ptr->evlist->entries, node) {
+		if (evsel->attr.type == ptr->intel_pt_pmu->type)
+			return perf_evlist__enable_event_idx(ptr->evlist, evsel,
+							     idx);
+	}
+	return -EINVAL;
+}
+
+struct itrace_record *intel_pt_recording_init(int *err)
+{
+	struct perf_pmu *intel_pt_pmu = perf_pmu__find(INTEL_PT_PMU_NAME);
+	struct intel_pt_recording *ptr;
+
+	if (!intel_pt_pmu)
+		return NULL;
+
+	ptr = zalloc(sizeof(struct intel_pt_recording));
+	if (!ptr) {
+		*err = -ENOMEM;
+		return NULL;
+	}
+
+	ptr->intel_pt_pmu = intel_pt_pmu;
+	ptr->itr.parse_sample_options = intel_pt_parse_sample_options;
+	ptr->itr.recording_options = intel_pt_recording_options;
+	ptr->itr.info_priv_size = intel_pt_info_priv_size;
+	ptr->itr.info_fill = intel_pt_info_fill;
+	ptr->itr.free = intel_pt_recording_free;
+	ptr->itr.snapshot_start = intel_pt_snapshot_start;
+	ptr->itr.snapshot_finish = intel_pt_snapshot_finish;
+	ptr->itr.find_snapshot = intel_pt_find_snapshot;
+	ptr->itr.parse_snapshot_options = intel_pt_parse_snapshot_options;
+	ptr->itr.reference = intel_pt_reference;
+	ptr->itr.read_finish = intel_pt_read_finish;
+	return &ptr->itr;
+}
diff --git a/tools/perf/util/intel-pt.h b/tools/perf/util/intel-pt.h
new file mode 100644
index 0000000..99898ca
--- /dev/null
+++ b/tools/perf/util/intel-pt.h
@@ -0,0 +1,40 @@
+/*
+ * intel_pt.h: Intel Processor Trace support
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#ifndef INCLUDE__PERF_INTEL_PT_H__
+#define INCLUDE__PERF_INTEL_PT_H__
+
+#define INTEL_PT_PMU_NAME "intel_pt"
+
+struct itrace_record;
+struct perf_tool;
+union perf_event;
+struct perf_session;
+struct perf_event_attr;
+struct perf_pmu;
+
+struct itrace_record *intel_pt_recording_init(int *err);
+
+int intel_pt_process_itrace_info(struct perf_tool *tool,
+				 union perf_event *event,
+				 struct perf_session *session);
+
+struct perf_event_attr *intel_pt_pmu_default_config(struct perf_pmu *pmu);
+
+#endif
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [PATCH v0 71/71] perf tools: Take Intel PT into use
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (69 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 70/71] perf tools: Add Intel PT support Alexander Shishkin
@ 2013-12-11 12:37 ` Alexander Shishkin
  2013-12-11 13:04 ` [PATCH v0 00/71] perf: Add support for Intel Processor Trace Ingo Molnar
  2013-12-11 13:52 ` Arnaldo Carvalho de Melo
  72 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 12:37 UTC (permalink / raw)
  To: Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Andi Kleen, Adrian Hunter

From: Adrian Hunter <adrian.hunter@intel.com>

To record an Instruction Trace, the weak function
itrace_record__init() must be implemented.

Equally to decode an Instruction Trace, the
Instruction Tracing type must be added to the
perf_event__process_itrace_info() function.

This patch makes those two changes plus hooks
up default config for the intel_pt PMU.  Also
some brief documentation is provided for
using the tools with intel_pt.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
---
 tools/perf/Documentation/intel-pt.txt | 581 ++++++++++++++++++++++++++++++++++
 tools/perf/arch/x86/Makefile          |   2 +
 tools/perf/arch/x86/util/itrace.c     |  41 +++
 tools/perf/arch/x86/util/pmu.c        |  13 +
 tools/perf/util/itrace.c              |   7 +-
 5 files changed, 642 insertions(+), 2 deletions(-)
 create mode 100644 tools/perf/Documentation/intel-pt.txt
 create mode 100644 tools/perf/arch/x86/util/itrace.c
 create mode 100644 tools/perf/arch/x86/util/pmu.c

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
new file mode 100644
index 0000000..977f5a0
--- /dev/null
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -0,0 +1,581 @@
+Intel Processor Trace
+=====================
+
+perf record
+===========
+
+new event
+---------
+
+The Intel PT kernel driver creates a new PMU for Intel PT.  PMU events are
+selected by providing the PMU name followed by the "config" separated by slashes.
+An enhancement has been made to allow default "config" e.g. the option
+
+	-e intel_pt//
+
+will use a default config value.  Currently that is the same as
+
+	-e intel_pt/tsc,noretcomp=0/
+
+which is the same as
+
+	-e intel_pt/tsc=1,noretcomp=0/
+
+The config terms are listed in /sys/devices/intel_pt/format.  They are bit
+fields within the itrace_config member of the struct perf_event_attr which is
+passed to the kernel by the perf_event_open system call.  They correspond to bit
+fields in the IA32_RTIT_CTL MSR.  Here is a list of them and their definitions:
+
+	$ for f in `ls /sys/devices/intel_pt/format`;do
+	> echo $f
+	> cat /sys/devices/intel_pt/format/$f
+	> done
+	noretcomp
+	itrace_config:11
+	tsc
+	itrace_config:10
+
+Note that the default config must be overridden for each term i.e.
+
+	-e intel_pt/noretcomp=0/
+
+is the same as:
+
+	-e intel_pt/tsc=1,noretcomp=0/
+
+So, to disable TSC packets use:
+
+	-e intel_pt/tsc=0/
+
+It is also possible to specify the itrace_config value explicitly:
+
+	-e intel_pt/itrace_config=0x400/
+
+Note that, as with all events, the event is suffixed with event modifiers:
+
+	u	userspace
+	k	kernel
+	h	hypervisor
+	G	guest
+	H	host
+	p	precise ip
+
+'h', 'G' and 'H' are for virtualization which is not supported by Intel PT.
+'p' is also not relevant to Intel PT.  So only options 'u' and 'k' are
+meaningful for Intel PT.
+
+perf_event_attr is displayed if the -vv option is used e.g.
+
+	------------------------------------------------------------
+	perf_event_attr:
+	type                6
+	size                120
+	config              0
+	sample_period       1
+	sample_freq         1
+	sample_type         0x10087
+	read_format         0x4
+	disabled            1    inherit             1
+	pinned              0    exclusive           0
+	exclude_user        0    exclude_kernel      1
+	exclude_hv          1    exclude_idle        0
+	mmap                0    comm                0
+	freq                0    inherit_stat        0
+	enable_on_exec      1    task                0
+	watermark           0    precise_ip          0
+	mmap_data           0    sample_id_all       1
+	exclude_host        0    exclude_guest       0
+	excl.callchain_kern 0    excl.callchain_user 0
+	mmap2               0
+	wakeup_events       0
+	wakeup_watermark    0
+	bp_type             0
+	bp_addr             0
+	config1             0
+	bp_len              0
+	config2             0
+	branch_sample_type  0
+	sample_regs_user    0
+	sample_stack_user   0
+	itrace_config       0xc00
+	itrace_watermark    0
+	itrace_sample_type  0
+	itrace_sample_size  0
+	------------------------------------------------------------
+	perf_event_open: pid 20956  cpu 0  group_fd -1  flags 0
+	perf_event_open: pid 20956  cpu 1  group_fd -1  flags 0
+	perf_event_open: pid 20956  cpu 2  group_fd -1  flags 0
+	perf_event_open: pid 20956  cpu 3  group_fd -1  flags 0
+	------------------------------------------------------------
+
+
+new sampling option
+-------------------
+
+To select Intel PT "sampling" a new option has been added:
+
+	-I
+
+Optionally it can be followed by the sample size in bytes e.g.
+
+	-I4096
+
+It is important to select a sample size that is big enough to contain at least
+one PSB packet.  If not a warning will be displayed:
+
+	Intel PT sample size (%zu) may be too small for PSB period (%zu)
+
+The calculation used for that is: if sample_size <= psb_period + 256 display the
+warning.
+
+The default sample size is currently 4KiB
+
+The sample size is passed in itrace_sample_size in struct perf_event_attr.  The
+sample size is limited by the maximum event size which is 64KiB.  It is
+difficult to know how big the event might be without the trace sample attached,
+but the tool validates that the sample size is not greater than 60KiB.
+
+The sample size is displayed if the option -vv is used e.g.
+
+	Intel PT sample size: %zu
+
+Optionally the "config" can be specified in exactly the same fashion as the
+intel_pt event but without slashes e.g.
+
+	-Itsc=0,noretcomp=0
+
+or
+
+	-I1024tsc=0,noretcomp=0
+
+
+new snapshot option
+-------------------
+
+To select snapshot mode a new option has been added:
+
+	-S
+
+Optionally it can be followed by the snapshot size e.g.
+
+	-S0x100000
+
+The default snapshot size is the itrace mmap size.  If neither itrace mmap size
+nor snapshot size is specified, then the default is 4MiB for privileged users
+(or if /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for unprivileged users.
+If an unprivileged user does not specify mmap pages, the mmap pages will be
+reduced as described in the 'new itrace mmap size option' section below.
+
+The snapshot size is displayed if the option -vv is used e.g.
+
+	Intel PT snapshot size: %zu
+
+
+new itrace mmap size option
+---------------------------
+
+Intel PT buffer size is specified by an addition to the -m option e.g.
+
+	-m,16
+
+selects a buffer size of 16 pages i.e. 64KiB.
+
+Note that the existing functionality of -m is unchanged.  The itrace mmap size
+is specified by the optional addition of a comma and the value.
+
+The default itrace mmap size for Intel PT is 4MiB/page_size for privileged users
+(or if /proc/sys/kernel/perf_event_paranoid < 0), 128KiB for unprivileged users.
+If an unprivileged user does not specify mmap pages, the mmap pages will be
+reduced from the default 512KiB/page_size to 256KiB/page_size, otherwise the
+user is likely to get an error as they exceed their mlock limit (Max locked
+memory as shown in /proc/self/limits).  Note that perf does not count the first
+512KiB (actually /proc/sys/kernel/perf_event_mlock_kb minus 1 page) per cpu
+against the mlock limit so an unprivileged user is allowed 512KiB per cpu plus
+their mlock limit (which defaults to 64KiB but is not multiplied by the number
+of cpus).
+
+In full-trace mode, powers of two and multiples of 128MiB are allowed for buffer
+size, with a minimum size of 1 page.  In snapshot mode, it is the same but the
+minimum size is 2 pages.  In sample mode, the driver manages the buffer size.
+
+The mmap size and itrace mmap size are displayed if the -vv option is used e.g.
+
+	mmap length 528384
+	itrace mmap length 4198400
+
+
+Intel PT modes of operation
+---------------------------
+
+Intel PT can be used in 3 modes:
+	full-trace mode
+	sample mode
+	snapshot mode
+
+Full-trace mode traces continuously e.g.
+
+	perf record -e intel_pt//u uname
+
+Sample mode attaches a Intel PT sample to other events e.g.
+
+	perf record -e branch-misses:u -I uname
+
+Snapshot mode captures the available data when a signal is sent e.g.
+
+	perf record -v -e intel_pt//u -S ./loopy 1000000000 &
+	[1] 11435
+	kill -USR2 11435
+	Recording instruction tracing snapshot
+
+Note that the signal sent is SIGUSR2.
+Note that "Recording instruction tracing snapshot" is displayed because the -v
+option is used.
+
+None of the 3 modes can be used together.
+
+
+Buffer handling
+---------------
+
+There may be buffer limitations (i.e. single ToPa entry) which means that actual
+buffer sizes are limited to powers of 2 up to 128MiB.  In order to provide other
+sizes, and in particular an arbitrarily large size, multiple buffers are
+logically concatenated.  However an interrupt must be used to switch between
+buffers.  That has two potential problems:
+	a) the interrupt may not be handled in time so that the current buffer
+	becomes full and some trace data is lost.
+	b) the interrupts may slow the system and affect the performance
+	results.
+
+If trace data is lost, the driver adds a PERF_RECORD_ITRACE_LOST event to the
+event stream which the tools report as an error.
+
+In full-trace mode, the driver waits for data to be copied out before allowing
+the (logical) buffer to wrap-around.  If data is not copied out quickly enough,
+again a PERF_RECORD_ITRACE_LOST event is added to the event stream.  If the
+driver has to wait, the intel_pt event gets disabled.  Because it is difficult
+to know when that happens, perf tools always re-enable the intel_pt event
+after copying out data.
+
+Note that the choice of buffer size and output device (e.g. a fast SSD vs a slow
+disk) will affect the ability to capture a complete trace.
+
+
+Intel PT and build ids
+----------------------
+
+By default "perf record" post-processes the event stream to find all build ids
+for executables for all addresses sampled.  Deliberately, Intel PT is not
+decoded for that purpose (it would take too long).  Instead the build ids for
+all executables encountered (due to mmap, comm or task events) are included
+in the perf.data file.
+
+To see buildids included in the perf.data file use the command:
+
+	perf buildid-list
+
+If the perf.data file contains Intel PT data, that is the same as:
+
+	perf buildid-list --with-hits
+
+
+Snapshot mode and event disabling
+---------------------------------
+
+In order to make a snapshot, the intel_pt event is disabled using an IOCTL,
+namely PERF_EVENT_IOC_DISABLE.  However doing that can also disable the
+collection of side-band information.  In order to prevent that,  a dummy
+software event has been introduced that permits tracking events (like mmaps) to
+continue to be recorded while intel_pt is disabled.  That is important to ensure
+there is complete side-band information to allow the decoding of subsequent
+snapshots.
+
+A test has been created for that.  To find the test:
+
+	perf test list
+	...
+	23: Test using a dummy software event to keep tracking
+
+To run the test:
+
+	perf test 23
+	23: Test using a dummy software event to keep tracking     : Ok
+
+
+perf record modes (nothing new here)
+------------------------------------
+
+perf record essentially operates in one of three modes:
+	per thread
+	per cpu
+	workload only
+
+"per thread" mode is selected by -t or by --per-thread (with -p or -u or just a
+workload).
+"per cpu" is selected by -C or -a.
+"workload only" mode is selected by not using the other options but providing a
+command to run (i.e. the workload).
+
+In per-thread mode an exact list of threads is traced.  There is no inheritance.
+Each thread has its own event buffer.
+
+In per-cpu mode all processes (or processes from the selected cgroup i.e. -G
+option, or processes selected with -p or -u) are traced.  Each cpu has its own
+buffer. Inheritance is allowed.
+
+In workload-only mode, the workload is traced but with per-cpu buffers.
+Inheritance is allowed.  Note that you can now trace a workload in per-thread
+mode by using the --per-thread option.
+
+
+Privileged vs non-privileged users
+----------------------------------
+
+Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users
+have memory limits imposed upon them.  That affects what buffer sizes they can
+have as outlined above.
+
+Unless /proc/sys/kernel/perf_event_paranoid is set to -1, unprivileged users are
+not permitted to use tracepoints which means there is insufficient side-band
+information to decode Intel PT in per-cpu mode, and potentially workload-only
+mode too if the workload creates new processes.
+
+Note also, that to use tracepoints, read-access to debugfs is required.  So if
+debugfs is not mounted or the user does not have read-access, it will again not
+be possible to decode Intel PT in per-cpu mode.
+
+Note however, Intel PT samples are always decoded because the sample is very
+likely to be mainly from the thread that was running when the sample was taken.
+Obviously, if the sample includes a context switch from another process the
+decoding will fail if tracepoints were not available.
+
+
+sched_switch tracepoint
+-----------------------
+
+The sched_switch tracepoint is used to provide side-band data for Intel PT
+decoding.  sched_switch events are automatically added. e.g. the second event
+shown below
+
+	$ perf record -vv -e intel_pt//u uname
+	------------------------------------------------------------
+	perf_event_attr:
+	type                6
+	size                120
+	config              0
+	sample_period       1
+	sample_freq         1
+	sample_type         0x10087
+	read_format         0x4
+	disabled            1    inherit             1
+	pinned              0    exclusive           0
+	exclude_user        0    exclude_kernel      1
+	exclude_hv          1    exclude_idle        0
+	mmap                0    comm                0
+	freq                0    inherit_stat        0
+	enable_on_exec      1    task                0
+	watermark           0    precise_ip          0
+	mmap_data           0    sample_id_all       1
+	exclude_host        0    exclude_guest       0
+	excl.callchain_kern 0    excl.callchain_user 0
+	mmap2               0
+	wakeup_events       0
+	wakeup_watermark    0
+	bp_type             0
+	bp_addr             0
+	config1             0
+	bp_len              0
+	config2             0
+	branch_sample_type  0
+	sample_regs_user    0
+	sample_stack_user   0
+	itrace_config       0xc00
+	itrace_watermark    0
+	itrace_sample_type  0
+	itrace_sample_size  0
+	------------------------------------------------------------
+	perf_event_open: pid 21206  cpu 0  group_fd -1  flags 0
+	perf_event_open: pid 21206  cpu 1  group_fd -1  flags 0
+	perf_event_open: pid 21206  cpu 2  group_fd -1  flags 0
+	perf_event_open: pid 21206  cpu 3  group_fd -1  flags 0
+	------------------------------------------------------------
+	perf_event_attr:
+	type                2
+	size                120
+	config              0x143
+	sample_period       1
+	sample_freq         1
+	sample_type         0x10587
+	read_format         0x4
+	disabled            0    inherit             1
+	pinned              0    exclusive           0
+	exclude_user        0    exclude_kernel      0
+	exclude_hv          0    exclude_idle        0
+	mmap                0    comm                0
+	freq                0    inherit_stat        0
+	enable_on_exec      0    task                0
+	watermark           0    precise_ip          0
+	mmap_data           0    sample_id_all       1
+	exclude_host        0    exclude_guest       1
+	excl.callchain_kern 0    excl.callchain_user 0
+	mmap2               0
+	wakeup_events       0
+	wakeup_watermark    0
+	bp_type             0
+	bp_addr             0
+	config1             0
+	bp_len              0
+	config2             0
+	branch_sample_type  0
+	sample_regs_user    0
+	sample_stack_user   0
+	itrace_config       0
+	itrace_watermark    0
+	itrace_sample_type  0
+	itrace_sample_size  0
+	------------------------------------------------------------
+	perf_event_open: pid -1  cpu 0  group_fd -1  flags 0
+	perf_event_open: pid -1  cpu 1  group_fd -1  flags 0
+	perf_event_open: pid -1  cpu 2  group_fd -1  flags 0
+	perf_event_open: pid -1  cpu 3  group_fd -1  flags 0
+	------------------------------------------------------------
+	perf_event_attr:
+	type                1
+	size                120
+	config              0x9
+	sample_period       1
+	sample_freq         1
+	sample_type         0x10007
+	read_format         0x4
+	disabled            1    inherit             1
+	pinned              0    exclusive           0
+	exclude_user        0    exclude_kernel      1
+	exclude_hv          1    exclude_idle        0
+	mmap                1    comm                1
+	freq                0    inherit_stat        0
+	enable_on_exec      1    task                0
+	watermark           0    precise_ip          0
+	mmap_data           0    sample_id_all       1
+	exclude_host        0    exclude_guest       0
+	excl.callchain_kern 0    excl.callchain_user 0
+	mmap2               1
+	wakeup_events       0
+	wakeup_watermark    0
+	bp_type             0
+	bp_addr             0
+	config1             0
+	bp_len              0
+	config2             0
+	branch_sample_type  0
+	sample_regs_user    0
+	sample_stack_user   0
+	itrace_config       0
+	itrace_watermark    0
+	itrace_sample_type  0
+	itrace_sample_size  0
+	------------------------------------------------------------
+	perf_event_open: pid 21206  cpu 0  group_fd -1  flags 0
+	perf_event_open: pid 21206  cpu 1  group_fd -1  flags 0
+	perf_event_open: pid 21206  cpu 2  group_fd -1  flags 0
+	perf_event_open: pid 21206  cpu 3  group_fd -1  flags 0
+	mmap length 528384
+	itrace mmap length 4198400
+	perf event ring buffer mmapped per cpu
+	Synthesizing itrace information
+	Linux
+	[ perf record: Woken up 1 times to write data ]
+	[ perf record: Captured and wrote 0.060 MB perf.data ]
+
+Note, the sched_switch event is only added if the user is permitted to use it
+and only in per-cpu mode.
+
+Note also, the sched_switch event is only added if TSC packets are requested.
+That is because, in the absence of timing information, the sched_switch events
+cannot be matched against the Intel PT trace.
+
+
+perf script
+===========
+
+By default, perf script will decode trace data found in the perf.data file.
+This can be further controlled by new option -Z.
+
+
+New instruction trace option
+----------------------------
+
+Having no option is the same as
+
+	-Z
+
+which, in turn, is the same as
+
+	-Zibe
+
+The letters are:
+
+	i	synthesize "instructions" events
+	b	synthesize "branches" events
+	e	synthesize tracing error events
+
+"Instructions" events look like they were recorded by "perf record -e
+instructions".
+
+"Branches" events look like they were recorded by "perf record -e branches".
+
+Error events are new.  They show where the decoder lost the trace.  Error events
+are quite important.  Users must know if what they are seeing is a complete
+picture or not.
+
+In addition, the period of the "instructions" event can be specified. e.g.
+
+	-Zi10us
+
+sets the period to 10us i.e. one  instruction sample is synthesized for each 10
+microseconds of trace.  Alternatives to "us" are "ms" (milliseconds),
+"ns" (nanoseconds), "t" (TSC ticks) or "i" (instructions).
+
+"ms", "us" and "ns" are converted to TSC ticks.
+
+The timing information included with Intel PT does not give the time of every
+instruction.  Consequently, for the purpose of sampling, the decoder estimates
+the time since the last timing packet based on 1 tick per instruction.  The time
+on the sample is *not* adjusted and reflects the last known value of TSC.
+
+For Intel PT, the default period is 1000 instructions.
+
+To disable trace decoding entirely, use the option --no-itrace.
+
+
+dump option
+-----------
+
+perf script has an option (-D) to "dump" the events i.e. display the binary
+data.
+
+When -D is used, Intel PT packets are displayed.  The packet decoder does not
+pay attention to PSB packets, but just decodes the bytes - so the packets seen
+by the actual decoder may not be identical in places where the data is corrupt.
+One example of that would be when the buffer-switching interrupt has been too
+slow, and the buffer has been filled completely.  In that case, the last packet
+in the buffer might be truncated and immediately followed by a PSB as the trace
+continues in the next buffer.
+
+To disable the display of Intel PT packets, combine the -D option with
+--no-itrace.
+
+
+perf report
+===========
+
+By default, perf report will decode trace data found in the perf.data file.
+This can be further controlled by new option -Z exactly the same as perf script.
+
+
+perf inject
+===========
+
+perf inject also accepts the -Z option in which case tracing data is removed and
+replaced with the synthesized events. e.g.
+
+	perf inject -Z -i perf.data -o perf.data.new
diff --git a/tools/perf/arch/x86/Makefile b/tools/perf/arch/x86/Makefile
index 8801fe0..2700698 100644
--- a/tools/perf/arch/x86/Makefile
+++ b/tools/perf/arch/x86/Makefile
@@ -7,4 +7,6 @@ LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/unwind.o
 endif
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/header.o
 LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/tsc.o
+LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/pmu.o
+LIB_OBJS += $(OUTPUT)arch/$(ARCH)/util/itrace.o
 LIB_H += arch/$(ARCH)/util/tsc.h
diff --git a/tools/perf/arch/x86/util/itrace.c b/tools/perf/arch/x86/util/itrace.c
new file mode 100644
index 0000000..507eee4
--- /dev/null
+++ b/tools/perf/arch/x86/util/itrace.c
@@ -0,0 +1,41 @@
+/*
+ * itrace.c: instruction tracing support
+ * Copyright (c) 2013, Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along with
+ * this program; if not, write to the Free Software Foundation, Inc.,
+ * 51 Franklin St - Fifth Floor, Boston, MA 02110-1301 USA.
+ *
+ */
+
+#include "../../util/header.h"
+#include "../../util/itrace.h"
+#include "../../util/intel-pt.h"
+
+struct itrace_record *itrace_record__init(int *err)
+{
+	char buffer[64];
+	int ret;
+
+	*err = 0;
+
+	ret = get_cpuid(buffer, sizeof(buffer));
+	if (ret) {
+		*err = ret;
+		return NULL;
+	}
+
+	if (!strncmp(buffer, "GenuineIntel,", 13))
+		return intel_pt_recording_init(err);
+
+	return NULL;
+}
diff --git a/tools/perf/arch/x86/util/pmu.c b/tools/perf/arch/x86/util/pmu.c
new file mode 100644
index 0000000..699a7c2
--- /dev/null
+++ b/tools/perf/arch/x86/util/pmu.c
@@ -0,0 +1,13 @@
+#include <string.h>
+
+#include <linux/perf_event.h>
+
+#include "../../util/intel-pt.h"
+#include "../../util/pmu.h"
+
+struct perf_event_attr *perf_pmu__get_default_config(struct perf_pmu *pmu)
+{
+	if (!strcmp(pmu->name, INTEL_PT_PMU_NAME))
+		return intel_pt_pmu_default_config(pmu);
+	return NULL;
+}
diff --git a/tools/perf/util/itrace.c b/tools/perf/util/itrace.c
index 8ecbfb1..bea3cf7 100644
--- a/tools/perf/util/itrace.c
+++ b/tools/perf/util/itrace.c
@@ -46,6 +46,8 @@
 #include "debug.h"
 #include "parse-options.h"
 
+#include "intel-pt.h"
+
 int itrace_mmap__mmap(struct itrace_mmap *mm, struct itrace_mmap_params *mp,
 		      int fd)
 {
@@ -964,9 +966,9 @@ static bool itrace__dont_decode(struct perf_session *session)
 	       session->itrace_synth_opts->dont_decode;
 }
 
-int perf_event__process_itrace_info(struct perf_tool *tool __maybe_unused,
+int perf_event__process_itrace_info(struct perf_tool *tool,
 				    union perf_event *event,
-				    struct perf_session *session __maybe_unused)
+				    struct perf_session *session)
 {
 	enum itrace_type type = event->itrace_info.type;
 
@@ -978,6 +980,7 @@ int perf_event__process_itrace_info(struct perf_tool *tool __maybe_unused,
 
 	switch (type) {
 	case PERF_ITRACE_INTEL_PT:
+		return intel_pt_process_itrace_info(tool, event, session);
 	case PERF_ITRACE_UNKNOWN:
 	default:
 		return -EINVAL;
-- 
1.8.5.1


^ permalink raw reply related	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 00/71] perf: Add support for Intel Processor Trace
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (70 preceding siblings ...)
  2013-12-11 12:37 ` [PATCH v0 71/71] perf tools: Take Intel PT into use Alexander Shishkin
@ 2013-12-11 13:04 ` Ingo Molnar
  2013-12-11 13:14   ` Alexander Shishkin
  2013-12-11 13:52 ` Arnaldo Carvalho de Melo
  72 siblings, 1 reply; 163+ messages in thread
From: Ingo Molnar @ 2013-12-11 13:04 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen, Adrian Hunter


* Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:

> Hi,
> 
> This patchset adds support for Intel Processor Trace (PT) extension 
> [1] of Intel Architecture that allows the capture of information 
> about software execution flow, to the perf kernel and userspace 
> infrastructure. We provide an abstraction for it called "itrace" for 
> "instruction trace" ([2]).

Ok, this feature looks rather interesting.

On the hardware side this is essentially BTS (Branch Trace Store) on 
steroids (with many extensions), right?

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 00/71] perf: Add support for Intel Processor Trace
  2013-12-11 13:04 ` [PATCH v0 00/71] perf: Add support for Intel Processor Trace Ingo Molnar
@ 2013-12-11 13:14   ` Alexander Shishkin
  2013-12-11 13:47     ` Ingo Molnar
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-11 13:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen, Adrian Hunter

Ingo Molnar <mingo@kernel.org> writes:

> * Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:
>
>> Hi,
>> 
>> This patchset adds support for Intel Processor Trace (PT) extension 
>> [1] of Intel Architecture that allows the capture of information 
>> about software execution flow, to the perf kernel and userspace 
>> infrastructure. We provide an abstraction for it called "itrace" for 
>> "instruction trace" ([2]).
>
> Ok, this feature looks rather interesting.
>
> On the hardware side this is essentially BTS (Branch Trace Store) on 
> steroids (with many extensions), right?

Yes, you get timestamps and all sorts of other useful data in the trace
and the performance intrusion is much less than that of BTS.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 00/71] perf: Add support for Intel Processor Trace
  2013-12-11 13:14   ` Alexander Shishkin
@ 2013-12-11 13:47     ` Ingo Molnar
  2013-12-16 11:08       ` Alexander Shishkin
  0 siblings, 1 reply; 163+ messages in thread
From: Ingo Molnar @ 2013-12-11 13:47 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen, Adrian Hunter


* Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:

> Ingo Molnar <mingo@kernel.org> writes:
> 
> > * Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:
> >
> >> Hi,
> >> 
> >> This patchset adds support for Intel Processor Trace (PT) extension 
> >> [1] of Intel Architecture that allows the capture of information 
> >> about software execution flow, to the perf kernel and userspace 
> >> infrastructure. We provide an abstraction for it called "itrace" for 
> >> "instruction trace" ([2]).
> >
> > Ok, this feature looks rather interesting.
> >
> > On the hardware side this is essentially BTS (Branch Trace Store) 
> > on steroids (with many extensions), right?
> 
> Yes, you get timestamps and all sorts of other useful data in the 
> trace and the performance intrusion is much less than that of BTS.

So the problem I see here right now that BTS is rarely used and AFAICS 
close to unmaintained. It has some very minimal support in 'perf 
script' but that's all I can see.

So one necessary precondition to merging PT support would be to have a 
convincing case that this kind of stuff is generally useful.

One good approach to do that would be to unify the BTS and PT tooling 
(the kernel side can be unified as well, to the extent it makes 
sense), and to prove it via actual functionality that this stuff 
matters. BTS is available widely, so the tooling can be tested by 
anyone who's interested.

Allow people to record crashes in core dumps, allow them to look at 
histograms/spectrograms of BTS/PT traces, zoom in on actual traces, 
etc. - make it easier to handle this huge amount of data and visualize 
traces in other ways you find useful, etc.

None of that is done right now via BTS so nobody uses it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 00/71] perf: Add support for Intel Processor Trace
  2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
                   ` (71 preceding siblings ...)
  2013-12-11 13:04 ` [PATCH v0 00/71] perf: Add support for Intel Processor Trace Ingo Molnar
@ 2013-12-11 13:52 ` Arnaldo Carvalho de Melo
  72 siblings, 0 replies; 163+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-12-11 13:52 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

Em Wed, Dec 11, 2013 at 02:36:12PM +0200, Alexander Shishkin escreveu:
> Hi,
> 
> This patchset adds support for Intel Processor Trace (PT) extension [1] of
> Intel Architecture that allows the capture of information about software
> execution flow, to the perf kernel and userspace infrastructure. We
> provide an abstraction for it called "itrace" for "instruction
> trace" ([2]).

Nice stuff! And big patchset, I'll start by cherry picking the tooling
chunks that I find OK, will then comment on whatever is left.

While that happens I'm sure the new kernel bits will be discussed.

- Arnaldo

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 14/71] perf tools: Add cpu to struct thread
  2013-12-11 12:36 ` [PATCH v0 14/71] perf tools: Add cpu to struct thread Alexander Shishkin
@ 2013-12-11 14:19   ` Arnaldo Carvalho de Melo
  2013-12-12 14:14     ` Adrian Hunter
  2013-12-11 19:30   ` David Ahern
  1 sibling, 1 reply; 163+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-12-11 14:19 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen, Adrian Hunter

Em Wed, Dec 11, 2013 at 02:36:26PM +0200, Alexander Shishkin escreveu:
> From: Adrian Hunter <adrian.hunter@intel.com>
> 
> Tools may wish to track on which cpu a thread
> is running.  Add 'cpu' to struct thread for
> that purpose.  Also add machine functions to
> get / set the cpu for a tid.
> 
> This will be used to determine the cpu when
> decoding a per-thread Instruction Trace.
> 
> 
> +++ b/tools/perf/util/machine.c
> @@ -1412,3 +1412,29 @@ pid_t machine__get_thread_pid(struct machine *machine, pid_t tid)
>  
>  	return thread->pid_;
>  }
> +
> +int machine__get_thread_cpu(struct machine *machine, pid_t tid, pid_t *pid)
> +{
> +	struct thread *thread = machine__find_thread(machine, tid);
> +
> +	if (!thread)
> +		return -1;
> +
> +	if (pid)
> +		*pid = thread->pid_;
> +
> +	return thread->cpu;
> +}

What is the problem with:

	struct thread *thread = machine__find_thread(machine, tid);
	pid_t pid = thread->pid_;
	int cpu = thread->cpu;

In your case you'll have:

	int pid;
	int cpu = machine__get_thread_cpu(machine, tid, &pid);

Which is slightly more compact, but then we end up with a function that
from its name should just get a 'cpu' but also asks for the pid.

I think it is better to just use what we have (machine__find_thread),
have a 'thread' variable and then use any of its members, directly.

- ARnaldo

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 21/71] perf tools: Add feature test for __sync_val_compare_and_swap
  2013-12-11 12:36 ` [PATCH v0 21/71] perf tools: Add feature test for __sync_val_compare_and_swap Alexander Shishkin
@ 2013-12-11 19:24   ` Arnaldo Carvalho de Melo
  2013-12-11 20:07     ` Andi Kleen
  2013-12-12 13:42     ` Adrian Hunter
  0 siblings, 2 replies; 163+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-12-11 19:24 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen, Adrian Hunter

Em Wed, Dec 11, 2013 at 02:36:33PM +0200, Alexander Shishkin escreveu:
> From: Adrian Hunter <adrian.hunter@intel.com>
> 
> Add a feature test for __sync_val_compare_and_swap()
> and __sync_bool_compare_and_swap()

This makes the global feature tests to be rebuilt all the time, i.e. no
more caching on a relatively recent system:

[acme@ssdandy linux]$ gcc -v
Using built-in specs.
COLLECT_GCC=/usr/bin/gcc
COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.7.2/lto-wrapper
Target: x86_64-redhat-linux
Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
--infodir=/usr/share/info
--with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap
--enable-shared --enable-threads=posix --enable-checking=release
--disable-build-with-cxx --disable-build-poststage1-with-cxx
--with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
--enable-gnu-unique-object --enable-linker-build-id
--with-linker-hash-style=gnu
--enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto
--enable-plugin --enable-initfini-array --enable-java-awt=gtk
--disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
--enable-libgcj-multifile --enable-java-maintainer-mode
--with-ecj-jar=/usr/share/java/eclipse-ecj.jar
--disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic
--with-arch_32=i686 --build=x86_64-redhat-linux
Thread model: posix
gcc version 4.7.2 20121109 (Red Hat 4.7.2-8) (GCC) 
[acme@ssdandy linux]$o

[acme@ssdandy linux]$ cat /etc/fedora-release 
Fedora release 18 (Spherical Cow)

Can you provide more info about these gcc builtins and what is the
minimum system where this test will succeed?

In this system it works, as I can see:

...         sync-compare-and-swap: [ on  ]

[acme@ssdandy linux]$ time make O=/tmp/build/perf -C tools/perf/
install-bin
make: Entering directory `/home/acme/git/linux/tools/perf'
  BUILD:   Doing 'make -j8' parallel build

Auto-detecting system features:
...                     backtrace: [ on  ]
...                         dwarf: [ on  ]
...                fortify-source: [ on  ]
...         sync-compare-and-swap: [ on  ]
...                         glibc: [ on  ]
...                          gtk2: [ on  ]
...                  gtk2-infobar: [ on  ]
...                      libaudit: [ on  ]
...                        libbfd: [ on  ]
...                        libelf: [ on  ]
...             libelf-getphdrnum: [ on  ]
...                   libelf-mmap: [ on  ]
...                       libnuma: [ on  ]
...                       libperl: [ on  ]
...                     libpython: [ on  ]
...             libpython-version: [ on  ]
...                      libslang: [ on  ]
...                     libunwind: [ on  ]
...                       on-exit: [ on  ]
...            stackprotector-all: [ on  ]
...                       timerfd: [ on  ]

  GEN      perf-archive

Please check the recent changes from Jean Pihet, I think he had similar
problems, i.e. caching stopped working.

- Arnaldo

 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/config/Makefile                                 |  5 +++++
>  tools/perf/config/feature-checks/Makefile                  |  4 ++++
>  tools/perf/config/feature-checks/test-all.c                |  5 +++++
>  .../config/feature-checks/test-sync-compare-and-swap.c     | 14 ++++++++++++++
>  4 files changed, 28 insertions(+)
>  create mode 100644 tools/perf/config/feature-checks/test-sync-compare-and-swap.c
> 
> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
> index bae1072..43a2879 100644
> --- a/tools/perf/config/Makefile
> +++ b/tools/perf/config/Makefile
> @@ -126,6 +126,7 @@ CORE_FEATURE_TESTS =			\
>  	backtrace			\
>  	dwarf				\
>  	fortify-source			\
> +	sync-compare-and-swap		\
>  	glibc				\
>  	gtk2				\
>  	gtk2-infobar			\
> @@ -234,6 +235,10 @@ CFLAGS += -I$(LIB_INCLUDE)
>  
>  CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
>  
> +ifeq ($(feature-sync-compare-and-swap), 1)
> +  CFLAGS += -DHAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
> +endif
> +
>  ifndef NO_BIONIC
>    $(call feature_check,bionic)
>    ifeq ($(feature-bionic), 1)
> diff --git a/tools/perf/config/feature-checks/Makefile b/tools/perf/config/feature-checks/Makefile
> index b8bb749..b4b7bb2 100644
> --- a/tools/perf/config/feature-checks/Makefile
> +++ b/tools/perf/config/feature-checks/Makefile
> @@ -5,6 +5,7 @@ FILES=					\
>  	test-bionic			\
>  	test-dwarf			\
>  	test-fortify-source		\
> +	test-sync-compare-and-swap	\
>  	test-glibc			\
>  	test-gtk2			\
>  	test-gtk2-infobar		\
> @@ -140,6 +141,9 @@ test-backtrace:
>  test-timerfd:
>  	$(BUILD)
>  
> +test-sync-compare-and-swap:
> +	$(BUILD)
> +
>  -include *.d
>  
>  ###############################
> diff --git a/tools/perf/config/feature-checks/test-all.c b/tools/perf/config/feature-checks/test-all.c
> index 9b8a544..5cfec18 100644
> --- a/tools/perf/config/feature-checks/test-all.c
> +++ b/tools/perf/config/feature-checks/test-all.c
> @@ -89,6 +89,10 @@
>  # include "test-stackprotector-all.c"
>  #undef main
>  
> +#define main main_test_sync_compare_and_swap
> +# include "test-sync-compare-and-swap.c"
> +#undef main
> +
>  int main(int argc, char *argv[])
>  {
>  	main_test_libpython();
> @@ -111,6 +115,7 @@ int main(int argc, char *argv[])
>  	main_test_libnuma();
>  	main_test_timerfd();
>  	main_test_stackprotector_all();
> +	main_test_sync_compare_and_swap();
>  
>  	return 0;
>  }
> diff --git a/tools/perf/config/feature-checks/test-sync-compare-and-swap.c b/tools/perf/config/feature-checks/test-sync-compare-and-swap.c
> new file mode 100644
> index 0000000..c34d4ca
> --- /dev/null
> +++ b/tools/perf/config/feature-checks/test-sync-compare-and-swap.c
> @@ -0,0 +1,14 @@
> +#include <stdint.h>
> +
> +volatile uint64_t x;
> +
> +int main(int argc, char *argv[])
> +{
> +	uint64_t old, new = argc;
> +
> +	argv = argv;
> +	do {
> +		old = __sync_val_compare_and_swap(&x, 0, 0);
> +	} while (!__sync_bool_compare_and_swap(&x, old, new));
> +	return old == new;
> +}
> -- 
> 1.8.5.1

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit
  2013-12-11 12:36 ` [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit Alexander Shishkin
@ 2013-12-11 19:26   ` David Ahern
  2013-12-11 19:54     ` Arnaldo Carvalho de Melo
  2013-12-12 12:05     ` Adrian Hunter
  2013-12-16  3:16   ` David Ahern
  1 sibling, 2 replies; 163+ messages in thread
From: David Ahern @ 2013-12-11 19:26 UTC (permalink / raw)
  To: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen, Adrian Hunter

On 12/11/13, 5:36 AM, Alexander Shishkin wrote:
> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> index a0c7c59..80817ec 100644
> --- a/tools/perf/util/dso.c
> +++ b/tools/perf/util/dso.c
> @@ -446,6 +446,7 @@ struct dso *dso__new(const char *name)
>   		dso->cache = RB_ROOT;
>   		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
>   		dso->data_type   = DSO_BINARY_TYPE__NOT_FOUND;
> +		dso->is_64_bit = (sizeof(void *) == 8);

Isnt' that going to record the bitness of perf when it is compiled?

>   		dso->loaded = 0;
>   		dso->rel = 0;
>   		dso->sorted_by_name = 0;
> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> index 384f2d9..62680e1 100644
> --- a/tools/perf/util/dso.h
> +++ b/tools/perf/util/dso.h
> @@ -91,6 +91,7 @@ struct dso {
>   	u8		 annotate_warned:1;
>   	u8		 sname_alloc:1;
>   	u8		 lname_alloc:1;
> +	u8		 is_64_bit:1;

The is_64_bit name seems a bit hardcoded. We need something similar for 
perf-trace to set the audit machine type for resolving syscalls. How 
about having this field set a machine type rather than a "64-bit" flag?

David

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 13/71] perf tools: Add machine__get_thread_pid()
  2013-12-11 12:36 ` [PATCH v0 13/71] perf tools: Add machine__get_thread_pid() Alexander Shishkin
@ 2013-12-11 19:28   ` David Ahern
  2013-12-11 21:18     ` Andi Kleen
  2013-12-12 13:56     ` Adrian Hunter
  0 siblings, 2 replies; 163+ messages in thread
From: David Ahern @ 2013-12-11 19:28 UTC (permalink / raw)
  To: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen, Adrian Hunter

On 12/11/13, 5:36 AM, Alexander Shishkin wrote:
> From: Adrian Hunter <adrian.hunter@intel.com>
>
> Add a function to get the pid from the tid.
>
> This is needed when using the sched_switch
> tracepoint to follow object code execution.
> sched_switch identifies the thread but, to
> find the process mmaps, we need the process
> pid.

Are you looking up the current or next task? If the former why not use 
sample->pid rather than parsing the sched_switch tracepoint?

David


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 14/71] perf tools: Add cpu to struct thread
  2013-12-11 12:36 ` [PATCH v0 14/71] perf tools: Add cpu to struct thread Alexander Shishkin
  2013-12-11 14:19   ` Arnaldo Carvalho de Melo
@ 2013-12-11 19:30   ` David Ahern
  2013-12-11 19:55     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 163+ messages in thread
From: David Ahern @ 2013-12-11 19:30 UTC (permalink / raw)
  To: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen, Adrian Hunter

On 12/11/13, 5:36 AM, Alexander Shishkin wrote:

> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 55f3608..52fbfb6 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1412,3 +1412,29 @@ pid_t machine__get_thread_pid(struct machine *machine, pid_t tid)
>
>   	return thread->pid_;
>   }
> +
> +int machine__get_thread_cpu(struct machine *machine, pid_t tid, pid_t *pid)
> +{
> +	struct thread *thread = machine__find_thread(machine, tid);
> +
> +	if (!thread)
> +		return -1;
> +
> +	if (pid)
> +		*pid = thread->pid_;

Why is a 'get' function modifying the thread data?

David

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 27/71] perf evlist: Add 'system_wide' option
  2013-12-11 12:36 ` [PATCH v0 27/71] perf evlist: Add 'system_wide' option Alexander Shishkin
@ 2013-12-11 19:37   ` David Ahern
  2013-12-12 12:22     ` Adrian Hunter
  0 siblings, 1 reply; 163+ messages in thread
From: David Ahern @ 2013-12-11 19:37 UTC (permalink / raw)
  To: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen, Adrian Hunter

On 12/11/13, 5:36 AM, Alexander Shishkin wrote:
> From: Adrian Hunter <adrian.hunter@intel.com>
>
> Add an option to cause a selected event
> to be opened always without a pid when
> configured by perf_evsel__config().
>
> This is needed when using the sched_switch
> tracepoint to follow object code execution.
> sched_switch occurs before the task
> switch and so it cannot record it in a
> context limited to that task.  Note
> that also means that sched_switch is
> useless when capturing data per-thread,
> as is the 'context-switches' software
> event for the same reason.

This seems like a tailored solution for what is really a generic 
problem: you need events to have different attributes -- a mix of system 
wide, task based, with or without callchains and other sample options.

David

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 23/71] perf evlist: Add perf_evlist__to_front()
  2013-12-11 12:36 ` [PATCH v0 23/71] perf evlist: Add perf_evlist__to_front() Alexander Shishkin
@ 2013-12-11 19:38   ` Arnaldo Carvalho de Melo
  2013-12-12 14:09     ` Adrian Hunter
  2013-12-16 15:27   ` [tip:perf/core] " tip-bot for Adrian Hunter
  1 sibling, 1 reply; 163+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-12-11 19:38 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen, Adrian Hunter

Em Wed, Dec 11, 2013 at 02:36:35PM +0200, Alexander Shishkin escreveu:
> From: Adrian Hunter <adrian.hunter@intel.com>
 
> Add a function to move a selected event to the front of the list.
 
> This is needed because it is not possible to use the
> PERF_EVENT_IOC_SET_OUTPUT IOCTL from an Instruction Tracing event to a
> non-Instruction Tracing event.  Thus the Instruction Tracing event
> must come first.


The description doesn't matches what the code is doing, as it is moving
a _group_, not an event.

Also I wonder if you can't do this more effiently by finding where it
starts, ends and then doing some more splice like operatins instead of
moving member by member to a temp list. Setting the (next, prev) fields
of the various sublists to the right places.

There is even list_cut_position() already in list.h, used with
list_move() and list_splice() I think you can do it more efficiently,
I think.

- Arnaldo
 
> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
> ---
>  tools/perf/util/evlist.c | 17 +++++++++++++++++
>  tools/perf/util/evlist.h |  3 +++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index f9dbf5f..93683bc 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -1216,3 +1216,20 @@ int perf_evlist__strerror_open(struct perf_evlist *evlist __maybe_unused,
>  
>  	return 0;
>  }
> +
> +void perf_evlist__to_front(struct perf_evlist *evlist,
> +			   struct perf_evsel *move_evsel)
> +{
> +	struct perf_evsel *evsel, *n;
> +	LIST_HEAD(move);
> +
> +	if (move_evsel == perf_evlist__first(evlist))
> +		return;
> +
> +	list_for_each_entry_safe(evsel, n, &evlist->entries, node) {
> +		if (evsel->leader == move_evsel->leader)
> +			list_move_tail(&evsel->node, &move);
> +	}
> +
> +	list_splice(&move, &evlist->entries);
> +}
> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
> index 8a04aae..9f64ede 100644
> --- a/tools/perf/util/evlist.h
> +++ b/tools/perf/util/evlist.h
> @@ -194,5 +194,8 @@ static inline void perf_mmap__write_tail(struct perf_mmap *md,
>  }
>  
>  bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str);
> +void perf_evlist__to_front(struct perf_evlist *evlist,
> +			   struct perf_evsel *move_evsel);
> +
>  
>  #endif /* __PERF_EVLIST_H */
> -- 
> 1.8.5.1

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 54/71] perf script: Always allow fields 'addr' and 'cpu' for itrace
  2013-12-11 12:37 ` [PATCH v0 54/71] perf script: Always allow fields 'addr' and 'cpu' for itrace Alexander Shishkin
@ 2013-12-11 19:41   ` David Ahern
  2013-12-12 12:35     ` Adrian Hunter
  0 siblings, 1 reply; 163+ messages in thread
From: David Ahern @ 2013-12-11 19:41 UTC (permalink / raw)
  To: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen, Adrian Hunter

On 12/11/13, 5:37 AM, Alexander Shishkin wrote:
> From: Adrian Hunter <adrian.hunter@intel.com>
>
> If a file contains Instruction Tracing data then always allow
> fields 'addr' and 'cpu' to be selected as options for perf
> script.  This is necessary because Instruction Trace decoding
> may synthesize events with that information.

Why hardcode it? If it is present and the user opts for it then it will 
be printed. Why is the itrace check needed?

David

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit
  2013-12-11 19:26   ` David Ahern
@ 2013-12-11 19:54     ` Arnaldo Carvalho de Melo
  2013-12-12 12:07       ` Adrian Hunter
  2013-12-12 12:05     ` Adrian Hunter
  1 sibling, 1 reply; 163+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-12-11 19:54 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexander Shishkin, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen, Adrian Hunter

Em Wed, Dec 11, 2013 at 12:26:21PM -0700, David Ahern escreveu:
> On 12/11/13, 5:36 AM, Alexander Shishkin wrote:
> >diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> >index a0c7c59..80817ec 100644
> >--- a/tools/perf/util/dso.c
> >+++ b/tools/perf/util/dso.c
> >@@ -446,6 +446,7 @@ struct dso *dso__new(const char *name)
> >  		dso->cache = RB_ROOT;
> >  		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
> >  		dso->data_type   = DSO_BINARY_TYPE__NOT_FOUND;
> >+		dso->is_64_bit = (sizeof(void *) == 8);
> 
> Isnt' that going to record the bitness of perf when it is compiled?

Right, it will. Its just a default, I assume this will be reset after
the DSO is loaded, i.e. the ELF file gets accessed, no?

Which then would make this init to be just a distraction, no?

I wonder if we couldn't extern PERF_RECORD_MMAP to have the binary type
encoded there somehow...

> >  		dso->loaded = 0;
> >  		dso->rel = 0;
> >  		dso->sorted_by_name = 0;
> >diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> >index 384f2d9..62680e1 100644
> >--- a/tools/perf/util/dso.h
> >+++ b/tools/perf/util/dso.h
> >@@ -91,6 +91,7 @@ struct dso {
> >  	u8		 annotate_warned:1;
> >  	u8		 sname_alloc:1;
> >  	u8		 lname_alloc:1;
> >+	u8		 is_64_bit:1;
> 
> The is_64_bit name seems a bit hardcoded. We need something similar
> for perf-trace to set the audit machine type for resolving syscalls.
> How about having this field set a machine type rather than a
> "64-bit" flag?
> 
> David

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 14/71] perf tools: Add cpu to struct thread
  2013-12-11 19:30   ` David Ahern
@ 2013-12-11 19:55     ` Arnaldo Carvalho de Melo
  2013-12-11 19:57       ` David Ahern
  0 siblings, 1 reply; 163+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-12-11 19:55 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexander Shishkin, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen, Adrian Hunter

Em Wed, Dec 11, 2013 at 12:30:40PM -0700, David Ahern escreveu:
> On 12/11/13, 5:36 AM, Alexander Shishkin wrote:
> 
> >diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> >index 55f3608..52fbfb6 100644
> >--- a/tools/perf/util/machine.c
> >+++ b/tools/perf/util/machine.c
> >@@ -1412,3 +1412,29 @@ pid_t machine__get_thread_pid(struct machine *machine, pid_t tid)
> >
> >  	return thread->pid_;
> >  }
> >+
> >+int machine__get_thread_cpu(struct machine *machine, pid_t tid, pid_t *pid)
> >+{
> >+	struct thread *thread = machine__find_thread(machine, tid);
> >+
> >+	if (!thread)
> >+		return -1;
> >+
> >+	if (pid)
> >+		*pid = thread->pid_;
> 
> Why is a 'get' function modifying the thread data?

Where is this happening? :-)

My main complaint here was that we would be getting more things than
what the function name implies, see another message with my reply to
this patch.

- Arnaldo

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 14/71] perf tools: Add cpu to struct thread
  2013-12-11 19:55     ` Arnaldo Carvalho de Melo
@ 2013-12-11 19:57       ` David Ahern
  0 siblings, 0 replies; 163+ messages in thread
From: David Ahern @ 2013-12-11 19:57 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Peter Zijlstra, Ingo Molnar, linux-kernel,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen, Adrian Hunter

On 12/11/13, 12:55 PM, Arnaldo Carvalho de Melo wrote:
> Em Wed, Dec 11, 2013 at 12:30:40PM -0700, David Ahern escreveu:
>> On 12/11/13, 5:36 AM, Alexander Shishkin wrote:
>>
>>> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
>>> index 55f3608..52fbfb6 100644
>>> --- a/tools/perf/util/machine.c
>>> +++ b/tools/perf/util/machine.c
>>> @@ -1412,3 +1412,29 @@ pid_t machine__get_thread_pid(struct machine *machine, pid_t tid)
>>>
>>>   	return thread->pid_;
>>>   }
>>> +
>>> +int machine__get_thread_cpu(struct machine *machine, pid_t tid, pid_t *pid)
>>> +{
>>> +	struct thread *thread = machine__find_thread(machine, tid);
>>> +
>>> +	if (!thread)
>>> +		return -1;
>>> +
>>> +	if (pid)
>>> +		*pid = thread->pid_;
>>
>> Why is a 'get' function modifying the thread data?
>
> Where is this happening? :-)

D'oh. code review while on a call ... multitasking failed.

David

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 21/71] perf tools: Add feature test for __sync_val_compare_and_swap
  2013-12-11 19:24   ` Arnaldo Carvalho de Melo
@ 2013-12-11 20:07     ` Andi Kleen
  2013-12-12 13:45       ` Adrian Hunter
  2013-12-12 13:42     ` Adrian Hunter
  1 sibling, 1 reply; 163+ messages in thread
From: Andi Kleen @ 2013-12-11 20:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Peter Zijlstra, Ingo Molnar, linux-kernel,
	David Ahern, Frederic Weisbecker, Jiri Olsa, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Stephane Eranian, Adrian Hunter

> Can you provide more info about these gcc builtins and what is the
> minimum system where this test will succeed?

CMPXCHG for x86 is available since the 486 or so, so practically
everywhere.

I think that's mainly for other architectures.

it would be reasonable to just use #if defined(__x86_64__) || defined(__i386__)
instead.

-Andi

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 01/71] perf: Disable all pmus on unthrottling and rescheduling
  2013-12-11 12:36 ` [PATCH v0 01/71] perf: Disable all pmus on unthrottling and rescheduling Alexander Shishkin
@ 2013-12-11 20:53   ` Andi Kleen
  2013-12-13 18:06   ` Peter Zijlstra
  1 sibling, 0 replies; 163+ messages in thread
From: Andi Kleen @ 2013-12-11 20:53 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian

On Wed, Dec 11, 2013 at 02:36:13PM +0200, Alexander Shishkin wrote:
> Currently, only one pmu in a context gets disabled during unthrottling
> and event_sched_{out,in}, however, events in one context may belong to
> different pmus, which results in pmus being reprogrammed while they are
> still enabled. This patch temporarily disables pmus that correspond to
> each event in the context while these events are being modified.

This is a generic bug fix and should be merged independent of the PT code.
This affects any setup using multiple PMUs at the same time.

Reviewed-by: Andi Kleen <ak@linux.intel.com>

-Andi
> 
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> ---
>  kernel/events/core.c | 27 ++++++++++++++++++++++++---
>  1 file changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 403b781..d656cd6 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1396,6 +1396,9 @@ event_sched_out(struct perf_event *event,
>  	if (event->state != PERF_EVENT_STATE_ACTIVE)
>  		return;
>  
> +	if (event->pmu != ctx->pmu)
> +		perf_pmu_disable(event->pmu);
> +
>  	event->state = PERF_EVENT_STATE_INACTIVE;
>  	if (event->pending_disable) {
>  		event->pending_disable = 0;
> @@ -1412,6 +1415,9 @@ event_sched_out(struct perf_event *event,
>  		ctx->nr_freq--;
>  	if (event->attr.exclusive || !cpuctx->active_oncpu)
>  		cpuctx->exclusive = 0;
> +
> +	if (event->pmu != ctx->pmu)
> +		perf_pmu_enable(event->pmu);
>  }
>  
>  static void
> @@ -1652,6 +1658,7 @@ event_sched_in(struct perf_event *event,
>  		 struct perf_event_context *ctx)
>  {
>  	u64 tstamp = perf_event_time(event);
> +	int ret = 0;
>  
>  	if (event->state <= PERF_EVENT_STATE_OFF)
>  		return 0;
> @@ -1674,10 +1681,14 @@ event_sched_in(struct perf_event *event,
>  	 */
>  	smp_wmb();
>  
> +	if (event->pmu != ctx->pmu)
> +		perf_pmu_disable(event->pmu);
> +
>  	if (event->pmu->add(event, PERF_EF_START)) {
>  		event->state = PERF_EVENT_STATE_INACTIVE;
>  		event->oncpu = -1;
> -		return -EAGAIN;
> +		ret = -EAGAIN;
> +		goto out;
>  	}
>  
>  	event->tstamp_running += tstamp - event->tstamp_stopped;
> @@ -1693,7 +1704,11 @@ event_sched_in(struct perf_event *event,
>  	if (event->attr.exclusive)
>  		cpuctx->exclusive = 1;
>  
> -	return 0;
> +out:
> +	if (event->pmu != ctx->pmu)
> +		perf_pmu_enable(event->pmu);
> +
> +	return ret;
>  }
>  
>  static int
> @@ -2743,6 +2758,9 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx,
>  		if (!event_filter_match(event))
>  			continue;
>  
> +		if (ctx->pmu != event->pmu)
> +			perf_pmu_disable(event->pmu);
> +
>  		hwc = &event->hw;
>  
>  		if (hwc->interrupts == MAX_INTERRUPTS) {
> @@ -2752,7 +2770,7 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx,
>  		}
>  
>  		if (!event->attr.freq || !event->attr.sample_freq)
> -			continue;
> +			goto next;
>  
>  		/*
>  		 * stop the event and update event->count
> @@ -2774,6 +2792,9 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx,
>  			perf_adjust_period(event, period, delta, false);
>  
>  		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
> +	next:
> +		if (ctx->pmu != event->pmu)
> +			perf_pmu_enable(event->pmu);
>  	}
>  
>  	perf_pmu_enable(ctx->pmu);
> -- 
> 1.8.5.1
> 

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 13/71] perf tools: Add machine__get_thread_pid()
  2013-12-11 19:28   ` David Ahern
@ 2013-12-11 21:18     ` Andi Kleen
  2013-12-11 21:49       ` David Ahern
  2013-12-12 13:56     ` Adrian Hunter
  1 sibling, 1 reply; 163+ messages in thread
From: Andi Kleen @ 2013-12-11 21:18 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Adrian Hunter

David Ahern <dsahern@gmail.com> writes:
>
> Are you looking up the current or next task? If the former why not use
> sample->pid rather than parsing the sched_switch tracepoint?

The itrace stream doesn't have a pid field, and it needs the exact
time stamp of the switch. There may not actually be any samples
before decoding.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 13/71] perf tools: Add machine__get_thread_pid()
  2013-12-11 21:18     ` Andi Kleen
@ 2013-12-11 21:49       ` David Ahern
  0 siblings, 0 replies; 163+ messages in thread
From: David Ahern @ 2013-12-11 21:49 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Adrian Hunter

On 12/11/13, 2:18 PM, Andi Kleen wrote:
> David Ahern <dsahern@gmail.com> writes:
>>
>> Are you looking up the current or next task? If the former why not use
>> sample->pid rather than parsing the sched_switch tracepoint?
>
> The itrace stream doesn't have a pid field, and it needs the exact
> time stamp of the switch. There may not actually be any samples
> before decoding.
>
> -Andi
>

What I meant is this:

perf record -e sched:sched_switch -a -- sleep 1 | perf script -f 
comm,tid,pid,event,trace


qemu-system-x86  8688/8692  sched:sched_switch: 
prev_comm=qemu-system-x86 prev_pid=8692 prev_prio=120 prev_state=S ==> 
next_comm=swapper/15 next_pid=0 next_prio=120

8688/8692 are the pid and tid of the running task. If you are monitoring 
sched_switch events and looking at running task -- the one getting 
scheduled out -- you don't need to parse the tracepoint. But, if you 
want to know next task then you do need to parse it. I was wondering 
which task is getting looked up.

David

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit
  2013-12-11 19:26   ` David Ahern
  2013-12-11 19:54     ` Arnaldo Carvalho de Melo
@ 2013-12-12 12:05     ` Adrian Hunter
  2013-12-12 16:45       ` David Ahern
  1 sibling, 1 reply; 163+ messages in thread
From: Adrian Hunter @ 2013-12-12 12:05 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen

On 11/12/13 21:26, David Ahern wrote:
> On 12/11/13, 5:36 AM, Alexander Shishkin wrote:
>> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
>> index a0c7c59..80817ec 100644
>> --- a/tools/perf/util/dso.c
>> +++ b/tools/perf/util/dso.c
>> @@ -446,6 +446,7 @@ struct dso *dso__new(const char *name)
>>           dso->cache = RB_ROOT;
>>           dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
>>           dso->data_type   = DSO_BINARY_TYPE__NOT_FOUND;
>> +        dso->is_64_bit = (sizeof(void *) == 8);
> 
> Isnt' that going to record the bitness of perf when it is compiled?
> 
>>           dso->loaded = 0;
>>           dso->rel = 0;
>>           dso->sorted_by_name = 0;
>> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
>> index 384f2d9..62680e1 100644
>> --- a/tools/perf/util/dso.h
>> +++ b/tools/perf/util/dso.h
>> @@ -91,6 +91,7 @@ struct dso {
>>       u8         annotate_warned:1;
>>       u8         sname_alloc:1;
>>       u8         lname_alloc:1;
>> +    u8         is_64_bit:1;
> 
> The is_64_bit name seems a bit hardcoded. We need something similar for
> perf-trace to set the audit machine type for resolving syscalls. How about
> having this field set a machine type rather than a "64-bit" flag?

I am not sure what you mean by "machine type".  For itrace the
implementation only deals with its own architecture (e.g. the intel_pt
pmu is only on Intel architecture) so it is not necessary to record
the architecture.

is_64_bit corresponds to ELFCLASS64 (vs ELFCLASS32) which is needed
to determine whether the instruction set is 64-bit.  That should
work for other architectures too.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit
  2013-12-11 19:54     ` Arnaldo Carvalho de Melo
@ 2013-12-12 12:07       ` Adrian Hunter
  0 siblings, 0 replies; 163+ messages in thread
From: Adrian Hunter @ 2013-12-12 12:07 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: David Ahern, Alexander Shishkin, Peter Zijlstra, Ingo Molnar,
	linux-kernel, Frederic Weisbecker, Jiri Olsa, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Stephane Eranian, Andi Kleen

On 11/12/13 21:54, Arnaldo Carvalho de Melo wrote:
> Em Wed, Dec 11, 2013 at 12:26:21PM -0700, David Ahern escreveu:
>> On 12/11/13, 5:36 AM, Alexander Shishkin wrote:
>>> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
>>> index a0c7c59..80817ec 100644
>>> --- a/tools/perf/util/dso.c
>>> +++ b/tools/perf/util/dso.c
>>> @@ -446,6 +446,7 @@ struct dso *dso__new(const char *name)
>>>  		dso->cache = RB_ROOT;
>>>  		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
>>>  		dso->data_type   = DSO_BINARY_TYPE__NOT_FOUND;
>>> +		dso->is_64_bit = (sizeof(void *) == 8);
>>
>> Isnt' that going to record the bitness of perf when it is compiled?
> 
> Right, it will. Its just a default, I assume this will be reset after
> the DSO is loaded, i.e. the ELF file gets accessed, no?

Yes

> 
> Which then would make this init to be just a distraction, no?

Yes

> 
> I wonder if we couldn't extern PERF_RECORD_MMAP to have the binary type
> encoded there somehow...
> 
>>>  		dso->loaded = 0;
>>>  		dso->rel = 0;
>>>  		dso->sorted_by_name = 0;
>>> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
>>> index 384f2d9..62680e1 100644
>>> --- a/tools/perf/util/dso.h
>>> +++ b/tools/perf/util/dso.h
>>> @@ -91,6 +91,7 @@ struct dso {
>>>  	u8		 annotate_warned:1;
>>>  	u8		 sname_alloc:1;
>>>  	u8		 lname_alloc:1;
>>> +	u8		 is_64_bit:1;
>>
>> The is_64_bit name seems a bit hardcoded. We need something similar
>> for perf-trace to set the audit machine type for resolving syscalls.
>> How about having this field set a machine type rather than a
>> "64-bit" flag?
>>
>> David
> 
> 


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 27/71] perf evlist: Add 'system_wide' option
  2013-12-11 19:37   ` David Ahern
@ 2013-12-12 12:22     ` Adrian Hunter
  0 siblings, 0 replies; 163+ messages in thread
From: Adrian Hunter @ 2013-12-12 12:22 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen

On 11/12/13 21:37, David Ahern wrote:
> On 12/11/13, 5:36 AM, Alexander Shishkin wrote:
>> From: Adrian Hunter <adrian.hunter@intel.com>
>>
>> Add an option to cause a selected event
>> to be opened always without a pid when
>> configured by perf_evsel__config().
>>
>> This is needed when using the sched_switch
>> tracepoint to follow object code execution.
>> sched_switch occurs before the task
>> switch and so it cannot record it in a
>> context limited to that task.  Note
>> that also means that sched_switch is
>> useless when capturing data per-thread,
>> as is the 'context-switches' software
>> event for the same reason.
> 
> This seems like a tailored solution for what is really a generic problem:
> you need events to have different attributes -- a mix of system wide, task
> based, with or without callchains and other sample options.

Actually in this case it is not the attribute but another parameter of the
perf_event_open syscall, namely the pid.  The effect of that is that there
are potentially fewer file descriptors needed for that event i.e. just 1 per
cpu compared with 1 per cpu per thread.

This is a generic solution for the case where you want to mix an event that
is not tied to a process, with other events that are.

If it were the attribute it would be easy because 'attr' is a member of
'struct evsel' so it can simply be changed directly.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 54/71] perf script: Always allow fields 'addr' and 'cpu' for itrace
  2013-12-11 19:41   ` David Ahern
@ 2013-12-12 12:35     ` Adrian Hunter
  0 siblings, 0 replies; 163+ messages in thread
From: Adrian Hunter @ 2013-12-12 12:35 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen

On 11/12/13 21:41, David Ahern wrote:
> On 12/11/13, 5:37 AM, Alexander Shishkin wrote:
>> From: Adrian Hunter <adrian.hunter@intel.com>
>>
>> If a file contains Instruction Tracing data then always allow
>> fields 'addr' and 'cpu' to be selected as options for perf
>> script.  This is necessary because Instruction Trace decoding
>> may synthesize events with that information.
> 
> Why hardcode it? If it is present and the user opts for it then it will be
> printed. Why is the itrace check needed?

itrace synthesizes events, so the events do not exist until the decoding
starts, thus the validation prevents you from having 'addr' or 'cpu' even
though they will show up in events later.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 21/71] perf tools: Add feature test for __sync_val_compare_and_swap
  2013-12-11 19:24   ` Arnaldo Carvalho de Melo
  2013-12-11 20:07     ` Andi Kleen
@ 2013-12-12 13:42     ` Adrian Hunter
  1 sibling, 0 replies; 163+ messages in thread
From: Adrian Hunter @ 2013-12-12 13:42 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Peter Zijlstra, Ingo Molnar, linux-kernel,
	David Ahern, Frederic Weisbecker, Jiri Olsa, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Stephane Eranian, Andi Kleen

On 11/12/13 21:24, Arnaldo Carvalho de Melo wrote:
> Em Wed, Dec 11, 2013 at 02:36:33PM +0200, Alexander Shishkin escreveu:
>> From: Adrian Hunter <adrian.hunter@intel.com>
>>
>> Add a feature test for __sync_val_compare_and_swap()
>> and __sync_bool_compare_and_swap()
> 
> This makes the global feature tests to be rebuilt all the time, i.e. no
> more caching on a relatively recent system:
> 
> [acme@ssdandy linux]$ gcc -v
> Using built-in specs.
> COLLECT_GCC=/usr/bin/gcc
> COLLECT_LTO_WRAPPER=/usr/libexec/gcc/x86_64-redhat-linux/4.7.2/lto-wrapper
> Target: x86_64-redhat-linux
> Configured with: ../configure --prefix=/usr --mandir=/usr/share/man
> --infodir=/usr/share/info
> --with-bugurl=http://bugzilla.redhat.com/bugzilla --enable-bootstrap
> --enable-shared --enable-threads=posix --enable-checking=release
> --disable-build-with-cxx --disable-build-poststage1-with-cxx
> --with-system-zlib --enable-__cxa_atexit --disable-libunwind-exceptions
> --enable-gnu-unique-object --enable-linker-build-id
> --with-linker-hash-style=gnu
> --enable-languages=c,c++,objc,obj-c++,java,fortran,ada,go,lto
> --enable-plugin --enable-initfini-array --enable-java-awt=gtk
> --disable-dssi --with-java-home=/usr/lib/jvm/java-1.5.0-gcj-1.5.0.0/jre
> --enable-libgcj-multifile --enable-java-maintainer-mode
> --with-ecj-jar=/usr/share/java/eclipse-ecj.jar
> --disable-libjava-multilib --with-ppl --with-cloog --with-tune=generic
> --with-arch_32=i686 --build=x86_64-redhat-linux
> Thread model: posix
> gcc version 4.7.2 20121109 (Red Hat 4.7.2-8) (GCC) 
> [acme@ssdandy linux]$o
> 
> [acme@ssdandy linux]$ cat /etc/fedora-release 
> Fedora release 18 (Spherical Cow)
> 
> Can you provide more info about these gcc builtins and what is the
> minimum system where this test will succeed?

The first reference in the gcc manuals is in gcc version 4.1.2

	http://gcc.gnu.org/onlinedocs/gcc-4.1.2/gcc/

However I am not sure what will happen on all architectures.  The gcc manual
says:

	Not all operations are supported by all target processors. If a
	particular operation cannot be implemented on the target processor,
	a warning is generated and a call an external function is
	generated. The external function carries the same name as the
	built-in version, with an additional suffix ‘_n’ where n is the
	size of the data type.



> 
> In this system it works, as I can see:
> 
> ...         sync-compare-and-swap: [ on  ]
> 
> [acme@ssdandy linux]$ time make O=/tmp/build/perf -C tools/perf/
> install-bin
> make: Entering directory `/home/acme/git/linux/tools/perf'
>   BUILD:   Doing 'make -j8' parallel build
> 
> Auto-detecting system features:
> ...                     backtrace: [ on  ]
> ...                         dwarf: [ on  ]
> ...                fortify-source: [ on  ]
> ...         sync-compare-and-swap: [ on  ]
> ...                         glibc: [ on  ]
> ...                          gtk2: [ on  ]
> ...                  gtk2-infobar: [ on  ]
> ...                      libaudit: [ on  ]
> ...                        libbfd: [ on  ]
> ...                        libelf: [ on  ]
> ...             libelf-getphdrnum: [ on  ]
> ...                   libelf-mmap: [ on  ]
> ...                       libnuma: [ on  ]
> ...                       libperl: [ on  ]
> ...                     libpython: [ on  ]
> ...             libpython-version: [ on  ]
> ...                      libslang: [ on  ]
> ...                     libunwind: [ on  ]
> ...                       on-exit: [ on  ]
> ...            stackprotector-all: [ on  ]
> ...                       timerfd: [ on  ]
> 
>   GEN      perf-archive
> 
> Please check the recent changes from Jean Pihet, I think he had similar
> problems, i.e. caching stopped working.

The problem is argc and argv must be passed to
main_test_sync_compare_and_swap().  Will be fixed next
version.

> 
> - Arnaldo
> 
>  
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/perf/config/Makefile                                 |  5 +++++
>>  tools/perf/config/feature-checks/Makefile                  |  4 ++++
>>  tools/perf/config/feature-checks/test-all.c                |  5 +++++
>>  .../config/feature-checks/test-sync-compare-and-swap.c     | 14 ++++++++++++++
>>  4 files changed, 28 insertions(+)
>>  create mode 100644 tools/perf/config/feature-checks/test-sync-compare-and-swap.c
>>
>> diff --git a/tools/perf/config/Makefile b/tools/perf/config/Makefile
>> index bae1072..43a2879 100644
>> --- a/tools/perf/config/Makefile
>> +++ b/tools/perf/config/Makefile
>> @@ -126,6 +126,7 @@ CORE_FEATURE_TESTS =			\
>>  	backtrace			\
>>  	dwarf				\
>>  	fortify-source			\
>> +	sync-compare-and-swap		\
>>  	glibc				\
>>  	gtk2				\
>>  	gtk2-infobar			\
>> @@ -234,6 +235,10 @@ CFLAGS += -I$(LIB_INCLUDE)
>>  
>>  CFLAGS += -D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64 -D_GNU_SOURCE
>>  
>> +ifeq ($(feature-sync-compare-and-swap), 1)
>> +  CFLAGS += -DHAVE_SYNC_COMPARE_AND_SWAP_SUPPORT
>> +endif
>> +
>>  ifndef NO_BIONIC
>>    $(call feature_check,bionic)
>>    ifeq ($(feature-bionic), 1)
>> diff --git a/tools/perf/config/feature-checks/Makefile b/tools/perf/config/feature-checks/Makefile
>> index b8bb749..b4b7bb2 100644
>> --- a/tools/perf/config/feature-checks/Makefile
>> +++ b/tools/perf/config/feature-checks/Makefile
>> @@ -5,6 +5,7 @@ FILES=					\
>>  	test-bionic			\
>>  	test-dwarf			\
>>  	test-fortify-source		\
>> +	test-sync-compare-and-swap	\
>>  	test-glibc			\
>>  	test-gtk2			\
>>  	test-gtk2-infobar		\
>> @@ -140,6 +141,9 @@ test-backtrace:
>>  test-timerfd:
>>  	$(BUILD)
>>  
>> +test-sync-compare-and-swap:
>> +	$(BUILD)
>> +
>>  -include *.d
>>  
>>  ###############################
>> diff --git a/tools/perf/config/feature-checks/test-all.c b/tools/perf/config/feature-checks/test-all.c
>> index 9b8a544..5cfec18 100644
>> --- a/tools/perf/config/feature-checks/test-all.c
>> +++ b/tools/perf/config/feature-checks/test-all.c
>> @@ -89,6 +89,10 @@
>>  # include "test-stackprotector-all.c"
>>  #undef main
>>  
>> +#define main main_test_sync_compare_and_swap
>> +# include "test-sync-compare-and-swap.c"
>> +#undef main
>> +
>>  int main(int argc, char *argv[])
>>  {
>>  	main_test_libpython();
>> @@ -111,6 +115,7 @@ int main(int argc, char *argv[])
>>  	main_test_libnuma();
>>  	main_test_timerfd();
>>  	main_test_stackprotector_all();
>> +	main_test_sync_compare_and_swap();
>>  
>>  	return 0;
>>  }
>> diff --git a/tools/perf/config/feature-checks/test-sync-compare-and-swap.c b/tools/perf/config/feature-checks/test-sync-compare-and-swap.c
>> new file mode 100644
>> index 0000000..c34d4ca
>> --- /dev/null
>> +++ b/tools/perf/config/feature-checks/test-sync-compare-and-swap.c
>> @@ -0,0 +1,14 @@
>> +#include <stdint.h>
>> +
>> +volatile uint64_t x;
>> +
>> +int main(int argc, char *argv[])
>> +{
>> +	uint64_t old, new = argc;
>> +
>> +	argv = argv;
>> +	do {
>> +		old = __sync_val_compare_and_swap(&x, 0, 0);
>> +	} while (!__sync_bool_compare_and_swap(&x, old, new));
>> +	return old == new;
>> +}
>> -- 
>> 1.8.5.1
> 
> 


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 21/71] perf tools: Add feature test for __sync_val_compare_and_swap
  2013-12-11 20:07     ` Andi Kleen
@ 2013-12-12 13:45       ` Adrian Hunter
  0 siblings, 0 replies; 163+ messages in thread
From: Adrian Hunter @ 2013-12-12 13:45 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Arnaldo Carvalho de Melo, Alexander Shishkin, Peter Zijlstra,
	Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian

On 11/12/13 22:07, Andi Kleen wrote:
>> Can you provide more info about these gcc builtins and what is the
>> minimum system where this test will succeed?
> 
> CMPXCHG for x86 is available since the 486 or so, so practically
> everywhere.
> 
> I think that's mainly for other architectures.
> 
> it would be reasonable to just use #if defined(__x86_64__) || defined(__i386__)
> instead.

__sync_val_compare_and_swap() is being used in the itrace abstraction which
is architecture neutral, so I can't use x86 defines.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 13/71] perf tools: Add machine__get_thread_pid()
  2013-12-11 19:28   ` David Ahern
  2013-12-11 21:18     ` Andi Kleen
@ 2013-12-12 13:56     ` Adrian Hunter
  1 sibling, 0 replies; 163+ messages in thread
From: Adrian Hunter @ 2013-12-12 13:56 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen

On 11/12/13 21:28, David Ahern wrote:
> On 12/11/13, 5:36 AM, Alexander Shishkin wrote:
>> From: Adrian Hunter <adrian.hunter@intel.com>
>>
>> Add a function to get the pid from the tid.
>>
>> This is needed when using the sched_switch
>> tracepoint to follow object code execution.
>> sched_switch identifies the thread but, to
>> find the process mmaps, we need the process
>> pid.
> 
> Are you looking up the current or next task? If the former why not use
> sample->pid rather than parsing the sched_switch tracepoint?

Next pid unfortunately.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 23/71] perf evlist: Add perf_evlist__to_front()
  2013-12-11 19:38   ` Arnaldo Carvalho de Melo
@ 2013-12-12 14:09     ` Adrian Hunter
  0 siblings, 0 replies; 163+ messages in thread
From: Adrian Hunter @ 2013-12-12 14:09 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Peter Zijlstra, Ingo Molnar, linux-kernel,
	David Ahern, Frederic Weisbecker, Jiri Olsa, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Stephane Eranian, Andi Kleen

On 11/12/13 21:38, Arnaldo Carvalho de Melo wrote:
> Em Wed, Dec 11, 2013 at 02:36:35PM +0200, Alexander Shishkin escreveu:
>> From: Adrian Hunter <adrian.hunter@intel.com>
>  
>> Add a function to move a selected event to the front of the list.
>  
>> This is needed because it is not possible to use the
>> PERF_EVENT_IOC_SET_OUTPUT IOCTL from an Instruction Tracing event to a
>> non-Instruction Tracing event.  Thus the Instruction Tracing event
>> must come first.
> 
> 
> The description doesn't matches what the code is doing, as it is moving
> a _group_, not an event.

OK I will rename it

> 
> Also I wonder if you can't do this more effiently by finding where it
> starts, ends and then doing some more splice like operatins instead of
> moving member by member to a temp list. Setting the (next, prev) fields
> of the various sublists to the right places.
> 
> There is even list_cut_position() already in list.h, used with
> list_move() and list_splice() I think you can do it more efficiently,
> I think.

OK

> 
> - Arnaldo
>  
>> Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
>> ---
>>  tools/perf/util/evlist.c | 17 +++++++++++++++++
>>  tools/perf/util/evlist.h |  3 +++
>>  2 files changed, 20 insertions(+)
>>
>> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
>> index f9dbf5f..93683bc 100644
>> --- a/tools/perf/util/evlist.c
>> +++ b/tools/perf/util/evlist.c
>> @@ -1216,3 +1216,20 @@ int perf_evlist__strerror_open(struct perf_evlist *evlist __maybe_unused,
>>  
>>  	return 0;
>>  }
>> +
>> +void perf_evlist__to_front(struct perf_evlist *evlist,
>> +			   struct perf_evsel *move_evsel)
>> +{
>> +	struct perf_evsel *evsel, *n;
>> +	LIST_HEAD(move);
>> +
>> +	if (move_evsel == perf_evlist__first(evlist))
>> +		return;
>> +
>> +	list_for_each_entry_safe(evsel, n, &evlist->entries, node) {
>> +		if (evsel->leader == move_evsel->leader)
>> +			list_move_tail(&evsel->node, &move);
>> +	}
>> +
>> +	list_splice(&move, &evlist->entries);
>> +}
>> diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
>> index 8a04aae..9f64ede 100644
>> --- a/tools/perf/util/evlist.h
>> +++ b/tools/perf/util/evlist.h
>> @@ -194,5 +194,8 @@ static inline void perf_mmap__write_tail(struct perf_mmap *md,
>>  }
>>  
>>  bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str);
>> +void perf_evlist__to_front(struct perf_evlist *evlist,
>> +			   struct perf_evsel *move_evsel);
>> +
>>  
>>  #endif /* __PERF_EVLIST_H */
>> -- 
>> 1.8.5.1
> 
> 


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 14/71] perf tools: Add cpu to struct thread
  2013-12-11 14:19   ` Arnaldo Carvalho de Melo
@ 2013-12-12 14:14     ` Adrian Hunter
  0 siblings, 0 replies; 163+ messages in thread
From: Adrian Hunter @ 2013-12-12 14:14 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Alexander Shishkin, Peter Zijlstra, Ingo Molnar, linux-kernel,
	David Ahern, Frederic Weisbecker, Jiri Olsa, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Stephane Eranian, Andi Kleen

On 11/12/13 16:19, Arnaldo Carvalho de Melo wrote:
> Em Wed, Dec 11, 2013 at 02:36:26PM +0200, Alexander Shishkin escreveu:
>> From: Adrian Hunter <adrian.hunter@intel.com>
>>
>> Tools may wish to track on which cpu a thread
>> is running.  Add 'cpu' to struct thread for
>> that purpose.  Also add machine functions to
>> get / set the cpu for a tid.
>>
>> This will be used to determine the cpu when
>> decoding a per-thread Instruction Trace.
>>
>>
>> +++ b/tools/perf/util/machine.c
>> @@ -1412,3 +1412,29 @@ pid_t machine__get_thread_pid(struct machine *machine, pid_t tid)
>>  
>>  	return thread->pid_;
>>  }
>> +
>> +int machine__get_thread_cpu(struct machine *machine, pid_t tid, pid_t *pid)
>> +{
>> +	struct thread *thread = machine__find_thread(machine, tid);
>> +
>> +	if (!thread)
>> +		return -1;
>> +
>> +	if (pid)
>> +		*pid = thread->pid_;
>> +
>> +	return thread->cpu;
>> +}
> 
> What is the problem with:
> 
> 	struct thread *thread = machine__find_thread(machine, tid);
> 	pid_t pid = thread->pid_;
> 	int cpu = thread->cpu;
> 
> In your case you'll have:
> 
> 	int pid;
> 	int cpu = machine__get_thread_cpu(machine, tid, &pid);
> 
> Which is slightly more compact, but then we end up with a function that
> from its name should just get a 'cpu' but also asks for the pid.
> 
> I think it is better to just use what we have (machine__find_thread),
> have a 'thread' variable and then use any of its members, directly.

OK




^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit
  2013-12-12 12:05     ` Adrian Hunter
@ 2013-12-12 16:45       ` David Ahern
  2013-12-12 19:05         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 163+ messages in thread
From: David Ahern @ 2013-12-12 16:45 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen

On 12/12/13, 5:05 AM, Adrian Hunter wrote:

>>> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
>>> index 384f2d9..62680e1 100644
>>> --- a/tools/perf/util/dso.h
>>> +++ b/tools/perf/util/dso.h
>>> @@ -91,6 +91,7 @@ struct dso {
>>>        u8         annotate_warned:1;
>>>        u8         sname_alloc:1;
>>>        u8         lname_alloc:1;
>>> +    u8         is_64_bit:1;
>>
>> The is_64_bit name seems a bit hardcoded. We need something similar for
>> perf-trace to set the audit machine type for resolving syscalls. How about
>> having this field set a machine type rather than a "64-bit" flag?
>
> I am not sure what you mean by "machine type".  For itrace the
> implementation only deals with its own architecture (e.g. the intel_pt
> pmu is only on Intel architecture) so it is not necessary to record
> the architecture.
>
> is_64_bit corresponds to ELFCLASS64 (vs ELFCLASS32) which is needed
> to determine whether the instruction set is 64-bit.  That should
> work for other architectures too.
>

perf-trace needs something similar -- an audit machine type to know how 
to convert syscall numbers to functions. One of the following per task:

typedef enum {
     MACH_X86=0,
     MACH_86_64,
     MACH_IA64,
     MACH_PPC64,
     MACH_PPC,
     MACH_S390X,
     MACH_S390,
     MACH_ALPHA,
     MACH_ARMEB
} machine_t;

I was pondering how the 2 can be combined into a common flag.

David

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit
  2013-12-12 16:45       ` David Ahern
@ 2013-12-12 19:05         ` Arnaldo Carvalho de Melo
  2013-12-12 19:16           ` David Ahern
  0 siblings, 1 reply; 163+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-12-12 19:05 UTC (permalink / raw)
  To: David Ahern
  Cc: Adrian Hunter, Alexander Shishkin, Peter Zijlstra, Ingo Molnar,
	linux-kernel, Frederic Weisbecker, Jiri Olsa, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Stephane Eranian, Andi Kleen

Em Thu, Dec 12, 2013 at 09:45:26AM -0700, David Ahern escreveu:
> On 12/12/13, 5:05 AM, Adrian Hunter wrote:
> 
> >>>diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> >>>index 384f2d9..62680e1 100644
> >>>--- a/tools/perf/util/dso.h
> >>>+++ b/tools/perf/util/dso.h
> >>>@@ -91,6 +91,7 @@ struct dso {
> >>>       u8         annotate_warned:1;
> >>>       u8         sname_alloc:1;
> >>>       u8         lname_alloc:1;
> >>>+    u8         is_64_bit:1;
> >>
> >>The is_64_bit name seems a bit hardcoded. We need something similar for
> >>perf-trace to set the audit machine type for resolving syscalls. How about
> >>having this field set a machine type rather than a "64-bit" flag?
> >
> >I am not sure what you mean by "machine type".  For itrace the
> >implementation only deals with its own architecture (e.g. the intel_pt
> >pmu is only on Intel architecture) so it is not necessary to record
> >the architecture.
> >
> >is_64_bit corresponds to ELFCLASS64 (vs ELFCLASS32) which is needed
> >to determine whether the instruction set is 64-bit.  That should
> >work for other architectures too.
> >
> 
> perf-trace needs something similar -- an audit machine type to know
> how to convert syscall numbers to functions. One of the following
> per task:
> 
> typedef enum {
>     MACH_X86=0,
>     MACH_86_64,
>     MACH_IA64,
>     MACH_PPC64,
>     MACH_PPC,
>     MACH_S390X,
>     MACH_S390,
>     MACH_ALPHA,
>     MACH_ARMEB
> } machine_t;
> 
> I was pondering how the 2 can be combined into a common flag.

Well, if we can pass somehow the magic number of an executable mmap
in the PERF_RECORD_MMAP2 record, we would be able, together with the
data we already have in the perf.data header (uname in a live session),
to figure that out, no?

I.e. we wouldn't be limiting ourselves to the ELF executable format.

Time to read the kernel loader for the various formats we support...

- Arnaldo

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit
  2013-12-12 19:05         ` Arnaldo Carvalho de Melo
@ 2013-12-12 19:16           ` David Ahern
  2013-12-12 20:01             ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 163+ messages in thread
From: David Ahern @ 2013-12-12 19:16 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: Adrian Hunter, Alexander Shishkin, Peter Zijlstra, Ingo Molnar,
	linux-kernel, Frederic Weisbecker, Jiri Olsa, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Stephane Eranian, Andi Kleen

On 12/12/13, 12:05 PM, Arnaldo Carvalho de Melo wrote:

> Well, if we can pass somehow the magic number of an executable mmap
> in the PERF_RECORD_MMAP2 record, we would be able, together with the
> data we already have in the perf.data header (uname in a live session),
> to figure that out, no?

Sure, but any kernel-side only solution will be extremely limited in 
user base for years.

David

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit
  2013-12-12 19:16           ` David Ahern
@ 2013-12-12 20:01             ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 163+ messages in thread
From: Arnaldo Carvalho de Melo @ 2013-12-12 20:01 UTC (permalink / raw)
  To: David Ahern
  Cc: Adrian Hunter, Alexander Shishkin, Peter Zijlstra, Ingo Molnar,
	linux-kernel, Frederic Weisbecker, Jiri Olsa, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Stephane Eranian, Andi Kleen

Em Thu, Dec 12, 2013 at 12:16:16PM -0700, David Ahern escreveu:
> On 12/12/13, 12:05 PM, Arnaldo Carvalho de Melo wrote:
> 
> >Well, if we can pass somehow the magic number of an executable mmap
> >in the PERF_RECORD_MMAP2 record, we would be able, together with the
> >data we already have in the perf.data header (uname in a live session),
> >to figure that out, no?
> 
> Sure, but any kernel-side only solution will be extremely limited in
> user base for years.

You mean it will take time for the kernel with this feature to become
widespread?

Sure, but how do you propose to properly implement this using existing
facilities?

I can't think of any way that doesn't requires having access to the file
referenced via PERF_RECORD_{MMAP,MMAP2}, in userspace, and that is racy.

For older kernels, that doesn't support this, we can do as I think you
envision, but that doesn't precludes trying to put in place a more
robust solution.

- Arnaldo

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 01/71] perf: Disable all pmus on unthrottling and rescheduling
  2013-12-11 12:36 ` [PATCH v0 01/71] perf: Disable all pmus on unthrottling and rescheduling Alexander Shishkin
  2013-12-11 20:53   ` Andi Kleen
@ 2013-12-13 18:06   ` Peter Zijlstra
  2013-12-16 11:00     ` Alexander Shishkin
  1 sibling, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-13 18:06 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

On Wed, Dec 11, 2013 at 02:36:13PM +0200, Alexander Shishkin wrote:
> Currently, only one pmu in a context gets disabled during unthrottling
> and event_sched_{out,in}, however, events in one context may belong to
> different pmus, which results in pmus being reprogrammed while they are
> still enabled. This patch temporarily disables pmus that correspond to
> each event in the context while these events are being modified.
> 
> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> ---
>  kernel/events/core.c | 27 ++++++++++++++++++++++++---
>  1 file changed, 24 insertions(+), 3 deletions(-)
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 403b781..d656cd6 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1396,6 +1396,9 @@ event_sched_out(struct perf_event *event,
>  	if (event->state != PERF_EVENT_STATE_ACTIVE)
>  		return;
>  
> +	if (event->pmu != ctx->pmu)
> +		perf_pmu_disable(event->pmu);
> +
>  	event->state = PERF_EVENT_STATE_INACTIVE;
>  	if (event->pending_disable) {
>  		event->pending_disable = 0;
> @@ -1412,6 +1415,9 @@ event_sched_out(struct perf_event *event,
>  		ctx->nr_freq--;
>  	if (event->attr.exclusive || !cpuctx->active_oncpu)
>  		cpuctx->exclusive = 0;
> +
> +	if (event->pmu != ctx->pmu)
> +		perf_pmu_enable(event->pmu);
>  }
>  
>  static void

Hmm, indeed. Does it make sense to drop the conditional?
perf_pmu_{en,dis}able() is recursive and the thinking is that if its the
same PMU the cacheline is hot because we touched it already recently
anyway, so the unconditional inc/dec might actually be faster.. dunno.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit
  2013-12-11 12:36 ` [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit Alexander Shishkin
  2013-12-11 19:26   ` David Ahern
@ 2013-12-16  3:16   ` David Ahern
  2013-12-16  7:55     ` Adrian Hunter
  1 sibling, 1 reply; 163+ messages in thread
From: David Ahern @ 2013-12-16  3:16 UTC (permalink / raw)
  To: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo
  Cc: Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen, Adrian Hunter

On 12/11/13, 5:36 AM, Alexander Shishkin wrote:
> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
> index a0c7c59..80817ec 100644
> --- a/tools/perf/util/dso.c
> +++ b/tools/perf/util/dso.c
> @@ -446,6 +446,7 @@ struct dso *dso__new(const char *name)
>   		dso->cache = RB_ROOT;
>   		dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
>   		dso->data_type   = DSO_BINARY_TYPE__NOT_FOUND;
> +		dso->is_64_bit = (sizeof(void *) == 8);
>   		dso->loaded = 0;
>   		dso->rel = 0;
>   		dso->sorted_by_name = 0;
> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
> index 384f2d9..62680e1 100644
> --- a/tools/perf/util/dso.h
> +++ b/tools/perf/util/dso.h
> @@ -91,6 +91,7 @@ struct dso {
>   	u8		 annotate_warned:1;
>   	u8		 sname_alloc:1;
>   	u8		 lname_alloc:1;
> +	u8		 is_64_bit:1;
>   	u8		 sorted_by_name;
>   	u8		 loaded;
>   	u8		 rel;
> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
> index eed0b96..a0fc81b 100644
> --- a/tools/perf/util/symbol-elf.c
> +++ b/tools/perf/util/symbol-elf.c
> @@ -595,6 +595,8 @@ int symsrc__init(struct symsrc *ss, struct dso *dso, const char *name,
>   			goto out_elf_end;
>   	}
>
> +	ss->is_64_bit = (gelf_getclass(elf) == ELFCLASS64);
> +
>   	ss->symtab = elf_section_by_name(elf, &ehdr, &ss->symshdr, ".symtab",
>   			NULL);
>   	if (ss->symshdr.sh_type != SHT_SYMTAB)
> @@ -694,6 +696,7 @@ int dso__load_sym(struct dso *dso, struct map *map,
>   	bool remap_kernel = false, adjust_kernel_syms = false;
>
>   	dso->symtab_type = syms_ss->type;
> +	dso->is_64_bit = syms_ss->is_64_bit;
>   	dso->rel = syms_ss->ehdr.e_type == ET_REL;
>
>   	/*
> diff --git a/tools/perf/util/symbol-minimal.c b/tools/perf/util/symbol-minimal.c
> index ac7070a..b9d1119 100644
> --- a/tools/perf/util/symbol-minimal.c
> +++ b/tools/perf/util/symbol-minimal.c
> @@ -1,3 +1,4 @@
> +#include "util.h"
>   #include "symbol.h"
>
>   #include <stdio.h>
> @@ -287,6 +288,23 @@ int dso__synthesize_plt_symbols(struct dso *dso __maybe_unused,
>   	return 0;
>   }
>
> +static int fd__is_64_bit(int fd)
> +{
> +	u8 e_ident[EI_NIDENT];
> +
> +	if (lseek(fd, 0, SEEK_SET))
> +		return -1;
> +
> +	if (readn(fd, e_ident, sizeof(e_ident)) != sizeof(e_ident))
> +		return -1;
> +
> +	if (memcmp(e_ident, ELFMAG, SELFMAG) ||
> +	    e_ident[EI_VERSION] != EV_CURRENT)
> +		return -1;
> +
> +	return e_ident[EI_CLASS] == ELFCLASS64;
> +}
> +
>   int dso__load_sym(struct dso *dso, struct map *map __maybe_unused,
>   		  struct symsrc *ss,
>   		  struct symsrc *runtime_ss __maybe_unused,
> @@ -294,6 +312,11 @@ int dso__load_sym(struct dso *dso, struct map *map __maybe_unused,
>   		  int kmodule __maybe_unused)
>   {
>   	unsigned char *build_id[BUILD_ID_SIZE];
> +	int ret;
> +
> +	ret = fd__is_64_bit(ss->fd);
> +	if (ret >= 0)
> +		dso->is_64_bit = ret;
>
>   	if (filename__read_build_id(ss->name, build_id, BUILD_ID_SIZE) > 0) {
>   		dso__set_build_id(dso, build_id);


Here's what is wrong with this API: you are determining DSO bitness at 
symbol load time, not when the DSO is created and added to the maps.

As I pointed out in a prior comment you are initializing dso->is_64_bit 
to perf's bitness when the dso object is created (dso__new) but the 
value is not correctly set until dso__load time. Some tools (perf-trace) 
never load symbols, so that value is always wrong in the sense that its 
value has no correlation to the dso object.

David

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit
  2013-12-16  3:16   ` David Ahern
@ 2013-12-16  7:55     ` Adrian Hunter
  0 siblings, 0 replies; 163+ messages in thread
From: Adrian Hunter @ 2013-12-16  7:55 UTC (permalink / raw)
  To: David Ahern
  Cc: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen

On 16/12/13 05:16, David Ahern wrote:
> On 12/11/13, 5:36 AM, Alexander Shishkin wrote:
>> diff --git a/tools/perf/util/dso.c b/tools/perf/util/dso.c
>> index a0c7c59..80817ec 100644
>> --- a/tools/perf/util/dso.c
>> +++ b/tools/perf/util/dso.c
>> @@ -446,6 +446,7 @@ struct dso *dso__new(const char *name)
>>           dso->cache = RB_ROOT;
>>           dso->symtab_type = DSO_BINARY_TYPE__NOT_FOUND;
>>           dso->data_type   = DSO_BINARY_TYPE__NOT_FOUND;
>> +        dso->is_64_bit = (sizeof(void *) == 8);
>>           dso->loaded = 0;
>>           dso->rel = 0;
>>           dso->sorted_by_name = 0;
>> diff --git a/tools/perf/util/dso.h b/tools/perf/util/dso.h
>> index 384f2d9..62680e1 100644
>> --- a/tools/perf/util/dso.h
>> +++ b/tools/perf/util/dso.h
>> @@ -91,6 +91,7 @@ struct dso {
>>       u8         annotate_warned:1;
>>       u8         sname_alloc:1;
>>       u8         lname_alloc:1;
>> +    u8         is_64_bit:1;
>>       u8         sorted_by_name;
>>       u8         loaded;
>>       u8         rel;
>> diff --git a/tools/perf/util/symbol-elf.c b/tools/perf/util/symbol-elf.c
>> index eed0b96..a0fc81b 100644
>> --- a/tools/perf/util/symbol-elf.c
>> +++ b/tools/perf/util/symbol-elf.c
>> @@ -595,6 +595,8 @@ int symsrc__init(struct symsrc *ss, struct dso *dso,
>> const char *name,
>>               goto out_elf_end;
>>       }
>>
>> +    ss->is_64_bit = (gelf_getclass(elf) == ELFCLASS64);
>> +
>>       ss->symtab = elf_section_by_name(elf, &ehdr, &ss->symshdr, ".symtab",
>>               NULL);
>>       if (ss->symshdr.sh_type != SHT_SYMTAB)
>> @@ -694,6 +696,7 @@ int dso__load_sym(struct dso *dso, struct map *map,
>>       bool remap_kernel = false, adjust_kernel_syms = false;
>>
>>       dso->symtab_type = syms_ss->type;
>> +    dso->is_64_bit = syms_ss->is_64_bit;
>>       dso->rel = syms_ss->ehdr.e_type == ET_REL;
>>
>>       /*
>> diff --git a/tools/perf/util/symbol-minimal.c
>> b/tools/perf/util/symbol-minimal.c
>> index ac7070a..b9d1119 100644
>> --- a/tools/perf/util/symbol-minimal.c
>> +++ b/tools/perf/util/symbol-minimal.c
>> @@ -1,3 +1,4 @@
>> +#include "util.h"
>>   #include "symbol.h"
>>
>>   #include <stdio.h>
>> @@ -287,6 +288,23 @@ int dso__synthesize_plt_symbols(struct dso *dso
>> __maybe_unused,
>>       return 0;
>>   }
>>
>> +static int fd__is_64_bit(int fd)
>> +{
>> +    u8 e_ident[EI_NIDENT];
>> +
>> +    if (lseek(fd, 0, SEEK_SET))
>> +        return -1;
>> +
>> +    if (readn(fd, e_ident, sizeof(e_ident)) != sizeof(e_ident))
>> +        return -1;
>> +
>> +    if (memcmp(e_ident, ELFMAG, SELFMAG) ||
>> +        e_ident[EI_VERSION] != EV_CURRENT)
>> +        return -1;
>> +
>> +    return e_ident[EI_CLASS] == ELFCLASS64;
>> +}
>> +
>>   int dso__load_sym(struct dso *dso, struct map *map __maybe_unused,
>>             struct symsrc *ss,
>>             struct symsrc *runtime_ss __maybe_unused,
>> @@ -294,6 +312,11 @@ int dso__load_sym(struct dso *dso, struct map *map
>> __maybe_unused,
>>             int kmodule __maybe_unused)
>>   {
>>       unsigned char *build_id[BUILD_ID_SIZE];
>> +    int ret;
>> +
>> +    ret = fd__is_64_bit(ss->fd);
>> +    if (ret >= 0)
>> +        dso->is_64_bit = ret;
>>
>>       if (filename__read_build_id(ss->name, build_id, BUILD_ID_SIZE) > 0) {
>>           dso__set_build_id(dso, build_id);
> 
> 
> Here's what is wrong with this API: you are determining DSO bitness at
> symbol load time, not when the DSO is created and added to the maps.
> 
> As I pointed out in a prior comment you are initializing dso->is_64_bit to
> perf's bitness when the dso object is created (dso__new) but the value is
> not correctly set until dso__load time. Some tools (perf-trace) never load
> symbols, so that value is always wrong in the sense that its value has no
> correlation to the dso object.

That is a very good point.  I think Arnaldo has previously noted that symbol
loading needs to be split from dso (map) loading.  I don't have the time to
do that right now.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 01/71] perf: Disable all pmus on unthrottling and rescheduling
  2013-12-13 18:06   ` Peter Zijlstra
@ 2013-12-16 11:00     ` Alexander Shishkin
  2013-12-16 11:07       ` Peter Zijlstra
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-16 11:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Dec 11, 2013 at 02:36:13PM +0200, Alexander Shishkin wrote:
>> Currently, only one pmu in a context gets disabled during unthrottling
>> and event_sched_{out,in}, however, events in one context may belong to
>> different pmus, which results in pmus being reprogrammed while they are
>> still enabled. This patch temporarily disables pmus that correspond to
>> each event in the context while these events are being modified.
>> 
>> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
>> ---
>>  kernel/events/core.c | 27 ++++++++++++++++++++++++---
>>  1 file changed, 24 insertions(+), 3 deletions(-)
>> 
>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>> index 403b781..d656cd6 100644
>> --- a/kernel/events/core.c
>> +++ b/kernel/events/core.c
>> @@ -1396,6 +1396,9 @@ event_sched_out(struct perf_event *event,
>>  	if (event->state != PERF_EVENT_STATE_ACTIVE)
>>  		return;
>>  
>> +	if (event->pmu != ctx->pmu)
>> +		perf_pmu_disable(event->pmu);
>> +
>>  	event->state = PERF_EVENT_STATE_INACTIVE;
>>  	if (event->pending_disable) {
>>  		event->pending_disable = 0;
>> @@ -1412,6 +1415,9 @@ event_sched_out(struct perf_event *event,
>>  		ctx->nr_freq--;
>>  	if (event->attr.exclusive || !cpuctx->active_oncpu)
>>  		cpuctx->exclusive = 0;
>> +
>> +	if (event->pmu != ctx->pmu)
>> +		perf_pmu_enable(event->pmu);
>>  }
>>  
>>  static void
>
> Hmm, indeed. Does it make sense to drop the conditional?
> perf_pmu_{en,dis}able() is recursive and the thinking is that if its the
> same PMU the cacheline is hot because we touched it already recently
> anyway, so the unconditional inc/dec might actually be faster.. dunno.

Well, given the disable_count check in perf_pmu_{en,dis}able, this one
indeed looks redundant to me. Should I resend this one separately?

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 01/71] perf: Disable all pmus on unthrottling and rescheduling
  2013-12-16 11:00     ` Alexander Shishkin
@ 2013-12-16 11:07       ` Peter Zijlstra
  0 siblings, 0 replies; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-16 11:07 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

On Mon, Dec 16, 2013 at 01:00:36PM +0200, Alexander Shishkin wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Wed, Dec 11, 2013 at 02:36:13PM +0200, Alexander Shishkin wrote:
> >> Currently, only one pmu in a context gets disabled during unthrottling
> >> and event_sched_{out,in}, however, events in one context may belong to
> >> different pmus, which results in pmus being reprogrammed while they are
> >> still enabled. This patch temporarily disables pmus that correspond to
> >> each event in the context while these events are being modified.
> >> 
> >> Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
> >> ---
> >>  kernel/events/core.c | 27 ++++++++++++++++++++++++---
> >>  1 file changed, 24 insertions(+), 3 deletions(-)
> >> 
> >> diff --git a/kernel/events/core.c b/kernel/events/core.c
> >> index 403b781..d656cd6 100644
> >> --- a/kernel/events/core.c
> >> +++ b/kernel/events/core.c
> >> @@ -1396,6 +1396,9 @@ event_sched_out(struct perf_event *event,
> >>  	if (event->state != PERF_EVENT_STATE_ACTIVE)
> >>  		return;
> >>  
> >> +	if (event->pmu != ctx->pmu)
> >> +		perf_pmu_disable(event->pmu);
> >> +
> >>  	event->state = PERF_EVENT_STATE_INACTIVE;
> >>  	if (event->pending_disable) {
> >>  		event->pending_disable = 0;
> >> @@ -1412,6 +1415,9 @@ event_sched_out(struct perf_event *event,
> >>  		ctx->nr_freq--;
> >>  	if (event->attr.exclusive || !cpuctx->active_oncpu)
> >>  		cpuctx->exclusive = 0;
> >> +
> >> +	if (event->pmu != ctx->pmu)
> >> +		perf_pmu_enable(event->pmu);
> >>  }
> >>  
> >>  static void
> >
> > Hmm, indeed. Does it make sense to drop the conditional?
> > perf_pmu_{en,dis}able() is recursive and the thinking is that if its the
> > same PMU the cacheline is hot because we touched it already recently
> > anyway, so the unconditional inc/dec might actually be faster.. dunno.
> 
> Well, given the disable_count check in perf_pmu_{en,dis}able, this one
> indeed looks redundant to me. Should I resend this one separately?

Yes, it seems an unrelated bugfix, like Andi said.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 00/71] perf: Add support for Intel Processor Trace
  2013-12-11 13:47     ` Ingo Molnar
@ 2013-12-16 11:08       ` Alexander Shishkin
  2013-12-16 14:37         ` Ingo Molnar
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-16 11:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen, Adrian Hunter

Ingo Molnar <mingo@kernel.org> writes:

> * Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:
>
>> Ingo Molnar <mingo@kernel.org> writes:
>> 
>> > * Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:
>> >
>> >> Hi,
>> >> 
>> >> This patchset adds support for Intel Processor Trace (PT) extension 
>> >> [1] of Intel Architecture that allows the capture of information 
>> >> about software execution flow, to the perf kernel and userspace 
>> >> infrastructure. We provide an abstraction for it called "itrace" for 
>> >> "instruction trace" ([2]).
>> >
>> > Ok, this feature looks rather interesting.
>> >
>> > On the hardware side this is essentially BTS (Branch Trace Store) 
>> > on steroids (with many extensions), right?
>> 
>> Yes, you get timestamps and all sorts of other useful data in the 
>> trace and the performance intrusion is much less than that of BTS.
>
> So the problem I see here right now that BTS is rarely used and AFAICS 
> close to unmaintained. It has some very minimal support in 'perf 
> script' but that's all I can see.
>
> So one necessary precondition to merging PT support would be to have a 
> convincing case that this kind of stuff is generally useful.

This is not unreasonable. We can have some of this functionality with
BTS.

> One good approach to do that would be to unify the BTS and PT tooling 
> (the kernel side can be unified as well, to the extent it makes 
> sense), and to prove it via actual functionality that this stuff 
> matters. BTS is available widely, so the tooling can be tested by 
> anyone who's interested.
>
> Allow people to record crashes in core dumps, allow them to look at 
> histograms/spectrograms of BTS/PT traces, zoom in on actual traces, 
> etc. - make it easier to handle this huge amount of data and visualize 
> traces in other ways you find useful, etc.
>
> None of that is done right now via BTS so nobody uses it.

So I can make BTS appear as an "itrace" pmu similarly to PT. One
question that comes to mind is should we then dispose of the old
interface that's used for accessing BTS functionality or make it coexist
with the new one.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 00/71] perf: Add support for Intel Processor Trace
  2013-12-16 11:08       ` Alexander Shishkin
@ 2013-12-16 14:37         ` Ingo Molnar
  2013-12-16 15:18           ` Andi Kleen
  0 siblings, 1 reply; 163+ messages in thread
From: Ingo Molnar @ 2013-12-16 14:37 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen, Adrian Hunter


* Alexander Shishkin <alexander.shishkin@linux.intel.com> wrote:

> > One good approach to do that would be to unify the BTS and PT 
> > tooling (the kernel side can be unified as well, to the extent it 
> > makes sense), and to prove it via actual functionality that this 
> > stuff matters. BTS is available widely, so the tooling can be 
> > tested by anyone who's interested.
> >
> > Allow people to record crashes in core dumps, allow them to look 
> > at histograms/spectrograms of BTS/PT traces, zoom in on actual 
> > traces, etc. - make it easier to handle this huge amount of data 
> > and visualize traces in other ways you find useful, etc.
> >
> > None of that is done right now via BTS so nobody uses it.
> 
> So I can make BTS appear as an "itrace" pmu similarly to PT. One 
> question that comes to mind is should we then dispose of the old 
> interface that's used for accessing BTS functionality or make it 
> coexist with the new one.

So we could make the old ABI a CONFIG_PERF_EVENTS_COMPAT_X86_BTS kind 
of legacy option, turned off by default. That allows us its eventual 
future phasing out.

It all depends on how useful the new tooling becomes: if interesting 
things can be done with it via an obvious, powerful interface then 
people might start using it.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 00/71] perf: Add support for Intel Processor Trace
  2013-12-16 14:37         ` Ingo Molnar
@ 2013-12-16 15:18           ` Andi Kleen
  2013-12-16 15:30             ` Frederic Weisbecker
  0 siblings, 1 reply; 163+ messages in thread
From: Andi Kleen @ 2013-12-16 15:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Alexander Shishkin, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Adrian Hunter

> So we could make the old ABI a CONFIG_PERF_EVENTS_COMPAT_X86_BTS kind 
> of legacy option, turned off by default. That allows us its eventual 
> future phasing out.
> 
> It all depends on how useful the new tooling becomes: if interesting 
> things can be done with it via an obvious, powerful interface then 
> people might start using it.

The thing to keep in mind is that BTS is really really slow.

It's unlikely it'll be ever all that useful no matter how the API
looks like.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 163+ messages in thread

* [tip:perf/core] perf tools: Add perf_event_paranoid()
  2013-12-11 12:36 ` [PATCH v0 11/71] perf tools: Add perf_event_paranoid() Alexander Shishkin
@ 2013-12-16 15:26   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 163+ messages in thread
From: tip-bot for Adrian Hunter @ 2013-12-16 15:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, eranian, mingo, mingo, a.p.zijlstra, efault, jolsa,
	fweisbec, ak, dsahern, tglx, hpa, paulus, linux-kernel, namhyung,
	adrian.hunter

Commit-ID:  1a47245d2f3bf6276c95cd37901b562962d6ae47
Gitweb:     http://git.kernel.org/tip/1a47245d2f3bf6276c95cd37901b562962d6ae47
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Wed, 11 Dec 2013 14:36:23 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 13 Dec 2013 10:30:20 -0300

perf tools: Add perf_event_paranoid()

Add a function to return the value of
/proc/sys/kernel/perf_event_paranoid.

This will be used to determine default values for mmap size because perf
is not subject to mmap limits when perf_event_paranoid is less than
zero.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1386765443-26966-12-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c |  3 +--
 tools/perf/util/util.c   | 19 +++++++++++++++++++
 tools/perf/util/util.h   |  1 +
 3 files changed, 21 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index af25055..2eb7378 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1191,8 +1191,7 @@ int perf_evlist__strerror_open(struct perf_evlist *evlist __maybe_unused,
 				    "Error:\t%s.\n"
 				    "Hint:\tCheck /proc/sys/kernel/perf_event_paranoid setting.", emsg);
 
-		if (filename__read_int("/proc/sys/kernel/perf_event_paranoid", &value))
-			break;
+		value = perf_event_paranoid();
 
 		printed += scnprintf(buf + printed, size - printed, "\nHint:\t");
 
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 4a57609..8f63dba 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -1,5 +1,6 @@
 #include "../perf.h"
 #include "util.h"
+#include "fs.h"
 #include <sys/mman.h>
 #ifdef HAVE_BACKTRACE_SUPPORT
 #include <execinfo.h>
@@ -8,6 +9,7 @@
 #include <stdlib.h>
 #include <string.h>
 #include <errno.h>
+#include <limits.h>
 #include <linux/kernel.h>
 
 /*
@@ -496,3 +498,20 @@ const char *get_filename_for_perf_kvm(void)
 
 	return filename;
 }
+
+int perf_event_paranoid(void)
+{
+	char path[PATH_MAX];
+	const char *procfs = procfs__mountpoint();
+	int value;
+
+	if (!procfs)
+		return INT_MAX;
+
+	scnprintf(path, PATH_MAX, "%s/sys/kernel/perf_event_paranoid", procfs);
+
+	if (filename__read_int(path, &value))
+		return INT_MAX;
+
+	return value;
+}
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 0171213..1e7d413 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -321,6 +321,7 @@ void free_srcline(char *srcline);
 
 int filename__read_int(const char *filename, int *value);
 int filename__read_str(const char *filename, char **buf, size_t *sizep);
+int perf_event_paranoid(void);
 
 const char *get_filename_for_perf_kvm(void);
 #endif /* GIT_COMPAT_UTIL_H */

^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [tip:perf/core] perf header: Allow header-> data_offset to be predetermined
  2013-12-11 12:36 ` [PATCH v0 16/71] perf tools: Allow header->data_offset to be predetermined Alexander Shishkin
@ 2013-12-16 15:26   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 163+ messages in thread
From: tip-bot for Adrian Hunter @ 2013-12-16 15:26 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, eranian, mingo, mingo, a.p.zijlstra, efault, jolsa,
	fweisbec, ak, dsahern, tglx, hpa, paulus, linux-kernel, namhyung,
	adrian.hunter

Commit-ID:  d645c442e68d24e64c46845bc8bb5d5a0a70b249
Gitweb:     http://git.kernel.org/tip/d645c442e68d24e64c46845bc8bb5d5a0a70b249
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Wed, 11 Dec 2013 14:36:28 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 13 Dec 2013 10:30:20 -0300

perf header: Allow header->data_offset to be predetermined

It will be necessary to predetermine header->data_offset to allow space
for attributes that are added later.  Consequently, do not change
header->data_offset if it is non-zero.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1386765443-26966-17-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/header.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/header.c b/tools/perf/util/header.c
index 0bb830f..61c5421 100644
--- a/tools/perf/util/header.c
+++ b/tools/perf/util/header.c
@@ -2327,7 +2327,8 @@ int perf_session__write_header(struct perf_session *session,
 		}
 	}
 
-	header->data_offset = lseek(fd, 0, SEEK_CUR);
+	if (!header->data_offset)
+		header->data_offset = lseek(fd, 0, SEEK_CUR);
 	header->feat_offset = header->data_offset + header->data_size;
 
 	if (at_exit) {

^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [tip:perf/core] perf evlist: Add can_select_event() method
  2013-12-11 12:36 ` [PATCH v0 17/71] perf tools: Add perf_evlist__can_select_event() Alexander Shishkin
@ 2013-12-16 15:27   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 163+ messages in thread
From: tip-bot for Adrian Hunter @ 2013-12-16 15:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, eranian, mingo, mingo, a.p.zijlstra, efault, jolsa,
	fweisbec, ak, dsahern, tglx, hpa, paulus, linux-kernel, namhyung,
	adrian.hunter

Commit-ID:  c09ec622629eeb4b7877646a42852e7156363425
Gitweb:     http://git.kernel.org/tip/c09ec622629eeb4b7877646a42852e7156363425
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Wed, 11 Dec 2013 14:36:29 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 13 Dec 2013 10:30:20 -0300

perf evlist: Add can_select_event() method

Add a function to determine whether an event can be selected.

This function is needed to allow a tool to automatically select
additional events, but only if they are available.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1386765443-26966-18-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.h |  2 ++
 tools/perf/util/record.c | 37 +++++++++++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+)

diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 649d6ea..8a04aae 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -193,4 +193,6 @@ static inline void perf_mmap__write_tail(struct perf_mmap *md,
 	pc->data_tail = tail;
 }
 
+bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str);
+
 #endif /* __PERF_EVLIST_H */
diff --git a/tools/perf/util/record.c b/tools/perf/util/record.c
index c8845b1..e510453 100644
--- a/tools/perf/util/record.c
+++ b/tools/perf/util/record.c
@@ -177,3 +177,40 @@ int perf_record_opts__config(struct perf_record_opts *opts)
 {
 	return perf_record_opts__config_freq(opts);
 }
+
+bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str)
+{
+	struct perf_evlist *temp_evlist;
+	struct perf_evsel *evsel;
+	int err, fd, cpu;
+	bool ret = false;
+
+	temp_evlist = perf_evlist__new();
+	if (!temp_evlist)
+		return false;
+
+	err = parse_events(temp_evlist, str);
+	if (err)
+		goto out_delete;
+
+	evsel = perf_evlist__last(temp_evlist);
+
+	if (!evlist || cpu_map__empty(evlist->cpus)) {
+		struct cpu_map *cpus = cpu_map__new(NULL);
+
+		cpu =  cpus ? cpus->map[0] : 0;
+		cpu_map__delete(cpus);
+	} else {
+		cpu = evlist->cpus->map[0];
+	}
+
+	fd = sys_perf_event_open(&evsel->attr, -1, cpu, -1, 0);
+	if (fd >= 0) {
+		close(fd);
+		ret = true;
+	}
+
+out_delete:
+	perf_evlist__delete(temp_evlist);
+	return ret;
+}

^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [tip:perf/core] perf tools: Move mem_bswap32/64 to util.c
  2013-12-11 12:36 ` [PATCH v0 20/71] perf tools: Move mem_bswap32/64 to util.c Alexander Shishkin
@ 2013-12-16 15:27   ` tip-bot for Adrian Hunter
  0 siblings, 0 replies; 163+ messages in thread
From: tip-bot for Adrian Hunter @ 2013-12-16 15:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, eranian, mingo, mingo, a.p.zijlstra, efault, jolsa,
	fweisbec, ak, dsahern, tglx, hpa, paulus, linux-kernel, namhyung,
	adrian.hunter

Commit-ID:  71db07b12eace6a3619335d03eaf3cbe2de131ed
Gitweb:     http://git.kernel.org/tip/71db07b12eace6a3619335d03eaf3cbe2de131ed
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Wed, 11 Dec 2013 14:36:32 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 13 Dec 2013 10:30:21 -0300

perf tools: Move mem_bswap32/64 to util.c

Move functions mem_bswap_32() and mem_bswap_64() so they can be reused.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1386765443-26966-21-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/session.c | 21 ---------------------
 tools/perf/util/session.h |  2 --
 tools/perf/util/util.c    | 22 ++++++++++++++++++++++
 tools/perf/util/util.h    |  3 +++
 4 files changed, 25 insertions(+), 23 deletions(-)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index e748f29..989b2e3 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -247,27 +247,6 @@ void perf_tool__fill_defaults(struct perf_tool *tool)
 	}
 }
  
-void mem_bswap_32(void *src, int byte_size)
-{
-	u32 *m = src;
-	while (byte_size > 0) {
-		*m = bswap_32(*m);
-		byte_size -= sizeof(u32);
-		++m;
-	}
-}
-
-void mem_bswap_64(void *src, int byte_size)
-{
-	u64 *m = src;
-
-	while (byte_size > 0) {
-		*m = bswap_64(*m);
-		byte_size -= sizeof(u64);
-		++m;
-	}
-}
-
 static void swap_sample_id_all(union perf_event *event, void *data)
 {
 	void *end = (void *) event + event->header.size;
diff --git a/tools/perf/util/session.h b/tools/perf/util/session.h
index 2a3955e..9c25d49 100644
--- a/tools/perf/util/session.h
+++ b/tools/perf/util/session.h
@@ -74,8 +74,6 @@ int perf_session__resolve_callchain(struct perf_session *session,
 
 bool perf_session__has_traces(struct perf_session *session, const char *msg);
 
-void mem_bswap_64(void *src, int byte_size);
-void mem_bswap_32(void *src, int byte_size);
 void perf_event__attr_swap(struct perf_event_attr *attr);
 
 int perf_session__create_kernel_maps(struct perf_session *session);
diff --git a/tools/perf/util/util.c b/tools/perf/util/util.c
index 8f63dba..42ad667 100644
--- a/tools/perf/util/util.c
+++ b/tools/perf/util/util.c
@@ -10,6 +10,7 @@
 #include <string.h>
 #include <errno.h>
 #include <limits.h>
+#include <byteswap.h>
 #include <linux/kernel.h>
 
 /*
@@ -515,3 +516,24 @@ int perf_event_paranoid(void)
 
 	return value;
 }
+
+void mem_bswap_32(void *src, int byte_size)
+{
+	u32 *m = src;
+	while (byte_size > 0) {
+		*m = bswap_32(*m);
+		byte_size -= sizeof(u32);
+		++m;
+	}
+}
+
+void mem_bswap_64(void *src, int byte_size)
+{
+	u64 *m = src;
+
+	while (byte_size > 0) {
+		*m = bswap_64(*m);
+		byte_size -= sizeof(u64);
+		++m;
+	}
+}
diff --git a/tools/perf/util/util.h b/tools/perf/util/util.h
index 1e7d413..a1eea3e 100644
--- a/tools/perf/util/util.h
+++ b/tools/perf/util/util.h
@@ -323,5 +323,8 @@ int filename__read_int(const char *filename, int *value);
 int filename__read_str(const char *filename, char **buf, size_t *sizep);
 int perf_event_paranoid(void);
 
+void mem_bswap_64(void *src, int byte_size);
+void mem_bswap_32(void *src, int byte_size);
+
 const char *get_filename_for_perf_kvm(void);
 #endif /* GIT_COMPAT_UTIL_H */

^ permalink raw reply related	[flat|nested] 163+ messages in thread

* [tip:perf/core] perf evlist: Add perf_evlist__to_front()
  2013-12-11 12:36 ` [PATCH v0 23/71] perf evlist: Add perf_evlist__to_front() Alexander Shishkin
  2013-12-11 19:38   ` Arnaldo Carvalho de Melo
@ 2013-12-16 15:27   ` tip-bot for Adrian Hunter
  1 sibling, 0 replies; 163+ messages in thread
From: tip-bot for Adrian Hunter @ 2013-12-16 15:27 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: acme, eranian, mingo, mingo, a.p.zijlstra, efault, jolsa,
	fweisbec, ak, dsahern, tglx, hpa, paulus, linux-kernel, namhyung,
	adrian.hunter

Commit-ID:  a025e4f0d8a92b38539d39b495b530015296b4d9
Gitweb:     http://git.kernel.org/tip/a025e4f0d8a92b38539d39b495b530015296b4d9
Author:     Adrian Hunter <adrian.hunter@intel.com>
AuthorDate: Wed, 11 Dec 2013 14:36:35 +0200
Committer:  Arnaldo Carvalho de Melo <acme@redhat.com>
CommitDate: Fri, 13 Dec 2013 10:30:21 -0300

perf evlist: Add perf_evlist__to_front()

Add a function to move a selected event to the
front of the list.

This is needed because it is not possible
to use the PERF_EVENT_IOC_SET_OUTPUT IOCTL
from an Instruction Tracing event to a
non-Instruction Tracing event.  Thus the
Instruction Tracing event must come first.

Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1386765443-26966-24-git-send-email-alexander.shishkin@linux.intel.com
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
---
 tools/perf/util/evlist.c | 17 +++++++++++++++++
 tools/perf/util/evlist.h |  3 +++
 2 files changed, 20 insertions(+)

diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index 2eb7378..0b31cee 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -1212,3 +1212,20 @@ int perf_evlist__strerror_open(struct perf_evlist *evlist __maybe_unused,
 
 	return 0;
 }
+
+void perf_evlist__to_front(struct perf_evlist *evlist,
+			   struct perf_evsel *move_evsel)
+{
+	struct perf_evsel *evsel, *n;
+	LIST_HEAD(move);
+
+	if (move_evsel == perf_evlist__first(evlist))
+		return;
+
+	list_for_each_entry_safe(evsel, n, &evlist->entries, node) {
+		if (evsel->leader == move_evsel->leader)
+			list_move_tail(&evsel->node, &move);
+	}
+
+	list_splice(&move, &evlist->entries);
+}
diff --git a/tools/perf/util/evlist.h b/tools/perf/util/evlist.h
index 8a04aae..9f64ede 100644
--- a/tools/perf/util/evlist.h
+++ b/tools/perf/util/evlist.h
@@ -194,5 +194,8 @@ static inline void perf_mmap__write_tail(struct perf_mmap *md,
 }
 
 bool perf_evlist__can_select_event(struct perf_evlist *evlist, const char *str);
+void perf_evlist__to_front(struct perf_evlist *evlist,
+			   struct perf_evsel *move_evsel);
+
 
 #endif /* __PERF_EVLIST_H */

^ permalink raw reply related	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 00/71] perf: Add support for Intel Processor Trace
  2013-12-16 15:18           ` Andi Kleen
@ 2013-12-16 15:30             ` Frederic Weisbecker
  2013-12-16 15:45               ` Andi Kleen
  0 siblings, 1 reply; 163+ messages in thread
From: Frederic Weisbecker @ 2013-12-16 15:30 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Alexander Shishkin, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Adrian Hunter

On Mon, Dec 16, 2013 at 07:18:52AM -0800, Andi Kleen wrote:
> > So we could make the old ABI a CONFIG_PERF_EVENTS_COMPAT_X86_BTS kind 
> > of legacy option, turned off by default. That allows us its eventual 
> > future phasing out.
> > 
> > It all depends on how useful the new tooling becomes: if interesting 
> > things can be done with it via an obvious, powerful interface then 
> > people might start using it.
> 
> The thing to keep in mind is that BTS is really really slow.
> 
> It's unlikely it'll be ever all that useful no matter how the API
> looks like.

You're right it's extremely slow. But it can still be relevant for debugging,
at least for apps that don't do too much CPU bound stuffs.

My hope has always been that we can make a userspace function graph tracer
out of its dumps. And I think we can, I'm pretty sure that would be a useful tool.

Even better would be to allow for some perf timehist which we can use to navigate through
the execution flow including all branches. But that's quite sophisticated (although
possibly very useful), still a function graph would be a good beginning.

Now if we find a faster replacement that can dump similar sources or even better
if we can filter by branch type (call and ret is all we need for a function graph tracer)
I'm all for it. But I agree with Ingo that some useful tooling should come along.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 00/71] perf: Add support for Intel Processor Trace
  2013-12-16 15:30             ` Frederic Weisbecker
@ 2013-12-16 15:45               ` Andi Kleen
  2013-12-16 15:57                 ` Frederic Weisbecker
  2013-12-18  4:03                 ` Namhyung Kim
  0 siblings, 2 replies; 163+ messages in thread
From: Andi Kleen @ 2013-12-16 15:45 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Ingo Molnar, Alexander Shishkin, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Adrian Hunter

> You're right it's extremely slow. But it can still be relevant for debugging,
> at least for apps that don't do too much CPU bound stuffs.

There are patches from Markus already for gdb to use it (using the old
BTS perf interface). I'm not sure they have been merged into gdb
mainline yet though.

> My hope has always been that we can make a userspace function graph tracer
> out of its dumps. And I think we can, I'm pretty sure that would be a useful tool.

I wrote one, based on the __fentry__, like the kernel:
http://github.com/andikleen/ftracer

BTS has no timing information, so you could at best do a function tracer
without timing.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 00/71] perf: Add support for Intel Processor Trace
  2013-12-16 15:45               ` Andi Kleen
@ 2013-12-16 15:57                 ` Frederic Weisbecker
  2013-12-18  4:03                 ` Namhyung Kim
  1 sibling, 0 replies; 163+ messages in thread
From: Frederic Weisbecker @ 2013-12-16 15:57 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Alexander Shishkin, Peter Zijlstra,
	Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian, Adrian Hunter

On Mon, Dec 16, 2013 at 07:45:27AM -0800, Andi Kleen wrote:
> > You're right it's extremely slow. But it can still be relevant for debugging,
> > at least for apps that don't do too much CPU bound stuffs.
> 
> There are patches from Markus already for gdb to use it (using the old
> BTS perf interface). I'm not sure they have been merged into gdb
> mainline yet though.

Ok.

> 
> > My hope has always been that we can make a userspace function graph tracer
> > out of its dumps. And I think we can, I'm pretty sure that would be a useful tool.
> 
> I wrote one, based on the __fentry__, like the kernel:
> http://github.com/andikleen/ftracer

Sounds like nice stuff, but that implies building with the gcc option I think.

> 
> BTS has no timing information, so you could at best do a function tracer
> without timing.

Right, now although the function timing was the initial purpose of the function graph
tracer, the graph itself proved to be much more useful :)

But yeah the timing is nice too when we chase hotspot though probably perf report deprecated it.

> 
> -Andi
> -- 
> ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-11 12:36 ` [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units Alexander Shishkin
@ 2013-12-17 16:11   ` Peter Zijlstra
  2013-12-18 13:23     ` Alexander Shishkin
  0 siblings, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-17 16:11 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

On Wed, Dec 11, 2013 at 02:36:16PM +0200, Alexander Shishkin wrote:
> Instruction tracing PMUs are capable of recording a log of instruction
> execution flow on a cpu core, which can be useful for profiling and crash
> analysis. This patch adds itrace infrastructure for perf events and the
> rest of the kernel to use.
> 
> Since such PMUs can produce copious amounts of trace data, it may be
> impractical to process it inside the kernel in real time, but instead export
> raw trace streams to userspace for subsequent analysis. Thus, itrace PMUs
> may export their trace buffers, which can be mmap()ed to userspace from a
> perf event fd with a PERF_EVENT_ITRACE_OFFSET offset. To that end, perf
> is extended to work with multiple ring buffers per event, reusing the
> ring_buffer code in an attempt to reduce complexity.

Please read the thread here: https://lkml.org/lkml/2008/12/4/64

On my thoughts of this creative mmap() usage.

tl;dr: no f*cking way.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 00/71] perf: Add support for Intel Processor Trace
  2013-12-16 15:45               ` Andi Kleen
  2013-12-16 15:57                 ` Frederic Weisbecker
@ 2013-12-18  4:03                 ` Namhyung Kim
  1 sibling, 0 replies; 163+ messages in thread
From: Namhyung Kim @ 2013-12-18  4:03 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Frederic Weisbecker, Ingo Molnar, Alexander Shishkin,
	Peter Zijlstra, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, David Ahern, Jiri Olsa, Mike Galbraith,
	Paul Mackerras, Stephane Eranian, Adrian Hunter

Hi Andi,

On Mon, 16 Dec 2013 07:45:27 -0800, Andi Kleen wrote:
>> My hope has always been that we can make a userspace function graph tracer
>> out of its dumps. And I think we can, I'm pretty sure that would be a useful tool.
>
> I wrote one, based on the __fentry__, like the kernel:
> http://github.com/andikleen/ftracer

Oh, I'm writing a similar one too.  I'll take a look at yours.

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-17 16:11   ` Peter Zijlstra
@ 2013-12-18 13:23     ` Alexander Shishkin
  2013-12-18 13:34       ` Peter Zijlstra
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-18 13:23 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Dec 11, 2013 at 02:36:16PM +0200, Alexander Shishkin wrote:
>> Instruction tracing PMUs are capable of recording a log of instruction
>> execution flow on a cpu core, which can be useful for profiling and crash
>> analysis. This patch adds itrace infrastructure for perf events and the
>> rest of the kernel to use.
>> 
>> Since such PMUs can produce copious amounts of trace data, it may be
>> impractical to process it inside the kernel in real time, but instead export
>> raw trace streams to userspace for subsequent analysis. Thus, itrace PMUs
>> may export their trace buffers, which can be mmap()ed to userspace from a
>> perf event fd with a PERF_EVENT_ITRACE_OFFSET offset. To that end, perf
>> is extended to work with multiple ring buffers per event, reusing the
>> ring_buffer code in an attempt to reduce complexity.
>
> Please read the thread here: https://lkml.org/lkml/2008/12/4/64
>
> On my thoughts of this creative mmap() usage.

That's unfortunate, it made sense to me. But let's then have a look at
the alternative approaches. Bearing in mind that it is crucial for us to
export trace buffers to userspace as opposed to processing the trace
data in the kernel, the fact that we still need the normal perf data
stream and your dislike for mmap trickery, we need two separate file
descriptors: one for the perf data and one for the trace data.

One way of doing this would be to call sys_perf_event_open() once for
each. The first call would return a file descriptor, which provides good
old perf data buffer; the second call would use this file descriptor for
a group leader and will return another descriptor (thus creating another
perf_event), which, when mmap()ed, will provide a trace buffer.

Or, we could introduce a new PERF_FLAG_XXX to mean that we want a
descriptor with a trace buffer. And then, of course, one could always
add an ioctl(), but that'd probably be a bit over the top.

Do any of these sound reasonable? Any other possibilities that I'm
missing here?

Thanks,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-18 13:23     ` Alexander Shishkin
@ 2013-12-18 13:34       ` Peter Zijlstra
  2013-12-18 14:01         ` Alexander Shishkin
  0 siblings, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-18 13:34 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

On Wed, Dec 18, 2013 at 03:23:41PM +0200, Alexander Shishkin wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Wed, Dec 11, 2013 at 02:36:16PM +0200, Alexander Shishkin wrote:
> >> Instruction tracing PMUs are capable of recording a log of instruction
> >> execution flow on a cpu core, which can be useful for profiling and crash
> >> analysis. This patch adds itrace infrastructure for perf events and the
> >> rest of the kernel to use.
> >> 
> >> Since such PMUs can produce copious amounts of trace data, it may be
> >> impractical to process it inside the kernel in real time, but instead export
> >> raw trace streams to userspace for subsequent analysis. Thus, itrace PMUs
> >> may export their trace buffers, which can be mmap()ed to userspace from a
> >> perf event fd with a PERF_EVENT_ITRACE_OFFSET offset. To that end, perf
> >> is extended to work with multiple ring buffers per event, reusing the
> >> ring_buffer code in an attempt to reduce complexity.
> >
> > Please read the thread here: https://lkml.org/lkml/2008/12/4/64
> >
> > On my thoughts of this creative mmap() usage.
> 
> That's unfortunate, it made sense to me. But let's then have a look at
> the alternative approaches. Bearing in mind that it is crucial for us to
> export trace buffers to userspace as opposed to processing the trace
> data in the kernel, the fact that we still need the normal perf data
> stream and your dislike for mmap trickery, we need two separate file
> descriptors: one for the perf data and one for the trace data.

Why don't you start by explaining _why_ you need a second stream to
begin with?

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-18 13:34       ` Peter Zijlstra
@ 2013-12-18 14:01         ` Alexander Shishkin
  2013-12-18 14:11           ` Peter Zijlstra
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-18 14:01 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Dec 18, 2013 at 03:23:41PM +0200, Alexander Shishkin wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> 
>> > On Wed, Dec 11, 2013 at 02:36:16PM +0200, Alexander Shishkin wrote:
>> >> Instruction tracing PMUs are capable of recording a log of instruction
>> >> execution flow on a cpu core, which can be useful for profiling and crash
>> >> analysis. This patch adds itrace infrastructure for perf events and the
>> >> rest of the kernel to use.
>> >> 
>> >> Since such PMUs can produce copious amounts of trace data, it may be
>> >> impractical to process it inside the kernel in real time, but instead export
>> >> raw trace streams to userspace for subsequent analysis. Thus, itrace PMUs
>> >> may export their trace buffers, which can be mmap()ed to userspace from a
>> >> perf event fd with a PERF_EVENT_ITRACE_OFFSET offset. To that end, perf
>> >> is extended to work with multiple ring buffers per event, reusing the
>> >> ring_buffer code in an attempt to reduce complexity.
>> >
>> > Please read the thread here: https://lkml.org/lkml/2008/12/4/64
>> >
>> > On my thoughts of this creative mmap() usage.
>> 
>> That's unfortunate, it made sense to me. But let's then have a look at
>> the alternative approaches. Bearing in mind that it is crucial for us to
>> export trace buffers to userspace as opposed to processing the trace
>> data in the kernel, the fact that we still need the normal perf data
>> stream and your dislike for mmap trickery, we need two separate file
>> descriptors: one for the perf data and one for the trace data.
>
> Why don't you start by explaining _why_ you need a second stream to
> begin with?

Oh, I'm sure I've explained it earlier ([1], [2]), but why not. The data
in the second stream is generated at a rate which is hundreds of
megabytes per second per core. Decoding this data is ~1000 times slower
than generating it. Ergo, can't be done in kernel, needs to be exported
as-is to userspace for later retreival and decoding. Doing it via perf
stream means an extra copy, which at these rates is a waste. Ergo, a
second buffer.

[1] https://lkml.org/lkml/2013/12/11/213
[2] https://lkml.org/lkml/2013/12/11/358

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-18 14:01         ` Alexander Shishkin
@ 2013-12-18 14:11           ` Peter Zijlstra
  2013-12-18 14:22             ` Alexander Shishkin
  0 siblings, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-18 14:11 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

On Wed, Dec 18, 2013 at 04:01:04PM +0200, Alexander Shishkin wrote:
> > Why don't you start by explaining _why_ you need a second stream to
> > begin with?
> 
> Oh, I'm sure I've explained it earlier ([1], [2])

See, I didn't read 0 because that information gets lost and patches
should be self explanatory, and i didn't get to the Intel driver yet
because well, I got stuck in the generic code.

> but why not. The data
> in the second stream is generated at a rate which is hundreds of
> megabytes per second per core. Decoding this data is ~1000 times slower
> than generating it. Ergo, can't be done in kernel, needs to be exported
> as-is to userspace for later retreival and decoding. Doing it via perf
> stream means an extra copy, which at these rates is a waste. Ergo, a
> second buffer.

Still confused, if you cannot copy it into one buffer, then why can you
copy it into a second buffer?


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-18 14:11           ` Peter Zijlstra
@ 2013-12-18 14:22             ` Alexander Shishkin
  2013-12-18 15:09               ` Peter Zijlstra
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-18 14:22 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Dec 18, 2013 at 04:01:04PM +0200, Alexander Shishkin wrote:
>> > Why don't you start by explaining _why_ you need a second stream to
>> > begin with?
>> 
>> Oh, I'm sure I've explained it earlier ([1], [2])
>
> See, I didn't read 0 because that information gets lost and patches
> should be self explanatory, and i didn't get to the Intel driver yet
> because well, I got stuck in the generic code.

Sure. The general concept is more important than the actual driver at
this point anyway.

>> but why not. The data
>> in the second stream is generated at a rate which is hundreds of
>> megabytes per second per core. Decoding this data is ~1000 times slower
>> than generating it. Ergo, can't be done in kernel, needs to be exported
>> as-is to userspace for later retreival and decoding. Doing it via perf
>> stream means an extra copy, which at these rates is a waste. Ergo, a
>> second buffer.
>
> Still confused, if you cannot copy it into one buffer, then why can you
> copy it into a second buffer?

It's not copied, hardware writes directly into that second buffer.

I've done the same with BTS now (as Ingo suggested) and it also benefits
from this approach.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-18 14:22             ` Alexander Shishkin
@ 2013-12-18 15:09               ` Peter Zijlstra
  2013-12-19  7:53                 ` Alexander Shishkin
  0 siblings, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-18 15:09 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

On Wed, Dec 18, 2013 at 04:22:36PM +0200, Alexander Shishkin wrote:
> > Still confused, if you cannot copy it into one buffer, then why can you
> > copy it into a second buffer?
> 
> It's not copied, hardware writes directly into that second buffer.

Where's the PT documentation? I can't find it in the SDM and your ISA
extensions link is a generic Intel website which is friggin useless
(like all corporate websites strive to be).

Your actual PT patch doesn't describe how the things works either, and
while I could go read the code, I'm too lazy.

The thing is; why can't you zero-copy whatever buffer the hardware
writes into, into the normal buffer?

Machinery like that would also be useful to zero-copy bits out of the
buffer right into the page-cache.

> I've done the same with BTS now (as Ingo suggested) and it also benefits
> from this approach.

The problem with DS is that it needs physically contiguous pages is it
not? So you cannot really allocate a large buffer, and you end up
needing to copy or swizzle stuff.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-18 15:09               ` Peter Zijlstra
@ 2013-12-19  7:53                 ` Alexander Shishkin
  2013-12-19 10:26                   ` Peter Zijlstra
  2013-12-19 10:31                   ` Peter Zijlstra
  0 siblings, 2 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-19  7:53 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

Peter Zijlstra <peterz@infradead.org> writes:

> On Wed, Dec 18, 2013 at 04:22:36PM +0200, Alexander Shishkin wrote:
>> > Still confused, if you cannot copy it into one buffer, then why can you
>> > copy it into a second buffer?
>> 
>> It's not copied, hardware writes directly into that second buffer.
>
> Where's the PT documentation? I can't find it in the SDM and your ISA
> extensions link is a generic Intel website which is friggin useless
> (like all corporate websites strive to be).

[1]

> Your actual PT patch doesn't describe how the things works either, and
> while I could go read the code, I'm too lazy.
>
> The thing is; why can't you zero-copy whatever buffer the hardware
> writes into, into the normal buffer?

I'm not sure I understand. You mean, have the buffer split between perf
data and trace data?

> Machinery like that would also be useful to zero-copy bits out of the
> buffer right into the page-cache.

Please elaborate.

>> I've done the same with BTS now (as Ingo suggested) and it also benefits
>> from this approach.
>
> The problem with DS is that it needs physically contiguous pages is it
> not? So you cannot really allocate a large buffer, and you end up
> needing to copy or swizzle stuff.

Yes and some implementations of PT have the same issue, but you can do a
sufficiently large high order allocation and map it to userspace and
still no copying (or parsing/decoding) in kernel space required.

[1] http://download-software.intel.com/sites/default/files/managed/71/2e/319433-017.pdf

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19  7:53                 ` Alexander Shishkin
@ 2013-12-19 10:26                   ` Peter Zijlstra
  2013-12-19 11:14                     ` Alexander Shishkin
  2013-12-19 10:31                   ` Peter Zijlstra
  1 sibling, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-19 10:26 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> > The thing is; why can't you zero-copy whatever buffer the hardware
> > writes into, into the normal buffer?
> 
> I'm not sure I understand. You mean, have the buffer split between perf
> data and trace data?

Yep, I don't see any reason why this wouldn't work.

When the hardware thing sends an interrupt to notify us its buffer is
'full', stop the recorder, try to create a single record in the buffer
that's big enough + 1 page, then swizzle the hardware pages and the
buffer pages for that record, using the +1 page to page align the actual
data. Then (re)start the hardware on the 'new' pages.



^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19  7:53                 ` Alexander Shishkin
  2013-12-19 10:26                   ` Peter Zijlstra
@ 2013-12-19 10:31                   ` Peter Zijlstra
  2013-12-19 11:17                     ` Alexander Shishkin
  1 sibling, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-19 10:31 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
> Yes and some implementations of PT have the same issue, but you can do a
> sufficiently large high order allocation and map it to userspace and
> still no copying (or parsing/decoding) in kernel space required.

What's sufficiently large? The largest we could possibly allocate is
something like 4k^11 which is 8M or so. That's not all that big given
you keep saying it generates in the order of 100 MB/s.

Also, 'some implementations', that sounds like a fail right there. Why
are there already different implementations, and some which such stupid
design, of something this new?

How about just saying NO to the ones that requires physically contiguous
allocations?

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 10:26                   ` Peter Zijlstra
@ 2013-12-19 11:14                     ` Alexander Shishkin
  2013-12-19 11:25                       ` Peter Zijlstra
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-19 11:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> > The thing is; why can't you zero-copy whatever buffer the hardware
>> > writes into, into the normal buffer?
>> 
>> I'm not sure I understand. You mean, have the buffer split between perf
>> data and trace data?
>
> Yep, I don't see any reason why this wouldn't work.
>
> When the hardware thing sends an interrupt to notify us its buffer is
> 'full', stop the recorder, try to create a single record in the buffer
> that's big enough + 1 page, then swizzle the hardware pages and the
> buffer pages for that record, using the +1 page to page align the actual
> data. Then (re)start the hardware on the 'new' pages.

We configure the hardware thing to send an interrupt *before* the buffer
is full, keep the recorder running while userspace saves stuff to
perf.data file. Recording only stops if perf fails to read the trace
data out fast enough and the buffer fills up. So you'd have a complete
trace.

Also, we have what we call a "snapshot" mode, where we keep the hardware
thing running, writing data to a circular buffer till it's stopped, in
case we're only interested in the most recent trace data to see what it
is that takes too long to respond, etc. And while it is running, we're
getting new records in the perf stream all the time (mmaps, etc).

Put simple: perf data and trace data are two different separate types of
information that originate from two different sources, can exist and
make sense separately from one another and should not be mixed.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 10:31                   ` Peter Zijlstra
@ 2013-12-19 11:17                     ` Alexander Shishkin
  2013-12-19 11:28                       ` Peter Zijlstra
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-19 11:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
>> Yes and some implementations of PT have the same issue, but you can do a
>> sufficiently large high order allocation and map it to userspace and
>> still no copying (or parsing/decoding) in kernel space required.
>
> What's sufficiently large? The largest we could possibly allocate is
> something like 4k^11 which is 8M or so. That's not all that big given
> you keep saying it generates in the order of 100 MB/s.

One chunk is 8M. You can have as many as the buddy allocator permits you
to have. When you get a PMI, you simply switch one chunk for another and
on the tracing goes.

> Also, 'some implementations', that sounds like a fail right there. Why
> are there already different implementations, and some which such stupid
> design, of something this new?
>
> How about just saying NO to the ones that requires physically contiguous
> allocations?

No reason to leave those out, because they are still extremely useful
for tracing and fit perfectly fine in a model with two buffers.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 11:14                     ` Alexander Shishkin
@ 2013-12-19 11:25                       ` Peter Zijlstra
  2013-12-19 11:57                         ` Alexander Shishkin
  0 siblings, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-19 11:25 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

On Thu, Dec 19, 2013 at 01:14:09PM +0200, Alexander Shishkin wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
> >> Peter Zijlstra <peterz@infradead.org> writes:
> >> > The thing is; why can't you zero-copy whatever buffer the hardware
> >> > writes into, into the normal buffer?
> >> 
> >> I'm not sure I understand. You mean, have the buffer split between perf
> >> data and trace data?
> >
> > Yep, I don't see any reason why this wouldn't work.
> >
> > When the hardware thing sends an interrupt to notify us its buffer is
> > 'full', stop the recorder, try to create a single record in the buffer
> > that's big enough + 1 page, then swizzle the hardware pages and the
> > buffer pages for that record, using the +1 page to page align the actual
> > data. Then (re)start the hardware on the 'new' pages.
> 
> We configure the hardware thing to send an interrupt *before* the buffer
> is full, keep the recorder running while userspace saves stuff to
> perf.data file. Recording only stops if perf fails to read the trace
> data out fast enough and the buffer fills up. So you'd have a complete
> trace.
> 
> Also, we have what we call a "snapshot" mode, where we keep the hardware
> thing running, writing data to a circular buffer till it's stopped, in
> case we're only interested in the most recent trace data to see what it
> is that takes too long to respond, etc. And while it is running, we're
> getting new records in the perf stream all the time (mmaps, etc).
> 
> Put simple: perf data and trace data are two different separate types of
> information that originate from two different sources, can exist and
> make sense separately from one another and should not be mixed.

Well you're either having to change your stance or we're done talking
right now.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 11:17                     ` Alexander Shishkin
@ 2013-12-19 11:28                       ` Peter Zijlstra
  2013-12-19 11:57                         ` Peter Zijlstra
                                           ` (2 more replies)
  0 siblings, 3 replies; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-19 11:28 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

On Thu, Dec 19, 2013 at 01:17:51PM +0200, Alexander Shishkin wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
> >> Yes and some implementations of PT have the same issue, but you can do a
> >> sufficiently large high order allocation and map it to userspace and
> >> still no copying (or parsing/decoding) in kernel space required.
> >
> > What's sufficiently large? The largest we could possibly allocate is
> > something like 4k^11 which is 8M or so. That's not all that big given
> > you keep saying it generates in the order of 100 MB/s.
> 
> One chunk is 8M. You can have as many as the buddy allocator permits you
> to have. When you get a PMI, you simply switch one chunk for another and
> on the tracing goes.

This document you referred me to looks to specify something with a
proper s/g implementation; called ToPA. There doesn't appear to be a
limit to the linked entries and you can specify a size per entry, and I
don't see anywhere why 4k would be bad.

That said, I'm still reading..

> > Also, 'some implementations', that sounds like a fail right there. Why
> > are there already different implementations, and some which such stupid
> > design, of something this new?
> >
> > How about just saying NO to the ones that requires physically contiguous
> > allocations?
> 
> No reason to leave those out, because they are still extremely useful
> for tracing and fit perfectly fine in a model with two buffers.

Maybe; but lets start with the sane hardware. Then we'll look at the
amount of pain needed to support these broken pieces of crap and decide
later.

So drop all support for crappy hardware now.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 11:25                       ` Peter Zijlstra
@ 2013-12-19 11:57                         ` Alexander Shishkin
  0 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-19 11:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Dec 19, 2013 at 01:14:09PM +0200, Alexander Shishkin wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> 
>> > On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
>> >> Peter Zijlstra <peterz@infradead.org> writes:
>> >> > The thing is; why can't you zero-copy whatever buffer the hardware
>> >> > writes into, into the normal buffer?
>> >> 
>> >> I'm not sure I understand. You mean, have the buffer split between perf
>> >> data and trace data?
>> >
>> > Yep, I don't see any reason why this wouldn't work.
>> >
>> > When the hardware thing sends an interrupt to notify us its buffer is
>> > 'full', stop the recorder, try to create a single record in the buffer
>> > that's big enough + 1 page, then swizzle the hardware pages and the
>> > buffer pages for that record, using the +1 page to page align the actual
>> > data. Then (re)start the hardware on the 'new' pages.
>> 
>> We configure the hardware thing to send an interrupt *before* the buffer
>> is full, keep the recorder running while userspace saves stuff to
>> perf.data file. Recording only stops if perf fails to read the trace
>> data out fast enough and the buffer fills up. So you'd have a complete
>> trace.
>> 
>> Also, we have what we call a "snapshot" mode, where we keep the hardware
>> thing running, writing data to a circular buffer till it's stopped, in
>> case we're only interested in the most recent trace data to see what it
>> is that takes too long to respond, etc. And while it is running, we're
>> getting new records in the perf stream all the time (mmaps, etc).
>> 
>> Put simple: perf data and trace data are two different separate types of
>> information that originate from two different sources, can exist and
>> make sense separately from one another and should not be mixed.
>
> Well you're either having to change your stance or we're done talking
> right now.

I'm making a case in favor of 2 separate buffers just like you asked in
one of the previous emails. It's backed by some very real usecases. That
said, I'm not personally attached to any one design, only what makes
sense. There is no 'stance'.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 11:28                       ` Peter Zijlstra
@ 2013-12-19 11:57                         ` Peter Zijlstra
  2013-12-19 12:52                           ` Peter Zijlstra
  2013-12-19 12:57                           ` Peter Zijlstra
  2013-12-19 11:58                         ` Alexander Shishkin
  2013-12-19 12:39                         ` Ingo Molnar
  2 siblings, 2 replies; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-19 11:57 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

On Thu, Dec 19, 2013 at 12:28:12PM +0100, Peter Zijlstra wrote:
> This document you referred me to looks to specify something with a
> proper s/g implementation; called ToPA. There doesn't appear to be a
> limit to the linked entries and you can specify a size per entry, and I
> don't see anywhere why 4k would be bad.
> 
> That said, I'm still reading..

Found it:

"Single Output Region ToPA Implementation

The first processor generation to implement Intel PT supports only ToPA
configurations with a single ToPA entry followed by an END entry that
points back to the first entry (creating one circular output buffer).
Such processors enumerate CPUID.(EAX=14H,ECX=0):EBX[bit 1] as 0."

So basically you guys buggered the hardware.

More specifically, what actual hardware is this? Is this first
generation HSW or so?

Please enumerate the actual hardware that supports this PT stuff and
which hardware has it fixed.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 11:28                       ` Peter Zijlstra
  2013-12-19 11:57                         ` Peter Zijlstra
@ 2013-12-19 11:58                         ` Alexander Shishkin
  2013-12-19 12:39                         ` Ingo Molnar
  2 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-19 11:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Dec 19, 2013 at 01:17:51PM +0200, Alexander Shishkin wrote:
>> Peter Zijlstra <peterz@infradead.org> writes:
>> 
>> > On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
>> >> Yes and some implementations of PT have the same issue, but you can do a
>> >> sufficiently large high order allocation and map it to userspace and
>> >> still no copying (or parsing/decoding) in kernel space required.
>> >
>> > What's sufficiently large? The largest we could possibly allocate is
>> > something like 4k^11 which is 8M or so. That's not all that big given
>> > you keep saying it generates in the order of 100 MB/s.
>> 
>> One chunk is 8M. You can have as many as the buddy allocator permits you
>> to have. When you get a PMI, you simply switch one chunk for another and
>> on the tracing goes.
>
> This document you referred me to looks to specify something with a
> proper s/g implementation; called ToPA. There doesn't appear to be a
> limit to the linked entries and you can specify a size per entry, and I
> don't see anywhere why 4k would be bad.

JFYI, 11.2.4.1, "Single Output Region ToPA Implementation".

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 11:28                       ` Peter Zijlstra
  2013-12-19 11:57                         ` Peter Zijlstra
  2013-12-19 11:58                         ` Alexander Shishkin
@ 2013-12-19 12:39                         ` Ingo Molnar
  2013-12-19 14:30                           ` Alexander Shishkin
  2 siblings, 1 reply; 163+ messages in thread
From: Ingo Molnar @ 2013-12-19 12:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexander Shishkin, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Thu, Dec 19, 2013 at 01:17:51PM +0200, Alexander Shishkin wrote:
> > Peter Zijlstra <peterz@infradead.org> writes:
> > 
> > > On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
> > >> Yes and some implementations of PT have the same issue, but you can do a
> > >> sufficiently large high order allocation and map it to userspace and
> > >> still no copying (or parsing/decoding) in kernel space required.
> > >
> > > What's sufficiently large? The largest we could possibly allocate is
> > > something like 4k^11 which is 8M or so. That's not all that big given
> > > you keep saying it generates in the order of 100 MB/s.
> > 
> > One chunk is 8M. You can have as many as the buddy allocator permits you
> > to have. When you get a PMI, you simply switch one chunk for another and
> > on the tracing goes.
> 
> This document you referred me to looks to specify something with a
> proper s/g implementation; called ToPA. There doesn't appear to be a
> limit to the linked entries and you can specify a size per entry, and I
> don't see anywhere why 4k would be bad.
> 
> That said, I'm still reading..
> 
> > > Also, 'some implementations', that sounds like a fail right there. Why
> > > are there already different implementations, and some which such stupid
> > > design, of something this new?
> > >
> > > How about just saying NO to the ones that requires physically contiguous
> > > allocations?
> > 
> > No reason to leave those out, because they are still extremely useful
> > for tracing and fit perfectly fine in a model with two buffers.
> 
> Maybe; but lets start with the sane hardware. Then we'll look at the 
> amount of pain needed to support these broken pieces of crap and 
> decide later.
> 
> So drop all support for crappy hardware now.

Absolutely agreed ...

The thing is, BTS itself is rarely used (and not primarily because 
it's slow, but because its tooling and thus its utility is poor), so 
the last thing we want is another piece of broken hardware with a 
quirky software interface to it that tooling has trouble utilizing.

Sigh, when will Intel learn to talk to Linux PMU experts _before_ 
committing to a hardware interface??

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 11:57                         ` Peter Zijlstra
@ 2013-12-19 12:52                           ` Peter Zijlstra
  2013-12-19 12:57                           ` Peter Zijlstra
  1 sibling, 0 replies; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-19 12:52 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen



Found more:

"Note that no “freezing” takes place with the ToPA PMI. Thus, packet
generation is not frozen, and the interrupt handler will be traced
(though filtering can prevent this). Further, the setting of
IA32_DEBUGCTL.Freeze_Perfmon_on_PMI is ignored and performance counters
are not frozen by a ToPA PMI."


Can someone confirm with the hardware people what happens when an actual
PMU counter overflows and tries to raise the PMI while we're in one that
ignores the 'Freeze_perfmon_on_PMI' bit?

Since you cannot assert an interrupt that already asserted, but that
handler can see the overflow status bit set and will likely process it;
assuming the PMU is actually frozen.

Also, this just smells ripe for errata and ugly bugs.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 11:57                         ` Peter Zijlstra
  2013-12-19 12:52                           ` Peter Zijlstra
@ 2013-12-19 12:57                           ` Peter Zijlstra
  2013-12-19 14:54                             ` Alexander Shishkin
  1 sibling, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-19 12:57 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

On Thu, Dec 19, 2013 at 12:57:59PM +0100, Peter Zijlstra wrote:
> On Thu, Dec 19, 2013 at 12:28:12PM +0100, Peter Zijlstra wrote:
> > This document you referred me to looks to specify something with a
> > proper s/g implementation; called ToPA. There doesn't appear to be a
> > limit to the linked entries and you can specify a size per entry, and I
> > don't see anywhere why 4k would be bad.
> > 
> > That said, I'm still reading..
> 
> Found it:
> 
> "Single Output Region ToPA Implementation
> 
> The first processor generation to implement Intel PT supports only ToPA
> configurations with a single ToPA entry followed by an END entry that
> points back to the first entry (creating one circular output buffer).
> Such processors enumerate CPUID.(EAX=14H,ECX=0):EBX[bit 1] as 0."
> 
> So basically you guys buggered the hardware.
> 

"ToPA PMI and Single Output Region ToPA Implementation

A processor that supports only a single ToPA output region
implementation (such that only one output region is supported; see
above) will attempt to signal a ToPA PMI interrupt before the output
wraps and overwrites the top of the buffer. To support this
functionality, the PMI handler should disable packet generation as soon
as possible.  Due to PMI skid, it is possible, in rare cases, that the
wrap will have occurred before the PMI is delivered. Software can avoid
this by setting the STOP bit in the ToPA entry (see Table 11-3); this
will disable tracing once the region is filled, and no wrap will occur.
This approach has the downside of disabling packet generation so that
some of the instructions that led up to the PMI will not be traced. If
the PMI skid is significant enough to cause the region to fill and
tracing to be disabled, the PMI handler will need to clear the
IA32_RTIT_STATUS.Stopped indication before tracing can resume."


So you're basically forced to stop the tracing on PMI anyhow; so your
continuous tracing argument goes out the window.

Also, what a complete clusterfuck. I think we're far better of
pretending PT doesn't exist until its fixed.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 12:39                         ` Ingo Molnar
@ 2013-12-19 14:30                           ` Alexander Shishkin
  2013-12-19 14:49                             ` Frederic Weisbecker
  2013-12-19 15:10                             ` Peter Zijlstra
  0 siblings, 2 replies; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-19 14:30 UTC (permalink / raw)
  To: Ingo Molnar, Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

Ingo Molnar <mingo@kernel.org> writes:

> * Peter Zijlstra <peterz@infradead.org> wrote:
>
>> On Thu, Dec 19, 2013 at 01:17:51PM +0200, Alexander Shishkin wrote:
>> > Peter Zijlstra <peterz@infradead.org> writes:
>> > 
>> > > On Thu, Dec 19, 2013 at 09:53:44AM +0200, Alexander Shishkin wrote:
>> > >> Yes and some implementations of PT have the same issue, but you can do a
>> > >> sufficiently large high order allocation and map it to userspace and
>> > >> still no copying (or parsing/decoding) in kernel space required.
>> > >
>> > > What's sufficiently large? The largest we could possibly allocate is
>> > > something like 4k^11 which is 8M or so. That's not all that big given
>> > > you keep saying it generates in the order of 100 MB/s.
>> > 
>> > One chunk is 8M. You can have as many as the buddy allocator permits you
>> > to have. When you get a PMI, you simply switch one chunk for another and
>> > on the tracing goes.
>> 
>> This document you referred me to looks to specify something with a
>> proper s/g implementation; called ToPA. There doesn't appear to be a
>> limit to the linked entries and you can specify a size per entry, and I
>> don't see anywhere why 4k would be bad.
>> 
>> That said, I'm still reading..
>> 
>> > > Also, 'some implementations', that sounds like a fail right there. Why
>> > > are there already different implementations, and some which such stupid
>> > > design, of something this new?
>> > >
>> > > How about just saying NO to the ones that requires physically contiguous
>> > > allocations?
>> > 
>> > No reason to leave those out, because they are still extremely useful
>> > for tracing and fit perfectly fine in a model with two buffers.
>> 
>> Maybe; but lets start with the sane hardware. Then we'll look at the 
>> amount of pain needed to support these broken pieces of crap and 
>> decide later.
>> 
>> So drop all support for crappy hardware now.
>
> Absolutely agreed ...
>
> The thing is, BTS itself is rarely used (and not primarily because 
> it's slow, but because its tooling and thus its utility is poor), so 
> the last thing we want is another piece of broken hardware with a 
> quirky software interface to it that tooling has trouble utilizing.

Or the interface and implementation of BTS support in the kernel
discourage its use and that is why it is so rarely used.

What I'm proposing is a unified interface for trace units to export
their traces and not only the "non-crappy" ones, in a way that won't
discourage its use from day one.

So I'd like to steer away from the ways in which hardware can be broken
and talk about a usable interface, to begin with.

Regargs,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 14:30                           ` Alexander Shishkin
@ 2013-12-19 14:49                             ` Frederic Weisbecker
  2013-12-19 15:02                               ` Peter Zijlstra
  2013-12-19 15:10                             ` Peter Zijlstra
  1 sibling, 1 reply; 163+ messages in thread
From: Frederic Weisbecker @ 2013-12-19 14:49 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, Peter Zijlstra, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, David Ahern, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen

On Thu, Dec 19, 2013 at 04:30:53PM +0200, Alexander Shishkin wrote:
> Or the interface and implementation of BTS support in the kernel
> discourage its use and that is why it is so rarely used.

I never heard complains about it. It's a simple dump of from/to address couples.
I just think nobody take the time to develop userspace tooling to exploit it.
But it's famous slowness might have had a bad influence on this. And may be
also the fact that it's very architecture specific. AMD doesn't support BTS if I recall
correctly. Or may be it has its own different implementation?

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 12:57                           ` Peter Zijlstra
@ 2013-12-19 14:54                             ` Alexander Shishkin
  2013-12-19 15:14                               ` Peter Zijlstra
  0 siblings, 1 reply; 163+ messages in thread
From: Alexander Shishkin @ 2013-12-19 14:54 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Dec 19, 2013 at 12:57:59PM +0100, Peter Zijlstra wrote:
> So you're basically forced to stop the tracing on PMI anyhow; so your
> continuous tracing argument goes out the window.

It's only stopped inside the PMI handler to set up another buffer, and
is then started again, so no useful trace is lost. PMI handler is not
traced. What you're proposing is stopping it for good till perf collects
the previous data, which will lose us a lot of trace. So my argument
stands.

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 14:49                             ` Frederic Weisbecker
@ 2013-12-19 15:02                               ` Peter Zijlstra
  0 siblings, 0 replies; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-19 15:02 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Alexander Shishkin, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, David Ahern, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian,
	Andi Kleen

On Thu, Dec 19, 2013 at 03:49:42PM +0100, Frederic Weisbecker wrote:
> On Thu, Dec 19, 2013 at 04:30:53PM +0200, Alexander Shishkin wrote:
> > Or the interface and implementation of BTS support in the kernel
> > discourage its use and that is why it is so rarely used.
> 
> I never heard complains about it. It's a simple dump of from/to address couples.
> I just think nobody take the time to develop userspace tooling to exploit it.
> But it's famous slowness might have had a bad influence on this. And may be
> also the fact that it's very architecture specific. AMD doesn't support BTS if I recall
> correctly. Or may be it has its own different implementation?

No AMD doesn't do anything like that.

There was some attempt to cure some of the wobblies:

  https://lkml.org/lkml/2013/7/8/154

But people never pursued that.

That said, if people want overwrite mode to work for PT we'd need to fix
the same thing.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 14:30                           ` Alexander Shishkin
  2013-12-19 14:49                             ` Frederic Weisbecker
@ 2013-12-19 15:10                             ` Peter Zijlstra
  2014-01-06 21:25                               ` Andi Kleen
  1 sibling, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-19 15:10 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel,
	David Ahern, Frederic Weisbecker, Jiri Olsa, Mike Galbraith,
	Namhyung Kim, Paul Mackerras, Stephane Eranian, Andi Kleen

On Thu, Dec 19, 2013 at 04:30:53PM +0200, Alexander Shishkin wrote:
> So I'd like to steer away from the ways in which hardware can be broken
> and talk about a usable interface, to begin with.

Just dump it into the regular one buffer like I outlined.

That said; we very much need to have at least two architectures
implemented for any of this code to move.

But we cannot ignore the hardware trainwreck; we cannot shape our
interface around something that's utterly broken.

Some hardware is just too broken to support.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 14:54                             ` Alexander Shishkin
@ 2013-12-19 15:14                               ` Peter Zijlstra
  0 siblings, 0 replies; 163+ messages in thread
From: Peter Zijlstra @ 2013-12-19 15:14 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian, Andi Kleen

On Thu, Dec 19, 2013 at 04:54:27PM +0200, Alexander Shishkin wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Thu, Dec 19, 2013 at 12:57:59PM +0100, Peter Zijlstra wrote:
> > So you're basically forced to stop the tracing on PMI anyhow; so your
> > continuous tracing argument goes out the window.
> 
> It's only stopped inside the PMI handler to set up another buffer, and
> is then started again, so no useful trace is lost. PMI handler is not
> traced. What you're proposing is stopping it for good till perf collects
> the previous data, which will lose us a lot of trace. So my argument
> stands.

That is not what I proposed at all.

The PMI will swizzle the pages and resume recording. If there is no
space in the output buffer, we'll simply re-use the existing pages and
overwrite data.




^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2013-12-19 15:10                             ` Peter Zijlstra
@ 2014-01-06 21:25                               ` Andi Kleen
  2014-01-06 22:05                                 ` Peter Zijlstra
  2014-01-06 22:15                                 ` Peter Zijlstra
  0 siblings, 2 replies; 163+ messages in thread
From: Andi Kleen @ 2014-01-06 21:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexander Shishkin, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Dec 19, 2013 at 04:30:53PM +0200, Alexander Shishkin wrote:
>> So I'd like to steer away from the ways in which hardware can be broken
>> and talk about a usable interface, to begin with.
>
> Just dump it into the regular one buffer like I outlined.

Just getting back to this. 

Do you realize that PT buffers have to be page aligned. 

So to mix it with a regular perf buffer would need padding every PT
message by 4K, which wastes a lot of memory. The side band messages
are usually only a few bytes (e.g. context switch).

If the sideband is mfrequent it could even take up >half of the buffer,
but mostly only with padding.

Is that what you intended?

perf doesn't support gaps today, so your proposal wouldn't even
seem to fit into the current perf design.

Also of course it requires disabling/enabling PT explicitly for 
every perf message, which is slow. So you add at least 2*WRMSR cost
(thousands of cycles).

> That said; we very much need to have at least two architectures
> implemented for any of this code to move.
>
> But we cannot ignore the hardware trainwreck; we cannot shape our
> interface around something that's utterly broken.
>
> Some hardware is just too broken to support.

I don't think the PT design is broken in any way, it's straight 
forward and simple.

Trying to mix hardware tracing and software tracing in the same buffer
on the other hand ...

Anyways if perf is not flexible enough to support this I suppose
it could switch to a simple device driver, and only run perf with
separate fds for side band purposes. 

Would you prefer that?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-06 21:25                               ` Andi Kleen
@ 2014-01-06 22:05                                 ` Peter Zijlstra
  2014-01-07  0:52                                   ` Andi Kleen
  2014-01-06 22:15                                 ` Peter Zijlstra
  1 sibling, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2014-01-06 22:05 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Shishkin, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian

On Mon, Jan 06, 2014 at 01:25:02PM -0800, Andi Kleen wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> 
> > On Thu, Dec 19, 2013 at 04:30:53PM +0200, Alexander Shishkin wrote:
> >> So I'd like to steer away from the ways in which hardware can be broken
> >> and talk about a usable interface, to begin with.
> >
> > Just dump it into the regular one buffer like I outlined.
> 
> Just getting back to this. 
> 
> Do you realize that PT buffers have to be page aligned. 
> 
> So to mix it with a regular perf buffer would need padding every PT
> message by 4K, which wastes a lot of memory. The side band messages
> are usually only a few bytes (e.g. context switch).
> 
> If the sideband is mfrequent it could even take up >half of the buffer,
> but mostly only with padding.
> 
> Is that what you intended?
> 
> perf doesn't support gaps today, so your proposal wouldn't even
> seem to fit into the current perf design.

That would a really trivial addition.

> Also of course it requires disabling/enabling PT explicitly for 
> every perf message, which is slow. So you add at least 2*WRMSR cost
> (thousands of cycles).

That's just dumb, no flush the entire PT buffer into a few large
records.

> > That said; we very much need to have at least two architectures
> > implemented for any of this code to move.
> >
> > But we cannot ignore the hardware trainwreck; we cannot shape our
> > interface around something that's utterly broken.
> >
> > Some hardware is just too broken to support.
> 
> I don't think the PT design is broken in any way, it's straight 
> forward and simple.

If it were actually implemented like the spec says and not have this
crappy S/G limitation, then maybe.

> Trying to mix hardware tracing and software tracing in the same buffer
> on the other hand ...
> 
> Anyways if perf is not flexible enough to support this I suppose
> it could switch to a simple device driver, and only run perf with
> separate fds for side band purposes. 
> 
> Would you prefer that?

Don't be stupid.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-06 21:25                               ` Andi Kleen
  2014-01-06 22:05                                 ` Peter Zijlstra
@ 2014-01-06 22:15                                 ` Peter Zijlstra
  2014-01-06 23:10                                   ` Andi Kleen
  1 sibling, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2014-01-06 22:15 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Shishkin, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian

> I don't think the PT design is broken in any way, it's straight 
> forward and simple.

Also, do clarify the other points I asked about. Esp. the non
FREEZE_ON_PMI behaviour of the PT PMI is worrying me immensely.

To me it seems very weird that PT is hooked to the same PMI as the
normal PMU, it really should have been a different interrupt.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-06 22:15                                 ` Peter Zijlstra
@ 2014-01-06 23:10                                   ` Andi Kleen
  2014-01-07  8:38                                     ` Peter Zijlstra
  2014-01-07  8:41                                     ` Peter Zijlstra
  0 siblings, 2 replies; 163+ messages in thread
From: Andi Kleen @ 2014-01-06 23:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Alexander Shishkin, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian

Peter Zijlstra <peterz@infradead.org> writes:

Can you please clarify your position on the interleaved buffer?

I still can't see how it is a efficient design.

It's generally true in scather-gather (be it software or hardware) 
that each additional SG entry increases the cost. So to make things
efficient you always want to minimize entries as much as possible.

>> I don't think the PT design is broken in any way, it's straight 
>> forward and simple.
>
> Also, do clarify the other points I asked about. Esp. the non
> FREEZE_ON_PMI behaviour of the PT PMI is worrying me immensely.

The only reason for hardware freeze is when you have a few entries (like
with LBRs) so the interrupt entry code could overwhelm it.

But PT is not small, it's gigantic: even with the smallest buffer you
have many thousands of entries.

So you will get a few branches in the interrupt entry, but it's not a problem
because everything you really wanted to trace is still there.

Eventually the handler disables PT, so there's no risk of racing with
the update or anything like that.

Did I miss anything?

> To me it seems very weird that PT is hooked to the same PMI as the
> normal PMU, it really should have been a different interrupt.

It's in the same STATUS register, so it's cheap to check both.

It shouldn't add any new spurious problems (or at least nothing
worse than what we already have)

I understand that it would be nice to separate other NMI users
from all of PMI, but that would be an orthogonal problem.

Any other issues?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-06 22:05                                 ` Peter Zijlstra
@ 2014-01-07  0:52                                   ` Andi Kleen
  2014-01-07  1:01                                     ` Andi Kleen
  2014-01-07  8:42                                     ` Peter Zijlstra
  0 siblings, 2 replies; 163+ messages in thread
From: Andi Kleen @ 2014-01-07  0:52 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Alexander Shishkin, Ingo Molnar,
	Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian

> > Also of course it requires disabling/enabling PT explicitly for 
> > every perf message, which is slow. So you add at least 2*WRMSR cost
> > (thousands of cycles).
> 
> That's just dumb, no flush the entire PT buffer into a few large
> records.

How would that work?

You mean a separate buffer and then copy or map?

------

Also here are some more problems with interleaving: 

A common PT config is to just run it as a ring buffer in the background
and only take the data out when something happens (sample, crash etc.)

But the side band still needs to be logged and at arbitary times.

So the PT wrapping will happen much more often than the perf wrapping.

If you interleave you may actually end up with lots of small rings 
in a single buffer, unless you stop every time the buffer fills up
(which would add a lot more overhead)

I suppose it could be somehow parsed, but it would very different 
from what perf does today.

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-07  0:52                                   ` Andi Kleen
@ 2014-01-07  1:01                                     ` Andi Kleen
  2014-01-07  8:42                                     ` Peter Zijlstra
  1 sibling, 0 replies; 163+ messages in thread
From: Andi Kleen @ 2014-01-07  1:01 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Peter Zijlstra, Alexander Shishkin, Ingo Molnar,
	Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian

On Tue, Jan 07, 2014 at 01:52:31AM +0100, Andi Kleen wrote:
> > > Also of course it requires disabling/enabling PT explicitly for 
> > > every perf message, which is slow. So you add at least 2*WRMSR cost
> > > (thousands of cycles).
> > 
> > That's just dumb, no flush the entire PT buffer into a few large
> > records.
> 
> How would that work?
> 
> You mean a separate buffer and then copy or map?
> 
> ------
> 
> Also here are some more problems with interleaving: 
> 
> A common PT config is to just run it as a ring buffer in the background
> and only take the data out when something happens (sample, crash etc.)
> 
> But the side band still needs to be logged and at arbitary times.
> 
> So the PT wrapping will happen much more often than the perf wrapping.
> 
> If you interleave you may actually end up with lots of small rings 
> in a single buffer, unless you stop every time the buffer fills up
> (which would add a lot more overhead)
> 
> I suppose it could be somehow parsed, but it would very different 
> from what perf does today.

Thinking about it more it's likely very hard to parse. Dropping instructions is
fine, dropping perf metadata is not (or only as last resort). 

If we miss a MMAP we may never be able to parse that code region.
If we miss a context switch we may be also completely lost until the
next switch.

That means PT couldn't overwrite perf metadata normally.

So you could easily get into situations where the interleaved PT buffer
is between two perf metadata statements and ends up really small, while
large other parts of the buffer are unused.

The only way around it would be likely to move entries around -- to 
garbage collect so to say -- but doing that non-blocking from a NMI will be
challenging.

With the separate buffers we don't have any of these problems.

-Andi

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-06 23:10                                   ` Andi Kleen
@ 2014-01-07  8:38                                     ` Peter Zijlstra
  2014-01-07 15:42                                       ` Andi Kleen
  2014-01-07  8:41                                     ` Peter Zijlstra
  1 sibling, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2014-01-07  8:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Shishkin, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian

On Mon, Jan 06, 2014 at 03:10:28PM -0800, Andi Kleen wrote:
> Peter Zijlstra <peterz@infradead.org> writes:
> > Also, do clarify the other points I asked about. Esp. the non
> > FREEZE_ON_PMI behaviour of the PT PMI is worrying me immensely.
> 
> The only reason for hardware freeze is when you have a few entries (like
> with LBRs) so the interrupt entry code could overwhelm it.
> 
> But PT is not small, it's gigantic: even with the smallest buffer you
> have many thousands of entries.
> 
> So you will get a few branches in the interrupt entry, but it's not a problem
> because everything you really wanted to trace is still there.
> 
> Eventually the handler disables PT, so there's no risk of racing with
> the update or anything like that.
> 
> Did I miss anything?

Yes; go read this:

 lkml.kernel.org/r/20131219125205.GT3694@twins.programming.kicks-ass.net

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-06 23:10                                   ` Andi Kleen
  2014-01-07  8:38                                     ` Peter Zijlstra
@ 2014-01-07  8:41                                     ` Peter Zijlstra
  2014-01-07 15:46                                       ` Andi Kleen
  1 sibling, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2014-01-07  8:41 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Shishkin, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian

On Mon, Jan 06, 2014 at 03:10:28PM -0800, Andi Kleen wrote:
> > To me it seems very weird that PT is hooked to the same PMI as the
> > normal PMU, it really should have been a different interrupt.
> 
> It's in the same STATUS register, so it's cheap to check both.
> 
> It shouldn't add any new spurious problems (or at least nothing
> worse than what we already have)
> 
> I understand that it would be nice to separate other NMI users
> from all of PMI, but that would be an orthogonal problem.
> 
> Any other issues?

Aside from the fact that PT and the PMU are otherwise unrelated, so it
being in the global status register is weird too.

Also, the PT interrupt doesn't actually need to be an NMI; when the
proposed S/G implementation would actually work as stated there can be
plenty room left when we trigger the interrupt.

But again, see the other email I referenced; the PMU triggering a PMI
while we're in one PT triggered is my biggest concern; esp. since both
have different FREEZE semantics.


^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-07  0:52                                   ` Andi Kleen
  2014-01-07  1:01                                     ` Andi Kleen
@ 2014-01-07  8:42                                     ` Peter Zijlstra
  2014-01-07 15:48                                       ` Andi Kleen
  1 sibling, 1 reply; 163+ messages in thread
From: Peter Zijlstra @ 2014-01-07  8:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Shishkin, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian

On Tue, Jan 07, 2014 at 01:52:31AM +0100, Andi Kleen wrote:
> > > Also of course it requires disabling/enabling PT explicitly for 
> > > every perf message, which is slow. So you add at least 2*WRMSR cost
> > > (thousands of cycles).
> > 
> > That's just dumb, no flush the entire PT buffer into a few large
> > records.
> 
> How would that work?
> 
> You mean a separate buffer and then copy or map?
> 
> ------
> 
> Also here are some more problems with interleaving: 
> 
> A common PT config is to just run it as a ring buffer in the background
> and only take the data out when something happens (sample, crash etc.)
> 
> But the side band still needs to be logged and at arbitary times.
> 
> So the PT wrapping will happen much more often than the perf wrapping.

So create two events, one for the PT stuff and one to track the
side-band stuff. We have a NOP event for just this purpose.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-07  8:38                                     ` Peter Zijlstra
@ 2014-01-07 15:42                                       ` Andi Kleen
  2014-01-07 20:51                                         ` Peter Zijlstra
  0 siblings, 1 reply; 163+ messages in thread
From: Andi Kleen @ 2014-01-07 15:42 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Alexander Shishkin, Ingo Molnar,
	Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian

> Yes; go read this:
> 
>  lkml.kernel.org/r/20131219125205.GT3694@twins.programming.kicks-ass.net

Hmm, but AFAIK we're not using freeze counters on PMI today.
We just rely on the explicit disabling in the counters through the global
ctrl.

So it should be the same as with any other PMI which also does not
automatically freeze. Not true?

Or do you mean interaction with the LBRs here?
(currently LBRs and PT are mutually exclusive)

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-07  8:41                                     ` Peter Zijlstra
@ 2014-01-07 15:46                                       ` Andi Kleen
  0 siblings, 0 replies; 163+ messages in thread
From: Andi Kleen @ 2014-01-07 15:46 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Alexander Shishkin, Ingo Molnar,
	Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian

> Also, the PT interrupt doesn't actually need to be an NMI; when the
> proposed S/G implementation would actually work as stated there can be
> plenty room left when we trigger the interrupt.

That's true.

-andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-07  8:42                                     ` Peter Zijlstra
@ 2014-01-07 15:48                                       ` Andi Kleen
  2014-01-08 11:53                                         ` Alexander Shishkin
  0 siblings, 1 reply; 163+ messages in thread
From: Andi Kleen @ 2014-01-07 15:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Alexander Shishkin, Ingo Molnar,
	Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian

> So create two events, one for the PT stuff and one to track the
> side-band stuff. We have a NOP event for just this purpose.

Ok I guess that could work.

Essentially replace the magic mmap offset with a second fd.

Alex, what do you think?

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-07 15:42                                       ` Andi Kleen
@ 2014-01-07 20:51                                         ` Peter Zijlstra
  2014-01-07 23:34                                           ` Andi Kleen
       [not found]                                           ` <20140107212322.GE20765@two.firstfloor.org>
  0 siblings, 2 replies; 163+ messages in thread
From: Peter Zijlstra @ 2014-01-07 20:51 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Shishkin, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian

On Tue, Jan 07, 2014 at 04:42:55PM +0100, Andi Kleen wrote:
> > Yes; go read this:
> > 
> >  lkml.kernel.org/r/20131219125205.GT3694@twins.programming.kicks-ass.net
> 
> Hmm, but AFAIK we're not using freeze counters on PMI today.
> We just rely on the explicit disabling in the counters through the global
> ctrl.
> 
> So it should be the same as with any other PMI which also does not
> automatically freeze. Not true?

Regardless whether its used or not; I'd very much like that answered.

> Or do you mean interaction with the LBRs here?
> (currently LBRs and PT are mutually exclusive)

Yes we very much rely on the FREEZE bits for LBR. PT and LBR being
mutually exclusive wasn't documented (or I missed it) and completely
blows.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-07 20:51                                         ` Peter Zijlstra
@ 2014-01-07 23:34                                           ` Andi Kleen
       [not found]                                           ` <20140107212322.GE20765@two.firstfloor.org>
  1 sibling, 0 replies; 163+ messages in thread
From: Andi Kleen @ 2014-01-07 23:34 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, Alexander Shishkin, Ingo Molnar,
	Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, David Ahern,
	Frederic Weisbecker, Jiri Olsa, Mike Galbraith, Namhyung Kim,
	Paul Mackerras, Stephane Eranian

On Tue, Jan 07, 2014 at 09:51:45PM +0100, Peter Zijlstra wrote:
> On Tue, Jan 07, 2014 at 04:42:55PM +0100, Andi Kleen wrote:
> > > Yes; go read this:
> > > 
> > >  lkml.kernel.org/r/20131219125205.GT3694@twins.programming.kicks-ass.net
> > 
> > Hmm, but AFAIK we're not using freeze counters on PMI today.
> > We just rely on the explicit disabling in the counters through the global
> > ctrl.
> > 
> > So it should be the same as with any other PMI which also does not
> > automatically freeze. Not true?
> 
> Regardless whether its used or not; I'd very much like that answered.

The freeze always starts with the counter overflow, independent if the interrupt
is blocked or not. So everything should be ok.

-Andi

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
       [not found]                                             ` <20140108082840.GH2480@laptop.programming.kicks-ass.net>
@ 2014-01-08  8:31                                               ` Peter Zijlstra
  0 siblings, 0 replies; 163+ messages in thread
From: Peter Zijlstra @ 2014-01-08  8:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Alexander Shishkin, Ingo Molnar, Arnaldo Carvalho de Melo,
	Ingo Molnar, linux-kernel, David Ahern, Frederic Weisbecker,
	Jiri Olsa, Mike Galbraith, Namhyung Kim, Paul Mackerras,
	Stephane Eranian

restoring the list.. I really should drop all emails you send off list
into /dev/null.

On Wed, Jan 08, 2014 at 09:28:40AM +0100, Peter Zijlstra wrote:
> On Tue, Jan 07, 2014 at 10:23:22PM +0100, Andi Kleen wrote:
> > > Yes we very much rely on the FREEZE bits for LBR. PT and LBR being
> > > mutually exclusive wasn't documented (or I missed it) and completely
> > > blows.
> > 
> > Can you describe why it is a problem? I had considered it only a minor
> > inconvenience, for many things you would use LBRs for PT is far better.
> 
> Because is someone writes a GCC tool using perf-LBR support for some
> basic block analysis, and someone else writes another tool for PT, then
> the first tool magically stops working when the PT tool is started.
> 
> We cannot refuse to create perf-LBR events, because at that time there
> might not be a PT user -- and even if there was one, it might go away.
> 
> But as long as there's a PT user around, the LBR events will not be able
> to be scheduled and will simply starve, for no apparent reason.
> 
> Complete and utterly miserable position.
> 
> And it makes sense to write LBR tools because they cover a much greater
> spread of hardware.

^ permalink raw reply	[flat|nested] 163+ messages in thread

* Re: [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units
  2014-01-07 15:48                                       ` Andi Kleen
@ 2014-01-08 11:53                                         ` Alexander Shishkin
  0 siblings, 0 replies; 163+ messages in thread
From: Alexander Shishkin @ 2014-01-08 11:53 UTC (permalink / raw)
  To: Andi Kleen, Peter Zijlstra
  Cc: Andi Kleen, Ingo Molnar, Arnaldo Carvalho de Melo, Ingo Molnar,
	linux-kernel, David Ahern, Frederic Weisbecker, Jiri Olsa,
	Mike Galbraith, Namhyung Kim, Paul Mackerras, Stephane Eranian

Andi Kleen <andi@firstfloor.org> writes:

>> So create two events, one for the PT stuff and one to track the
>> side-band stuff. We have a NOP event for just this purpose.
>
> Ok I guess that could work.
>
> Essentially replace the magic mmap offset with a second fd.
>
> Alex, what do you think?

Yes, that's what I suggested some time ago in [1]. A second buffer
(through another fd or otherwise) is an essential thing from my point of
view.

[1] http://marc.info/?l=linux-kernel&m=138737306725663

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 163+ messages in thread

end of thread, other threads:[~2014-01-08 11:53 UTC | newest]

Thread overview: 163+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-12-11 12:36 [PATCH v0 00/71] perf: Add support for Intel Processor Trace Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 01/71] perf: Disable all pmus on unthrottling and rescheduling Alexander Shishkin
2013-12-11 20:53   ` Andi Kleen
2013-12-13 18:06   ` Peter Zijlstra
2013-12-16 11:00     ` Alexander Shishkin
2013-12-16 11:07       ` Peter Zijlstra
2013-12-11 12:36 ` [PATCH v0 02/71] x86: Add Intel Processor Trace (INTEL_PT) cpu feature detection Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 03/71] perf: Abstract ring_buffer backing store operations Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 04/71] itrace: Infrastructure for instruction flow tracing units Alexander Shishkin
2013-12-17 16:11   ` Peter Zijlstra
2013-12-18 13:23     ` Alexander Shishkin
2013-12-18 13:34       ` Peter Zijlstra
2013-12-18 14:01         ` Alexander Shishkin
2013-12-18 14:11           ` Peter Zijlstra
2013-12-18 14:22             ` Alexander Shishkin
2013-12-18 15:09               ` Peter Zijlstra
2013-12-19  7:53                 ` Alexander Shishkin
2013-12-19 10:26                   ` Peter Zijlstra
2013-12-19 11:14                     ` Alexander Shishkin
2013-12-19 11:25                       ` Peter Zijlstra
2013-12-19 11:57                         ` Alexander Shishkin
2013-12-19 10:31                   ` Peter Zijlstra
2013-12-19 11:17                     ` Alexander Shishkin
2013-12-19 11:28                       ` Peter Zijlstra
2013-12-19 11:57                         ` Peter Zijlstra
2013-12-19 12:52                           ` Peter Zijlstra
2013-12-19 12:57                           ` Peter Zijlstra
2013-12-19 14:54                             ` Alexander Shishkin
2013-12-19 15:14                               ` Peter Zijlstra
2013-12-19 11:58                         ` Alexander Shishkin
2013-12-19 12:39                         ` Ingo Molnar
2013-12-19 14:30                           ` Alexander Shishkin
2013-12-19 14:49                             ` Frederic Weisbecker
2013-12-19 15:02                               ` Peter Zijlstra
2013-12-19 15:10                             ` Peter Zijlstra
2014-01-06 21:25                               ` Andi Kleen
2014-01-06 22:05                                 ` Peter Zijlstra
2014-01-07  0:52                                   ` Andi Kleen
2014-01-07  1:01                                     ` Andi Kleen
2014-01-07  8:42                                     ` Peter Zijlstra
2014-01-07 15:48                                       ` Andi Kleen
2014-01-08 11:53                                         ` Alexander Shishkin
2014-01-06 22:15                                 ` Peter Zijlstra
2014-01-06 23:10                                   ` Andi Kleen
2014-01-07  8:38                                     ` Peter Zijlstra
2014-01-07 15:42                                       ` Andi Kleen
2014-01-07 20:51                                         ` Peter Zijlstra
2014-01-07 23:34                                           ` Andi Kleen
     [not found]                                           ` <20140107212322.GE20765@two.firstfloor.org>
     [not found]                                             ` <20140108082840.GH2480@laptop.programming.kicks-ass.net>
2014-01-08  8:31                                               ` Peter Zijlstra
2014-01-07  8:41                                     ` Peter Zijlstra
2014-01-07 15:46                                       ` Andi Kleen
2013-12-11 12:36 ` [PATCH v0 05/71] x86: perf: Intel PT PMU driver Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 06/71] perf: Allow set-output for task contexts of different types Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 07/71] perf tools: Record whether a dso is 64-bit Alexander Shishkin
2013-12-11 19:26   ` David Ahern
2013-12-11 19:54     ` Arnaldo Carvalho de Melo
2013-12-12 12:07       ` Adrian Hunter
2013-12-12 12:05     ` Adrian Hunter
2013-12-12 16:45       ` David Ahern
2013-12-12 19:05         ` Arnaldo Carvalho de Melo
2013-12-12 19:16           ` David Ahern
2013-12-12 20:01             ` Arnaldo Carvalho de Melo
2013-12-16  3:16   ` David Ahern
2013-12-16  7:55     ` Adrian Hunter
2013-12-11 12:36 ` [PATCH v0 08/71] perf tools: Let a user specify a PMU event without any config terms Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 09/71] perf tools: Let default config be defined for a PMU Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 10/71] perf tools: Add perf_pmu__scan_file() Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 11/71] perf tools: Add perf_event_paranoid() Alexander Shishkin
2013-12-16 15:26   ` [tip:perf/core] " tip-bot for Adrian Hunter
2013-12-11 12:36 ` [PATCH v0 12/71] perf tools: Add dsos__hit_all() Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 13/71] perf tools: Add machine__get_thread_pid() Alexander Shishkin
2013-12-11 19:28   ` David Ahern
2013-12-11 21:18     ` Andi Kleen
2013-12-11 21:49       ` David Ahern
2013-12-12 13:56     ` Adrian Hunter
2013-12-11 12:36 ` [PATCH v0 14/71] perf tools: Add cpu to struct thread Alexander Shishkin
2013-12-11 14:19   ` Arnaldo Carvalho de Melo
2013-12-12 14:14     ` Adrian Hunter
2013-12-11 19:30   ` David Ahern
2013-12-11 19:55     ` Arnaldo Carvalho de Melo
2013-12-11 19:57       ` David Ahern
2013-12-11 12:36 ` [PATCH v0 15/71] perf tools: Add ability to record the current tid for each cpu Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 16/71] perf tools: Allow header->data_offset to be predetermined Alexander Shishkin
2013-12-16 15:26   ` [tip:perf/core] perf header: Allow header-> data_offset " tip-bot for Adrian Hunter
2013-12-11 12:36 ` [PATCH v0 17/71] perf tools: Add perf_evlist__can_select_event() Alexander Shishkin
2013-12-16 15:27   ` [tip:perf/core] perf evlist: Add can_select_event() method tip-bot for Adrian Hunter
2013-12-11 12:36 ` [PATCH v0 18/71] perf session: Flag if the event stream is entirely in memory Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 19/71] perf evlist: Pass mmap parameters in a struct Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 20/71] perf tools: Move mem_bswap32/64 to util.c Alexander Shishkin
2013-12-16 15:27   ` [tip:perf/core] " tip-bot for Adrian Hunter
2013-12-11 12:36 ` [PATCH v0 21/71] perf tools: Add feature test for __sync_val_compare_and_swap Alexander Shishkin
2013-12-11 19:24   ` Arnaldo Carvalho de Melo
2013-12-11 20:07     ` Andi Kleen
2013-12-12 13:45       ` Adrian Hunter
2013-12-12 13:42     ` Adrian Hunter
2013-12-11 12:36 ` [PATCH v0 22/71] perf tools: Add option macro OPT_CALLBACK_OPTARG Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 23/71] perf evlist: Add perf_evlist__to_front() Alexander Shishkin
2013-12-11 19:38   ` Arnaldo Carvalho de Melo
2013-12-12 14:09     ` Adrian Hunter
2013-12-16 15:27   ` [tip:perf/core] " tip-bot for Adrian Hunter
2013-12-11 12:36 ` [PATCH v0 24/71] perf evlist: Add perf_evlist__set_tracking_event() Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 25/71] perf evsel: Add 'no_aux_samples' option Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 26/71] perf evsel: Add 'immediate' option Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 27/71] perf evlist: Add 'system_wide' option Alexander Shishkin
2013-12-11 19:37   ` David Ahern
2013-12-12 12:22     ` Adrian Hunter
2013-12-11 12:36 ` [PATCH v0 28/71] perf tools: Add id index Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 29/71] perf pmu: Let pmu's with no events show up on perf list Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 30/71] perf session: Add ability to skip 4GiB or more Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 31/71] perf session: Add perf_session__deliver_synth_event() Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 32/71] perf tools: Allow TSC conversion on any arch Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 33/71] perf tools: Move rdtsc() function Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 34/71] perf evlist: Add perf_evlist__enable_event_idx() Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 35/71] perf tools: Add itrace members of struct perf_event_attr Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 36/71] perf tools: Add support for parsing pmu itrace_config Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 37/71] perf tools: Add support for PERF_RECORD_ITRACE_LOST Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 38/71] perf tools: Add itrace sample parsing Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 39/71] perf header: Add Instruction Tracing feature Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 40/71] perf evlist: Add ability to mmap itrace buffers Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 41/71] perf tools: Add user events for Instruction Tracing Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 42/71] perf tools: Add support for Instruction Trace recording Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 43/71] perf record: Add basic Instruction Tracing support Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 44/71] perf record: Extend -m option for Instruction Tracing mmap pages Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 45/71] perf tools: Add a user event for Instruction Tracing errors Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 46/71] perf session: Add Instruction Tracing hooks Alexander Shishkin
2013-12-11 12:36 ` [PATCH v0 47/71] perf session: Add Instruction Tracing options Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 48/71] perf session: Make perf_event__itrace_swap() non-static Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 49/71] perf itrace: Add helpers for Instruction Tracing errors Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 50/71] perf itrace: Add helpers for queuing Instruction Tracing data Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 51/71] perf itrace: Add a heap for sorting Instruction Tracing queues Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 52/71] perf itrace: Add processing for Instruction Tracing events Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 53/71] perf script: Add Instruction Tracing support Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 54/71] perf script: Always allow fields 'addr' and 'cpu' for itrace Alexander Shishkin
2013-12-11 19:41   ` David Ahern
2013-12-12 12:35     ` Adrian Hunter
2013-12-11 12:37 ` [PATCH v0 55/71] perf report: Add Instruction Tracing support Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 56/71] perf tools: Add Instruction Trace sampling support Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 57/71] perf record: " Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 58/71] perf tools: Add Instruction Tracing Snapshot Mode Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 59/71] perf record: Add Instruction Tracing Snapshot Mode support Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 60/71] perf inject: Re-pipe Instruction Tracing events Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 61/71] perf inject: Add Instruction Tracing support Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 62/71] perf inject: Cut Instruction Tracing samples Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 63/71] perf tools: Add Instruction Tracing index Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 64/71] perf tools: Hit all build ids when Instruction Tracing Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 65/71] perf itrace: Add Intel PT as an Instruction Tracing type Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 66/71] perf tools: Add Intel PT packet decoder Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 67/71] perf tools: Add Intel PT instruction decoder Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 68/71] perf tools: Add Intel PT log Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 69/71] perf tools: Add Intel PT decoder Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 70/71] perf tools: Add Intel PT support Alexander Shishkin
2013-12-11 12:37 ` [PATCH v0 71/71] perf tools: Take Intel PT into use Alexander Shishkin
2013-12-11 13:04 ` [PATCH v0 00/71] perf: Add support for Intel Processor Trace Ingo Molnar
2013-12-11 13:14   ` Alexander Shishkin
2013-12-11 13:47     ` Ingo Molnar
2013-12-16 11:08       ` Alexander Shishkin
2013-12-16 14:37         ` Ingo Molnar
2013-12-16 15:18           ` Andi Kleen
2013-12-16 15:30             ` Frederic Weisbecker
2013-12-16 15:45               ` Andi Kleen
2013-12-16 15:57                 ` Frederic Weisbecker
2013-12-18  4:03                 ` Namhyung Kim
2013-12-11 13:52 ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.