linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/7] perf, intel: Add support for PEBS output to Intel PT
@ 2019-07-04 16:00 Alexander Shishkin
  2019-07-04 16:00 ` [PATCH v1 1/7] perf: Allow normal events to be sources of AUX data Alexander Shishkin
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Alexander Shishkin @ 2019-07-04 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, kan.liang,
	Alexander Shishkin

Hi Peter,

Second attempt at the PEBS-via-PT feature. The previous one is here [1].
This one depends on the legitimacy including exclusive events (PT, in this
case) in groups, as I posted earlier [2]. Although, it will work without
that patch, because this kind of grouping wasn't really disallowed.

The first problem with the previous patchset was bad handling of
conflicting PEBS events (PEBS->PT and PEBS->DS). This version fails to
schedule the first conflicting event, thus allowing conflicting events
to be rotated. This is done in an x86 perf patch 2/7.

The second problem was lack of guarantees that the PT event is scheduled
together with the PEBS->PT event and that it's the correct PT event. This
is addressed via grouping. The requirement is that the PEBS event is added
to a group where there is a PT event, then a link between them in created.
If that group later is broken down, the PEBS event will fail to schedule.
This is done in perf core patch 1/7.

The rest of the series (2/7..7/7) are the remaining tooling patches needed
to enable this are included.

The PEBS feature: output to Intel PT stream instead of the DS area. It's
theoretically useful in virtualized environments, where DS area can't be
used. It's also good for those who are interested in instruction trace for
context of the PEBS events. As PEBS goes, it can provide LBR context with
all the branch-related information that PT doesn't provide at the moment.

PEBS records are packetized in the PT stream, so instead of extracting
them in the PMI, we leave it to the perf tool, because real time PT
decoding is not practical.

[1] https://marc.info/?l=linux-kernel&m=155679423430002
[2] https://marc.info/?l=linux-kernel&m=156197929026646

Adrian Hunter (5):
  perf tools: Add aux_source attribute flag
  perf tools: Add itrace option 'o' to synthesize aux-source events
  perf intel-pt: Process options for PEBS event synthesis
  perf tools: Add aux-source config term
  perf intel-pt: Add brief documentation for PEBS via Intel PT

Alexander Shishkin (2):
  perf: Allow normal events to be sources of AUX data
  perf/x86/intel: Support PEBS output to PT

 arch/x86/events/core.c                   | 45 ++++++++++++
 arch/x86/events/intel/core.c             | 20 ++++++
 arch/x86/events/intel/ds.c               | 61 +++++++++++++++-
 arch/x86/events/perf_event.h             | 11 +++
 arch/x86/include/asm/msr-index.h         |  4 ++
 include/linux/perf_event.h               | 14 ++++
 include/uapi/linux/perf_event.h          |  3 +-
 kernel/events/core.c                     | 92 ++++++++++++++++++++++++
 tools/include/uapi/linux/perf_event.h    |  3 +-
 tools/perf/Documentation/intel-pt.txt    | 15 ++++
 tools/perf/Documentation/itrace.txt      |  2 +
 tools/perf/Documentation/perf-record.txt |  2 +
 tools/perf/arch/x86/util/intel-pt.c      | 23 ++++++
 tools/perf/util/auxtrace.c               |  4 ++
 tools/perf/util/auxtrace.h               |  3 +
 tools/perf/util/evsel.c                  |  4 ++
 tools/perf/util/evsel.h                  |  2 +
 tools/perf/util/intel-pt.c               | 18 +++++
 tools/perf/util/parse-events.c           |  8 +++
 tools/perf/util/parse-events.h           |  1 +
 tools/perf/util/parse-events.l           |  1 +
 21 files changed, 333 insertions(+), 3 deletions(-)

-- 
2.20.1


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v1 1/7] perf: Allow normal events to be sources of AUX data
  2019-07-04 16:00 [PATCH v1 0/7] perf, intel: Add support for PEBS output to Intel PT Alexander Shishkin
@ 2019-07-04 16:00 ` Alexander Shishkin
  2019-07-29 11:31   ` Peter Zijlstra
  2019-07-04 16:00 ` [PATCH v1 2/7] perf/x86/intel: Support PEBS output to PT Alexander Shishkin
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Alexander Shishkin @ 2019-07-04 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, kan.liang,
	Alexander Shishkin

In some cases, ordinary (non-AUX) events can generate data for AUX events.
For example, PEBS events can come out as records in the Intel PT stream
instead of their usual DS records, if configured to do so.

One requirement for such events is to consistently schedule together, to
ensure that the data from the "AUX source" events isn't lost while their
corresponding AUX event is not scheduled. We use grouping to provide this
guarantee: an "AUX source" event can be added to a group with an AUX event,
provided that the former supports writing to the latter.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 include/linux/perf_event.h      | 14 +++++
 include/uapi/linux/perf_event.h |  3 +-
 kernel/events/core.c            | 92 +++++++++++++++++++++++++++++++++
 3 files changed, 108 insertions(+), 1 deletion(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 201cc93cec32..7e5920b94290 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -245,6 +245,7 @@ struct perf_event;
 #define PERF_PMU_CAP_ITRACE			0x20
 #define PERF_PMU_CAP_HETEROGENEOUS_CPUS		0x40
 #define PERF_PMU_CAP_NO_EXCLUDE			0x80
+#define PERF_PMU_CAP_AUX_SOURCE			0x100
 
 /**
  * struct pmu - generic performance monitoring unit
@@ -445,6 +446,16 @@ struct pmu {
 	void (*addr_filters_sync)	(struct perf_event *event);
 					/* optional */
 
+	/*
+	 * Check if event can be used for aux_source purposes for
+	 * events of this PMU.
+	 *
+	 * Runs from perf_event_open(). Should return 0 for "no match"
+	 * or non-zero for "match".
+	 */
+	int (*aux_source_match)		(struct perf_event *event);
+					/* optional */
+
 	/*
 	 * Filter events for PMU-specific reasons.
 	 */
@@ -680,6 +691,9 @@ struct perf_event {
 	struct perf_addr_filter_range	*addr_filter_ranges;
 	unsigned long			addr_filters_gen;
 
+	/* for aux_source events */
+	struct perf_event		*aux_event;
+
 	void (*destroy)(struct perf_event *);
 	struct rcu_head			rcu_head;
 
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 7198ddd0c6b1..213cae95e713 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -374,7 +374,8 @@ struct perf_event_attr {
 				namespaces     :  1, /* include namespaces data */
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
-				__reserved_1   : 33;
+				aux_source     :  1, /* generate AUX records instead of events */
+				__reserved_1   : 32;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8cfb721bb284..fc586da37067 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1887,6 +1887,57 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
 	ctx->generation++;
 }
 
+static int
+perf_aux_source_match(struct perf_event *event, struct perf_event *aux_event)
+{
+	if (!has_aux(aux_event))
+		return 0;
+
+	if (!event->pmu->aux_source_match)
+		return 0;
+
+	return event->pmu->aux_source_match(aux_event);
+}
+
+static void put_event(struct perf_event *event);
+static void event_sched_out(struct perf_event *event,
+			    struct perf_cpu_context *cpuctx,
+			    struct perf_event_context *ctx);
+
+static void perf_put_aux_event(struct perf_event *event)
+{
+	struct perf_event_context *ctx = event->ctx;
+	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
+	struct perf_event *iter = NULL;
+
+	/*
+	 * If event uses aux_event tear down the link
+	 */
+	if (event->aux_event) {
+		put_event(event->aux_event);
+		event->aux_event = NULL;
+		return;
+	}
+
+	/*
+	 * If the event is an aux_event, tear down all links to
+	 * it from other events.
+	 */
+	for_each_sibling_event(iter, event->group_leader) {
+		if (iter->aux_event != event)
+			continue;
+
+		iter->aux_event = NULL;
+		put_event(event);
+
+		/*
+		 * If it's ACTIVE, schedule it out. It won't schedule
+		 * again because !aux_event.
+		 */
+		event_sched_out(iter, cpuctx, ctx);
+	}
+}
+
 static void perf_group_detach(struct perf_event *event)
 {
 	struct perf_event *sibling, *tmp;
@@ -1902,6 +1953,8 @@ static void perf_group_detach(struct perf_event *event)
 
 	event->attach_state &= ~PERF_ATTACH_GROUP;
 
+	perf_put_aux_event(event);
+
 	/*
 	 * If this is a sibling, remove it from its group.
 	 */
@@ -10396,6 +10449,12 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 		goto err_ns;
 	}
 
+	if (event->attr.aux_source &&
+	    !(pmu->capabilities & PERF_PMU_CAP_AUX_SOURCE)) {
+		err = -EOPNOTSUPP;
+		goto err_pmu;
+	}
+
 	err = exclusive_event_init(event);
 	if (err)
 		goto err_pmu;
@@ -11052,6 +11111,39 @@ SYSCALL_DEFINE5(perf_event_open,
 		}
 	}
 
+	if (event->attr.aux_source) {
+		struct perf_event *aux_event = group_leader;
+
+		/*
+		 * One of the events in the group must be an aux event
+		 * if we want to be an aux_source. This way, the aux event
+		 * will precede its aux_source events in the group, and
+		 * therefore will always schedule first.
+		 */
+		err = -EINVAL;
+		if (!aux_event)
+			goto err_locked;
+
+		if (perf_aux_source_match(event, aux_event))
+			goto found_aux;
+
+		for_each_sibling_event(aux_event, group_leader) {
+			if (perf_aux_source_match(event, aux_event))
+				goto found_aux;
+		}
+
+		goto err_locked;
+
+found_aux:
+		/*
+		 * Link aux_sources to their aux event; this is undone in
+		 * perf_group_detach(). When the group in torn down, the
+		 * aux_source events loose their link to the aux_event and
+		 * can't schedule any more.
+		 */
+		if (atomic_long_inc_not_zero(&aux_event->refcount))
+			event->aux_event = aux_event;
+	}
 
 	/*
 	 * Must be under the same ctx::mutex as perf_install_in_context(),
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v1 2/7] perf/x86/intel: Support PEBS output to PT
  2019-07-04 16:00 [PATCH v1 0/7] perf, intel: Add support for PEBS output to Intel PT Alexander Shishkin
  2019-07-04 16:00 ` [PATCH v1 1/7] perf: Allow normal events to be sources of AUX data Alexander Shishkin
@ 2019-07-04 16:00 ` Alexander Shishkin
  2019-07-29 13:37   ` Peter Zijlstra
  2019-07-04 16:00 ` [PATCH v1 3/7] perf tools: Add aux_source attribute flag Alexander Shishkin
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 12+ messages in thread
From: Alexander Shishkin @ 2019-07-04 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, kan.liang,
	Alexander Shishkin

If PEBS declares ability to output its data to Intel PT stream, use the
aux_source attribute bit to enable PEBS data output to PT. This requires
a PT event to be present and scheduled in the same context. Unlike the
DS area, the kernel does not extract PEBS records from the PT stream to
generate corresponding records in the perf stream, because that would
require real time in-kernel PT decoding, which is not feasible. The PMI,
however, can still be used.

The output setting is per-CPU, so all PEBS events must be either writing
to PT or to the DS area, therefore, in case of conflict, the conflicting
event will fail to schedule, allowing the rotation logic to alternate
between the PEBS->PT and PEBS->DS events.

Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 arch/x86/events/core.c           | 45 +++++++++++++++++++++++
 arch/x86/events/intel/core.c     | 20 +++++++++++
 arch/x86/events/intel/ds.c       | 61 +++++++++++++++++++++++++++++++-
 arch/x86/events/perf_event.h     | 11 ++++++
 arch/x86/include/asm/msr-index.h |  4 +++
 5 files changed, 140 insertions(+), 1 deletion(-)

diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
index f0e4804515d8..a11924e20df3 100644
--- a/arch/x86/events/core.c
+++ b/arch/x86/events/core.c
@@ -869,6 +869,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 	unsigned long used_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
 	struct perf_event *e;
 	int n0, i, wmin, wmax, unsched = 0;
+	int n_pebs_ds, n_pebs_pt;
 	struct hw_perf_event *hwc;
 
 	bitmap_zero(used_mask, X86_PMC_IDX_MAX);
@@ -884,6 +885,37 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
 	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
 		n0 -= cpuc->n_txn;
 
+	/*
+	 * Check for PEBS->DS and PEBS->PT events.
+	 * 1) They can't be scheduled simultaneously;
+	 * 2) PEBS->PT events depend on a corresponding PT event
+	 */
+	for (i = 0, n_pebs_ds = 0, n_pebs_pt = 0; i < n; i++) {
+		e = cpuc->event_list[i];
+
+		if (e->attr.precise_ip) {
+			if (e->hw.flags & PERF_X86_EVENT_PEBS_VIA_PT) {
+				/*
+				 * New PEBS->PT event, check ->aux_event; if
+				 * it's NULL, the group has been broken down
+				 * and this event can't schedule any more.
+				 */
+				if (!cpuc->is_fake && i >= n0 && !e->aux_event)
+					return -EINVAL;
+				n_pebs_pt++;
+			} else {
+				n_pebs_ds++;
+			}
+		}
+	}
+
+	/*
+	 * Fail to add conflicting PEBS events. If this happens, rotation
+	 * takes care that all events get to run.
+	 */
+	if (n_pebs_ds && n_pebs_pt)
+		return -EINVAL;
+
 	if (x86_pmu.start_scheduling)
 		x86_pmu.start_scheduling(cpuc);
 
@@ -2241,6 +2273,17 @@ static int x86_pmu_check_period(struct perf_event *event, u64 value)
 	return 0;
 }
 
+static int x86_pmu_aux_source_match(struct perf_event *event)
+{
+	if (!(pmu.capabilities & PERF_PMU_CAP_AUX_SOURCE))
+		return 0;
+
+	if (x86_pmu.aux_source_match)
+		return x86_pmu.aux_source_match(event);
+
+	return 0;
+}
+
 static struct pmu pmu = {
 	.pmu_enable		= x86_pmu_enable,
 	.pmu_disable		= x86_pmu_disable,
@@ -2266,6 +2309,8 @@ static struct pmu pmu = {
 	.sched_task		= x86_pmu_sched_task,
 	.task_ctx_size          = sizeof(struct x86_perf_task_context),
 	.check_period		= x86_pmu_check_period,
+
+	.aux_source_match	= x86_pmu_aux_source_match,
 };
 
 void arch_perf_update_userpage(struct perf_event *event,
diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
index bda450ff51ee..6955d4f7e7aa 100644
--- a/arch/x86/events/intel/core.c
+++ b/arch/x86/events/intel/core.c
@@ -3301,6 +3301,13 @@ static int intel_pmu_hw_config(struct perf_event *event)
 		}
 	}
 
+	if (event->attr.aux_source) {
+		if (!event->attr.precise_ip)
+			return -EINVAL;
+
+		event->hw.flags |= PERF_X86_EVENT_PEBS_VIA_PT;
+	}
+
 	if (event->attr.type != PERF_TYPE_RAW)
 		return 0;
 
@@ -3814,6 +3821,17 @@ static int intel_pmu_check_period(struct perf_event *event, u64 value)
 	return intel_pmu_has_bts_period(event, value) ? -EINVAL : 0;
 }
 
+static int intel_pmu_aux_source_match(struct perf_event *event)
+{
+	if (!x86_pmu.intel_cap.pebs_output_pt_available)
+		return 0;
+
+	if (event->pmu->name && !strcmp(event->pmu->name, "intel_pt"))
+		return 1;
+
+	return 0;
+}
+
 PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
 
 PMU_FORMAT_ATTR(ldlat, "config1:0-15");
@@ -3938,6 +3956,8 @@ static __initconst const struct x86_pmu intel_pmu = {
 	.sched_task		= intel_pmu_sched_task,
 
 	.check_period		= intel_pmu_check_period,
+
+	.aux_source_match	= intel_pmu_aux_source_match,
 };
 
 static __init void intel_clovertown_quirk(void)
diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
index 7acc526b4ad2..9c59462f38a3 100644
--- a/arch/x86/events/intel/ds.c
+++ b/arch/x86/events/intel/ds.c
@@ -902,6 +902,9 @@ struct event_constraint *intel_pebs_constraints(struct perf_event *event)
  */
 static inline bool pebs_needs_sched_cb(struct cpu_hw_events *cpuc)
 {
+	if (cpuc->n_pebs == cpuc->n_pebs_via_pt)
+		return false;
+
 	return cpuc->n_pebs && (cpuc->n_pebs == cpuc->n_large_pebs);
 }
 
@@ -919,6 +922,9 @@ static inline void pebs_update_threshold(struct cpu_hw_events *cpuc)
 	u64 threshold;
 	int reserved;
 
+	if (cpuc->n_pebs_via_pt)
+		return;
+
 	if (x86_pmu.flags & PMU_FL_PEBS_ALL)
 		reserved = x86_pmu.max_pebs_events + x86_pmu.num_counters_fixed;
 	else
@@ -1059,10 +1065,50 @@ void intel_pmu_pebs_add(struct perf_event *event)
 	cpuc->n_pebs++;
 	if (hwc->flags & PERF_X86_EVENT_LARGE_PEBS)
 		cpuc->n_large_pebs++;
+	if (hwc->flags & PERF_X86_EVENT_PEBS_VIA_PT)
+		cpuc->n_pebs_via_pt++;
 
 	pebs_update_state(needed_cb, cpuc, event, true);
 }
 
+static void intel_pmu_pebs_via_pt_disable(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+
+	if (!(event->hw.flags & PERF_X86_EVENT_PEBS_VIA_PT))
+		return;
+
+	if (!(cpuc->pebs_enabled & ~PEBS_VIA_PT_MASK))
+		cpuc->pebs_enabled &= ~PEBS_VIA_PT_MASK;
+}
+
+static void intel_pmu_pebs_via_pt_enable(struct perf_event *event)
+{
+	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
+	struct hw_perf_event *hwc = &event->hw;
+	struct debug_store *ds = cpuc->ds;
+
+	if (!(event->hw.flags & PERF_X86_EVENT_PEBS_VIA_PT))
+		return;
+
+	/*
+	 * In case there's a mix of PEBS->PT and PEBS->DS, fall back
+	 * to DS.
+	 */
+	if (cpuc->n_pebs != cpuc->n_pebs_via_pt) {
+		/* PEBS-to-DS events present, fall back to DS */
+		intel_pmu_pebs_via_pt_disable(event);
+		return;
+	}
+
+	if (!(event->hw.flags & PERF_X86_EVENT_LARGE_PEBS))
+		cpuc->pebs_enabled |= PEBS_PMI_AFTER_EACH_RECORD;
+
+	cpuc->pebs_enabled |= PEBS_OUTPUT_PT;
+
+	wrmsrl(MSR_RELOAD_PMC0 + hwc->idx, ds->pebs_event_reset[hwc->idx]);
+}
+
 void intel_pmu_pebs_enable(struct perf_event *event)
 {
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
@@ -1100,6 +1146,8 @@ void intel_pmu_pebs_enable(struct perf_event *event)
 	} else {
 		ds->pebs_event_reset[hwc->idx] = 0;
 	}
+
+	intel_pmu_pebs_via_pt_enable(event);
 }
 
 void intel_pmu_pebs_del(struct perf_event *event)
@@ -1111,6 +1159,8 @@ void intel_pmu_pebs_del(struct perf_event *event)
 	cpuc->n_pebs--;
 	if (hwc->flags & PERF_X86_EVENT_LARGE_PEBS)
 		cpuc->n_large_pebs--;
+	if (hwc->flags & PERF_X86_EVENT_PEBS_VIA_PT)
+		cpuc->n_pebs_via_pt--;
 
 	pebs_update_state(needed_cb, cpuc, event, false);
 }
@@ -1120,7 +1170,8 @@ void intel_pmu_pebs_disable(struct perf_event *event)
 	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
 	struct hw_perf_event *hwc = &event->hw;
 
-	if (cpuc->n_pebs == cpuc->n_large_pebs)
+	if (cpuc->n_pebs == cpuc->n_large_pebs &&
+	    cpuc->n_pebs != cpuc->n_pebs_via_pt)
 		intel_pmu_drain_pebs_buffer();
 
 	cpuc->pebs_enabled &= ~(1ULL << hwc->idx);
@@ -1131,6 +1182,8 @@ void intel_pmu_pebs_disable(struct perf_event *event)
 	else if (event->hw.flags & PERF_X86_EVENT_PEBS_ST)
 		cpuc->pebs_enabled &= ~(1ULL << 63);
 
+	intel_pmu_pebs_via_pt_disable(event);
+
 	if (cpuc->enabled)
 		wrmsrl(MSR_IA32_PEBS_ENABLE, cpuc->pebs_enabled);
 
@@ -2032,6 +2085,12 @@ void __init intel_ds_init(void)
 					  PERF_SAMPLE_REGS_INTR);
 			}
 			pr_cont("PEBS fmt4%c%s, ", pebs_type, pebs_qual);
+
+			if (x86_pmu.intel_cap.pebs_output_pt_available) {
+				pr_cont("PEBS-via-PT, ");
+				x86_get_pmu()->capabilities |= PERF_PMU_CAP_AUX_SOURCE;
+			}
+
 			break;
 
 		default:
diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
index 9bcec3f99e4a..57882c6c67d2 100644
--- a/arch/x86/events/perf_event.h
+++ b/arch/x86/events/perf_event.h
@@ -76,6 +76,7 @@ static inline bool constraint_match(struct event_constraint *c, u64 ecode)
 #define PERF_X86_EVENT_EXCL_ACCT	0x0100 /* accounted EXCL event */
 #define PERF_X86_EVENT_AUTO_RELOAD	0x0200 /* use PEBS auto-reload */
 #define PERF_X86_EVENT_LARGE_PEBS	0x0400 /* use large PEBS */
+#define PERF_X86_EVENT_PEBS_VIA_PT	0x0800 /* use PT buffer for PEBS */
 
 struct amd_nb {
 	int nb_id;  /* NorthBridge id */
@@ -85,6 +86,11 @@ struct amd_nb {
 };
 
 #define PEBS_COUNTER_MASK	((1ULL << MAX_PEBS_EVENTS) - 1)
+#define PEBS_PMI_AFTER_EACH_RECORD BIT_ULL(60)
+#define PEBS_OUTPUT_OFFSET	61
+#define PEBS_OUTPUT_MASK	(3ull << PEBS_OUTPUT_OFFSET)
+#define PEBS_OUTPUT_PT		(1ull << PEBS_OUTPUT_OFFSET)
+#define PEBS_VIA_PT_MASK	(PEBS_OUTPUT_PT | PEBS_PMI_AFTER_EACH_RECORD)
 
 /*
  * Flags PEBS can handle without an PMI.
@@ -229,6 +235,7 @@ struct cpu_hw_events {
 	u64			pebs_enabled;
 	int			n_pebs;
 	int			n_large_pebs;
+	int			n_pebs_via_pt;
 
 	/* Current super set of events hardware configuration */
 	u64			pebs_data_cfg;
@@ -528,6 +535,8 @@ union perf_capabilities {
 		 */
 		u64	full_width_write:1;
 		u64     pebs_baseline:1;
+		u64	pebs_metrics_available:1;
+		u64	pebs_output_pt_available:1;
 	};
 	u64	capabilities;
 };
@@ -711,6 +720,8 @@ struct x86_pmu {
 	 * Check period value for PERF_EVENT_IOC_PERIOD ioctl.
 	 */
 	int (*check_period) (struct perf_event *event, u64 period);
+
+	int (*aux_source_match) (struct perf_event *event);
 };
 
 struct x86_perf_task_context {
diff --git a/arch/x86/include/asm/msr-index.h b/arch/x86/include/asm/msr-index.h
index 6b4fc2788078..03c42f08f063 100644
--- a/arch/x86/include/asm/msr-index.h
+++ b/arch/x86/include/asm/msr-index.h
@@ -375,6 +375,10 @@
 /* Alternative perfctr range with full access. */
 #define MSR_IA32_PMC0			0x000004c1
 
+/* Auto-reload via MSR instead of DS area */
+#define MSR_RELOAD_PMC0			0x000014c1
+#define MSR_RELOAD_FIXED_CTR0		0x00001309
+
 /* AMD64 MSRs. Not complete. See the architecture manual for a more
    complete list. */
 
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v1 3/7] perf tools: Add aux_source attribute flag
  2019-07-04 16:00 [PATCH v1 0/7] perf, intel: Add support for PEBS output to Intel PT Alexander Shishkin
  2019-07-04 16:00 ` [PATCH v1 1/7] perf: Allow normal events to be sources of AUX data Alexander Shishkin
  2019-07-04 16:00 ` [PATCH v1 2/7] perf/x86/intel: Support PEBS output to PT Alexander Shishkin
@ 2019-07-04 16:00 ` Alexander Shishkin
  2019-07-04 16:00 ` [PATCH v1 4/7] perf tools: Add itrace option 'o' to synthesize aux-source events Alexander Shishkin
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Alexander Shishkin @ 2019-07-04 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, kan.liang,
	Adrian Hunter, Alexander Shishkin

From: Adrian Hunter <adrian.hunter@intel.com>

Add aux_source attribute flag to match the kernel's perf_event.h file.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 tools/include/uapi/linux/perf_event.h | 3 ++-
 tools/perf/util/evsel.c               | 1 +
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/tools/include/uapi/linux/perf_event.h b/tools/include/uapi/linux/perf_event.h
index 7198ddd0c6b1..213cae95e713 100644
--- a/tools/include/uapi/linux/perf_event.h
+++ b/tools/include/uapi/linux/perf_event.h
@@ -374,7 +374,8 @@ struct perf_event_attr {
 				namespaces     :  1, /* include namespaces data */
 				ksymbol        :  1, /* include ksymbol events */
 				bpf_event      :  1, /* include bpf events */
-				__reserved_1   : 33;
+				aux_source     :  1, /* generate AUX records instead of events */
+				__reserved_1   : 32;
 
 	union {
 		__u32		wakeup_events;	  /* wakeup every n events */
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 5ab31a4a658d..899dc189ff2d 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1683,6 +1683,7 @@ int perf_event_attr__fprintf(FILE *fp, struct perf_event_attr *attr,
 	PRINT_ATTRf(namespaces, p_unsigned);
 	PRINT_ATTRf(ksymbol, p_unsigned);
 	PRINT_ATTRf(bpf_event, p_unsigned);
+	PRINT_ATTRf(aux_source, p_unsigned);
 
 	PRINT_ATTRn("{ wakeup_events, wakeup_watermark }", wakeup_events, p_unsigned);
 	PRINT_ATTRf(bp_type, p_unsigned);
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v1 4/7] perf tools: Add itrace option 'o' to synthesize aux-source events
  2019-07-04 16:00 [PATCH v1 0/7] perf, intel: Add support for PEBS output to Intel PT Alexander Shishkin
                   ` (2 preceding siblings ...)
  2019-07-04 16:00 ` [PATCH v1 3/7] perf tools: Add aux_source attribute flag Alexander Shishkin
@ 2019-07-04 16:00 ` Alexander Shishkin
  2019-07-04 16:00 ` [PATCH v1 5/7] perf intel-pt: Process options for PEBS event synthesis Alexander Shishkin
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Alexander Shishkin @ 2019-07-04 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, kan.liang,
	Adrian Hunter, Alexander Shishkin

From: Adrian Hunter <adrian.hunter@intel.com>

Add itrace option 'o' to synthesize events recorded in the AUX area due to
the use of perf record's aux-source config term.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 tools/perf/Documentation/itrace.txt | 2 ++
 tools/perf/util/auxtrace.c          | 4 ++++
 tools/perf/util/auxtrace.h          | 3 +++
 3 files changed, 9 insertions(+)

diff --git a/tools/perf/Documentation/itrace.txt b/tools/perf/Documentation/itrace.txt
index c2182cbabde3..a9d1ff65e52b 100644
--- a/tools/perf/Documentation/itrace.txt
+++ b/tools/perf/Documentation/itrace.txt
@@ -5,6 +5,8 @@
 		x	synthesize transactions events
 		w	synthesize ptwrite events
 		p	synthesize power events
+		o	synthesize other events recorded due to the use
+			of aux-source (refer to perf record)
 		e	synthesize error events
 		d	create a debug log
 		g	synthesize a call chain (use with i or x)
diff --git a/tools/perf/util/auxtrace.c b/tools/perf/util/auxtrace.c
index bc215fe0b4b4..0262ac393fa5 100644
--- a/tools/perf/util/auxtrace.c
+++ b/tools/perf/util/auxtrace.c
@@ -964,6 +964,7 @@ void itrace_synth_opts__set_default(struct itrace_synth_opts *synth_opts,
 	synth_opts->transactions = true;
 	synth_opts->ptwrites = true;
 	synth_opts->pwr_events = true;
+	synth_opts->other_events = true;
 	synth_opts->errors = true;
 	if (no_sample) {
 		synth_opts->period_type = PERF_ITRACE_PERIOD_INSTRUCTIONS;
@@ -1061,6 +1062,9 @@ int itrace_parse_synth_opts(const struct option *opt, const char *str,
 		case 'p':
 			synth_opts->pwr_events = true;
 			break;
+		case 'o':
+			synth_opts->other_events = true;
+			break;
 		case 'e':
 			synth_opts->errors = true;
 			break;
diff --git a/tools/perf/util/auxtrace.h b/tools/perf/util/auxtrace.h
index e9b4c5edf78b..2690791974b0 100644
--- a/tools/perf/util/auxtrace.h
+++ b/tools/perf/util/auxtrace.h
@@ -60,6 +60,8 @@ enum itrace_period_type {
  * @transactions: whether to synthesize events for transactions
  * @ptwrites: whether to synthesize events for ptwrites
  * @pwr_events: whether to synthesize power events
+ * @other_events: whether to synthesize other events recorded due to the use of
+ *                aux_source
  * @errors: whether to synthesize decoder error events
  * @dont_decode: whether to skip decoding entirely
  * @log: write a decoding log
@@ -86,6 +88,7 @@ struct itrace_synth_opts {
 	bool			transactions;
 	bool			ptwrites;
 	bool			pwr_events;
+	bool			other_events;
 	bool			errors;
 	bool			dont_decode;
 	bool			log;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v1 5/7] perf intel-pt: Process options for PEBS event synthesis
  2019-07-04 16:00 [PATCH v1 0/7] perf, intel: Add support for PEBS output to Intel PT Alexander Shishkin
                   ` (3 preceding siblings ...)
  2019-07-04 16:00 ` [PATCH v1 4/7] perf tools: Add itrace option 'o' to synthesize aux-source events Alexander Shishkin
@ 2019-07-04 16:00 ` Alexander Shishkin
  2019-07-04 16:00 ` [PATCH v1 6/7] perf tools: Add aux-source config term Alexander Shishkin
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Alexander Shishkin @ 2019-07-04 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, kan.liang,
	Adrian Hunter, Alexander Shishkin

From: Adrian Hunter <adrian.hunter@intel.com>

Process synth_opts.other_events and attr.aux_source to set up for
synthesizing PEBs via Intel PT events.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 tools/perf/arch/x86/util/intel-pt.c | 23 +++++++++++++++++++++++
 tools/perf/util/intel-pt.c          | 18 ++++++++++++++++++
 2 files changed, 41 insertions(+)

diff --git a/tools/perf/arch/x86/util/intel-pt.c b/tools/perf/arch/x86/util/intel-pt.c
index 9804098dcefb..a776cfcc0705 100644
--- a/tools/perf/arch/x86/util/intel-pt.c
+++ b/tools/perf/arch/x86/util/intel-pt.c
@@ -548,6 +548,26 @@ static int intel_pt_validate_config(struct perf_pmu *intel_pt_pmu,
 					evsel->attr.config);
 }
 
+/*
+ * Currently, there is not enough information to disambiguate different PEBS
+ * events, so only allow one.
+ */
+static bool intel_pt_too_many_aux_source(struct perf_evlist *evlist)
+{
+	struct perf_evsel *evsel;
+	int aux_source_cnt = 0;
+
+	evlist__for_each_entry(evlist, evsel)
+		aux_source_cnt += !!evsel->attr.aux_source;
+
+	if (aux_source_cnt > 1) {
+		pr_err(INTEL_PT_PMU_NAME " supports at most one event with aux-source\n");
+		return true;
+	}
+
+	return false;
+}
+
 static int intel_pt_recording_options(struct auxtrace_record *itr,
 				      struct perf_evlist *evlist,
 				      struct record_opts *opts)
@@ -588,6 +608,9 @@ static int intel_pt_recording_options(struct auxtrace_record *itr,
 		return -EINVAL;
 	}
 
+	if (intel_pt_too_many_aux_source(evlist))
+		return -EINVAL;
+
 	if (!opts->full_auxtrace)
 		return 0;
 
diff --git a/tools/perf/util/intel-pt.c b/tools/perf/util/intel-pt.c
index 470aaae9d930..b8771e073efb 100644
--- a/tools/perf/util/intel-pt.c
+++ b/tools/perf/util/intel-pt.c
@@ -2894,6 +2894,22 @@ static int intel_pt_synth_events(struct intel_pt *pt,
 	return 0;
 }
 
+static void intel_pt_setup_pebs_events(struct intel_pt *pt)
+{
+	struct perf_evsel *evsel;
+
+	if (!pt->synth_opts.other_events)
+		return;
+
+	evlist__for_each_entry(pt->session->evlist, evsel) {
+		if (evsel->attr.aux_source && evsel->id) {
+			pt->sample_pebs = true;
+			pt->pebs_evsel = evsel;
+			return;
+		}
+	}
+}
+
 static struct perf_evsel *intel_pt_find_sched_switch(struct perf_evlist *evlist)
 {
 	struct perf_evsel *evsel;
@@ -3266,6 +3282,8 @@ int intel_pt_process_auxtrace_info(union perf_event *event,
 	if (err)
 		goto err_delete_thread;
 
+	intel_pt_setup_pebs_events(pt);
+
 	err = auxtrace_queues__process_index(&pt->queues, session);
 	if (err)
 		goto err_delete_thread;
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v1 6/7] perf tools: Add aux-source config term
  2019-07-04 16:00 [PATCH v1 0/7] perf, intel: Add support for PEBS output to Intel PT Alexander Shishkin
                   ` (4 preceding siblings ...)
  2019-07-04 16:00 ` [PATCH v1 5/7] perf intel-pt: Process options for PEBS event synthesis Alexander Shishkin
@ 2019-07-04 16:00 ` Alexander Shishkin
  2019-07-04 16:00 ` [PATCH v1 7/7] perf intel-pt: Add brief documentation for PEBS via Intel PT Alexander Shishkin
  2019-07-29  8:10 ` [PATCH v1 0/7] perf, intel: Add support for PEBS output to " Alexander Shishkin
  7 siblings, 0 replies; 12+ messages in thread
From: Alexander Shishkin @ 2019-07-04 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, kan.liang,
	Adrian Hunter, Alexander Shishkin

From: Adrian Hunter <adrian.hunter@intel.com>

Expose the aux_source attribute flag to the user to configure, by adding a
config term 'aux-source'. For events that support it, selection of
'aux-source' causes the generation of AUX records instead of event records.
This requires that an AUX area event is also provided.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 tools/perf/Documentation/perf-record.txt | 2 ++
 tools/perf/util/evsel.c                  | 3 +++
 tools/perf/util/evsel.h                  | 2 ++
 tools/perf/util/parse-events.c           | 8 ++++++++
 tools/perf/util/parse-events.h           | 1 +
 tools/perf/util/parse-events.l           | 1 +
 6 files changed, 17 insertions(+)

diff --git a/tools/perf/Documentation/perf-record.txt b/tools/perf/Documentation/perf-record.txt
index 15e0fa87241b..3077f6373dff 100644
--- a/tools/perf/Documentation/perf-record.txt
+++ b/tools/perf/Documentation/perf-record.txt
@@ -60,6 +60,8 @@ OPTIONS
 	  - 'name' : User defined event name. Single quotes (') may be used to
 		    escape symbols in the name from parsing by shell and tool
 		    like this: name=\'CPU_CLK_UNHALTED.THREAD:cmask=0x1\'.
+	  - 'aux-source': Generate AUX records instead of events. This requires
+			  that an AUX area event is also provided.
 
           See the linkperf:perf-list[1] man page for more parameters.
 
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index 899dc189ff2d..ff5fafcff8df 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -832,6 +832,9 @@ static void apply_config_terms(struct perf_evsel *evsel,
 			break;
 		case PERF_EVSEL__CONFIG_TERM_PERCORE:
 			break;
+		case PERF_EVSEL__CONFIG_TERM_AUX_SOURCE:
+			attr->aux_source = term->val.aux_source ? 1 : 0;
+			break;
 		default:
 			break;
 		}
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index cad54e8ba522..295d32fd42ba 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -51,6 +51,7 @@ enum term_type {
 	PERF_EVSEL__CONFIG_TERM_DRV_CFG,
 	PERF_EVSEL__CONFIG_TERM_BRANCH,
 	PERF_EVSEL__CONFIG_TERM_PERCORE,
+	PERF_EVSEL__CONFIG_TERM_AUX_SOURCE,
 };
 
 struct perf_evsel_config_term {
@@ -69,6 +70,7 @@ struct perf_evsel_config_term {
 		char	*branch;
 		unsigned long max_events;
 		bool	percore;
+		bool	aux_source;
 	} val;
 	bool weak;
 };
diff --git a/tools/perf/util/parse-events.c b/tools/perf/util/parse-events.c
index cf0b9b81c5aa..1048227f3eda 100644
--- a/tools/perf/util/parse-events.c
+++ b/tools/perf/util/parse-events.c
@@ -951,6 +951,7 @@ static const char *config_term_names[__PARSE_EVENTS__TERM_TYPE_NR] = {
 	[PARSE_EVENTS__TERM_TYPE_NOOVERWRITE]		= "no-overwrite",
 	[PARSE_EVENTS__TERM_TYPE_DRV_CFG]		= "driver-config",
 	[PARSE_EVENTS__TERM_TYPE_PERCORE]		= "percore",
+	[PARSE_EVENTS__TERM_TYPE_AUX_SOURCE]		= "aux-source",
 };
 
 static bool config_term_shrinked;
@@ -1071,6 +1072,9 @@ do {									   \
 			return -EINVAL;
 		}
 		break;
+	case PARSE_EVENTS__TERM_TYPE_AUX_SOURCE:
+		CHECK_TYPE_VAL(NUM);
+		break;
 	default:
 		err->str = strdup("unknown term");
 		err->idx = term->err_term;
@@ -1121,6 +1125,7 @@ static int config_term_tracepoint(struct perf_event_attr *attr,
 	case PARSE_EVENTS__TERM_TYPE_MAX_EVENTS:
 	case PARSE_EVENTS__TERM_TYPE_OVERWRITE:
 	case PARSE_EVENTS__TERM_TYPE_NOOVERWRITE:
+	case PARSE_EVENTS__TERM_TYPE_AUX_SOURCE:
 		return config_term_common(attr, term, err);
 	default:
 		if (err) {
@@ -1213,6 +1218,9 @@ do {								\
 			ADD_CONFIG_TERM(PERCORE, percore,
 					term->val.num ? true : false);
 			break;
+		case PARSE_EVENTS__TERM_TYPE_AUX_SOURCE:
+			ADD_CONFIG_TERM(AUX_SOURCE, aux_source, term->val.num ? 1 : 0);
+			break;
 		default:
 			break;
 		}
diff --git a/tools/perf/util/parse-events.h b/tools/perf/util/parse-events.h
index f7139e1a2fd3..782195ce8238 100644
--- a/tools/perf/util/parse-events.h
+++ b/tools/perf/util/parse-events.h
@@ -76,6 +76,7 @@ enum {
 	PARSE_EVENTS__TERM_TYPE_OVERWRITE,
 	PARSE_EVENTS__TERM_TYPE_DRV_CFG,
 	PARSE_EVENTS__TERM_TYPE_PERCORE,
+	PARSE_EVENTS__TERM_TYPE_AUX_SOURCE,
 	__PARSE_EVENTS__TERM_TYPE_NR,
 };
 
diff --git a/tools/perf/util/parse-events.l b/tools/perf/util/parse-events.l
index ca6098874fe2..399605a64c3d 100644
--- a/tools/perf/util/parse-events.l
+++ b/tools/perf/util/parse-events.l
@@ -284,6 +284,7 @@ no-inherit		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOINHERIT); }
 overwrite		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_OVERWRITE); }
 no-overwrite		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_NOOVERWRITE); }
 percore			{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_PERCORE); }
+aux-source		{ return term(yyscanner, PARSE_EVENTS__TERM_TYPE_AUX_SOURCE); }
 ,			{ return ','; }
 "/"			{ BEGIN(INITIAL); return '/'; }
 {name_minus}		{ return str(yyscanner, PE_NAME); }
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v1 7/7] perf intel-pt: Add brief documentation for PEBS via Intel PT
  2019-07-04 16:00 [PATCH v1 0/7] perf, intel: Add support for PEBS output to Intel PT Alexander Shishkin
                   ` (5 preceding siblings ...)
  2019-07-04 16:00 ` [PATCH v1 6/7] perf tools: Add aux-source config term Alexander Shishkin
@ 2019-07-04 16:00 ` Alexander Shishkin
  2019-07-29  8:10 ` [PATCH v1 0/7] perf, intel: Add support for PEBS output to " Alexander Shishkin
  7 siblings, 0 replies; 12+ messages in thread
From: Alexander Shishkin @ 2019-07-04 16:00 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, kan.liang,
	Adrian Hunter, Alexander Shishkin

From: Adrian Hunter <adrian.hunter@intel.com>

Document how to select PEBS via Intel PT and how to display synthesized
PEBS samples.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
---
 tools/perf/Documentation/intel-pt.txt | 15 +++++++++++++++
 1 file changed, 15 insertions(+)

diff --git a/tools/perf/Documentation/intel-pt.txt b/tools/perf/Documentation/intel-pt.txt
index 50c5b60101bd..8c40f898685b 100644
--- a/tools/perf/Documentation/intel-pt.txt
+++ b/tools/perf/Documentation/intel-pt.txt
@@ -919,3 +919,18 @@ amended to take the number of elements as a parameter.
 
 Note there is currently no advantage to using Intel PT instead of LBR, but
 that may change in the future if greater use is made of the data.
+
+
+PEBS via Intel PT
+=================
+
+Some hardware has the feature to redirect PEBS records to the Intel PT trace.
+Recording is selected by using the aux-source config term e.g.
+
+	perf record  -c 10000 -e cycles/aux-source/ppp -e intel_pt/branch=0/ uname
+
+Note that currently, software only supports redirecting at most one PEBS event.
+
+To display PEBS events from the Intel PT trace, use the itrace 'o' option e.g.
+
+	perf script --itrace=oe
-- 
2.20.1


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 0/7] perf, intel: Add support for PEBS output to Intel PT
  2019-07-04 16:00 [PATCH v1 0/7] perf, intel: Add support for PEBS output to Intel PT Alexander Shishkin
                   ` (6 preceding siblings ...)
  2019-07-04 16:00 ` [PATCH v1 7/7] perf intel-pt: Add brief documentation for PEBS via Intel PT Alexander Shishkin
@ 2019-07-29  8:10 ` Alexander Shishkin
  7 siblings, 0 replies; 12+ messages in thread
From: Alexander Shishkin @ 2019-07-29  8:10 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, kan.liang,
	alexander.shishkin

Alexander Shishkin <alexander.shishkin@linux.intel.com> writes:

> Hi Peter,

Ping.

Thanks,
--
Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 1/7] perf: Allow normal events to be sources of AUX data
  2019-07-04 16:00 ` [PATCH v1 1/7] perf: Allow normal events to be sources of AUX data Alexander Shishkin
@ 2019-07-29 11:31   ` Peter Zijlstra
  0 siblings, 0 replies; 12+ messages in thread
From: Peter Zijlstra @ 2019-07-29 11:31 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, kan.liang

On Thu, Jul 04, 2019 at 07:00:18PM +0300, Alexander Shishkin wrote:
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 8cfb721bb284..fc586da37067 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1887,6 +1887,57 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
>  	ctx->generation++;
>  }
>  
> +static int
> +perf_aux_source_match(struct perf_event *event, struct perf_event *aux_event)
> +{
> +	if (!has_aux(aux_event))
> +		return 0;
> +
> +	if (!event->pmu->aux_source_match)
> +		return 0;
> +
> +	return event->pmu->aux_source_match(aux_event);
> +}
> +
> +static void put_event(struct perf_event *event);
> +static void event_sched_out(struct perf_event *event,
> +			    struct perf_cpu_context *cpuctx,
> +			    struct perf_event_context *ctx);
> +
> +static void perf_put_aux_event(struct perf_event *event)
> +{
> +	struct perf_event_context *ctx = event->ctx;
> +	struct perf_cpu_context *cpuctx = __get_cpu_context(ctx);
> +	struct perf_event *iter = NULL;
> +
> +	/*
> +	 * If event uses aux_event tear down the link
> +	 */
> +	if (event->aux_event) {
> +		put_event(event->aux_event);
> +		event->aux_event = NULL;
> +		return;
> +	}
> +
> +	/*
> +	 * If the event is an aux_event, tear down all links to
> +	 * it from other events.
> +	 */
> +	for_each_sibling_event(iter, event->group_leader) {
> +		if (iter->aux_event != event)
> +			continue;
> +
> +		iter->aux_event = NULL;
> +		put_event(event);
> +
> +		/*
> +		 * If it's ACTIVE, schedule it out. It won't schedule
> +		 * again because !aux_event.
> +		 */
> +		event_sched_out(iter, cpuctx, ctx);
> +	}
> +}

I'm thinking we can use a perf_get_aux_event() to use below.

> +
>  static void perf_group_detach(struct perf_event *event)
>  {
>  	struct perf_event *sibling, *tmp;
> @@ -1902,6 +1953,8 @@ static void perf_group_detach(struct perf_event *event)
>  
>  	event->attach_state &= ~PERF_ATTACH_GROUP;
>  
> +	perf_put_aux_event(event);
> +
>  	/*
>  	 * If this is a sibling, remove it from its group.
>  	 */
> @@ -10396,6 +10449,12 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
>  		goto err_ns;
>  	}
>  
> +	if (event->attr.aux_source &&
> +	    !(pmu->capabilities & PERF_PMU_CAP_AUX_SOURCE)) {
> +		err = -EOPNOTSUPP;
> +		goto err_pmu;
> +	}
> +
>  	err = exclusive_event_init(event);
>  	if (err)
>  		goto err_pmu;
> @@ -11052,6 +11111,39 @@ SYSCALL_DEFINE5(perf_event_open,
>  		}
>  	}
>  
> +	if (event->attr.aux_source) {
> +		struct perf_event *aux_event = group_leader;
> +
> +		/*
> +		 * One of the events in the group must be an aux event
> +		 * if we want to be an aux_source. This way, the aux event
> +		 * will precede its aux_source events in the group, and
> +		 * therefore will always schedule first.

Can't we mandate that the group leader is the AUX event?

> +		 */
> +		err = -EINVAL;
> +		if (!aux_event)
> +			goto err_locked;
> +
> +		if (perf_aux_source_match(event, aux_event))
> +			goto found_aux;
> +
> +		for_each_sibling_event(aux_event, group_leader) {
> +			if (perf_aux_source_match(event, aux_event))
> +				goto found_aux;
> +		}
> +
> +		goto err_locked;
> +
> +found_aux:
> +		/*
> +		 * Link aux_sources to their aux event; this is undone in
> +		 * perf_group_detach(). When the group in torn down, the
> +		 * aux_source events loose their link to the aux_event and
> +		 * can't schedule any more.
> +		 */
> +		if (atomic_long_inc_not_zero(&aux_event->refcount))
> +			event->aux_event = aux_event;

I'm thinking failing that inc_not_zero() is BAD (tm) ?!

> +	}

With the sugggested perf_get_aux_event() this would become something
like:

	if (event->attr.aux_source && !perf_get_aux_event(event, group_leader))
		goto err_locked;

or something.

>  
>  	/*
>  	 * Must be under the same ctx::mutex as perf_install_in_context(),
> -- 
> 2.20.1
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 2/7] perf/x86/intel: Support PEBS output to PT
  2019-07-04 16:00 ` [PATCH v1 2/7] perf/x86/intel: Support PEBS output to PT Alexander Shishkin
@ 2019-07-29 13:37   ` Peter Zijlstra
  2019-07-31 14:02     ` Alexander Shishkin
  0 siblings, 1 reply; 12+ messages in thread
From: Peter Zijlstra @ 2019-07-29 13:37 UTC (permalink / raw)
  To: Alexander Shishkin
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, kan.liang

On Thu, Jul 04, 2019 at 07:00:19PM +0300, Alexander Shishkin wrote:

> diff --git a/arch/x86/events/core.c b/arch/x86/events/core.c
> index f0e4804515d8..a11924e20df3 100644
> --- a/arch/x86/events/core.c
> +++ b/arch/x86/events/core.c
> @@ -869,6 +869,7 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
>  	unsigned long used_mask[BITS_TO_LONGS(X86_PMC_IDX_MAX)];
>  	struct perf_event *e;
>  	int n0, i, wmin, wmax, unsched = 0;
> +	int n_pebs_ds, n_pebs_pt;
>  	struct hw_perf_event *hwc;
>  
>  	bitmap_zero(used_mask, X86_PMC_IDX_MAX);
> @@ -884,6 +885,37 @@ int x86_schedule_events(struct cpu_hw_events *cpuc, int n, int *assign)
>  	if (cpuc->txn_flags & PERF_PMU_TXN_ADD)
>  		n0 -= cpuc->n_txn;
>  
> +	/*
> +	 * Check for PEBS->DS and PEBS->PT events.
> +	 * 1) They can't be scheduled simultaneously;
> +	 * 2) PEBS->PT events depend on a corresponding PT event
> +	 */
> +	for (i = 0, n_pebs_ds = 0, n_pebs_pt = 0; i < n; i++) {
> +		e = cpuc->event_list[i];
> +
> +		if (e->attr.precise_ip) {
> +			if (e->hw.flags & PERF_X86_EVENT_PEBS_VIA_PT) {
> +				/*
> +				 * New PEBS->PT event, check ->aux_event; if
> +				 * it's NULL, the group has been broken down
> +				 * and this event can't schedule any more.
> +				 */
> +				if (!cpuc->is_fake && i >= n0 && !e->aux_event)
> +					return -EINVAL;

How can this happen? Is this an artifact if creating a group, and then
destroying the group leader (the PT event) and then getting a bunch of
unschedulable events as remains?

> +				n_pebs_pt++;
> +			} else {
> +				n_pebs_ds++;
> +			}
> +		}
> +	}

This makes for the 3rd i..n iteration in a row, now the first is over
cpuc->event_constraint[], this is the second and the third isn't
guaranteed to terminate but is over both cpuc->event_list[] and
->event_constraint[].

It just feels like we can do better.

> +
> +	/*
> +	 * Fail to add conflicting PEBS events. If this happens, rotation
> +	 * takes care that all events get to run.
> +	 */
> +	if (n_pebs_ds && n_pebs_pt)
> +		return -EINVAL;

This basically means we can rewrite the above like:

	u8 pebs_pt = 0;

	if (e->attr.precise_ip) {
		bool pt = is_pebs_pt(e);

		if (pebs_pt & (1 << !pt))
			return -EINVAL;

		pebs_pt |= 1 << pt;
	}

There's no need to finish the loop or to actually count how many there
are; all we need to know is there's only one type.

Then again, if you put these counters in cpuc, you can make
collect_events() reject the event before we ever get to scheduling and
avoid the whole iteration.

> +
>  	if (x86_pmu.start_scheduling)
>  		x86_pmu.start_scheduling(cpuc);
>  

> diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> index bda450ff51ee..6955d4f7e7aa 100644
> --- a/arch/x86/events/intel/core.c
> +++ b/arch/x86/events/intel/core.c

> @@ -3814,6 +3821,17 @@ static int intel_pmu_check_period(struct perf_event *event, u64 value)
>  	return intel_pmu_has_bts_period(event, value) ? -EINVAL : 0;
>  }
>  
> +static int intel_pmu_aux_source_match(struct perf_event *event)
> +{
> +	if (!x86_pmu.intel_cap.pebs_output_pt_available)
> +		return 0;
> +
> +	if (event->pmu->name && !strcmp(event->pmu->name, "intel_pt"))

Yuck, surely we can do something like:

	if (is_pt_event(event))

which is implemented in intel/pt.c and does something like:

	return event->pmu == &pt_pmu.pmu;

> +		return 1;
> +
> +	return 0;
> +}
> +
>  PMU_FORMAT_ATTR(offcore_rsp, "config1:0-63");
>  
>  PMU_FORMAT_ATTR(ldlat, "config1:0-15");

> diff --git a/arch/x86/events/intel/ds.c b/arch/x86/events/intel/ds.c
> index 7acc526b4ad2..9c59462f38a3 100644
> --- a/arch/x86/events/intel/ds.c
> +++ b/arch/x86/events/intel/ds.c

> +static void intel_pmu_pebs_via_pt_enable(struct perf_event *event)
> +{
> +	struct cpu_hw_events *cpuc = this_cpu_ptr(&cpu_hw_events);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct debug_store *ds = cpuc->ds;
> +
> +	if (!(event->hw.flags & PERF_X86_EVENT_PEBS_VIA_PT))
> +		return;
> +
> +	/*
> +	 * In case there's a mix of PEBS->PT and PEBS->DS, fall back
> +	 * to DS.
> +	 */

I thought we disallowed that from happening !?

> +	if (cpuc->n_pebs != cpuc->n_pebs_via_pt) {
> +		/* PEBS-to-DS events present, fall back to DS */
> +		intel_pmu_pebs_via_pt_disable(event);
> +		return;
> +	}
> +
> +	if (!(event->hw.flags & PERF_X86_EVENT_LARGE_PEBS))
> +		cpuc->pebs_enabled |= PEBS_PMI_AFTER_EACH_RECORD;
> +
> +	cpuc->pebs_enabled |= PEBS_OUTPUT_PT;
> +
> +	wrmsrl(MSR_RELOAD_PMC0 + hwc->idx, ds->pebs_event_reset[hwc->idx]);
> +}
> +


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v1 2/7] perf/x86/intel: Support PEBS output to PT
  2019-07-29 13:37   ` Peter Zijlstra
@ 2019-07-31 14:02     ` Alexander Shishkin
  0 siblings, 0 replies; 12+ messages in thread
From: Alexander Shishkin @ 2019-07-31 14:02 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Arnaldo Carvalho de Melo, Ingo Molnar, linux-kernel, kan.liang,
	alexander.shishkin

Peter Zijlstra <peterz@infradead.org> writes:

> On Thu, Jul 04, 2019 at 07:00:19PM +0300, Alexander Shishkin wrote:
>> +	/*
>> +	 * In case there's a mix of PEBS->PT and PEBS->DS, fall back
>> +	 * to DS.
>> +	 */
>
> I thought we disallowed that from happening !?

Yes, that was a weird leftover.

Thanks a bunch for the comments, they should all be addressed in v2:
https://marc.info/?l=linux-kernel&m=156458152126310

Regards,
--
Alex

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2019-07-31 14:02 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-07-04 16:00 [PATCH v1 0/7] perf, intel: Add support for PEBS output to Intel PT Alexander Shishkin
2019-07-04 16:00 ` [PATCH v1 1/7] perf: Allow normal events to be sources of AUX data Alexander Shishkin
2019-07-29 11:31   ` Peter Zijlstra
2019-07-04 16:00 ` [PATCH v1 2/7] perf/x86/intel: Support PEBS output to PT Alexander Shishkin
2019-07-29 13:37   ` Peter Zijlstra
2019-07-31 14:02     ` Alexander Shishkin
2019-07-04 16:00 ` [PATCH v1 3/7] perf tools: Add aux_source attribute flag Alexander Shishkin
2019-07-04 16:00 ` [PATCH v1 4/7] perf tools: Add itrace option 'o' to synthesize aux-source events Alexander Shishkin
2019-07-04 16:00 ` [PATCH v1 5/7] perf intel-pt: Process options for PEBS event synthesis Alexander Shishkin
2019-07-04 16:00 ` [PATCH v1 6/7] perf tools: Add aux-source config term Alexander Shishkin
2019-07-04 16:00 ` [PATCH v1 7/7] perf intel-pt: Add brief documentation for PEBS via Intel PT Alexander Shishkin
2019-07-29  8:10 ` [PATCH v1 0/7] perf, intel: Add support for PEBS output to " Alexander Shishkin

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).