All of lore.kernel.org
 help / color / mirror / Atom feed
From: Matt Fleming <matt@console-pimps.org>
To: Peter Zijlstra <peterz@infradead.org>,
	Ingo Molnar <mingo@kernel.org>, Jiri Olsa <jolsa@redhat.com>,
	Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>,
	linux-kernel@vger.kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Matt Fleming <matt.fleming@intel.com>,
	Arnaldo Carvalho de Melo <acme@redhat.com>
Subject: [PATCH 10/11] perf/x86/intel: Support task events with Intel CQM
Date: Wed, 24 Sep 2014 15:04:14 +0100	[thread overview]
Message-ID: <1411567455-31264-11-git-send-email-matt@console-pimps.org> (raw)
In-Reply-To: <1411567455-31264-1-git-send-email-matt@console-pimps.org>

From: Matt Fleming <matt.fleming@intel.com>

Add support for task events as well as system-wide events. This change
has a big impact on the way that we gather LLC occupancy values in
intel_cqm_event_read().

Currently, for system-wide (per-cpu) events we defer processing to
userspace which knows how to discard all but one cpu result per package.

Things aren't so simple for task events because we need to do the value
aggregation ourselves. To do this, we defer updating the LLC occupancy
value in event->count from intel_cqm_event_read() and do an SMP
cross-call to read values for all packages in intel_cqm_event_count().
We need to ensure that we only do this for one task event per cache
group, otherwise we'll report duplicate values.

If we're a system-wide event we want to fallback to the default
perf_event_count() implementation. Refactor this into a common function
so that we don't duplicate the code.

Also, introduce PERF_TYPE_INTEL_CQM, since we need a way to track an
event's task (if the event isn't per-cpu) inside of the Intel CQM PMU
driver.  This task information is only availble in the upper layers of
the perf infrastructure.

Other perf backends stash the target task in event->hw.*target so we
need to do something similar. The task is used to determine whether
events should share a cache group and an RMID.

All tasks in a thread group are tracked together.

Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Matt Fleming <matt.fleming@intel.com>
---
 arch/x86/kernel/cpu/perf_event_intel_cqm.c | 198 +++++++++++++++++++++++++----
 include/linux/perf_event.h                 |   1 +
 include/uapi/linux/perf_event.h            |   1 +
 kernel/events/core.c                       |   2 +
 4 files changed, 180 insertions(+), 22 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
index a8c1e40b32b9..d40c29e6eeff 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -181,23 +181,124 @@ fail:
 
 /*
  * Determine if @a and @b measure the same set of tasks.
+ *
+ * If @a and @b measure the same set of tasks then we want to share a
+ * single RMID.
  */
 static bool __match_event(struct perf_event *a, struct perf_event *b)
 {
+	/* Per-cpu and task events don't mix */
 	if ((a->attach_state & PERF_ATTACH_TASK) !=
 	    (b->attach_state & PERF_ATTACH_TASK))
 		return false;
 
-	/* not task */
+#ifdef CONFIG_CGROUP_PERF
+	if (a->cgrp != b->cgrp)
+		return false;
+#endif
+
+	/* If not task event, we're machine wide */
+	if (!(b->attach_state & PERF_ATTACH_TASK))
+		return true;
+
+	/*
+	 * Events that target same task are placed into the same cache group.
+	 */
+	if (a->hw.cqm_target == b->hw.cqm_target)
+		return true;
+
+	/*
+	 * Are we an inherited event?
+	 */
+	if (b->parent == a)
+		return true;
 
-	return true; /* if not task, we're machine wide */
+	return false;
+}
+
+#ifdef CONFIG_CGROUP_PERF
+static inline struct perf_cgroup *event_to_cgroup(struct perf_event *event)
+{
+	if (event->attach_state & PERF_ATTACH_TASK)
+		return perf_cgroup_from_task(event->hw.cqm_target);
+
+	return event->cgrp;
 }
+#endif
 
 /*
  * Determine if @a's tasks intersect with @b's tasks
+ *
+ * There are combinations of events that we explicitly prohibit,
+ *
+ *		   PROHIBITS
+ *     system-wide    -> 	cgroup and task
+ *     cgroup 	      ->	system-wide
+ *     		      ->	task in cgroup
+ *     task 	      -> 	system-wide
+ *     		      ->	task in cgroup
+ *
+ * Call this function before allocating an RMID.
  */
 static bool __conflict_event(struct perf_event *a, struct perf_event *b)
 {
+#ifdef CONFIG_CGROUP_PERF
+	/*
+	 * We can have any number of cgroups but only one system-wide
+	 * event at a time.
+	 */
+	if (a->cgrp && b->cgrp) {
+		struct perf_cgroup *ac = a->cgrp;
+		struct perf_cgroup *bc = b->cgrp;
+
+		/*
+		 * This condition should have been caught in
+		 * __match_event() and we should be sharing an RMID.
+		 */
+		WARN_ON_ONCE(ac == bc);
+
+		if (cgroup_is_descendant(ac->css.cgroup, bc->css.cgroup) ||
+		    cgroup_is_descendant(bc->css.cgroup, ac->css.cgroup))
+			return true;
+
+		return false;
+	}
+
+	if (a->cgrp || b->cgrp) {
+		struct perf_cgroup *ac, *bc;
+
+		/*
+		 * cgroup and system-wide events are mutually exclusive
+		 */
+		if ((a->cgrp && !(b->attach_state & PERF_ATTACH_TASK)) ||
+		    (b->cgrp && !(a->attach_state & PERF_ATTACH_TASK)))
+			return true;
+
+		/*
+		 * Ensure neither event is part of the other's cgroup
+		 */
+		ac = event_to_cgroup(a);
+		bc = event_to_cgroup(b);
+		if (ac == bc)
+			return true;
+
+		/*
+		 * Must have cgroup and non-intersecting task events.
+		 */
+		if (!ac || !bc)
+			return false;
+
+		/*
+		 * We have cgroup and task events, and the task belongs
+		 * to a cgroup. Check for for overlap.
+		 */
+		if (cgroup_is_descendant(ac->css.cgroup, bc->css.cgroup) ||
+		    cgroup_is_descendant(bc->css.cgroup, ac->css.cgroup))
+			return true;
+
+		return false;
+	}
+#endif
 	/*
 	 * If one of them is not a task, same story as above with cgroups.
 	 */
@@ -217,7 +318,7 @@ static bool __conflict_event(struct perf_event *a, struct perf_event *b)
  * If we're part of a group, we use the group's RMID.
  */
 static int intel_cqm_setup_event(struct perf_event *event,
-				 struct perf_event **group)
+				 struct perf_event **group, int cpu)
 {
 	struct perf_event *iter;
 	int rmid;
@@ -244,9 +345,16 @@ static int intel_cqm_setup_event(struct perf_event *event,
 
 static void intel_cqm_event_read(struct perf_event *event)
 {
-	unsigned long rmid = event->hw.cqm_rmid;
+	unsigned long rmid;
 	u64 val;
 
+	/*
+	 * Task events are handled by intel_cqm_event_count().
+	 */
+	if (event->cpu == -1)
+		return;
+
+	rmid = event->hw.cqm_rmid;
 	val = __rmid_read(rmid);
 
 	/*
@@ -257,9 +365,67 @@ static void intel_cqm_event_read(struct perf_event *event)
 
 	val *= cqm_l3_scale; /* cachelines -> bytes */
 
+	/*
+	 * If this event is per-cpu then we don't need to do any
+	 * aggregation in the kernel, it's all done in userland.
+	 */
 	local64_set(&event->count, val);
 }
 
+static void __intel_cqm_event_count(void *info)
+{
+	struct perf_event *event = info;
+	u64 val;
+
+	val = __rmid_read(event->hw.cqm_rmid);
+
+	if (val & (RMID_VAL_ERROR | RMID_VAL_UNAVAIL))
+		return;
+
+	val *= cqm_l3_scale; /* cachelines -> bytes */
+
+	local64_add(val, &event->count);
+}
+
+static inline bool cqm_group_leader(struct perf_event *event)
+{
+	return !list_empty(&event->hw.cqm_groups_entry);
+}
+
+static u64 intel_cqm_event_count(struct perf_event *event)
+{
+	unsigned int cpu;
+
+	/*
+	 * We only need to worry about task events. System-wide events
+	 * are handled like usual, i.e. entirely with
+	 * intel_cqm_event_read().
+	 */
+	if (event->cpu != -1)
+		return __perf_event_count(event);
+
+	/*
+	 * Only the group leader gets to report values. This stops us
+	 * reporting duplicate values to userspace, and gives us a clear
+	 * rule for which task gets to report the values.
+	 *
+	 * Note that it is impossible to attribute these values to
+	 * specific packages - we forfeit that ability when we create
+	 * task events.
+	 */
+	if (!cqm_group_leader(event))
+		return 0;
+
+	local64_set(&event->count, 0);
+
+	for_each_cpu(cpu, &cqm_cpumask) {
+		smp_call_function_single(cpu, __intel_cqm_event_count,
+					 event, 1);
+	}
+
+	return __perf_event_count(event);
+}
+
 static void intel_cqm_event_start(struct perf_event *event, int mode)
 {
 	struct intel_cqm_state *state = &__get_cpu_var(cqm_state);
@@ -345,7 +511,7 @@ static void intel_cqm_event_destroy(struct perf_event *event)
 	/*
 	 * And we're the group leader..
 	 */
-	if (!list_empty(&event->hw.cqm_groups_entry)) {
+	if (cqm_group_leader(event)) {
 		/*
 		 * If there was a group_other, make that leader, otherwise
 		 * destroy the group and return the RMID.
@@ -365,17 +531,6 @@ static void intel_cqm_event_destroy(struct perf_event *event)
 
 static struct pmu intel_cqm_pmu;
 
-/*
- * XXX there's a bit of a problem in that we cannot simply do the one
- * event per node as one would want, since that one event would one get
- * scheduled on the one cpu. But we want to 'schedule' the RMID on all
- * CPUs.
- *
- * This means we want events for each CPU, however, that generates a lot
- * of duplicate values out to userspace -- this is not to be helped
- * unless we want to change the core code in some way. Fore more info,
- * see intel_cqm_event_read().
- */
 static int intel_cqm_event_init(struct perf_event *event)
 {
 	struct perf_event *group = NULL;
@@ -387,9 +542,6 @@ static int intel_cqm_event_init(struct perf_event *event)
 	if (event->attr.config & ~QOS_EVENT_MASK)
 		return -EINVAL;
 
-	if (event->cpu == -1)
-		return -EINVAL;
-
 	/* unsupported modes and filters */
 	if (event->attr.exclude_user   ||
 	    event->attr.exclude_kernel ||
@@ -407,7 +559,8 @@ static int intel_cqm_event_init(struct perf_event *event)
 
 	mutex_lock(&cache_mutex);
 
-	err = intel_cqm_setup_event(event, &group); /* will also set rmid */
+	/* Will also set rmid */
+	err = intel_cqm_setup_event(event, &group, event->cpu);
 	if (err)
 		goto out;
 
@@ -464,6 +617,7 @@ static struct pmu intel_cqm_pmu = {
 	.start		= intel_cqm_event_start,
 	.stop		= intel_cqm_event_stop,
 	.read		= intel_cqm_event_read,
+	.count		= intel_cqm_event_count,
 };
 
 static inline void cqm_pick_event_reader(int cpu)
@@ -574,8 +728,8 @@ static int __init intel_cqm_init(void)
 
 	__perf_cpu_notifier(intel_cqm_cpu_notifier);
 
-	ret = perf_pmu_register(&intel_cqm_pmu, "intel_cqm", -1);
-
+	ret = perf_pmu_register(&intel_cqm_pmu, "intel_cqm",
+				PERF_TYPE_INTEL_CQM);
 	if (ret)
 		pr_err("Intel CQM perf registration failed: %d\n", ret);
 	else
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 50ceb0665cda..4cb6e1e668a1 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -134,6 +134,7 @@ struct hw_perf_event {
 			struct list_head	cqm_events_entry;
 			struct list_head	cqm_groups_entry;
 			struct list_head	cqm_group_entry;
+			struct task_struct	*cqm_target;
 		};
 #ifdef CONFIG_HAVE_HW_BREAKPOINT
 		struct { /* breakpoint */
diff --git a/include/uapi/linux/perf_event.h b/include/uapi/linux/perf_event.h
index 9269de254874..85bd517878f5 100644
--- a/include/uapi/linux/perf_event.h
+++ b/include/uapi/linux/perf_event.h
@@ -32,6 +32,7 @@ enum perf_type_id {
 	PERF_TYPE_HW_CACHE			= 3,
 	PERF_TYPE_RAW				= 4,
 	PERF_TYPE_BREAKPOINT			= 5,
+	PERF_TYPE_INTEL_CQM			= 6,
 
 	PERF_TYPE_MAX,				/* non-ABI */
 };
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8a07f51a23ff..44174b395705 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6924,6 +6924,8 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 		else if (attr->type == PERF_TYPE_BREAKPOINT)
 			event->hw.bp_target = task;
 #endif
+		else if (attr->type == PERF_TYPE_INTEL_CQM)
+			event->hw.cqm_target = task;
 	}
 
 	if (!overflow_handler && parent_event) {
-- 
1.9.3


  parent reply	other threads:[~2014-09-24 14:05 UTC|newest]

Thread overview: 46+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-09-24 14:04 [PATCH 00/11] perf: Intel Cache QoS Monitoring support Matt Fleming
2014-09-24 14:04 ` [PATCH 01/11] perf stat: Fix AGGR_CORE segfault on multi-socket system Matt Fleming
2014-09-24 14:04 ` [PATCH 02/11] perf tools: Refactor unit and scale function parameters Matt Fleming
2014-09-29  9:21   ` Jiri Olsa
2014-10-03  5:25   ` [tip:perf/core] " tip-bot for Matt Fleming
2014-09-24 14:04 ` [PATCH 03/11] perf tools: Parse event per-package info files Matt Fleming
2014-09-24 14:04 ` [PATCH 04/11] perf: Make perf_cgroup_from_task() global Matt Fleming
2014-10-07 18:51   ` Peter Zijlstra
2014-10-08 10:27     ` Matt Fleming
2014-09-24 14:04 ` [PATCH 05/11] perf: Add ->count() function to read per-package counters Matt Fleming
2014-09-24 14:04 ` [PATCH 06/11] perf: Move cgroup init before PMU ->event_init() Matt Fleming
2014-10-07 19:34   ` Peter Zijlstra
2014-10-08 10:32     ` Matt Fleming
2014-10-08 10:49       ` Peter Zijlstra
2014-09-24 14:04 ` [PATCH 07/11] x86: Add support for Intel Cache QoS Monitoring (CQM) detection Matt Fleming
2014-09-24 14:04 ` [PATCH 08/11] perf/x86/intel: Add Intel Cache QoS Monitoring support Matt Fleming
2014-09-24 16:40   ` Andi Kleen
2014-09-24 20:27     ` Matt Fleming
2014-09-24 20:39       ` Andi Kleen
2014-09-29 18:51         ` Matt Fleming
2014-10-07 19:43   ` Peter Zijlstra
2014-10-08 10:36     ` Matt Fleming
2014-10-08 12:15       ` Matt Fleming
2014-10-08 14:47         ` Peter Zijlstra
2014-10-08 20:51           ` Peter Zijlstra
2014-09-24 14:04 ` [PATCH 09/11] perf/x86/intel: Implement LRU monitoring ID allocation for CQM Matt Fleming
2014-10-08  9:51   ` Peter Zijlstra
2014-10-08 10:53     ` Matt Fleming
2014-09-24 14:04 ` Matt Fleming [this message]
2014-09-24 14:45   ` [PATCH 10/11] perf/x86/intel: Support task events with Intel CQM Matt Fleming
2014-10-08 10:01   ` Peter Zijlstra
2014-10-08 11:07   ` Peter Zijlstra
2014-10-08 12:10     ` Matt Fleming
2014-10-08 14:49       ` Peter Zijlstra
2014-10-10 12:02         ` Matt Fleming
2014-10-10 14:02           ` Peter Zijlstra
2014-10-10 10:54     ` Peter Zijlstra
2014-10-10 11:10       ` Matt Fleming
2014-09-24 14:04 ` [PATCH 11/11] perf/x86/intel: Perform rotation on Intel CQM RMIDs Matt Fleming
2014-10-08 11:19   ` Peter Zijlstra
2014-10-08 11:56     ` Matt Fleming
2014-10-08 18:08   ` Peter Zijlstra
2014-10-08 19:02     ` Thomas Gleixner
2014-10-08 19:59       ` Matt Fleming
2014-10-08 18:10   ` Peter Zijlstra
2014-10-08 20:04     ` Matt Fleming

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1411567455-31264-11-git-send-email-matt@console-pimps.org \
    --to=matt@console-pimps.org \
    --cc=acme@kernel.org \
    --cc=acme@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jolsa@redhat.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=matt.fleming@intel.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.