All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/7] Perf core cleanups for shared perf_event_contexts
@ 2014-02-10 17:44 Mark Rutland
  2014-02-10 17:44 ` [PATCH 1/7] perf: fix prototype of find_pmu_context Mark Rutland
                   ` (7 more replies)
  0 siblings, 8 replies; 26+ messages in thread
From: Mark Rutland @ 2014-02-10 17:44 UTC (permalink / raw)
  To: linux-kernel; +Cc: will.deacon, dave.martin, Mark Rutland

Hi,

I've been working on adding support for heterogeneous pmus (as would be found
in big.LITTLE systems) to the arm perf backend, and in the process of doing so
I've noticed a few issues in the core perf code that would be nice to have
fixed up. The heterogeneous pmu support is not quite ready, but the core
changes are independent, so I'm posting them today.

The first three patches fix up some basic issues which are relevant regardless
of whether you wish to deal with heterogeneous pmus, though patch two also
fixes a potential bug when perf_event_contexts are shared by multiple pmus.

The remaining four patches attempt to fix some inconsistencies in perf core
with regard to multiple pmus sharing perf_event_contexts, taken to the extreme
of removing perf_event_context::pmu. These result in an odd edge-case with a
shared hrtimer_interval_ms being exported per-pmu, and also highlight how odd
it is to have a group of events from multiple pmus (given that should mean the
group can never be scheduled).

Based on commit 443772776c69 (perf: Disable all pmus on unthrottling and
rescheduling) I believe that pmus sharing a context is the expected behaviour,
but given the edge cases this series highlights I think it would make more
sense to disallow context sharing and allocate a unique context per-pmu instead
(which would also make ctx->pmu sane again). I'm happy to rework the series to
that effect if people agree with the approach.

Patch 4 of this series (be less pessimistic when scheduling events) can be
improved with some standardisation of error codes returned by pmu::add, but for
the moment I've kept myself away from the architecture backends.

The current heterogeneous pmu series based atop of this can be found on my
linux-arm git repo [1].

Patches are based on v3.14-rc1.

Thanks,
Mark.

[1] git://linux-arm.org/linux-mr.git devel/perf/multi-pmu

Mark Rutland (7):
  perf: fix prototype of find_pmu_context
  perf: remove redundant pmu assignment
  perf: kill perf_event_context_type
  perf: be less pessimistic when scheduling events
  perf: kill pmu::hrtimer_interval_ms
  perf: Centralise context pmu disabling
  perf: kill perf_event_context::pmu

 include/linux/perf_event.h |  10 +--
 kernel/events/core.c       | 162 +++++++++++++++++++++++----------------------
 2 files changed, 85 insertions(+), 87 deletions(-)

-- 
1.8.1.1


^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH 1/7] perf: fix prototype of find_pmu_context
  2014-02-10 17:44 [PATCH 0/7] Perf core cleanups for shared perf_event_contexts Mark Rutland
@ 2014-02-10 17:44 ` Mark Rutland
  2014-02-27 13:33   ` [tip:perf/core] perf: Fix prototype of find_pmu_context() tip-bot for Mark Rutland
  2014-02-10 17:44 ` [PATCH 2/7] perf: remove redundant pmu assignment Mark Rutland
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2014-02-10 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: will.deacon, dave.martin, Mark Rutland, Peter Zijlstra, Ingo Molnar

For some reason find_pmu_context is defined as returning void* rather
than a __percpu struct perf_cpu_context*. As all the requisite types are
defined in advance there's no reason to keep it that way.

This patch modifies the prototype of pmu_find_context to return a
__percpu struct perf_cpu_context*.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
---
 kernel/events/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 56003c6..08659d1 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6294,7 +6294,7 @@ static int perf_event_idx_default(struct perf_event *event)
  * Ensures all contexts with the same task_ctx_nr have the same
  * pmu_cpu_context too.
  */
-static void *find_pmu_context(int ctxn)
+static struct perf_cpu_context __percpu *find_pmu_context(int ctxn)
 {
 	struct pmu *pmu;
 
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 2/7] perf: remove redundant pmu assignment
  2014-02-10 17:44 [PATCH 0/7] Perf core cleanups for shared perf_event_contexts Mark Rutland
  2014-02-10 17:44 ` [PATCH 1/7] perf: fix prototype of find_pmu_context Mark Rutland
@ 2014-02-10 17:44 ` Mark Rutland
  2014-02-27 13:33   ` [tip:perf/core] perf: Remove redundant PMU assignment tip-bot for Mark Rutland
  2014-02-10 17:44 ` [PATCH 3/7] perf: kill perf_event_context_type Mark Rutland
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2014-02-10 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: will.deacon, dave.martin, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Stephane Eranian

Currently perf_branch_stack_sched_in iterates over the set of pmus,
checks that each pmu has a flush_branch_stack callback, then overwrites
the pmu before calling the callback. This is either redundant or broken.

In systems with a single hw pmu, pmu == cpuctx->ctx.pmu, and thus the
assignment is redundant.

In systems with multiple hw pmus (i.e. multiple pmus with task_ctx_nr ==
perf_hw_context) the pmus share the same perf_cpu_context. Thus the
assignment can cause one of the pmus to flush its branch stack
repeatedly rather than causing each of the pmus to flush their branch
stacks. Worse still, if only some pmus have the callback the assignment
can result in a branch to NULL.

This patch removes the redundant assignment.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
---
 kernel/events/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 08659d1..15fe6fc 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2563,8 +2563,6 @@ static void perf_branch_stack_sched_in(struct task_struct *prev,
 		if (cpuctx->ctx.nr_branch_stack > 0
 		    && pmu->flush_branch_stack) {
 
-			pmu = cpuctx->ctx.pmu;
-
 			perf_ctx_lock(cpuctx, cpuctx->task_ctx);
 
 			perf_pmu_disable(pmu);
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 3/7] perf: kill perf_event_context_type
  2014-02-10 17:44 [PATCH 0/7] Perf core cleanups for shared perf_event_contexts Mark Rutland
  2014-02-10 17:44 ` [PATCH 1/7] perf: fix prototype of find_pmu_context Mark Rutland
  2014-02-10 17:44 ` [PATCH 2/7] perf: remove redundant pmu assignment Mark Rutland
@ 2014-02-10 17:44 ` Mark Rutland
  2014-02-25 11:38   ` Peter Zijlstra
  2014-02-10 17:44 ` [PATCH 4/7] perf: be less pessimistic when scheduling events Mark Rutland
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2014-02-10 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: will.deacon, dave.martin, Mark Rutland, Peter Zijlstra, Ingo Molnar

Currently perf_event_context::type is used to determine whether a
context is cpu-bound or task-bound. However perf_event_context::task can
be used to determine this just as cheaply, and requires no additional
initialisation.

This patch removes perf_event_context::type, and modifies existing users
to check check perf_event_context::task instead. The now unused enum
perf_event_context_type is removed.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
---
 include/linux/perf_event.h | 6 ------
 kernel/events/core.c       | 3 +--
 2 files changed, 1 insertion(+), 8 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index e56b07f..df3d34a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -440,11 +440,6 @@ struct perf_event {
 #endif /* CONFIG_PERF_EVENTS */
 };
 
-enum perf_event_context_type {
-	task_context,
-	cpu_context,
-};
-
 /**
  * struct perf_event_context - event context structure
  *
@@ -452,7 +447,6 @@ enum perf_event_context_type {
  */
 struct perf_event_context {
 	struct pmu			*pmu;
-	enum perf_event_context_type	type;
 	/*
 	 * Protect the states of the events in the list,
 	 * nr_active, and the list:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 15fe6fc..e2fcf1b 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6493,7 +6493,6 @@ skip_type:
 		__perf_event_init_context(&cpuctx->ctx);
 		lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex);
 		lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock);
-		cpuctx->ctx.type = cpu_context;
 		cpuctx->ctx.pmu = pmu;
 
 		__perf_cpu_hrtimer_init(cpuctx, cpu);
@@ -7130,7 +7129,7 @@ SYSCALL_DEFINE5(perf_event_open,
 		 * task or CPU context:
 		 */
 		if (move_group) {
-			if (group_leader->ctx->type != ctx->type)
+			if (group_leader->ctx->task != ctx->task)
 				goto err_context;
 		} else {
 			if (group_leader->ctx != ctx)
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 4/7] perf: be less pessimistic when scheduling events
  2014-02-10 17:44 [PATCH 0/7] Perf core cleanups for shared perf_event_contexts Mark Rutland
                   ` (2 preceding siblings ...)
  2014-02-10 17:44 ` [PATCH 3/7] perf: kill perf_event_context_type Mark Rutland
@ 2014-02-10 17:44 ` Mark Rutland
  2014-02-10 17:58   ` Peter Zijlstra
  2014-02-10 17:44 ` [PATCH 5/7] perf: kill pmu::hrtimer_interval_ms Mark Rutland
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2014-02-10 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: will.deacon, dave.martin, Mark Rutland, Peter Zijlstra, Cc: Ingo Molnar

Currently ctx_flexible_sched_in assumes that the pmu is full as soon as
a single group fails to schedule. This is not necessarily true, and
leads to sub-optimal event scheduling in a couple of scenarios.

If heterogeneous hw pmus are registered (e.g. in a big.LITTLE system),
they will share the same perf_event_context, though each event will only
be possible to schedule on a subset of cpus, and thus some events may
fail to schedule in ctx_flexible_sched_in. If these events are early in
the flexible_groups list they will prevent viable events from being
scheduled until the list is sufficiently rotated. For short running
tasks its possible that sufficient rotation never occurs and events are
never counted even when all counters in the pmu are free.

Even on a single-pmu system it is possible for an permanently
unschedulable event group to starve a permanently schedulable event
group. Assume the pmu has N counters, and one of these counters is in
active use by a pinned event (e.g. a perf top session). Create two event
groups, one with N events (never schedulable) and one with N-1 (always
schedulable). While the former group is at the head of the
flexible_groups list it will prevent the latter from being scheduled. On
average the always-schedulable group will only be scheduled for half of
the time it's possible to schedule it for.

This patch makes ctx_flexible_sched_in attempt to schedule every group
in the flexible_groups list even when earlier groups failed to schedule,
enabling more groups to be scheduled simultaneously. The events are
still scheduled in list order, so no events scheduled under the old
behaviour will be rejected under the new behaviour. The existing
rotation of the flexible_groups list ensures that each group will be
scheduled given sufficient time, as with the current implementation.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Cc: Ingo Molnar <mingo@redhat.com>
---
 kernel/events/core.c | 19 +++++++------------
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index e2fcf1b..13ede70 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1781,8 +1781,7 @@ group_error:
  * Work out whether we can put this event group on the CPU now.
  */
 static int group_can_go_on(struct perf_event *event,
-			   struct perf_cpu_context *cpuctx,
-			   int can_add_hw)
+			   struct perf_cpu_context *cpuctx)
 {
 	/*
 	 * Groups consisting entirely of software events can always go on.
@@ -1802,10 +1801,9 @@ static int group_can_go_on(struct perf_event *event,
 	if (event->attr.exclusive && cpuctx->active_oncpu)
 		return 0;
 	/*
-	 * Otherwise, try to add it if all previous groups were able
-	 * to go on.
+	 * Otherwise, try to add it.
 	 */
-	return can_add_hw;
+	return 1;
 }
 
 static void add_event_to_ctx(struct perf_event *event,
@@ -2024,7 +2022,7 @@ static int __perf_event_enable(void *info)
 	if (leader != event && leader->state != PERF_EVENT_STATE_ACTIVE)
 		goto unlock;
 
-	if (!group_can_go_on(event, cpuctx, 1)) {
+	if (!group_can_go_on(event, cpuctx)) {
 		err = -EEXIST;
 	} else {
 		if (event == leader)
@@ -2409,7 +2407,7 @@ ctx_pinned_sched_in(struct perf_event_context *ctx,
 		if (is_cgroup_event(event))
 			perf_cgroup_mark_enabled(event, ctx);
 
-		if (group_can_go_on(event, cpuctx, 1))
+		if (group_can_go_on(event, cpuctx))
 			group_sched_in(event, cpuctx, ctx);
 
 		/*
@@ -2428,7 +2426,6 @@ ctx_flexible_sched_in(struct perf_event_context *ctx,
 		      struct perf_cpu_context *cpuctx)
 {
 	struct perf_event *event;
-	int can_add_hw = 1;
 
 	list_for_each_entry(event, &ctx->flexible_groups, group_entry) {
 		/* Ignore events in OFF or ERROR state */
@@ -2445,10 +2442,8 @@ ctx_flexible_sched_in(struct perf_event_context *ctx,
 		if (is_cgroup_event(event))
 			perf_cgroup_mark_enabled(event, ctx);
 
-		if (group_can_go_on(event, cpuctx, can_add_hw)) {
-			if (group_sched_in(event, cpuctx, ctx))
-				can_add_hw = 0;
-		}
+		if (group_can_go_on(event, cpuctx))
+			group_sched_in(event, cpuctx, ctx);
 	}
 }
 
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 5/7] perf: kill pmu::hrtimer_interval_ms
  2014-02-10 17:44 [PATCH 0/7] Perf core cleanups for shared perf_event_contexts Mark Rutland
                   ` (3 preceding siblings ...)
  2014-02-10 17:44 ` [PATCH 4/7] perf: be less pessimistic when scheduling events Mark Rutland
@ 2014-02-10 17:44 ` Mark Rutland
  2014-02-10 17:44 ` [PATCH 6/7] perf: Centralise context pmu disabling Mark Rutland
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 26+ messages in thread
From: Mark Rutland @ 2014-02-10 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: will.deacon, dave.martin, Mark Rutland, Peter Zijlstra, Ingo Molnar

The hrtimers affecting a pmu live in perf_cpu_context, which may be
shared by multiple pmus. As such it does not make sense for the pmu to
track the interval, which is a property of the shared context.

This patch removes pmu::hrtimer_interval_ms, and reworks all existing
users to use perf_cpu_context::hrtimer_interval instead. No pmu
implementations currently override the default hrtimer_interval_ms
statically, so we don't lose any existing implementation-specific
intervals (which could themselves be problematic if two pmus had
different hrtimer_interval_ms values).

While we're in the area we also replace uses of
ns_to_ktime(NSEC_PER_MSEC * $x) with ms_to_ktime($x), which does the
same thing.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
---
 include/linux/perf_event.h |  1 -
 kernel/events/core.c       | 24 +++++++++++++-----------
 2 files changed, 13 insertions(+), 12 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index df3d34a..7794a39 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -180,7 +180,6 @@ struct pmu {
 	int * __percpu			pmu_disable_count;
 	struct perf_cpu_context * __percpu pmu_cpu_context;
 	int				task_ctx_nr;
-	int				hrtimer_interval_ms;
 
 	/*
 	 * Fully disable/enable this PMU, can be used to protect from the PMI
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 13ede70..710c3fe 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -816,14 +816,11 @@ static void __perf_cpu_hrtimer_init(struct perf_cpu_context *cpuctx, int cpu)
 		return;
 
 	/*
-	 * check default is sane, if not set then force to
-	 * default interval (1/tick)
+	 * Set sane default interval (1/tick)
 	 */
-	timer = pmu->hrtimer_interval_ms;
-	if (timer < 1)
-		timer = pmu->hrtimer_interval_ms = PERF_CPU_HRTIMER;
+	timer = PERF_CPU_HRTIMER;
 
-	cpuctx->hrtimer_interval = ns_to_ktime(NSEC_PER_MSEC * timer);
+	cpuctx->hrtimer_interval = ms_to_ktime(timer);
 
 	hrtimer_init(hr, CLOCK_MONOTONIC, HRTIMER_MODE_REL_PINNED);
 	hr->function = perf_cpu_hrtimer_handler;
@@ -6346,14 +6343,21 @@ type_show(struct device *dev, struct device_attribute *attr, char *page)
 }
 static DEVICE_ATTR_RO(type);
 
+static int get_pmu_hrtimer_interval(struct pmu *pmu)
+{
+	struct perf_cpu_context *cpu_ctx = this_cpu_ptr(pmu->pmu_cpu_context);
+	return ktime_to_ms(cpu_ctx->hrtimer_interval);
+}
+
 static ssize_t
 perf_event_mux_interval_ms_show(struct device *dev,
 				struct device_attribute *attr,
 				char *page)
 {
 	struct pmu *pmu = dev_get_drvdata(dev);
+	int interval_ms = get_pmu_hrtimer_interval(pmu);
 
-	return snprintf(page, PAGE_SIZE-1, "%d\n", pmu->hrtimer_interval_ms);
+	return snprintf(page, PAGE_SIZE-1, "%d\n", interval_ms);
 }
 
 static ssize_t
@@ -6372,16 +6376,14 @@ perf_event_mux_interval_ms_store(struct device *dev,
 		return -EINVAL;
 
 	/* same value, noting to do */
-	if (timer == pmu->hrtimer_interval_ms)
+	if (timer == get_pmu_hrtimer_interval(pmu))
 		return count;
 
-	pmu->hrtimer_interval_ms = timer;
-
 	/* update all cpuctx for this PMU */
 	for_each_possible_cpu(cpu) {
 		struct perf_cpu_context *cpuctx;
 		cpuctx = per_cpu_ptr(pmu->pmu_cpu_context, cpu);
-		cpuctx->hrtimer_interval = ns_to_ktime(NSEC_PER_MSEC * timer);
+		cpuctx->hrtimer_interval = ms_to_ktime(timer);
 
 		if (hrtimer_active(&cpuctx->hrtimer))
 			hrtimer_forward_now(&cpuctx->hrtimer, cpuctx->hrtimer_interval);
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 6/7] perf: Centralise context pmu disabling
  2014-02-10 17:44 [PATCH 0/7] Perf core cleanups for shared perf_event_contexts Mark Rutland
                   ` (4 preceding siblings ...)
  2014-02-10 17:44 ` [PATCH 5/7] perf: kill pmu::hrtimer_interval_ms Mark Rutland
@ 2014-02-10 17:44 ` Mark Rutland
  2014-02-10 18:08   ` Peter Zijlstra
  2014-02-10 17:44 ` [PATCH 7/7] perf: kill perf_event_context::pmu Mark Rutland
  2014-02-19 13:43 ` [PATCH 0/7] Perf core cleanups for shared perf_event_contexts Mark Rutland
  7 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2014-02-10 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: will.deacon, dave.martin, Mark Rutland, Peter Zijlstra,
	Ingo Molnar, Alexander Shishkin, Andi Kleen

Commit 443772776c69 (perf: Disable all pmus on unthrottling and
rescheduling) identified an issue with having multiple PMUs sharing a
perf_event_context, but only partially solved the issue.

While ctx::pmu will be disabled across all of its events being
scheduled, pmus which are not ctx::pmu will be repeatedly enabled and
disabled between events being added, possibly counting imbetween
pmu::add calls. This could be expensive and could lead to events
counting for differing periods.

Instead, this patch adds new helpers to disable/enable all pmus which
have events in a context. While perf_pmu_{dis,en}able may be called
repeatedly for a particular pmu, disabling is reference counted such
that the real pmu::{dis,en}able callbacks are only called once (were
this not the case, the current code would be broken for ctx::pmu).

Uses of perf_pmu{disable,enable}(ctx->pmu) are replaced with
perf_ctx_pmus_{disable,enable}(ctx). The now unnecessary calls to
perf_pmu_enable and perf_pmu_disable added by 443772776c69 are removed.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
---
 kernel/events/core.c | 65 ++++++++++++++++++++++++++++------------------------
 1 file changed, 35 insertions(+), 30 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 710c3fe..55c772e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -500,7 +500,7 @@ void perf_cgroup_switch(struct task_struct *task, int mode)
 		 */
 		if (cpuctx->ctx.nr_cgroups > 0) {
 			perf_ctx_lock(cpuctx, cpuctx->task_ctx);
-			perf_pmu_disable(cpuctx->ctx.pmu);
+			perf_ctx_pmus_disable(&cpuctx->ctx);
 
 			if (mode & PERF_CGROUP_SWOUT) {
 				cpu_ctx_sched_out(cpuctx, EVENT_ALL);
@@ -521,7 +521,7 @@ void perf_cgroup_switch(struct task_struct *task, int mode)
 				cpuctx->cgrp = perf_cgroup_from_task(task);
 				cpu_ctx_sched_in(cpuctx, EVENT_ALL, task);
 			}
-			perf_pmu_enable(cpuctx->ctx.pmu);
+			perf_ctx_pmus_enable(&cpuctx->ctx);
 			perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
 		}
 	}
@@ -857,6 +857,26 @@ void perf_pmu_enable(struct pmu *pmu)
 		pmu->pmu_enable(pmu);
 }
 
+/*
+ * Must be called with ctx->lock or ctx->mutex held.
+ */
+void perf_ctx_pmus_disable(struct perf_event_context *ctx)
+{
+	struct perf_event *event;
+	list_for_each_entry(event, &ctx->event_list, event_entry)
+		perf_pmu_disable(event->pmu);
+}
+
+/*
+ * Must be called with ctx->lock or ctx->mutex held.
+ */
+void perf_ctx_pmus_enable(struct perf_event_context *ctx)
+{
+	struct perf_event *event;
+	list_for_each_entry(event, &ctx->event_list, event_entry)
+		perf_pmu_enable(event->pmu);
+}
+
 static DEFINE_PER_CPU(struct list_head, rotation_list);
 
 /*
@@ -1394,8 +1414,6 @@ event_sched_out(struct perf_event *event,
 	if (event->state != PERF_EVENT_STATE_ACTIVE)
 		return;
 
-	perf_pmu_disable(event->pmu);
-
 	event->state = PERF_EVENT_STATE_INACTIVE;
 	if (event->pending_disable) {
 		event->pending_disable = 0;
@@ -1412,8 +1430,6 @@ event_sched_out(struct perf_event *event,
 		ctx->nr_freq--;
 	if (event->attr.exclusive || !cpuctx->active_oncpu)
 		cpuctx->exclusive = 0;
-
-	perf_pmu_enable(event->pmu);
 }
 
 static void
@@ -1654,7 +1670,6 @@ event_sched_in(struct perf_event *event,
 		 struct perf_event_context *ctx)
 {
 	u64 tstamp = perf_event_time(event);
-	int ret = 0;
 
 	if (event->state <= PERF_EVENT_STATE_OFF)
 		return 0;
@@ -1677,13 +1692,10 @@ event_sched_in(struct perf_event *event,
 	 */
 	smp_wmb();
 
-	perf_pmu_disable(event->pmu);
-
 	if (event->pmu->add(event, PERF_EF_START)) {
 		event->state = PERF_EVENT_STATE_INACTIVE;
 		event->oncpu = -1;
-		ret = -EAGAIN;
-		goto out;
+		return -EAGAIN;
 	}
 
 	event->tstamp_running += tstamp - event->tstamp_stopped;
@@ -1699,10 +1711,7 @@ event_sched_in(struct perf_event *event,
 	if (event->attr.exclusive)
 		cpuctx->exclusive = 1;
 
-out:
-	perf_pmu_enable(event->pmu);
-
-	return ret;
+	return 0;
 }
 
 static int
@@ -1848,7 +1857,7 @@ static int  __perf_install_in_context(void *info)
 	struct task_struct *task = current;
 
 	perf_ctx_lock(cpuctx, task_ctx);
-	perf_pmu_disable(cpuctx->ctx.pmu);
+	perf_ctx_pmus_disable(&cpuctx->ctx);
 
 	/*
 	 * If there was an active task_ctx schedule it out.
@@ -1889,7 +1898,7 @@ static int  __perf_install_in_context(void *info)
 	 */
 	perf_event_sched_in(cpuctx, task_ctx, task);
 
-	perf_pmu_enable(cpuctx->ctx.pmu);
+	perf_ctx_pmus_enable(&cpuctx->ctx);
 	perf_ctx_unlock(cpuctx, task_ctx);
 
 	return 0;
@@ -2147,7 +2156,7 @@ static void ctx_sched_out(struct perf_event_context *ctx,
 	if (!ctx->nr_active)
 		return;
 
-	perf_pmu_disable(ctx->pmu);
+	perf_ctx_pmus_disable(ctx);
 	if ((is_active & EVENT_PINNED) && (event_type & EVENT_PINNED)) {
 		list_for_each_entry(event, &ctx->pinned_groups, group_entry)
 			group_sched_out(event, cpuctx, ctx);
@@ -2157,7 +2166,7 @@ static void ctx_sched_out(struct perf_event_context *ctx,
 		list_for_each_entry(event, &ctx->flexible_groups, group_entry)
 			group_sched_out(event, cpuctx, ctx);
 	}
-	perf_pmu_enable(ctx->pmu);
+	perf_ctx_pmus_enable(ctx);
 }
 
 /*
@@ -2491,7 +2500,7 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
 		return;
 
 	perf_ctx_lock(cpuctx, ctx);
-	perf_pmu_disable(ctx->pmu);
+	perf_ctx_pmus_disable(ctx);
 	/*
 	 * We want to keep the following priority order:
 	 * cpu pinned (that don't need to move), task pinned,
@@ -2504,7 +2513,7 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
 
 	perf_event_sched_in(cpuctx, cpuctx->task_ctx, task);
 
-	perf_pmu_enable(ctx->pmu);
+	perf_ctx_pmus_enable(ctx);
 	perf_ctx_unlock(cpuctx, ctx);
 
 	/*
@@ -2736,7 +2745,7 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx,
 		return;
 
 	raw_spin_lock(&ctx->lock);
-	perf_pmu_disable(ctx->pmu);
+	perf_ctx_pmus_disable(ctx);
 
 	list_for_each_entry_rcu(event, &ctx->event_list, event_entry) {
 		if (event->state != PERF_EVENT_STATE_ACTIVE)
@@ -2745,8 +2754,6 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx,
 		if (!event_filter_match(event))
 			continue;
 
-		perf_pmu_disable(event->pmu);
-
 		hwc = &event->hw;
 
 		if (hwc->interrupts == MAX_INTERRUPTS) {
@@ -2756,7 +2763,7 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx,
 		}
 
 		if (!event->attr.freq || !event->attr.sample_freq)
-			goto next;
+			continue;
 
 		/*
 		 * stop the event and update event->count
@@ -2778,11 +2785,9 @@ static void perf_adjust_freq_unthr_context(struct perf_event_context *ctx,
 			perf_adjust_period(event, period, delta, false);
 
 		event->pmu->start(event, delta > 0 ? PERF_EF_RELOAD : 0);
-	next:
-		perf_pmu_enable(event->pmu);
 	}
 
-	perf_pmu_enable(ctx->pmu);
+	perf_ctx_pmus_enable(ctx);
 	raw_spin_unlock(&ctx->lock);
 }
 
@@ -2826,7 +2831,7 @@ static int perf_rotate_context(struct perf_cpu_context *cpuctx)
 		goto done;
 
 	perf_ctx_lock(cpuctx, cpuctx->task_ctx);
-	perf_pmu_disable(cpuctx->ctx.pmu);
+	perf_ctx_pmus_disable(&cpuctx->ctx);
 
 	cpu_ctx_sched_out(cpuctx, EVENT_FLEXIBLE);
 	if (ctx)
@@ -2838,7 +2843,7 @@ static int perf_rotate_context(struct perf_cpu_context *cpuctx)
 
 	perf_event_sched_in(cpuctx, ctx, current);
 
-	perf_pmu_enable(cpuctx->ctx.pmu);
+	perf_ctx_pmus_enable(&cpuctx->ctx);
 	perf_ctx_unlock(cpuctx, cpuctx->task_ctx);
 done:
 	if (remove)
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH 7/7] perf: kill perf_event_context::pmu
  2014-02-10 17:44 [PATCH 0/7] Perf core cleanups for shared perf_event_contexts Mark Rutland
                   ` (5 preceding siblings ...)
  2014-02-10 17:44 ` [PATCH 6/7] perf: Centralise context pmu disabling Mark Rutland
@ 2014-02-10 17:44 ` Mark Rutland
  2014-02-10 18:10   ` Peter Zijlstra
  2014-02-19 13:43 ` [PATCH 0/7] Perf core cleanups for shared perf_event_contexts Mark Rutland
  7 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2014-02-10 17:44 UTC (permalink / raw)
  To: linux-kernel
  Cc: will.deacon, dave.martin, Mark Rutland, Peter Zijlstra, Ingo Molnar

Currently portions of the perf subsystem assume that a
perf_event_context is associated with a single pmu while in reality a
single perf_event_context may be shared by a number of pmus, as commit
443772776c69 (perf: Disable all pmus on unthrottling and rescheduling)
describes.

This patch removes perf_event_context::pmu, replacing it with a direct
pointer to the associated perf_cpu_context and a task_ctx_nr (as all
pmus sharing a context have the same task_ctx_nr). This makes the
relationship between pmus and perf_event_contexts clearer and allows us
to save on some pointer chasing.

This also fixes a potential misuse of ctx->pmu introduced in commit
bad7192b842c (perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the
period), where ctx->pmu is disabled before modifying state on
event->pmu. In this case the two pmus are not guaranteed to be the same.

As perf_pmu_rotate_{start,stop} only really care about the context they
are rotating, they are renamed to perf_event_ctx_{start,stop}.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Ingo Molnar <mingo@redhat.com>
---
 include/linux/perf_event.h |  3 ++-
 kernel/events/core.c       | 47 +++++++++++++++++++++++++---------------------
 2 files changed, 28 insertions(+), 22 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 7794a39..2123882 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -445,7 +445,8 @@ struct perf_event {
  * Used as a container for task events and CPU events as well:
  */
 struct perf_event_context {
-	struct pmu			*pmu;
+	struct perf_cpu_context __percpu *cpu_ctx;
+	enum perf_event_task_context	task_ctx_nr;
 	/*
 	 * Protect the states of the events in the list,
 	 * nr_active, and the list:
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 55c772e..541dcb5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -299,7 +299,7 @@ static inline u64 perf_clock(void)
 static inline struct perf_cpu_context *
 __get_cpu_context(struct perf_event_context *ctx)
 {
-	return this_cpu_ptr(ctx->pmu->pmu_cpu_context);
+	return this_cpu_ptr(ctx->cpu_ctx);
 }
 
 static void perf_ctx_lock(struct perf_cpu_context *cpuctx,
@@ -808,11 +808,10 @@ void perf_cpu_hrtimer_cancel(int cpu)
 static void __perf_cpu_hrtimer_init(struct perf_cpu_context *cpuctx, int cpu)
 {
 	struct hrtimer *hr = &cpuctx->hrtimer;
-	struct pmu *pmu = cpuctx->ctx.pmu;
 	int timer;
 
 	/* no multiplexing needed for SW PMU */
-	if (pmu->task_ctx_nr == perf_sw_context)
+	if (cpuctx->ctx.task_ctx_nr == perf_sw_context)
 		return;
 
 	/*
@@ -829,10 +828,9 @@ static void __perf_cpu_hrtimer_init(struct perf_cpu_context *cpuctx, int cpu)
 static void perf_cpu_hrtimer_restart(struct perf_cpu_context *cpuctx)
 {
 	struct hrtimer *hr = &cpuctx->hrtimer;
-	struct pmu *pmu = cpuctx->ctx.pmu;
 
 	/* not for SW PMU */
-	if (pmu->task_ctx_nr == perf_sw_context)
+	if (cpuctx->ctx.task_ctx_nr == perf_sw_context)
 		return;
 
 	if (hrtimer_active(hr))
@@ -880,13 +878,13 @@ void perf_ctx_pmus_enable(struct perf_event_context *ctx)
 static DEFINE_PER_CPU(struct list_head, rotation_list);
 
 /*
- * perf_pmu_rotate_start() and perf_rotate_context() are fully serialized
+ * perf_ctx_rotate_start() and perf_rotate_context() are fully serialized
  * because they're strictly cpu affine and rotate_start is called with IRQs
  * disabled, while rotate_context is called from IRQ context.
  */
-static void perf_pmu_rotate_start(struct pmu *pmu)
+static void perf_ctx_rotate_start(struct perf_event_context *ctx)
 {
-	struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(ctx->cpu_ctx);
 	struct list_head *head = &__get_cpu_var(rotation_list);
 
 	WARN_ON(!irqs_disabled());
@@ -1151,7 +1149,7 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
 
 	list_add_rcu(&event->event_entry, &ctx->event_list);
 	if (!ctx->nr_events)
-		perf_pmu_rotate_start(ctx->pmu);
+		perf_ctx_rotate_start(ctx);
 	ctx->nr_events++;
 	if (event->attr.inherit_stat)
 		ctx->nr_stat++;
@@ -2520,7 +2518,7 @@ static void perf_event_context_sched_in(struct perf_event_context *ctx,
 	 * Since these rotations are per-cpu, we need to ensure the
 	 * cpu-context we got scheduled on is actually rotating.
 	 */
-	perf_pmu_rotate_start(ctx->pmu);
+	perf_ctx_rotate_start(ctx);
 }
 
 /*
@@ -2805,7 +2803,7 @@ static void rotate_ctx(struct perf_event_context *ctx)
 }
 
 /*
- * perf_pmu_rotate_start() and perf_rotate_context() are fully serialized
+ * perf_ctx_rotate_start() and perf_rotate_context() are fully serialized
  * because they're strictly cpu affine and rotate_start is called with IRQs
  * disabled, while rotate_context is called from IRQ context.
  */
@@ -3028,7 +3026,9 @@ static void __perf_event_init_context(struct perf_event_context *ctx)
 }
 
 static struct perf_event_context *
-alloc_perf_context(struct pmu *pmu, struct task_struct *task)
+alloc_perf_context(enum perf_event_task_context task_ctx_nr,
+		   struct perf_cpu_context __percpu *cpuctx,
+		   struct task_struct *task)
 {
 	struct perf_event_context *ctx;
 
@@ -3041,7 +3041,8 @@ alloc_perf_context(struct pmu *pmu, struct task_struct *task)
 		ctx->task = task;
 		get_task_struct(task);
 	}
-	ctx->pmu = pmu;
+	ctx->task_ctx_nr = task_ctx_nr;
+	ctx->cpu_ctx = cpuctx;
 
 	return ctx;
 }
@@ -3120,7 +3121,8 @@ retry:
 		++ctx->pin_count;
 		raw_spin_unlock_irqrestore(&ctx->lock, flags);
 	} else {
-		ctx = alloc_perf_context(pmu, task);
+		ctx = alloc_perf_context(pmu->task_ctx_nr,
+					 pmu->pmu_cpu_context, task);
 		err = -ENOMEM;
 		if (!ctx)
 			goto errout;
@@ -3565,7 +3567,7 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
 
 	active = (event->state == PERF_EVENT_STATE_ACTIVE);
 	if (active) {
-		perf_pmu_disable(ctx->pmu);
+		perf_pmu_disable(event->pmu);
 		event->pmu->stop(event, PERF_EF_UPDATE);
 	}
 
@@ -3573,7 +3575,7 @@ static int perf_event_period(struct perf_event *event, u64 __user *arg)
 
 	if (active) {
 		event->pmu->start(event, PERF_EF_RELOAD);
-		perf_pmu_enable(ctx->pmu);
+		perf_pmu_enable(event->pmu);
 	}
 
 unlock:
@@ -6495,7 +6497,9 @@ skip_type:
 		__perf_event_init_context(&cpuctx->ctx);
 		lockdep_set_class(&cpuctx->ctx.mutex, &cpuctx_mutex);
 		lockdep_set_class(&cpuctx->ctx.lock, &cpuctx_lock);
-		cpuctx->ctx.pmu = pmu;
+
+		cpuctx->ctx.task_ctx_nr = pmu->task_ctx_nr;
+		cpuctx->ctx.cpu_ctx = pmu->pmu_cpu_context;
 
 		__perf_cpu_hrtimer_init(cpuctx, cpu);
 
@@ -7684,7 +7688,8 @@ inherit_task_group(struct perf_event *event, struct task_struct *parent,
 		 * child.
 		 */
 
-		child_ctx = alloc_perf_context(parent_ctx->pmu, child);
+		child_ctx = alloc_perf_context(parent_ctx->task_ctx_nr,
+						parent_ctx->cpu_ctx, child);
 		if (!child_ctx)
 			return -ENOMEM;
 
@@ -7843,9 +7848,9 @@ static void perf_event_init_cpu(int cpu)
 }
 
 #if defined CONFIG_HOTPLUG_CPU || defined CONFIG_KEXEC
-static void perf_pmu_rotate_stop(struct pmu *pmu)
+static void perf_ctx_rotate_stop(struct perf_event_context *ctx)
 {
-	struct perf_cpu_context *cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
+	struct perf_cpu_context *cpuctx = this_cpu_ptr(ctx->cpu_ctx);
 
 	WARN_ON(!irqs_disabled());
 
@@ -7857,7 +7862,7 @@ static void __perf_event_exit_context(void *__info)
 	struct perf_event_context *ctx = __info;
 	struct perf_event *event, *tmp;
 
-	perf_pmu_rotate_stop(ctx->pmu);
+	perf_ctx_rotate_stop(ctx);
 
 	list_for_each_entry_safe(event, tmp, &ctx->pinned_groups, group_entry)
 		__perf_remove_from_context(event);
-- 
1.8.1.1


^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/7] perf: be less pessimistic when scheduling events
  2014-02-10 17:44 ` [PATCH 4/7] perf: be less pessimistic when scheduling events Mark Rutland
@ 2014-02-10 17:58   ` Peter Zijlstra
  2014-02-11 17:48     ` Mark Rutland
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-02-10 17:58 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-kernel, will.deacon, dave.martin, Ingo Molnar

On Mon, Feb 10, 2014 at 05:44:21PM +0000, Mark Rutland wrote:
> This patch makes ctx_flexible_sched_in attempt to schedule every group
> in the flexible_groups list even when earlier groups failed to schedule,
> enabling more groups to be scheduled simultaneously.

Since you're basically free to create as many events as you want, you can
(as an unprivilidged user) make the context switch to and from your task
arbitrarily expensive.

So no, that's not going to fly.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 6/7] perf: Centralise context pmu disabling
  2014-02-10 17:44 ` [PATCH 6/7] perf: Centralise context pmu disabling Mark Rutland
@ 2014-02-10 18:08   ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2014-02-10 18:08 UTC (permalink / raw)
  To: Mark Rutland
  Cc: linux-kernel, will.deacon, dave.martin, Ingo Molnar,
	Alexander Shishkin, Andi Kleen

On Mon, Feb 10, 2014 at 05:44:23PM +0000, Mark Rutland wrote:
> Commit 443772776c69 (perf: Disable all pmus on unthrottling and
> rescheduling) identified an issue with having multiple PMUs sharing a
> perf_event_context, but only partially solved the issue.
> 
> While ctx::pmu will be disabled across all of its events being
> scheduled, pmus which are not ctx::pmu will be repeatedly enabled and
> disabled between events being added, possibly counting imbetween
> pmu::add calls. This could be expensive and could lead to events
> counting for differing periods.
> 
> Instead, this patch adds new helpers to disable/enable all pmus which
> have events in a context. While perf_pmu_{dis,en}able may be called
> repeatedly for a particular pmu, disabling is reference counted such
> that the real pmu::{dis,en}able callbacks are only called once (were
> this not the case, the current code would be broken for ctx::pmu).
> 
> Uses of perf_pmu{disable,enable}(ctx->pmu) are replaced with
> perf_ctx_pmus_{disable,enable}(ctx). The now unnecessary calls to
> perf_pmu_enable and perf_pmu_disable added by 443772776c69 are removed.

Hurmn; instead of adding more for_each_event iterations we should be
reducing them.

Given that we currently schedule first to last and stop on the first
event that fails to schedule, we can terminate the ctx_sched_out() loop
when it finds the first event that wasn't actually scheduled.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] perf: kill perf_event_context::pmu
  2014-02-10 17:44 ` [PATCH 7/7] perf: kill perf_event_context::pmu Mark Rutland
@ 2014-02-10 18:10   ` Peter Zijlstra
  2014-02-11 17:56     ` Mark Rutland
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-02-10 18:10 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-kernel, will.deacon, dave.martin, Ingo Molnar

On Mon, Feb 10, 2014 at 05:44:24PM +0000, Mark Rutland wrote:
> Currently portions of the perf subsystem assume that a
> perf_event_context is associated with a single pmu while in reality a
> single perf_event_context may be shared by a number of pmus, as commit
> 443772776c69 (perf: Disable all pmus on unthrottling and rescheduling)
> describes.
> 
> This patch removes perf_event_context::pmu, replacing it with a direct
> pointer to the associated perf_cpu_context and a task_ctx_nr (as all
> pmus sharing a context have the same task_ctx_nr). This makes the
> relationship between pmus and perf_event_contexts clearer and allows us
> to save on some pointer chasing.
> 
> This also fixes a potential misuse of ctx->pmu introduced in commit
> bad7192b842c (perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the
> period), where ctx->pmu is disabled before modifying state on
> event->pmu. In this case the two pmus are not guaranteed to be the same.
> 
> As perf_pmu_rotate_{start,stop} only really care about the context they
> are rotating, they are renamed to perf_event_ctx_{start,stop}.

This very much relies on the previous patch where you make pmu_disable
iterate all the events.

We could also change this to keep a pmu list for each context and
iterate that instead. Given there is indeed a fair limit on different
PMUs in the system that iteration should be much shorter.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/7] perf: be less pessimistic when scheduling events
  2014-02-10 17:58   ` Peter Zijlstra
@ 2014-02-11 17:48     ` Mark Rutland
  2014-02-25 11:29       ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2014-02-11 17:48 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Will Deacon, Dave P Martin, Ingo Molnar

On Mon, Feb 10, 2014 at 05:58:54PM +0000, Peter Zijlstra wrote:
> On Mon, Feb 10, 2014 at 05:44:21PM +0000, Mark Rutland wrote:
> > This patch makes ctx_flexible_sched_in attempt to schedule every group
> > in the flexible_groups list even when earlier groups failed to schedule,
> > enabling more groups to be scheduled simultaneously.
> 
> Since you're basically free to create as many events as you want, you can
> (as an unprivilidged user) make the context switch to and from your task
> arbitrarily expensive.

Not that it makes the approach any better, but surely this is already
the case for software events? I can create as many as I want, and
they'll always try to schedule.

I'll take a look into other ways to stop disparate PMUs' events from
adversely affecting each other. I guess it might not be possible to
solve the general problem without allowing said expensive context
switch.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] perf: kill perf_event_context::pmu
  2014-02-10 18:10   ` Peter Zijlstra
@ 2014-02-11 17:56     ` Mark Rutland
  2014-02-12 15:01       ` Dave Martin
  2014-02-25 11:31       ` Peter Zijlstra
  0 siblings, 2 replies; 26+ messages in thread
From: Mark Rutland @ 2014-02-11 17:56 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Will Deacon, Dave P Martin, Ingo Molnar

On Mon, Feb 10, 2014 at 06:10:26PM +0000, Peter Zijlstra wrote:
> On Mon, Feb 10, 2014 at 05:44:24PM +0000, Mark Rutland wrote:
> > Currently portions of the perf subsystem assume that a
> > perf_event_context is associated with a single pmu while in reality a
> > single perf_event_context may be shared by a number of pmus, as commit
> > 443772776c69 (perf: Disable all pmus on unthrottling and rescheduling)
> > describes.
> > 
> > This patch removes perf_event_context::pmu, replacing it with a direct
> > pointer to the associated perf_cpu_context and a task_ctx_nr (as all
> > pmus sharing a context have the same task_ctx_nr). This makes the
> > relationship between pmus and perf_event_contexts clearer and allows us
> > to save on some pointer chasing.
> > 
> > This also fixes a potential misuse of ctx->pmu introduced in commit
> > bad7192b842c (perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the
> > period), where ctx->pmu is disabled before modifying state on
> > event->pmu. In this case the two pmus are not guaranteed to be the same.
> > 
> > As perf_pmu_rotate_{start,stop} only really care about the context they
> > are rotating, they are renamed to perf_event_ctx_{start,stop}.
> 
> This very much relies on the previous patch where you make pmu_disable
> iterate all the events.
> 
> We could also change this to keep a pmu list for each context and
> iterate that instead. Given there is indeed a fair limit on different
> PMUs in the system that iteration should be much shorter.

Another option would be to have a context per-pmu. Each context's pmu
pointer would be valid, and (other than the case of software events) it
doesn't make sense to place events from disparate PMUs into the same
group anyway. Then you don't need a fixed sized pmu list in the context
or some arcane list structs.

Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] perf: kill perf_event_context::pmu
  2014-02-11 17:56     ` Mark Rutland
@ 2014-02-12 15:01       ` Dave Martin
  2014-02-25 11:31       ` Peter Zijlstra
  1 sibling, 0 replies; 26+ messages in thread
From: Dave Martin @ 2014-02-12 15:01 UTC (permalink / raw)
  To: Mark Rutland; +Cc: Peter Zijlstra, linux-kernel, Will Deacon, Ingo Molnar

On Tue, Feb 11, 2014 at 05:56:51PM +0000, Mark Rutland wrote:
> On Mon, Feb 10, 2014 at 06:10:26PM +0000, Peter Zijlstra wrote:
> > On Mon, Feb 10, 2014 at 05:44:24PM +0000, Mark Rutland wrote:
> > > Currently portions of the perf subsystem assume that a
> > > perf_event_context is associated with a single pmu while in reality a
> > > single perf_event_context may be shared by a number of pmus, as commit
> > > 443772776c69 (perf: Disable all pmus on unthrottling and rescheduling)
> > > describes.
> > > 
> > > This patch removes perf_event_context::pmu, replacing it with a direct
> > > pointer to the associated perf_cpu_context and a task_ctx_nr (as all
> > > pmus sharing a context have the same task_ctx_nr). This makes the
> > > relationship between pmus and perf_event_contexts clearer and allows us
> > > to save on some pointer chasing.
> > > 
> > > This also fixes a potential misuse of ctx->pmu introduced in commit
> > > bad7192b842c (perf: Fix PERF_EVENT_IOC_PERIOD to force-reset the
> > > period), where ctx->pmu is disabled before modifying state on
> > > event->pmu. In this case the two pmus are not guaranteed to be the same.
> > > 
> > > As perf_pmu_rotate_{start,stop} only really care about the context they
> > > are rotating, they are renamed to perf_event_ctx_{start,stop}.
> > 
> > This very much relies on the previous patch where you make pmu_disable
> > iterate all the events.
> > 
> > We could also change this to keep a pmu list for each context and
> > iterate that instead. Given there is indeed a fair limit on different
> > PMUs in the system that iteration should be much shorter.
> 
> Another option would be to have a context per-pmu. Each context's pmu
> pointer would be valid, and (other than the case of software events) it
> doesn't make sense to place events from disparate PMUs into the same
> group anyway. Then you don't need a fixed sized pmu list in the context
> or some arcane list structs.

Getting event rotation to work in a sensible way when there are mixtures
of events for different PMUs in a single context.  I've not come up with
a good enough solution for that to post yet.

Splitting out each PMU as a separate hardware context would make this
work a lot more naturally.

Cheers
---Dave

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7] Perf core cleanups for shared perf_event_contexts
  2014-02-10 17:44 [PATCH 0/7] Perf core cleanups for shared perf_event_contexts Mark Rutland
                   ` (6 preceding siblings ...)
  2014-02-10 17:44 ` [PATCH 7/7] perf: kill perf_event_context::pmu Mark Rutland
@ 2014-02-19 13:43 ` Mark Rutland
  2014-02-25 11:39   ` Peter Zijlstra
  7 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2014-02-19 13:43 UTC (permalink / raw)
  To: linux-kernel, Peter Zijlstra; +Cc: Will Deacon, Dave P Martin

On Mon, Feb 10, 2014 at 05:44:17PM +0000, Mark Rutland wrote:
> Hi,
> 
> I've been working on adding support for heterogeneous pmus (as would be found
> in big.LITTLE systems) to the arm perf backend, and in the process of doing so
> I've noticed a few issues in the core perf code that would be nice to have
> fixed up. The heterogeneous pmu support is not quite ready, but the core
> changes are independent, so I'm posting them today.
> 
> The first three patches fix up some basic issues which are relevant regardless
> of whether you wish to deal with heterogeneous pmus, though patch two also
> fixes a potential bug when perf_event_contexts are shared by multiple pmus.

Peter,

While you have concerns with the later patches in the series, do you
have any concerns with teh first three (as below), or would you be happy
to take them?

>   perf: fix prototype of find_pmu_context
>   perf: remove redundant pmu assignment
>   perf: kill perf_event_context_type

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/7] perf: be less pessimistic when scheduling events
  2014-02-11 17:48     ` Mark Rutland
@ 2014-02-25 11:29       ` Peter Zijlstra
  2014-02-27 12:07         ` Mark Rutland
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-02-25 11:29 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-kernel, Will Deacon, Dave P Martin, Ingo Molnar

On Tue, Feb 11, 2014 at 05:48:05PM +0000, Mark Rutland wrote:
> On Mon, Feb 10, 2014 at 05:58:54PM +0000, Peter Zijlstra wrote:
> > On Mon, Feb 10, 2014 at 05:44:21PM +0000, Mark Rutland wrote:
> > > This patch makes ctx_flexible_sched_in attempt to schedule every group
> > > in the flexible_groups list even when earlier groups failed to schedule,
> > > enabling more groups to be scheduled simultaneously.
> > 
> > Since you're basically free to create as many events as you want, you can
> > (as an unprivilidged user) make the context switch to and from your task
> > arbitrarily expensive.
> 
> Not that it makes the approach any better, but surely this is already
> the case for software events? 

Yeah, software events are a problem; I realized this the moment I send
that email. However, software events do not have the O(n!) programming
fail hw events have.

AMD Fam15h has a O(n^4) termination of the O(n*n!) algorithm, the rest
of x86 has a normal O(n^2) bound.

> I can create as many as I want, and they'll always try to schedule.
> 
> I'll take a look into other ways to stop disparate PMUs' events from
> adversely affecting each other. I guess it might not be possible to
> solve the general problem without allowing said expensive context
> switch.

So the problem you're facing is that you'll attach events for each PMU
available on the system to a particular task? And simply want to
continue scheduling even though a particular event will never fit on the
CPUs PMU its currently running on?

So the trivial work-around would be to create an event per cpu and set
event->cpu. That way you'll get the event_filter_match() exception in
ctx_flexible_sched_in() and we'll skip over all events that cannot be
programmed on this CPU.

The other approach is adding a cpumask to events, which would
drastically cut down on the amount of events you'd need to create; then
again, ARM doesn't have a silly amount of CPUs x86 has just quite yet.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] perf: kill perf_event_context::pmu
  2014-02-11 17:56     ` Mark Rutland
  2014-02-12 15:01       ` Dave Martin
@ 2014-02-25 11:31       ` Peter Zijlstra
  2014-02-27 11:48         ` Mark Rutland
  1 sibling, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-02-25 11:31 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-kernel, Will Deacon, Dave P Martin, Ingo Molnar

On Tue, Feb 11, 2014 at 05:56:51PM +0000, Mark Rutland wrote:
> Another option would be to have a context per-pmu. Each context's pmu
> pointer would be valid, and (other than the case of software events) it
> doesn't make sense to place events from disparate PMUs into the same
> group anyway. Then you don't need a fixed sized pmu list in the context
> or some arcane list structs.

No it does make sense; for example on hardware that doesn't have a PMI
you can create a software event + hardware event group and have the
software interrupt read the hardware counter and still get 'some'
sampling.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/7] perf: kill perf_event_context_type
  2014-02-10 17:44 ` [PATCH 3/7] perf: kill perf_event_context_type Mark Rutland
@ 2014-02-25 11:38   ` Peter Zijlstra
  2014-02-27 11:46     ` Mark Rutland
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-02-25 11:38 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-kernel, will.deacon, dave.martin, Ingo Molnar

On Mon, Feb 10, 2014 at 05:44:20PM +0000, Mark Rutland wrote:
> Currently perf_event_context::type is used to determine whether a
> context is cpu-bound or task-bound. However perf_event_context::task can
> be used to determine this just as cheaply, and requires no additional
> initialisation.
> 
> This patch removes perf_event_context::type, and modifies existing users
> to check check perf_event_context::task instead. The now unused enum
> perf_event_context_type is removed.

> @@ -7130,7 +7129,7 @@ SYSCALL_DEFINE5(perf_event_open,
>  		 * task or CPU context:
>  		 */
>  		if (move_group) {
> -			if (group_leader->ctx->type != ctx->type)
> +			if (group_leader->ctx->task != ctx->task)
>  				goto err_context;
>  		} else {
>  			if (group_leader->ctx != ctx)

That's not an equivalent statement. ctx->task (t1) != ctx->task (t2)
while they're still both of the same type.

Now I don't think you'll ever end up with different tasks in this case
so it might still work out; but you don't mention this and I'd have to
like think to make sure.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 0/7] Perf core cleanups for shared perf_event_contexts
  2014-02-19 13:43 ` [PATCH 0/7] Perf core cleanups for shared perf_event_contexts Mark Rutland
@ 2014-02-25 11:39   ` Peter Zijlstra
  0 siblings, 0 replies; 26+ messages in thread
From: Peter Zijlstra @ 2014-02-25 11:39 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-kernel, Will Deacon, Dave P Martin

On Wed, Feb 19, 2014 at 01:43:41PM +0000, Mark Rutland wrote:
> While you have concerns with the later patches in the series, do you
> have any concerns with teh first three (as below), or would you be happy
> to take them?

I've taken 1 and 2. I can take 3 if you confirm and update its
Changelog.

Thanks!

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 3/7] perf: kill perf_event_context_type
  2014-02-25 11:38   ` Peter Zijlstra
@ 2014-02-27 11:46     ` Mark Rutland
  0 siblings, 0 replies; 26+ messages in thread
From: Mark Rutland @ 2014-02-27 11:46 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Will Deacon, Dave P Martin, Ingo Molnar

On Tue, Feb 25, 2014 at 11:38:48AM +0000, Peter Zijlstra wrote:
> On Mon, Feb 10, 2014 at 05:44:20PM +0000, Mark Rutland wrote:
> > Currently perf_event_context::type is used to determine whether a
> > context is cpu-bound or task-bound. However perf_event_context::task can
> > be used to determine this just as cheaply, and requires no additional
> > initialisation.
> > 
> > This patch removes perf_event_context::type, and modifies existing users
> > to check check perf_event_context::task instead. The now unused enum
> > perf_event_context_type is removed.
> 
> > @@ -7130,7 +7129,7 @@ SYSCALL_DEFINE5(perf_event_open,
> >  		 * task or CPU context:
> >  		 */
> >  		if (move_group) {
> > -			if (group_leader->ctx->type != ctx->type)
> > +			if (group_leader->ctx->task != ctx->task)
> >  				goto err_context;
> >  		} else {
> >  			if (group_leader->ctx != ctx)
> 
> That's not an equivalent statement. ctx->task (t1) != ctx->task (t2)
> while they're still both of the same type.

True. However, that case seems fishy -- if group semantics are respected
a group with events in different tasks should never have any of its
events scheduled and enabled.

For all but the sw leader, hw follower case we reject groups spanning
multiple tasks, and the fact we allow that particular case appears to be
a bug (more on that below).

> Now I don't think you'll ever end up with different tasks in this case
> so it might still work out; but you don't mention this and I'd have to
> like think to make sure.

The short answer is we can, but that itself appears to be a bug. My
patch is not sufficient as it doesn't protect accesses to ctx->task.
Unfortunately even with appropriate locking, we'll race with the context
switch optimisation and may reject sane requests from userspace (as we
currently do in other cases).

Commit b04243ef7006 (perf: Complete software pmu grouping) allowed pure
software groups to be moved into a hardware context by comparing
ctx->type rather than ctx. This dropped the implicit test that ctx->task
is the same across the two events. Thus if a task-following hardware
event is created with a software group leader following another task,
said software event will be pulled across tasks into the hardware
context of the new event's task. This is not the case for any other
hw/sw leader/follower combination as ctx rather than ctx->type is still
compared. Where the ctx comparison fails we return -EINVAL (which we
used to for the sw leader, hw follower case also).

I'll rework my original patch to try to make that behaviour consistent.
However there's another race I'm not sure how to deal with.

Since commit 5a3126d4fe7c (perf: Fix the perf context switch
optimization) creating an event, forking, then creating a follower of
the original event will only succeed if the context switch optimisation
never came into play between the parent and a child, or came into play
multiple times and returned the original context to the parent. This
affects all cases other than the sw leader, hw follower case, where it
is masked by the behaviour above.

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] perf: kill perf_event_context::pmu
  2014-02-25 11:31       ` Peter Zijlstra
@ 2014-02-27 11:48         ` Mark Rutland
  2014-02-27 11:51           ` Peter Zijlstra
  0 siblings, 1 reply; 26+ messages in thread
From: Mark Rutland @ 2014-02-27 11:48 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Will Deacon, Dave P Martin, Ingo Molnar

On Tue, Feb 25, 2014 at 11:31:00AM +0000, Peter Zijlstra wrote:
> On Tue, Feb 11, 2014 at 05:56:51PM +0000, Mark Rutland wrote:
> > Another option would be to have a context per-pmu. Each context's pmu
> > pointer would be valid, and (other than the case of software events) it
> > doesn't make sense to place events from disparate PMUs into the same
> > group anyway. Then you don't need a fixed sized pmu list in the context
> > or some arcane list structs.
> 
> No it does make sense; for example on hardware that doesn't have a PMI
> you can create a software event + hardware event group and have the
> software interrupt read the hardware counter and still get 'some'
> sampling.

Sure, I called out software events as an exception above.

Does it ever make sense to group two hardware events for disparate
hardware PMUs?

Cheers,
Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] perf: kill perf_event_context::pmu
  2014-02-27 11:48         ` Mark Rutland
@ 2014-02-27 11:51           ` Peter Zijlstra
  2014-02-27 12:30             ` Mark Rutland
  0 siblings, 1 reply; 26+ messages in thread
From: Peter Zijlstra @ 2014-02-27 11:51 UTC (permalink / raw)
  To: Mark Rutland; +Cc: linux-kernel, Will Deacon, Dave P Martin, Ingo Molnar

On Thu, Feb 27, 2014 at 11:48:05AM +0000, Mark Rutland wrote:
> On Tue, Feb 25, 2014 at 11:31:00AM +0000, Peter Zijlstra wrote:
> > On Tue, Feb 11, 2014 at 05:56:51PM +0000, Mark Rutland wrote:
> > > Another option would be to have a context per-pmu. Each context's pmu
> > > pointer would be valid, and (other than the case of software events) it
> > > doesn't make sense to place events from disparate PMUs into the same
> > > group anyway. Then you don't need a fixed sized pmu list in the context
> > > or some arcane list structs.
> > 
> > No it does make sense; for example on hardware that doesn't have a PMI
> > you can create a software event + hardware event group and have the
> > software interrupt read the hardware counter and still get 'some'
> > sampling.
> 
> Sure, I called out software events as an exception above.

Oh sorry missed that.

> Does it ever make sense to group two hardware events for disparate
> hardware PMUs?

No, and I think we disallow that. We only explicitly allow software
events/groups to move to !software context.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 4/7] perf: be less pessimistic when scheduling events
  2014-02-25 11:29       ` Peter Zijlstra
@ 2014-02-27 12:07         ` Mark Rutland
  0 siblings, 0 replies; 26+ messages in thread
From: Mark Rutland @ 2014-02-27 12:07 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Will Deacon, Dave P Martin, Ingo Molnar

On Tue, Feb 25, 2014 at 11:29:36AM +0000, Peter Zijlstra wrote:
> On Tue, Feb 11, 2014 at 05:48:05PM +0000, Mark Rutland wrote:
> > On Mon, Feb 10, 2014 at 05:58:54PM +0000, Peter Zijlstra wrote:
> > > On Mon, Feb 10, 2014 at 05:44:21PM +0000, Mark Rutland wrote:
> > > > This patch makes ctx_flexible_sched_in attempt to schedule every group
> > > > in the flexible_groups list even when earlier groups failed to schedule,
> > > > enabling more groups to be scheduled simultaneously.
> > > 
> > > Since you're basically free to create as many events as you want, you can
> > > (as an unprivilidged user) make the context switch to and from your task
> > > arbitrarily expensive.
> > 
> > Not that it makes the approach any better, but surely this is already
> > the case for software events? 
> 
> Yeah, software events are a problem; I realized this the moment I send
> that email. However, software events do not have the O(n!) programming
> fail hw events have.
> 
> AMD Fam15h has a O(n^4) termination of the O(n*n!) algorithm, the rest
> of x86 has a normal O(n^2) bound.

Ok. I hadn't realised how expensive the event constraint solving was for
other architectures. For us it's a simple bitmap test at present.

> > I can create as many as I want, and they'll always try to schedule.
> > 
> > I'll take a look into other ways to stop disparate PMUs' events from
> > adversely affecting each other. I guess it might not be possible to
> > solve the general problem without allowing said expensive context
> > switch.
> 
> So the problem you're facing is that you'll attach events for each PMU
> available on the system to a particular task? And simply want to
> continue scheduling even though a particular event will never fit on the
> CPUs PMU its currently running on?

Basically, though s/fit on/be sane for/.

> So the trivial work-around would be to create an event per cpu and set
> event->cpu. That way you'll get the event_filter_match() exception in
> ctx_flexible_sched_in() and we'll skip over all events that cannot be
> programmed on this CPU.

While that's trivial for the core, it's painful elsewhere. Either
userspace has to figure out that it's on a heterogeneous system and
create the additional events, or when initialising events we need to
create the set of cpu-specific events transparently, avoiding recursion
in the initialisation code.

> The other approach is adding a cpumask to events, which would
> drastically cut down on the amount of events you'd need to create; then
> again, ARM doesn't have a silly amount of CPUs x86 has just quite yet.

This sounds good, though with a slight alteration becomes better: with a
pmu-specific filter function we could reuse the cpumask we use in the
arch backend. No changes to events, just an additional (optional)
callback on struct pmu.

Thanks for the suggestions.

Mark.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH 7/7] perf: kill perf_event_context::pmu
  2014-02-27 11:51           ` Peter Zijlstra
@ 2014-02-27 12:30             ` Mark Rutland
  0 siblings, 0 replies; 26+ messages in thread
From: Mark Rutland @ 2014-02-27 12:30 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, Will Deacon, Dave P Martin, Ingo Molnar

On Thu, Feb 27, 2014 at 11:51:43AM +0000, Peter Zijlstra wrote:
> On Thu, Feb 27, 2014 at 11:48:05AM +0000, Mark Rutland wrote:
> > On Tue, Feb 25, 2014 at 11:31:00AM +0000, Peter Zijlstra wrote:
> > > On Tue, Feb 11, 2014 at 05:56:51PM +0000, Mark Rutland wrote:
> > > > Another option would be to have a context per-pmu. Each context's pmu
> > > > pointer would be valid, and (other than the case of software events) it
> > > > doesn't make sense to place events from disparate PMUs into the same
> > > > group anyway. Then you don't need a fixed sized pmu list in the context
> > > > or some arcane list structs.
> > > 
> > > No it does make sense; for example on hardware that doesn't have a PMI
> > > you can create a software event + hardware event group and have the
> > > software interrupt read the hardware counter and still get 'some'
> > > sampling.
> > 
> > Sure, I called out software events as an exception above.
> 
> Oh sorry missed that.
> 
> > Does it ever make sense to group two hardware events for disparate
> > hardware PMUs?
> 
> No, and I think we disallow that. We only explicitly allow software
> events/groups to move to !software context.

As you say, other than the sw leader, hw follower case we currently
check that event->ctx is equivalent. This works for homogeneous PMUs, as
we only ever have one PMU in any task's
perf_event_ctxp[perf_hw_context].

In the heterogeneous PMU case multiple CPU PMUs can share the
perf_hw_context in a task, so the check for context equivalence is not
sufficient.

I can hide the check in the architecture backend, or we could try to
make the core code be more picky with something like the below.

---->8----
diff --git a/kernel/events/core.c b/kernel/events/core.c
index 56003c6..a4b6f80 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -7135,6 +7135,8 @@ SYSCALL_DEFINE5(perf_event_open,
                        if (group_leader->ctx->type != ctx->type)
                                goto err_context;
                } else {
+                       if (group_leader->pmu != pmu)
+                               goto err_context;
                        if (group_leader->ctx != ctx)
                                goto err_context;
                }
---->8----

Cheers,
Mark.

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [tip:perf/core] perf: Fix prototype of find_pmu_context()
  2014-02-10 17:44 ` [PATCH 1/7] perf: fix prototype of find_pmu_context Mark Rutland
@ 2014-02-27 13:33   ` tip-bot for Mark Rutland
  0 siblings, 0 replies; 26+ messages in thread
From: tip-bot for Mark Rutland @ 2014-02-27 13:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, will.deacon, peterz, mark.rutland,
	Dave.Martin, tglx

Commit-ID:  9e3170411ed171a126f4dca1672012a33efe59e5
Gitweb:     http://git.kernel.org/tip/9e3170411ed171a126f4dca1672012a33efe59e5
Author:     Mark Rutland <mark.rutland@arm.com>
AuthorDate: Mon, 10 Feb 2014 17:44:18 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 27 Feb 2014 12:43:21 +0100

perf: Fix prototype of find_pmu_context()

For some reason find_pmu_context() is defined as returning void * rather
than a __percpu struct perf_cpu_context *. As all the requisite types are
defined in advance there's no reason to keep it that way.

This patch modifies the prototype of pmu_find_context to return a
__percpu struct perf_cpu_context *.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Reviewed-by: Dave Martin <Dave.Martin@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Link: http://lkml.kernel.org/r/1392054264-23570-2-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index fa99006..4251598 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -6313,7 +6313,7 @@ static int perf_event_idx_default(struct perf_event *event)
  * Ensures all contexts with the same task_ctx_nr have the same
  * pmu_cpu_context too.
  */
-static void *find_pmu_context(int ctxn)
+static struct perf_cpu_context __percpu *find_pmu_context(int ctxn)
 {
 	struct pmu *pmu;
 

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [tip:perf/core] perf: Remove redundant PMU assignment
  2014-02-10 17:44 ` [PATCH 2/7] perf: remove redundant pmu assignment Mark Rutland
@ 2014-02-27 13:33   ` tip-bot for Mark Rutland
  0 siblings, 0 replies; 26+ messages in thread
From: tip-bot for Mark Rutland @ 2014-02-27 13:33 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, hpa, mingo, will.deacon, peterz,
	mark.rutland, tglx

Commit-ID:  fdded676c3ef680bf1abc415d307d7e69a6768d1
Gitweb:     http://git.kernel.org/tip/fdded676c3ef680bf1abc415d307d7e69a6768d1
Author:     Mark Rutland <mark.rutland@arm.com>
AuthorDate: Mon, 10 Feb 2014 17:44:19 +0000
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 27 Feb 2014 12:43:24 +0100

perf: Remove redundant PMU assignment

Currently perf_branch_stack_sched_in iterates over the set of pmus,
checks that each pmu has a flush_branch_stack callback, then overwrites
the pmu before calling the callback. This is either redundant or broken.

In systems with a single hw pmu, pmu == cpuctx->ctx.pmu, and thus the
assignment is redundant.

In systems with multiple hw pmus (i.e. multiple pmus with task_ctx_nr ==
perf_hw_context) the pmus share the same perf_cpu_context. Thus the
assignment can cause one of the pmus to flush its branch stack
repeatedly rather than causing each of the pmus to flush their branch
stacks. Worse still, if only some pmus have the callback the assignment
can result in a branch to NULL.

This patch removes the redundant assignment.

Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1392054264-23570-3-git-send-email-mark.rutland@arm.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 4251598..823a53d 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2582,8 +2582,6 @@ static void perf_branch_stack_sched_in(struct task_struct *prev,
 		if (cpuctx->ctx.nr_branch_stack > 0
 		    && pmu->flush_branch_stack) {
 
-			pmu = cpuctx->ctx.pmu;
-
 			perf_ctx_lock(cpuctx, cpuctx->task_ctx);
 
 			perf_pmu_disable(pmu);

^ permalink raw reply related	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2014-02-27 13:34 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-10 17:44 [PATCH 0/7] Perf core cleanups for shared perf_event_contexts Mark Rutland
2014-02-10 17:44 ` [PATCH 1/7] perf: fix prototype of find_pmu_context Mark Rutland
2014-02-27 13:33   ` [tip:perf/core] perf: Fix prototype of find_pmu_context() tip-bot for Mark Rutland
2014-02-10 17:44 ` [PATCH 2/7] perf: remove redundant pmu assignment Mark Rutland
2014-02-27 13:33   ` [tip:perf/core] perf: Remove redundant PMU assignment tip-bot for Mark Rutland
2014-02-10 17:44 ` [PATCH 3/7] perf: kill perf_event_context_type Mark Rutland
2014-02-25 11:38   ` Peter Zijlstra
2014-02-27 11:46     ` Mark Rutland
2014-02-10 17:44 ` [PATCH 4/7] perf: be less pessimistic when scheduling events Mark Rutland
2014-02-10 17:58   ` Peter Zijlstra
2014-02-11 17:48     ` Mark Rutland
2014-02-25 11:29       ` Peter Zijlstra
2014-02-27 12:07         ` Mark Rutland
2014-02-10 17:44 ` [PATCH 5/7] perf: kill pmu::hrtimer_interval_ms Mark Rutland
2014-02-10 17:44 ` [PATCH 6/7] perf: Centralise context pmu disabling Mark Rutland
2014-02-10 18:08   ` Peter Zijlstra
2014-02-10 17:44 ` [PATCH 7/7] perf: kill perf_event_context::pmu Mark Rutland
2014-02-10 18:10   ` Peter Zijlstra
2014-02-11 17:56     ` Mark Rutland
2014-02-12 15:01       ` Dave Martin
2014-02-25 11:31       ` Peter Zijlstra
2014-02-27 11:48         ` Mark Rutland
2014-02-27 11:51           ` Peter Zijlstra
2014-02-27 12:30             ` Mark Rutland
2014-02-19 13:43 ` [PATCH 0/7] Perf core cleanups for shared perf_event_contexts Mark Rutland
2014-02-25 11:39   ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.