All of lore.kernel.org
 help / color / mirror / Atom feed
* PERF_EVENT_IOC_SET_OUTPUT
@ 2013-10-01 19:11 Adrian Hunter
  2013-10-02 10:03 ` PERF_EVENT_IOC_SET_OUTPUT Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Adrian Hunter @ 2013-10-01 19:11 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Frederic Weisbecker, linux-kernel, Kleen, Andi,
	Shishkin, Alexander

Hi

It does not seem possible to use set-output between
task contexts of different types (e.g. a software event
to a hardware event)

If you look at perf_event_set_output():

           /*
            * If its not a per-cpu rb, it must be the same task.
            */
           if (output_event->cpu == -1 && output_event->ctx != event->ctx)
                   goto out;

ctx (perf_event_context) won't be the same for events
of different types.  Is this restriction necessary?

Regards
Adrian

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PERF_EVENT_IOC_SET_OUTPUT
  2013-10-01 19:11 PERF_EVENT_IOC_SET_OUTPUT Adrian Hunter
@ 2013-10-02 10:03 ` Peter Zijlstra
  2013-10-02 10:29   ` PERF_EVENT_IOC_SET_OUTPUT Frederic Weisbecker
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2013-10-02 10:03 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Ingo Molnar, Frederic Weisbecker, linux-kernel, Kleen, Andi,
	Shishkin, Alexander

On Tue, Oct 01, 2013 at 10:11:56PM +0300, Adrian Hunter wrote:
> Hi
> 
> It does not seem possible to use set-output between
> task contexts of different types (e.g. a software event
> to a hardware event)
> 
> If you look at perf_event_set_output():
> 
>           /*
>            * If its not a per-cpu rb, it must be the same task.
>            */
>           if (output_event->cpu == -1 && output_event->ctx != event->ctx)
>                   goto out;
> 
> ctx (perf_event_context) won't be the same for events
> of different types.  Is this restriction necessary?

Hmm.. so last night I wrote me a big reply saying we couldn't do it;
then this morning I reconsidered and thing that something like:

  output_event->ctx->task != event->ctx->task

should actually work.

The reason it should be OK I think is because perf_mmap() will refuse to
create a buffer for inherited events that have ->cpu == -1.

My initial response was going to say that it wouldn't be possible
because __perf_event_task_sched_out() could 'break' one ctx while still
swapping the other, at which point the buffer would have to service two
different tasks, potentially from different CPUs and with the buffers
not actually being SMP safe that's a problem.

But like stated, perf_mmap() seems to avoid that issue entirely by not
allowing inherited events that aren't cpu bound.

Someone please double check this..  :-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PERF_EVENT_IOC_SET_OUTPUT
  2013-10-02 10:03 ` PERF_EVENT_IOC_SET_OUTPUT Peter Zijlstra
@ 2013-10-02 10:29   ` Frederic Weisbecker
  2013-10-02 11:27     ` PERF_EVENT_IOC_SET_OUTPUT Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Frederic Weisbecker @ 2013-10-02 10:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Adrian Hunter, Ingo Molnar, linux-kernel, Kleen, Andi, Shishkin,
	Alexander

On Wed, Oct 02, 2013 at 12:03:50PM +0200, Peter Zijlstra wrote:
> On Tue, Oct 01, 2013 at 10:11:56PM +0300, Adrian Hunter wrote:
> > Hi
> > 
> > It does not seem possible to use set-output between
> > task contexts of different types (e.g. a software event
> > to a hardware event)
> > 
> > If you look at perf_event_set_output():
> > 
> >           /*
> >            * If its not a per-cpu rb, it must be the same task.
> >            */
> >           if (output_event->cpu == -1 && output_event->ctx != event->ctx)
> >                   goto out;
> > 
> > ctx (perf_event_context) won't be the same for events
> > of different types.  Is this restriction necessary?
> 
> Hmm.. so last night I wrote me a big reply saying we couldn't do it;
> then this morning I reconsidered and thing that something like:
> 
>   output_event->ctx->task != event->ctx->task
> 
> should actually work.
> 
> The reason it should be OK I think is because perf_mmap() will refuse to
> create a buffer for inherited events that have ->cpu == -1.
> 
> My initial response was going to say that it wouldn't be possible
> because __perf_event_task_sched_out() could 'break' one ctx while still
> swapping the other, at which point the buffer would have to service two
> different tasks, potentially from different CPUs and with the buffers
> not actually being SMP safe that's a problem.

I don't get what you mean with breaking or swapping a ctx.
But I can confirm that perf_mmap() won't allow a buffer to be remotely
accessed from another CPU. Now there may be other issues than locality which
I'm missing :)

> 
> But like stated, perf_mmap() seems to avoid that issue entirely by not
> allowing inherited events that aren't cpu bound.
> 
> Someone please double check this..  :-)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PERF_EVENT_IOC_SET_OUTPUT
  2013-10-02 10:29   ` PERF_EVENT_IOC_SET_OUTPUT Frederic Weisbecker
@ 2013-10-02 11:27     ` Peter Zijlstra
  2013-10-02 11:43       ` PERF_EVENT_IOC_SET_OUTPUT Frederic Weisbecker
  2013-10-02 12:29       ` PERF_EVENT_IOC_SET_OUTPUT Ingo Molnar
  0 siblings, 2 replies; 10+ messages in thread
From: Peter Zijlstra @ 2013-10-02 11:27 UTC (permalink / raw)
  To: Frederic Weisbecker
  Cc: Adrian Hunter, Ingo Molnar, linux-kernel, Kleen, Andi, Shishkin,
	Alexander

On Wed, Oct 02, 2013 at 12:29:56PM +0200, Frederic Weisbecker wrote:
> On Wed, Oct 02, 2013 at 12:03:50PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 01, 2013 at 10:11:56PM +0300, Adrian Hunter wrote:
> > > Hi
> > > 
> > > It does not seem possible to use set-output between
> > > task contexts of different types (e.g. a software event
> > > to a hardware event)
> > > 
> > > If you look at perf_event_set_output():
> > > 
> > >           /*
> > >            * If its not a per-cpu rb, it must be the same task.
> > >            */
> > >           if (output_event->cpu == -1 && output_event->ctx != event->ctx)
> > >                   goto out;
> > > 
> > > ctx (perf_event_context) won't be the same for events
> > > of different types.  Is this restriction necessary?
> > 
> > Hmm.. so last night I wrote me a big reply saying we couldn't do it;
> > then this morning I reconsidered and thing that something like:
> > 
> >   output_event->ctx->task != event->ctx->task
> > 
> > should actually work.
> > 
> > The reason it should be OK I think is because perf_mmap() will refuse to
> > create a buffer for inherited events that have ->cpu == -1.
> > 
> > My initial response was going to say that it wouldn't be possible
> > because __perf_event_task_sched_out() could 'break' one ctx while still
> > swapping the other, at which point the buffer would have to service two
> > different tasks, potentially from different CPUs and with the buffers
> > not actually being SMP safe that's a problem.
> 
> I don't get what you mean with breaking or swapping a ctx.
> But I can confirm that perf_mmap() won't allow a buffer to be remotely
> accessed from another CPU. Now there may be other issues than locality which
> I'm missing :)

The way we 'optimize' context switches between tasks with identical
contexts is to simply swap the context and leave the hardware alone.

So counters belonging to prev will then belong to next and vice versa.
This avoids having to read hardware counters, update stats, removes
counters from hardware, and re-program hardware with possible the exact
same set.

When a child context changes its context (eg, inserts or removes a
counter) we break this swapping because now the contexts don't match
anymore and we have to take the slow and painful way of prodding
hardware.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PERF_EVENT_IOC_SET_OUTPUT
  2013-10-02 11:27     ` PERF_EVENT_IOC_SET_OUTPUT Peter Zijlstra
@ 2013-10-02 11:43       ` Frederic Weisbecker
  2013-10-02 12:29       ` PERF_EVENT_IOC_SET_OUTPUT Ingo Molnar
  1 sibling, 0 replies; 10+ messages in thread
From: Frederic Weisbecker @ 2013-10-02 11:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Adrian Hunter, Ingo Molnar, linux-kernel, Kleen, Andi, Shishkin,
	Alexander

On Wed, Oct 02, 2013 at 01:27:30PM +0200, Peter Zijlstra wrote:
> On Wed, Oct 02, 2013 at 12:29:56PM +0200, Frederic Weisbecker wrote:
> > On Wed, Oct 02, 2013 at 12:03:50PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 01, 2013 at 10:11:56PM +0300, Adrian Hunter wrote:
> > > > Hi
> > > > 
> > > > It does not seem possible to use set-output between
> > > > task contexts of different types (e.g. a software event
> > > > to a hardware event)
> > > > 
> > > > If you look at perf_event_set_output():
> > > > 
> > > >           /*
> > > >            * If its not a per-cpu rb, it must be the same task.
> > > >            */
> > > >           if (output_event->cpu == -1 && output_event->ctx != event->ctx)
> > > >                   goto out;
> > > > 
> > > > ctx (perf_event_context) won't be the same for events
> > > > of different types.  Is this restriction necessary?
> > > 
> > > Hmm.. so last night I wrote me a big reply saying we couldn't do it;
> > > then this morning I reconsidered and thing that something like:
> > > 
> > >   output_event->ctx->task != event->ctx->task
> > > 
> > > should actually work.
> > > 
> > > The reason it should be OK I think is because perf_mmap() will refuse to
> > > create a buffer for inherited events that have ->cpu == -1.
> > > 
> > > My initial response was going to say that it wouldn't be possible
> > > because __perf_event_task_sched_out() could 'break' one ctx while still
> > > swapping the other, at which point the buffer would have to service two
> > > different tasks, potentially from different CPUs and with the buffers
> > > not actually being SMP safe that's a problem.
> > 
> > I don't get what you mean with breaking or swapping a ctx.
> > But I can confirm that perf_mmap() won't allow a buffer to be remotely
> > accessed from another CPU. Now there may be other issues than locality which
> > I'm missing :)
> 
> The way we 'optimize' context switches between tasks with identical
> contexts is to simply swap the context and leave the hardware alone.
> 
> So counters belonging to prev will then belong to next and vice versa.
> This avoids having to read hardware counters, update stats, removes
> counters from hardware, and re-program hardware with possible the exact
> same set.
> 
> When a child context changes its context (eg, inserts or removes a
> counter) we break this swapping because now the contexts don't match
> anymore and we have to take the slow and painful way of prodding
> hardware.

Ah right, I remember that now. This caused me quite some headaches
a few years ago :)

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PERF_EVENT_IOC_SET_OUTPUT
  2013-10-02 11:27     ` PERF_EVENT_IOC_SET_OUTPUT Peter Zijlstra
  2013-10-02 11:43       ` PERF_EVENT_IOC_SET_OUTPUT Frederic Weisbecker
@ 2013-10-02 12:29       ` Ingo Molnar
  2013-10-02 12:40         ` PERF_EVENT_IOC_SET_OUTPUT Peter Zijlstra
  1 sibling, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2013-10-02 12:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Adrian Hunter, linux-kernel, Kleen, Andi,
	Shishkin, Alexander


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Oct 02, 2013 at 12:29:56PM +0200, Frederic Weisbecker wrote:
> > On Wed, Oct 02, 2013 at 12:03:50PM +0200, Peter Zijlstra wrote:
> > > On Tue, Oct 01, 2013 at 10:11:56PM +0300, Adrian Hunter wrote:
> > > > Hi
> > > > 
> > > > It does not seem possible to use set-output between
> > > > task contexts of different types (e.g. a software event
> > > > to a hardware event)
> > > > 
> > > > If you look at perf_event_set_output():
> > > > 
> > > >           /*
> > > >            * If its not a per-cpu rb, it must be the same task.
> > > >            */
> > > >           if (output_event->cpu == -1 && output_event->ctx != event->ctx)
> > > >                   goto out;
> > > > 
> > > > ctx (perf_event_context) won't be the same for events
> > > > of different types.  Is this restriction necessary?
> > > 
> > > Hmm.. so last night I wrote me a big reply saying we couldn't do it;
> > > then this morning I reconsidered and thing that something like:
> > > 
> > >   output_event->ctx->task != event->ctx->task
> > > 
> > > should actually work.
> > > 
> > > The reason it should be OK I think is because perf_mmap() will refuse to
> > > create a buffer for inherited events that have ->cpu == -1.
> > > 
> > > My initial response was going to say that it wouldn't be possible
> > > because __perf_event_task_sched_out() could 'break' one ctx while still
> > > swapping the other, at which point the buffer would have to service two
> > > different tasks, potentially from different CPUs and with the buffers
> > > not actually being SMP safe that's a problem.
> > 
> > I don't get what you mean with breaking or swapping a ctx. But I can 
> > confirm that perf_mmap() won't allow a buffer to be remotely accessed 
> > from another CPU. Now there may be other issues than locality which 
> > I'm missing :)
> 
> The way we 'optimize' context switches between tasks with identical 
> contexts is to simply swap the context and leave the hardware alone.

Btw., this does not seem to be working very well when the perf context is 
inherited:

Baseline kernel with no perf context:

  aldebaran:~> taskset 1 perf stat --null perf bench sched pipe
  # Running sched/pipe benchmark...
  # Executed 1000000 pipe operations between two tasks

     Total time: 5.024 [sec]

       5.024951 usecs/op
         199006 ops/sec

with inherited perf contexts:

  aldebaran:~> taskset 1 perf stat perf bench sched pipe
  # Running sched/pipe benchmark...
  # Executed 1000000 pipe operations between two tasks

     Total time: 17.869 [sec]

      17.869061 usecs/op
          55962 ops/sec

+12.8 usecs of perf switching fat per context switch, on a non-debug 
kernel on a 2.8GHz CPU :-/

We should declare a hard feature stop until that is improved.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PERF_EVENT_IOC_SET_OUTPUT
  2013-10-02 12:29       ` PERF_EVENT_IOC_SET_OUTPUT Ingo Molnar
@ 2013-10-02 12:40         ` Peter Zijlstra
  2013-10-03  6:43           ` PERF_EVENT_IOC_SET_OUTPUT Ingo Molnar
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2013-10-02 12:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Adrian Hunter, linux-kernel, Kleen, Andi,
	Shishkin, Alexander

On Wed, Oct 02, 2013 at 02:29:01PM +0200, Ingo Molnar wrote:
> 
> Btw., this does not seem to be working very well when the perf context is 
> inherited:

https://lkml.org/lkml/2011/7/21/124

Never figured that one out :-(

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PERF_EVENT_IOC_SET_OUTPUT
  2013-10-02 12:40         ` PERF_EVENT_IOC_SET_OUTPUT Peter Zijlstra
@ 2013-10-03  6:43           ` Ingo Molnar
  2013-10-07 16:42             ` PERF_EVENT_IOC_SET_OUTPUT Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Ingo Molnar @ 2013-10-03  6:43 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Frederic Weisbecker, Adrian Hunter, linux-kernel, Kleen, Andi,
	Shishkin, Alexander


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, Oct 02, 2013 at 02:29:01PM +0200, Ingo Molnar wrote:
> > 
> > Btw., this does not seem to be working very well when the perf context is 
> > inherited:
> 
> https://lkml.org/lkml/2011/7/21/124
> 
> Never figured that one out :-(

Hm, maybe the various fixes over the past 2.5 years would make it work 
today?

If you find time to send an updated version that boots then I can try to 
trace it to figure out where it fails, if it still fails.

Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: PERF_EVENT_IOC_SET_OUTPUT
  2013-10-03  6:43           ` PERF_EVENT_IOC_SET_OUTPUT Ingo Molnar
@ 2013-10-07 16:42             ` Peter Zijlstra
  2013-10-29 14:08               ` [tip:perf/core] perf: Fix the perf context switch optimization tip-bot for Peter Zijlstra
  0 siblings, 1 reply; 10+ messages in thread
From: Peter Zijlstra @ 2013-10-07 16:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Frederic Weisbecker, Adrian Hunter, linux-kernel, Kleen, Andi,
	Shishkin, Alexander

On Thu, Oct 03, 2013 at 08:43:51AM +0200, Ingo Molnar wrote:
> If you find time to send an updated version that boots then I can try to 
> trace it to figure out where it fails, if it still fails.

Can you give the below a spin?

---
Subject: perf: Fix the perf context switch optimization
From: Peter Zijlstra <peterz@infradead.org>
Date: Mon Oct 7 17:12:48 CEST 2013

Currently we only optimize the context switch between two contexts that
have the same parent; this forgoes the optimization between parent and
child context, even though these contexts could be equivalent too.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
---
 kernel/events/core.c |   64 ++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 46 insertions(+), 18 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -899,6 +899,7 @@ static void unclone_ctx(struct perf_even
 		put_ctx(ctx->parent_ctx);
 		ctx->parent_ctx = NULL;
 	}
+	ctx->generation++;
 }
 
 static u32 perf_event_pid(struct perf_event *event, struct task_struct *p)
@@ -1136,6 +1137,8 @@ list_add_event(struct perf_event *event,
 	ctx->nr_events++;
 	if (event->attr.inherit_stat)
 		ctx->nr_stat++;
+
+	ctx->generation++;
 }
 
 /*
@@ -1313,6 +1316,8 @@ list_del_event(struct perf_event *event,
 	 */
 	if (event->state > PERF_EVENT_STATE_OFF)
 		event->state = PERF_EVENT_STATE_OFF;
+
+	ctx->generation++;
 }
 
 static void perf_group_detach(struct perf_event *event)
@@ -2149,22 +2154,38 @@ static void ctx_sched_out(struct perf_ev
 }
 
 /*
- * Test whether two contexts are equivalent, i.e. whether they
- * have both been cloned from the same version of the same context
- * and they both have the same number of enabled events.
- * If the number of enabled events is the same, then the set
- * of enabled events should be the same, because these are both
- * inherited contexts, therefore we can't access individual events
- * in them directly with an fd; we can only enable/disable all
- * events via prctl, or enable/disable all events in a family
- * via ioctl, which will have the same effect on both contexts.
+ * Test whether two contexts are equivalent, i.e. whether they have both been
+ * cloned from the same version of the same context.
+ *
+ * Equivalence is measured using a generation number in the context that is
+ * incremented on each modification to it; see unclone_ctx(), list_add_event()
+ * and list_del_event().
  */
 static int context_equiv(struct perf_event_context *ctx1,
 			 struct perf_event_context *ctx2)
 {
-	return ctx1->parent_ctx && ctx1->parent_ctx == ctx2->parent_ctx
-		&& ctx1->parent_gen == ctx2->parent_gen
-		&& !ctx1->pin_count && !ctx2->pin_count;
+	/* Pinning disables the swap optimization */
+	if (ctx1->pin_count || ctx2->pin_count)
+		return 0;
+
+	/* If ctx1 is the parent of ctx2 */
+	if (ctx1 == ctx2->parent_ctx && ctx1->generation == ctx2->parent_gen)
+		return 1;
+
+	/* If ctx2 is the parent of ctx1 */
+	if (ctx1->parent_ctx == ctx2 && ctx1->parent_gen == ctx2->generation)
+		return 1;
+
+	/*
+	 * If ctx1 and ctx2 have the same parent; we flatten the parent
+	 * hierarchy, see perf_event_init_context().
+	 */
+	if (ctx1->parent_ctx && ctx1->parent_ctx == ctx2->parent_ctx &&
+			ctx1->parent_gen == ctx2->parent_gen)
+		return 1;
+
+	/* Unmatched */
+	return 0;
 }
 
 static void __perf_event_sync_stat(struct perf_event *event,
@@ -2247,7 +2268,7 @@ static void perf_event_context_sched_out
 {
 	struct perf_event_context *ctx = task->perf_event_ctxp[ctxn];
 	struct perf_event_context *next_ctx;
-	struct perf_event_context *parent;
+	struct perf_event_context *parent, *next_parent;
 	struct perf_cpu_context *cpuctx;
 	int do_switch = 1;
 
@@ -2259,10 +2280,18 @@ static void perf_event_context_sched_out
 		return;
 
 	rcu_read_lock();
-	parent = rcu_dereference(ctx->parent_ctx);
 	next_ctx = next->perf_event_ctxp[ctxn];
-	if (parent && next_ctx &&
-	    rcu_dereference(next_ctx->parent_ctx) == parent) {
+	if (!next_ctx)
+		goto unlock;
+
+	parent = rcu_dereference(ctx->parent_ctx);
+	next_parent = rcu_dereference(next_ctx->parent_ctx);
+
+	/* If neither context have a parent context; they cannot be clones. */
+	if (!parent && !next_parent)
+		goto unlock;
+
+	if (next_parent == ctx || next_ctx == parent || next_parent == parent) {
 		/*
 		 * Looks like the two contexts are clones, so we might be
 		 * able to optimize the context switch.  We lock both
@@ -2290,6 +2319,7 @@ static void perf_event_context_sched_out
 		raw_spin_unlock(&next_ctx->lock);
 		raw_spin_unlock(&ctx->lock);
 	}
+unlock:
 	rcu_read_unlock();
 
 	if (do_switch) {
@@ -7128,7 +7158,6 @@ SYSCALL_DEFINE5(perf_event_open,
 	}
 
 	perf_install_in_context(ctx, event, event->cpu);
-	++ctx->generation;
 	perf_unpin_context(ctx);
 	mutex_unlock(&ctx->mutex);
 
@@ -7211,7 +7240,6 @@ perf_event_create_kernel_counter(struct
 	WARN_ON_ONCE(ctx->parent_ctx);
 	mutex_lock(&ctx->mutex);
 	perf_install_in_context(ctx, event, cpu);
-	++ctx->generation;
 	perf_unpin_context(ctx);
 	mutex_unlock(&ctx->mutex);
 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [tip:perf/core] perf: Fix the perf context switch optimization
  2013-10-07 16:42             ` PERF_EVENT_IOC_SET_OUTPUT Peter Zijlstra
@ 2013-10-29 14:08               ` tip-bot for Peter Zijlstra
  0 siblings, 0 replies; 10+ messages in thread
From: tip-bot for Peter Zijlstra @ 2013-10-29 14:08 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, hpa, mingo, alexander.shishkin, peterz, fweisbec,
	adrian.hunter, tglx

Commit-ID:  5a3126d4fe7c311fe12f98fef0470f6cb582d1ef
Gitweb:     http://git.kernel.org/tip/5a3126d4fe7c311fe12f98fef0470f6cb582d1ef
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Mon, 7 Oct 2013 17:12:48 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 29 Oct 2013 14:13:01 +0100

perf: Fix the perf context switch optimization

Currently we only optimize the context switch between two
contexts that have the same parent; this forgoes the
optimization between parent and child context, even though these
contexts could be equivalent too.

Signed-off-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Shishkin, Alexander <alexander.shishkin@intel.com>
Link: http://lkml.kernel.org/r/20131007164257.GH3081@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 64 +++++++++++++++++++++++++++++++++++++---------------
 1 file changed, 46 insertions(+), 18 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 85a8bbd..17b3c6c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -899,6 +899,7 @@ static void unclone_ctx(struct perf_event_context *ctx)
 		put_ctx(ctx->parent_ctx);
 		ctx->parent_ctx = NULL;
 	}
+	ctx->generation++;
 }
 
 static u32 perf_event_pid(struct perf_event *event, struct task_struct *p)
@@ -1136,6 +1137,8 @@ list_add_event(struct perf_event *event, struct perf_event_context *ctx)
 	ctx->nr_events++;
 	if (event->attr.inherit_stat)
 		ctx->nr_stat++;
+
+	ctx->generation++;
 }
 
 /*
@@ -1313,6 +1316,8 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
 	 */
 	if (event->state > PERF_EVENT_STATE_OFF)
 		event->state = PERF_EVENT_STATE_OFF;
+
+	ctx->generation++;
 }
 
 static void perf_group_detach(struct perf_event *event)
@@ -2149,22 +2154,38 @@ static void ctx_sched_out(struct perf_event_context *ctx,
 }
 
 /*
- * Test whether two contexts are equivalent, i.e. whether they
- * have both been cloned from the same version of the same context
- * and they both have the same number of enabled events.
- * If the number of enabled events is the same, then the set
- * of enabled events should be the same, because these are both
- * inherited contexts, therefore we can't access individual events
- * in them directly with an fd; we can only enable/disable all
- * events via prctl, or enable/disable all events in a family
- * via ioctl, which will have the same effect on both contexts.
+ * Test whether two contexts are equivalent, i.e. whether they have both been
+ * cloned from the same version of the same context.
+ *
+ * Equivalence is measured using a generation number in the context that is
+ * incremented on each modification to it; see unclone_ctx(), list_add_event()
+ * and list_del_event().
  */
 static int context_equiv(struct perf_event_context *ctx1,
 			 struct perf_event_context *ctx2)
 {
-	return ctx1->parent_ctx && ctx1->parent_ctx == ctx2->parent_ctx
-		&& ctx1->parent_gen == ctx2->parent_gen
-		&& !ctx1->pin_count && !ctx2->pin_count;
+	/* Pinning disables the swap optimization */
+	if (ctx1->pin_count || ctx2->pin_count)
+		return 0;
+
+	/* If ctx1 is the parent of ctx2 */
+	if (ctx1 == ctx2->parent_ctx && ctx1->generation == ctx2->parent_gen)
+		return 1;
+
+	/* If ctx2 is the parent of ctx1 */
+	if (ctx1->parent_ctx == ctx2 && ctx1->parent_gen == ctx2->generation)
+		return 1;
+
+	/*
+	 * If ctx1 and ctx2 have the same parent; we flatten the parent
+	 * hierarchy, see perf_event_init_context().
+	 */
+	if (ctx1->parent_ctx && ctx1->parent_ctx == ctx2->parent_ctx &&
+			ctx1->parent_gen == ctx2->parent_gen)
+		return 1;
+
+	/* Unmatched */
+	return 0;
 }
 
 static void __perf_event_sync_stat(struct perf_event *event,
@@ -2247,7 +2268,7 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
 {
 	struct perf_event_context *ctx = task->perf_event_ctxp[ctxn];
 	struct perf_event_context *next_ctx;
-	struct perf_event_context *parent;
+	struct perf_event_context *parent, *next_parent;
 	struct perf_cpu_context *cpuctx;
 	int do_switch = 1;
 
@@ -2259,10 +2280,18 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
 		return;
 
 	rcu_read_lock();
-	parent = rcu_dereference(ctx->parent_ctx);
 	next_ctx = next->perf_event_ctxp[ctxn];
-	if (parent && next_ctx &&
-	    rcu_dereference(next_ctx->parent_ctx) == parent) {
+	if (!next_ctx)
+		goto unlock;
+
+	parent = rcu_dereference(ctx->parent_ctx);
+	next_parent = rcu_dereference(next_ctx->parent_ctx);
+
+	/* If neither context have a parent context; they cannot be clones. */
+	if (!parent && !next_parent)
+		goto unlock;
+
+	if (next_parent == ctx || next_ctx == parent || next_parent == parent) {
 		/*
 		 * Looks like the two contexts are clones, so we might be
 		 * able to optimize the context switch.  We lock both
@@ -2290,6 +2319,7 @@ static void perf_event_context_sched_out(struct task_struct *task, int ctxn,
 		raw_spin_unlock(&next_ctx->lock);
 		raw_spin_unlock(&ctx->lock);
 	}
+unlock:
 	rcu_read_unlock();
 
 	if (do_switch) {
@@ -7136,7 +7166,6 @@ SYSCALL_DEFINE5(perf_event_open,
 	}
 
 	perf_install_in_context(ctx, event, event->cpu);
-	++ctx->generation;
 	perf_unpin_context(ctx);
 	mutex_unlock(&ctx->mutex);
 
@@ -7219,7 +7248,6 @@ perf_event_create_kernel_counter(struct perf_event_attr *attr, int cpu,
 	WARN_ON_ONCE(ctx->parent_ctx);
 	mutex_lock(&ctx->mutex);
 	perf_install_in_context(ctx, event, cpu);
-	++ctx->generation;
 	perf_unpin_context(ctx);
 	mutex_unlock(&ctx->mutex);
 

^ permalink raw reply related	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2013-10-29 14:09 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-10-01 19:11 PERF_EVENT_IOC_SET_OUTPUT Adrian Hunter
2013-10-02 10:03 ` PERF_EVENT_IOC_SET_OUTPUT Peter Zijlstra
2013-10-02 10:29   ` PERF_EVENT_IOC_SET_OUTPUT Frederic Weisbecker
2013-10-02 11:27     ` PERF_EVENT_IOC_SET_OUTPUT Peter Zijlstra
2013-10-02 11:43       ` PERF_EVENT_IOC_SET_OUTPUT Frederic Weisbecker
2013-10-02 12:29       ` PERF_EVENT_IOC_SET_OUTPUT Ingo Molnar
2013-10-02 12:40         ` PERF_EVENT_IOC_SET_OUTPUT Peter Zijlstra
2013-10-03  6:43           ` PERF_EVENT_IOC_SET_OUTPUT Ingo Molnar
2013-10-07 16:42             ` PERF_EVENT_IOC_SET_OUTPUT Peter Zijlstra
2013-10-29 14:08               ` [tip:perf/core] perf: Fix the perf context switch optimization tip-bot for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.