All of lore.kernel.org
 help / color / mirror / Atom feed
* [patch] perf wrong enabled time after attach/enable/enable
@ 2017-07-26  1:47 Vince Weaver
  2017-08-02 17:10 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Vince Weaver @ 2017-07-26  1:47 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian

Hello

It took a long time to track down this bug.  This causes a failure in the 
PAPI regression tests.

The problem happens with the following:
	fork a child, put it to sleep with SIGSTOP
	perf_event_open(), attaching to the pid of the child
	ENABLE/DISABLE/ENABLE/DISABLE/ENABLE
	read the results with the enabled/running multiplexing fields

Your "enabled" time will be a huge negative number, which will confuse any
code depening on it (especially the rdpmc scaling code).

>From what I can tell, when you DISABLE an event that's in a 
PERF_EVENT_STATE_INACTIVE state, currently it exits early without
updating event->tstamp_stopped.  So later when calculating the
enabled time it is wrong and you get the negative number.

The following quick hack fixes things, it just abuses the existing code
that was there to handle a similar case when a filter failed.  The proper
fix might be to just put the event->tstamp_stopped update before the
early out code, but the commit log for the change that introduced that
makes it sound a bit scary.

Anyway a reproducible test case for this is in my pref_event_tests tree,
the "tests/rdpmc/rdpmc_attach_multi_enable" test.

Vince

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 426c2ffba16d..18eadc3b37d5 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -1812,8 +1812,7 @@ event_sched_out(struct perf_event *event,
 	 * maintained, otherwise bogus information is return
 	 * via read() for time_enabled, time_running:
 	 */
-	if (event->state == PERF_EVENT_STATE_INACTIVE &&
-	    !event_filter_match(event)) {
+	if (event->state == PERF_EVENT_STATE_INACTIVE) {
 		delta = tstamp - event->tstamp_stopped;
 		event->tstamp_running += delta;
 		event->tstamp_stopped = tstamp;

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [patch] perf wrong enabled time after attach/enable/enable
  2017-07-26  1:47 [patch] perf wrong enabled time after attach/enable/enable Vince Weaver
@ 2017-08-02 17:10 ` Peter Zijlstra
  2017-08-03 18:07   ` Vince Weaver
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2017-08-02 17:10 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian

On Tue, Jul 25, 2017 at 09:47:52PM -0400, Vince Weaver wrote:

> The problem happens with the following:
> 	fork a child, put it to sleep with SIGSTOP
> 	perf_event_open(), attaching to the pid of the child
> 	ENABLE/DISABLE/ENABLE/DISABLE/ENABLE
> 	read the results with the enabled/running multiplexing fields
> 
> Your "enabled" time will be a huge negative number, which will confuse any
> code depening on it (especially the rdpmc scaling code).
> 
> From what I can tell, when you DISABLE an event that's in a 
> PERF_EVENT_STATE_INACTIVE state, currently it exits early without
> updating event->tstamp_stopped.  So later when calculating the
> enabled time it is wrong and you get the negative number.
> The following quick hack fixes things, it just abuses the existing code
> that was there to handle a similar case when a filter failed.  The proper
> fix might be to just put the event->tstamp_stopped update before the
> early out code, but the commit log for the change that introduced that
> makes it sound a bit scary.

fa66f07aa1f0 ("perf_events: Fix time tracking for events with pid != -1 and cpu != -1")

is the commit that introduced it (the scary one just moved the && to the
right place).

> Anyway a reproducible test case for this is in my pref_event_tests tree,
> the "tests/rdpmc/rdpmc_attach_multi_enable" test.
> 
> Vince
> 
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index 426c2ffba16d..18eadc3b37d5 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -1812,8 +1812,7 @@ event_sched_out(struct perf_event *event,
>  	 * maintained, otherwise bogus information is return
>  	 * via read() for time_enabled, time_running:
>  	 */
> -	if (event->state == PERF_EVENT_STATE_INACTIVE &&
> -	    !event_filter_match(event)) {
> +	if (event->state == PERF_EVENT_STATE_INACTIVE) {
>  		delta = tstamp - event->tstamp_stopped;
>  		event->tstamp_running += delta;
>  		event->tstamp_stopped = tstamp;

It doesn't look entirely crazy.. just trying to understand why Stephane
added that filter clause.

Argh, that time accounting always hurts my brain, its so weird. I never
dared rewrite it for risk of breaking it all again, but maybe, just
maybe we should have.. :/


init(now):
 enabled = stopped = running = now;

in(now):
 running += now - stopped;

out(now):
 stopped = now;

read:
 *enabled = stopped - enabled;
 *running = stopped - running;


Simple example:

init(10):
 e = s = r = 10

in(11):
 r += 11 - 10; r=11

out(12):
 s = 12

read:
 *e = 12-10 = 2
 *r = 12-11 = 1

in(14):
 r += 14 - 12; r=13

out(16):
 s = 16

read:
 *e = 16 - 10 = 6
 *r = 16 - 13 = 3


which confusingly works :-)

So lets break that by out not updating stopped, then we get:


init(10):
 e = s = r = 10

in(11):
 r += 11 - 10; r=11

read:
 *e = 10-10 = 2
 *r = 10-11 = -1

in(14):
 r += 14 - 10; r=15

read:
 *e = 10 - 10 = 0
 *r = 10 - 15 = -5


That is negative for 'running', not enabled. But your test really does
report a negative 'enabled'.

Playing with that test it really is the IOC_DISABLE while STOP'ed that
messes things up.

Ah.. magic.. the below seems to fix things, hopefully it doesn't break
anything else.

---
 kernel/events/core.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 426c2ffba16d..5dbbd1f90b4f 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2458,6 +2458,20 @@ perf_install_in_context(struct perf_event_context *ctx,
 }
 
 /*
+ * reverse update_event_times()
+ */
+void __perf_event_enable_time(struct perf_event *event, u64 tstamp)
+{
+	/*
+	 * total_time_enabled = tstamp_stopped - tstamp_enabled
+	 * total_time_running = tstamp_stopped - tstamp_running;
+	 */
+	event->tstamp_stopped = tstamp;
+	event->tstamp_enabled = tstamp - event->total_time_enabled;
+	event->tstamp_running = tstamp - event->total_time_running;
+}
+
+/*
  * Put a event into inactive state and update time fields.
  * Enabling the leader of a group effectively enables all
  * the group members that aren't explicitly disabled, so we
@@ -2471,10 +2485,11 @@ static void __perf_event_mark_enabled(struct perf_event *event)
 	u64 tstamp = perf_event_time(event);
 
 	event->state = PERF_EVENT_STATE_INACTIVE;
-	event->tstamp_enabled = tstamp - event->total_time_enabled;
+	__perf_event_enable_time(event, tstamp);
+
 	list_for_each_entry(sub, &event->sibling_list, group_entry) {
 		if (sub->state >= PERF_EVENT_STATE_INACTIVE)
-			sub->tstamp_enabled = tstamp - sub->total_time_enabled;
+			__perf_event_enable_time(sub, tstamp);
 	}
 }
 

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [patch] perf wrong enabled time after attach/enable/enable
  2017-08-02 17:10 ` Peter Zijlstra
@ 2017-08-03 18:07   ` Vince Weaver
  2017-08-04 11:22     ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Vince Weaver @ 2017-08-03 18:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian

On Wed, 2 Aug 2017, Peter Zijlstra wrote:

> Playing with that test it really is the IOC_DISABLE while STOP'ed that
> messes things up.
> 
> Ah.. magic.. the below seems to fix things, hopefully it doesn't break
> anything else.

yes, I've tested this and it seems to fix things.

With both this and the other patch applied PAPI can finally pass all of 
its validation tests when running with RDPMC support enabled.

Thanks,

Vince

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] perf wrong enabled time after attach/enable/enable
  2017-08-03 18:07   ` Vince Weaver
@ 2017-08-04 11:22     ` Peter Zijlstra
  2017-08-04 13:35       ` Vince Weaver
  2017-08-09 16:30       ` Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Peter Zijlstra @ 2017-08-04 11:22 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian

On Thu, Aug 03, 2017 at 02:07:53PM -0400, Vince Weaver wrote:
> On Wed, 2 Aug 2017, Peter Zijlstra wrote:
> 
> > Playing with that test it really is the IOC_DISABLE while STOP'ed that
> > messes things up.
> > 
> > Ah.. magic.. the below seems to fix things, hopefully it doesn't break
> > anything else.
> 
> yes, I've tested this and it seems to fix things.

New version below.. is a bit different and should handle another case.
I'll try and run more tests on it, because while it _SHOULD_ not trigger
that WARN_ON_ONCE() you know how that goes..

> With both this and the other patch applied PAPI can finally pass all of 
> its validation tests when running with RDPMC support enabled.

I get:

  rdpmc# (for i in *.c ; do ./${i%.c}; done) | grep "^Testing"
  Testing if rdpmc attach works...                           PASSED
  Testing if minimized rdpmc papi-multi-attach works...      PASSED
  Testing if we can rdpmc in execed process...               PASSED
  Testing if we can rdpmc in execed process (PAPI) ...       PASSED
  Testing if rdpmc works with event groups...                PASSED
  -bash: ./rdpmc_lib: No such file or directory
  Testing if rdpmc multi-attach works...                     PASSED
  Testing if rdpmc papi-multi-attach works...                PASSED
  Testing if userspace rdpmc multiplexing works...           PASSED
  Testing if rdpmc with pthreads works...                    PASSED
  Testing if rdpmc with pthreads works...                    PASSED
  Testing if resetting while using rdpmc works...            PASSED
  Testing if userspace rdpmc reads are supported...          NEW BEHAVIOR
  Testing if rdpmc fallback works on sw events...            PASSED
  Testing if userspace rdpmc reads give expected results...  PASSED

is that 'NEW BEHAVIOR' thing something I should worry about?

Full output for that test is:

  rdpmc# ./rdpmc_support
  This test checks if userspace rdpmc() style reads work.

  total start/read/stop latency: 52668 cycles
	  Event 0 -- count: 63 running: 0
  Testing if userspace rdpmc reads are supported...          NEW BEHAVIOR

Unpatched kernel also does that.

---
Subject: perf: Fix time on IOC_ENABLE
From: Peter Zijlstra <peterz@infradead.org>
Date: Thu Aug 3 15:42:09 CEST 2017

Vince reported that when we do IOC_ENABLE/IOC_DISABLE while the task
is SIGSTOP'ed state the timestamps go wobbly.

It turns out we indeed fail to correctly account time while in 'OFF'
state and doing IOC_ENABLE without getting scheduled in exposes the
problem.

Further thinking about this problem, it occurred to me that we can
suffer a similar fate when we migrate an uncore event between CPUs.
The perf_event_install() on the 'new' CPU will do add_event_to_ctx()
which will reset all the time stamp, resulting in a subsequent
update_event_times() to overwrite the total_time_* fields with smaller
values.

Reported-by: Vince Weaver <vincent.weaver@maine.edu>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
---
 kernel/events/core.c |   36 +++++++++++++++++++++++++++++++-----
 1 file changed, 31 insertions(+), 5 deletions(-)

--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -2217,6 +2217,33 @@ static int group_can_go_on(struct perf_e
 	return can_add_hw;
 }
 
+/*
+ * Complement to update_event_times(). This computes the tstamp_* values to
+ * continue 'enabled' state from @now. And effectively discards the time
+ * between the prior tstamp_stopped and now (as we were in the OFF state, or
+ * just switched (context) time base).
+ *
+ * This further assumes '@event->state == INACTIVE' (we just came from OFF) and
+ * cannot have been scheduled in yet. And going into INACTIVE state means
+ * '@event->tstamp_stopped = @now'.
+ *
+ * Thus given the rules of update_event_times():
+ *
+ *   total_time_enabled = tstamp_stopped - tstamp_enabled
+ *   total_time_running = tstamp_stopped - tstamp_running
+ *
+ * We can insert 'tstamp_stopped == now' and reverse them to compute new
+ * tstamp_* values.
+ */
+static void __perf_event_enable_time(struct perf_event *event, u64 now)
+{
+	WARN_ON_ONCE(event->state != PERF_EVENT_STATE_INACTIVE);
+
+	event->tstamp_stopped = now;
+	event->tstamp_enabled = now - event->total_time_enabled;
+	event->tstamp_running = now - event->total_time_running;
+}
+
 static void add_event_to_ctx(struct perf_event *event,
 			       struct perf_event_context *ctx)
 {
@@ -2224,9 +2251,7 @@ static void add_event_to_ctx(struct perf
 
 	list_add_event(event, ctx);
 	perf_group_attach(event);
-	event->tstamp_enabled = tstamp;
-	event->tstamp_running = tstamp;
-	event->tstamp_stopped = tstamp;
+	__perf_event_enable_time(event, tstamp);
 }
 
 static void ctx_sched_out(struct perf_event_context *ctx,
@@ -2471,10 +2496,11 @@ static void __perf_event_mark_enabled(st
 	u64 tstamp = perf_event_time(event);
 
 	event->state = PERF_EVENT_STATE_INACTIVE;
-	event->tstamp_enabled = tstamp - event->total_time_enabled;
+	__perf_event_enable_time(event, tstamp);
 	list_for_each_entry(sub, &event->sibling_list, group_entry) {
+		/* XXX should not be > INACTIVE if event isn't */
 		if (sub->state >= PERF_EVENT_STATE_INACTIVE)
-			sub->tstamp_enabled = tstamp - sub->total_time_enabled;
+			__perf_event_enable_time(sub, tstamp);
 	}
 }
 

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] perf wrong enabled time after attach/enable/enable
  2017-08-04 11:22     ` Peter Zijlstra
@ 2017-08-04 13:35       ` Vince Weaver
  2017-08-09 16:30       ` Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: Vince Weaver @ 2017-08-04 13:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian

On Fri, 4 Aug 2017, Peter Zijlstra wrote:

>   Testing if userspace rdpmc reads are supported...          NEW BEHAVIOR
>   Testing if rdpmc fallback works on sw events...            PASSED
>   Testing if userspace rdpmc reads give expected results...  PASSED
> 
> is that 'NEW BEHAVIOR' thing something I should worry about?

no, sorry, I've been working on cleaning up the outputs of the tests to be 
less confusing.

NEW BEHAVIOR here means it has the "new" (since Linux 3.12) cap_usr_rdpmc 
bit layout.

Vince

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [patch] perf wrong enabled time after attach/enable/enable
  2017-08-04 11:22     ` Peter Zijlstra
  2017-08-04 13:35       ` Vince Weaver
@ 2017-08-09 16:30       ` Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: Peter Zijlstra @ 2017-08-09 16:30 UTC (permalink / raw)
  To: Vince Weaver
  Cc: linux-kernel, Ingo Molnar, Arnaldo Carvalho de Melo, Stephane Eranian

On Fri, Aug 04, 2017 at 01:22:53PM +0200, Peter Zijlstra wrote:
> On Thu, Aug 03, 2017 at 02:07:53PM -0400, Vince Weaver wrote:
> > On Wed, 2 Aug 2017, Peter Zijlstra wrote:
> > 
> > > Playing with that test it really is the IOC_DISABLE while STOP'ed that
> > > messes things up.
> > > 
> > > Ah.. magic.. the below seems to fix things, hopefully it doesn't break
> > > anything else.
> > 
> > yes, I've tested this and it seems to fix things.
> 
> New version below.. is a bit different and should handle another case.
> I'll try and run more tests on it, because while it _SHOULD_ not trigger
> that WARN_ON_ONCE() you know how that goes..

And splat it goes.. turns out we can attach to a ctx when OFF (which
makes sense, they need to live someplace too).

In that case the IOC_ENABLE will again go through
__perf_event_enable_time() and recompute things.

I think I'll just remove the WARN_ON_ONCE(), and put in a comment.

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2017-08-09 16:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-26  1:47 [patch] perf wrong enabled time after attach/enable/enable Vince Weaver
2017-08-02 17:10 ` Peter Zijlstra
2017-08-03 18:07   ` Vince Weaver
2017-08-04 11:22     ` Peter Zijlstra
2017-08-04 13:35       ` Vince Weaver
2017-08-09 16:30       ` Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.