All of lore.kernel.org
 help / color / mirror / Atom feed
* [BUG] perf_events: PERF_FORMAT_GROUP not working correctly when monitoring another task
@ 2010-05-04  2:06 Corey Ashford
  2010-05-06 15:42 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Corey Ashford @ 2010-05-04  2:06 UTC (permalink / raw)
  To: LKML, Peter Zijlstra, Paul Mackerras, Frederic Weisbecker,
	Arnaldo Carvalho de Melo
  Cc: Carl Love, Maynard Johnson

[-- Attachment #1: Type: text/plain, Size: 2059 bytes --]

In the last couple of days, I've run across what appears to be a kernel bug in 2.6.33.3 (haven't tested later kernels yet) having to do with using the PERF_FORMAT_GROUP feature in combination with enable_on_exec and reading counts from a remote task.

What happens is that when we go to read the entire group up, only the first counter in can be accessed; the read() call returns too few bytes.  This problem doesn't occur if measuring the from the same task.

I have attached a "cut down", though it's not terribly small.  It is a cut down from the "task" example program in libpfm4/perf_examples.  In addition to trimming the program down a lot, I've removed the dependency on libpfm4 and made modifications so that it will compile in the tools/perf subdirectory.  If you copy the attachment into your tools/perf subdir, you should be able to compile it with just:

gcc -o show_fg_bug show_fg_bug.c

Then invoke it by passing it an executable that will give it something to chew on a little, e.g.:

./show_fg_bug md5sum `which gdb`

The test cases creates two counters and places them in the same group, and sets the PERF_FORMAT_GROUP option on the first counter.  It fork/execs the child and when the child is done executing, it reads back the counter values.

When I run it, I see this output:

% ./show_fg_bug md5sum `which gdb`
825b15d7279ef21d6c9d018d775758ae  /usr/bin/gdb
Error! tried to read 40 bytes, but got 32
            58684138 PERF_COUNT_HW_CPU_CYCLES (35469840 : 35469840)
                   0 PERF_COUNT_HW_INSTRUCTIONS (35469840 : 35469840)

Oddly enough, if you look at the "nr" (number of counters) value that gets read up as part of the group, it is two, but it can only read the first of the two counters.  Another data point is that it doesn't matter how many counters you add to the group, only the first can be read up.

Please let me know if you have any questions about this.

Thanks for your consideration,

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR 
cjashfor@us.ibm.com

[-- Attachment #2: show_fg_bug.c --]
[-- Type: text/plain, Size: 6030 bytes --]

/*
 * task_inherit.c - example of a task counting event in a tree of child processes
 *
 * Copyright (c) 2009 Google, Inc
 * Contributed by Stephane Eranian <eranian@gmail.com>
 *
 * Permission is hereby granted, free of charge, to any person obtaining a copy
 * of this software and associated documentation files (the "Software"), to deal
 * in the Software without restriction, including without limitation the rights
 * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies
 * of the Software, and to permit persons to whom the Software is furnished to do so,
 * subject to the following conditions:
 *
 * The above copyright notice and this permission notice shall be included in all
 * copies or substantial portions of the Software.
 *
 * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED,
 * INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A
 * PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT
 * HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF
 * CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE
 * OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
 *
 * This test case is a cut down of the program perf_examples/task from libpfm4.
 */
#include <sys/types.h>
#include <inttypes.h>
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <stdarg.h>

#include <sys/wait.h>
#include "perf.h"

typedef struct {
        struct perf_event_attr hw;
        uint64_t value, prev_value;
        uint64_t enabled, running;
        char *name;
        int fd;
} perf_event_desc_t;

int
child(char **arg)
{
	/*
	 * execute the requested command
	 */
	execvp(arg[0], arg);
	fprintf(stderr, "cannot exec: %s\n", arg[0]);
	exit(1);
	/* not reached */
}

static inline uint64_t
perf_scale(uint64_t *values)
{
	uint64_t res = 0;

	if (values[2])
		res = (uint64_t)((double)values[0] * values[1]/values[2]);
	return res;
}


static void
read_group(perf_event_desc_t *fds, int num)
{
	uint64_t *values;
	size_t sz;
	int i, ret;

	/*
	 * 	{ u64		nr;
	 * 	  { u64		time_enabled; } && PERF_FORMAT_ENABLED
	 * 	  { u64		time_running; } && PERF_FORMAT_RUNNING
	 * 	  { u64		value;
	 * 	    { u64	id;           } && PERF_FORMAT_ID
	 * 	  }		cntr[nr];
	 * 	} && PERF_FORMAT_GROUP
	 *
	 * we do not use FORMAT_ID in this program
	 */
	sz = sizeof(uint64_t) * (3 + num);
	values = malloc(sz);
	if (!values)
		fprintf(stderr, "cannot allocate memory for values\n");

	ret = read(fds[0].fd, values, sz);
	if (ret != sz) { /* unsigned */
		if (ret == -1)
			fprintf(stderr, "cannot read values\n");
		else	/* likely pinned and could not be loaded */
			fprintf(stderr, "Error! tried to read %d bytes, but got %d\n", (int)sz, ret);
	}

	/*
	 * propagate to save area
	 */
	for(i = 0; i < num; i++) {
		values[0] = values[3+i];
		/*
		 * scaling because we may be sharing the PMU and
		 * thus may be multiplexed
		 */
		fds[i].prev_value = fds[i].value;
		fds[i].value = perf_scale(values);
		fds[i].enabled = values[1];
		fds[i].running = values[2];
	}
	free(values);
}


static void
print_counts(perf_event_desc_t *fds, int num)
{
	int i;


	read_group(fds, num);
	for(i = 0; i < num; i++) {
		double ratio;
		uint64_t val;

		val = fds[i].value - fds[i].prev_value;

		ratio = 0.0;
		if (fds[i].enabled)
			ratio = 1.0 * fds[i].running / fds[i].enabled;

		if (ratio == 1.0)
			printf("%'20"PRIu64" %s (%'"PRIu64" : %'"PRIu64")\n", val, fds[i].name, fds[i].enabled, fds[i].running);
		else
			if (ratio == 0.0)
				printf("%'20"PRIu64" %s (did not run: incompatible events, too many events in a group, competing session)\n", val, fds[i].name);
			else
				printf("%'20"PRIu64" %s (scaled from %.2f%% of time)\n", val, fds[i].name, ratio*100.0);

	}
}

int
parent(char **arg)
{
	perf_event_desc_t *fds;
	int status, ret, i, num;
	int ready[2], go[2];
	char buf;
	pid_t pid;

	num = 2;
	fds = calloc(num, sizeof(perf_event_desc_t));

	fds[0].name = "PERF_COUNT_HW_CPU_CYCLES";
	fds[0].hw.type = PERF_TYPE_HARDWARE;
	fds[0].hw.config = PERF_COUNT_HW_CPU_CYCLES;
	fds[1].name = "PERF_COUNT_HW_INSTRUCTIONS";
	fds[1].hw.type = PERF_TYPE_HARDWARE;
	fds[1].hw.type = PERF_COUNT_HW_INSTRUCTIONS;

	ret = pipe(ready);
	if (ret)
		fprintf(stderr, "cannot create pipe ready");

	ret = pipe(go);
	if (ret)
		fprintf(stderr, "cannot create pipe go");


	/*
	 * Create the child task
	 */
	if ((pid = fork()) == -1)
		fprintf(stderr, "Cannot fork process");

	/*
	 * and launch the child code
	 *
	 * The pipe is used to avoid a race condition
	 * between for() and exec(). We need the pid
	 * of the new tak but we want to start measuring
	 * at the first user level instruction. Thus we
	 * need to prevent exec until we have attached
	 * the events.
	 */
	if (pid == 0) {
		close(ready[0]);
		close(go[1]);

		/*
		 * let the parent know we exist
		 */
		close(ready[1]);
		if (read(go[0], &buf, 1) == -1)
			fprintf(stderr, "unable to read go_pipe");


		exit(child(arg));
	}

	close(ready[1]);
	close(go[0]);

	if (read(ready[0], &buf, 1) == -1)
		fprintf(stderr, "unable to read child_ready_pipe");

	close(ready[0]);

	fds[0].fd = -1;
	for(i = 0; i < num; i++) {
		/*
		 * create leader disabled with enable_on-exec
		 */
		fds[i].hw.disabled = (i == 0);
		fds[i].hw.enable_on_exec = (i == 0);
		fds[i].hw.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
			 PERF_FORMAT_TOTAL_TIME_RUNNING | PERF_FORMAT_GROUP;

		fds[i].fd = sys_perf_event_open(&fds[i].hw, pid, -1, fds[0].fd, 0);
		if (fds[i].fd == -1) {
			warn("cannot attach event %d %s", i, fds[i].name);
			goto error;
		}
	}	

	close(go[1]);

	waitpid(pid, &status, 0);

	print_counts(fds, num);

	for(i = 0; i < num; i++)
		close(fds[i].fd);

	free(fds);
	return 0;
error:
	free(fds);
	kill(SIGKILL, pid);
	return -1;
}


int
main(int argc, char **argv)
{
	int c;

	if (argc < 2) {
		fprintf(stderr, "you must specify a command to execute\n");
	} 

	return parent(&argv[1]);
}

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] perf_events: PERF_FORMAT_GROUP not working correctly when monitoring another task
  2010-05-04  2:06 [BUG] perf_events: PERF_FORMAT_GROUP not working correctly when monitoring another task Corey Ashford
@ 2010-05-06 15:42 ` Peter Zijlstra
  2010-05-06 20:09   ` Corey Ashford
  2010-05-07 18:41   ` [tip:perf/core] perf: Fix exit() vs PERF_FORMAT_GROUP tip-bot for Peter Zijlstra
  0 siblings, 2 replies; 6+ messages in thread
From: Peter Zijlstra @ 2010-05-06 15:42 UTC (permalink / raw)
  To: Corey Ashford
  Cc: LKML, Paul Mackerras, Frederic Weisbecker,
	Arnaldo Carvalho de Melo, Carl Love, Maynard Johnson,
	stephane eranian

On Mon, 2010-05-03 at 19:06 -0700, Corey Ashford wrote:
> In the last couple of days, I've run across what appears to be a
> kernel bug in 2.6.33.3 (haven't tested later kernels yet) having to do
> with using the PERF_FORMAT_GROUP feature in combination with
> enable_on_exec and reading counts from a remote task.
> 
> What happens is that when we go to read the entire group up, only the
> first counter in can be accessed; the read() call returns too few
> bytes.  This problem doesn't occur if measuring the from the same
> task.
> 
> I have attached a "cut down", though it's not terribly small.  It is a
> cut down from the "task" example program in libpfm4/perf_examples.  In
> addition to trimming the program down a lot, I've removed the
> dependency on libpfm4 and made modifications so that it will compile
> in the tools/perf subdirectory.  If you copy the attachment into your
> tools/perf subdir, you should be able to compile it with just:
> 
> gcc -o show_fg_bug show_fg_bug.c
> 
> Then invoke it by passing it an executable that will give it something
> to chew on a little, e.g.:
> 
> ./show_fg_bug md5sum `which gdb`
> 
> The test cases creates two counters and places them in the same group,
> and sets the PERF_FORMAT_GROUP option on the first counter.  It
> fork/execs the child and when the child is done executing, it reads
> back the counter values.
> 
> When I run it, I see this output:
> 
> % ./show_fg_bug md5sum `which gdb`
> 825b15d7279ef21d6c9d018d775758ae  /usr/bin/gdb
> Error! tried to read 40 bytes, but got 32
>             58684138 PERF_COUNT_HW_CPU_CYCLES (35469840 : 35469840)
>                    0 PERF_COUNT_HW_INSTRUCTIONS (35469840 : 35469840)
> 
> Oddly enough, if you look at the "nr" (number of counters) value that
> gets read up as part of the group, it is two, but it can only read the
> first of the two counters.  Another data point is that it doesn't
> matter how many counters you add to the group, only the first can be
> read up.
> 
> Please let me know if you have any questions about this.

Looks like you hit the exact same bug Stephane did:
  http://lkml.org/lkml/2010/4/9/142

The below patch seems to cure it for me.

# gcc -o show_fg_bug show_fg_bug.c
# ./show_fg_bug md5sum `which gdb`
bc88d978c10446689326245529e6c4c1  /usr/bin/gdb
            29721534 PERF_COUNT_HW_CPU_CYCLES (18657335 : 18657335)
            18681747 PERF_COUNT_HW_INSTRUCTIONS (18657335 : 18657335)


---
Subject: perf: Fix exit() vs PERF_FORMAT_GROUP
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Thu May 06 17:31:38 CEST 2010

Both Stephane and Corey reported that PERF_FORMAT_GROUP didn't work as
expected if the task the counters were attached to quit before the
read() call.

The cause is that we unconditionally destroy the grouping when we
remove counters from their context. Fix this by only doing this when
we free the counter itself.

Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
Index: linux-2.6/include/linux/perf_event.h
===================================================================
--- linux-2.6.orig/include/linux/perf_event.h
+++ linux-2.6/include/linux/perf_event.h
@@ -548,6 +548,7 @@ struct pmu {
  * enum perf_event_active_state - the states of a event
  */
 enum perf_event_active_state {
+	PERF_EVENT_STATE_FREE		= -3,
 	PERF_EVENT_STATE_ERROR		= -2,
 	PERF_EVENT_STATE_OFF		= -1,
 	PERF_EVENT_STATE_INACTIVE	=  0,
Index: linux-2.6/kernel/perf_event.c
===================================================================
--- linux-2.6.orig/kernel/perf_event.c
+++ linux-2.6/kernel/perf_event.c
@@ -342,6 +342,9 @@ list_del_event(struct perf_event *event,
 	if (event->state > PERF_EVENT_STATE_OFF)
 		event->state = PERF_EVENT_STATE_OFF;
 
+	if (event->state > PERF_EVENT_STATE_FREE)
+		return;
+
 	/*
 	 * If this was a group event with sibling events then
 	 * upgrade the siblings to singleton events by adding them
@@ -1861,6 +1864,8 @@ int perf_event_release_kernel(struct per
 {
 	struct perf_event_context *ctx = event->ctx;
 
+	event->state = PERF_EVENT_STATE_FREE;
+
 	WARN_ON_ONCE(ctx->parent_ctx);
 	mutex_lock(&ctx->mutex);
 	perf_event_remove_from_context(event);


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] perf_events: PERF_FORMAT_GROUP not working correctly when monitoring another task
  2010-05-06 15:42 ` Peter Zijlstra
@ 2010-05-06 20:09   ` Corey Ashford
  2010-05-06 20:22     ` Peter Zijlstra
  2010-05-07 18:41   ` [tip:perf/core] perf: Fix exit() vs PERF_FORMAT_GROUP tip-bot for Peter Zijlstra
  1 sibling, 1 reply; 6+ messages in thread
From: Corey Ashford @ 2010-05-06 20:09 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Paul Mackerras, Frederic Weisbecker,
	Arnaldo Carvalho de Melo, Carl Love, Maynard Johnson,
	stephane eranian

On 05/06/2010 08:42 AM, Peter Zijlstra wrote:
> On Mon, 2010-05-03 at 19:06 -0700, Corey Ashford wrote:
>> In the last couple of days, I've run across what appears to be a
>> kernel bug in 2.6.33.3 (haven't tested later kernels yet) having to do
>> with using the PERF_FORMAT_GROUP feature in combination with
>> enable_on_exec and reading counts from a remote task.
>>
>> What happens is that when we go to read the entire group up, only the
>> first counter in can be accessed; the read() call returns too few
>> bytes.  This problem doesn't occur if measuring the from the same
>> task.
>>
>> I have attached a "cut down", though it's not terribly small.  It is a
>> cut down from the "task" example program in libpfm4/perf_examples.  In
>> addition to trimming the program down a lot, I've removed the
>> dependency on libpfm4 and made modifications so that it will compile
>> in the tools/perf subdirectory.  If you copy the attachment into your
>> tools/perf subdir, you should be able to compile it with just:
>>
>> gcc -o show_fg_bug show_fg_bug.c
>>
>> Then invoke it by passing it an executable that will give it something
>> to chew on a little, e.g.:
>>
>> ./show_fg_bug md5sum `which gdb`
>>
>> The test cases creates two counters and places them in the same group,
>> and sets the PERF_FORMAT_GROUP option on the first counter.  It
>> fork/execs the child and when the child is done executing, it reads
>> back the counter values.
>>
>> When I run it, I see this output:
>>
>> % ./show_fg_bug md5sum `which gdb`
>> 825b15d7279ef21d6c9d018d775758ae  /usr/bin/gdb
>> Error! tried to read 40 bytes, but got 32
>>              58684138 PERF_COUNT_HW_CPU_CYCLES (35469840 : 35469840)
>>                     0 PERF_COUNT_HW_INSTRUCTIONS (35469840 : 35469840)
>>
>> Oddly enough, if you look at the "nr" (number of counters) value that
>> gets read up as part of the group, it is two, but it can only read the
>> first of the two counters.  Another data point is that it doesn't
>> matter how many counters you add to the group, only the first can be
>> read up.
>>
>> Please let me know if you have any questions about this.
>
> Looks like you hit the exact same bug Stephane did:
>    http://lkml.org/lkml/2010/4/9/142
>
> The below patch seems to cure it for me.
>
> # gcc -o show_fg_bug show_fg_bug.c
> # ./show_fg_bug md5sum `which gdb`
> bc88d978c10446689326245529e6c4c1  /usr/bin/gdb
>              29721534 PERF_COUNT_HW_CPU_CYCLES (18657335 : 18657335)
>              18681747 PERF_COUNT_HW_INSTRUCTIONS (18657335 : 18657335)
>
>
> ---
> Subject: perf: Fix exit() vs PERF_FORMAT_GROUP
> From: Peter Zijlstra<a.p.zijlstra@chello.nl>
> Date: Thu May 06 17:31:38 CEST 2010
>
> Both Stephane and Corey reported that PERF_FORMAT_GROUP didn't work as
> expected if the task the counters were attached to quit before the
> read() call.
>
> The cause is that we unconditionally destroy the grouping when we
> remove counters from their context. Fix this by only doing this when
> we free the counter itself.
>
> Reported-by: Corey Ashford<cjashfor@linux.vnet.ibm.com>
> Reported-by: Stephane Eranian<eranian@google.com>
> Signed-off-by: Peter Zijlstra<a.p.zijlstra@chello.nl>
> ---
> Index: linux-2.6/include/linux/perf_event.h
> ===================================================================
> --- linux-2.6.orig/include/linux/perf_event.h
> +++ linux-2.6/include/linux/perf_event.h
> @@ -548,6 +548,7 @@ struct pmu {
>    * enum perf_event_active_state - the states of a event
>    */
>   enum perf_event_active_state {
> +	PERF_EVENT_STATE_FREE		= -3,
>   	PERF_EVENT_STATE_ERROR		= -2,
>   	PERF_EVENT_STATE_OFF		= -1,
>   	PERF_EVENT_STATE_INACTIVE	=  0,
> Index: linux-2.6/kernel/perf_event.c
> ===================================================================
> --- linux-2.6.orig/kernel/perf_event.c
> +++ linux-2.6/kernel/perf_event.c
> @@ -342,6 +342,9 @@ list_del_event(struct perf_event *event,
>   	if (event->state>  PERF_EVENT_STATE_OFF)
>   		event->state = PERF_EVENT_STATE_OFF;
>
> +	if (event->state>  PERF_EVENT_STATE_FREE)
> +		return;
> +
>   	/*
>   	 * If this was a group event with sibling events then
>   	 * upgrade the siblings to singleton events by adding them
> @@ -1861,6 +1864,8 @@ int perf_event_release_kernel(struct per
>   {
>   	struct perf_event_context *ctx = event->ctx;
>
> +	event->state = PERF_EVENT_STATE_FREE;
> +
>   	WARN_ON_ONCE(ctx->parent_ctx);
>   	mutex_lock(&ctx->mutex);
>   	perf_event_remove_from_context(event);
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/

I applied your patch to 2.6.33.3.  It applied with a few offsets, but no 
errors:
% patch -p1 < ~/pz_patch.diff
patching file include/linux/perf_event.h
Hunk #1 succeeded at 517 (offset -31 lines).
patching file kernel/perf_event.c
Hunk #1 succeeded at 349 (offset 7 lines).
Hunk #2 succeeded at 1777 (offset -87 lines).

The patch works with the test case I provided, but for some reason it 
breaks the normal operation of the libpfm4 "task" utility.  If I put 
more than one event in a group, I get zero counts on all but the first 
event.  That's even if I don't use the PERF_FORMAT_GROUP option.

So something appears to be messed up.  I will see if I can construct an 
arch-independent test case which demonstrates the problem.

-- 
Regards,

- Corey

Corey Ashford
Software Engineer
IBM Linux Technology Center, Linux Toolchain
Beaverton, OR
cjashfor@us.ibm.com

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] perf_events: PERF_FORMAT_GROUP not working correctly when monitoring another task
  2010-05-06 20:09   ` Corey Ashford
@ 2010-05-06 20:22     ` Peter Zijlstra
  2010-05-07  2:40       ` Corey Ashford
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2010-05-06 20:22 UTC (permalink / raw)
  To: Corey Ashford
  Cc: LKML, Paul Mackerras, Frederic Weisbecker,
	Arnaldo Carvalho de Melo, Carl Love, Maynard Johnson,
	stephane eranian

On Thu, 2010-05-06 at 13:09 -0700, Corey Ashford wrote:
> The patch works with the test case I provided, but for some reason it 
> breaks the normal operation of the libpfm4 "task" utility.  If I put 
> more than one event in a group, I get zero counts on all but the first 
> event.  That's even if I don't use the PERF_FORMAT_GROUP option.

Bugger,.. weird though, I can't see how it would affect the state before
the counters get detached from their context (exit() / close()).

> So something appears to be messed up.  I will see if I can construct an 
> arch-independent test case which demonstrates the problem.

That would be awesome, thanks!



^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [BUG] perf_events: PERF_FORMAT_GROUP not working correctly when monitoring another task
  2010-05-06 20:22     ` Peter Zijlstra
@ 2010-05-07  2:40       ` Corey Ashford
  0 siblings, 0 replies; 6+ messages in thread
From: Corey Ashford @ 2010-05-07  2:40 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: LKML, Paul Mackerras, Frederic Weisbecker,
	Arnaldo Carvalho de Melo, Carl Love, Maynard Johnson,
	stephane eranian

On 05/06/2010 01:22 PM, Peter Zijlstra wrote:
> On Thu, 2010-05-06 at 13:09 -0700, Corey Ashford wrote:
>> The patch works with the test case I provided, but for some reason it
>> breaks the normal operation of the libpfm4 "task" utility.  If I put
>> more than one event in a group, I get zero counts on all but the first
>> event.  That's even if I don't use the PERF_FORMAT_GROUP option.
>
> Bugger,.. weird though, I can't see how it would affect the state before
> the counters get detached from their context (exit() / close()).
>
>> So something appears to be messed up.  I will see if I can construct an
>> arch-independent test case which demonstrates the problem.
>
> That would be awesome, thanks!
>

Ok, it was pilot error.  We had made a few changes to the "task" program 
to expand its capabilities and introduced a couple of bugs at the same time.

Unfortunately, I have one remaining mystery.  The default behavior of 
task is to create groups, without the PERF_FORMAT_GROUP flag set, and 
then read up the counts and time enabled/running values for each counter 
in the group after the measurement has occurred.

Well, for some reason I haven't been able to figure out, all but first 
event in the group have their time running and time enable values set to 
zero, but their counts are non-zero.

If I replicate this setup with the show_fg_bug test case I posted, the 
time running and enabled values are correct for each counter.

The enabled and running flag bits are set for every event which is 
opened (in both show_fg_bug and task)... so I don't know what the 
difference would be.

If anything springs to mind there as to how I might be getting zero 
enabled/running values passed back from the kernel, I'd be very 
interested to know.

Otherwise, I will keep poking at it till I find the bug in task, or 
something I can point at in the kernel.

Thanks,

- Corey

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:perf/core] perf: Fix exit() vs PERF_FORMAT_GROUP
  2010-05-06 15:42 ` Peter Zijlstra
  2010-05-06 20:09   ` Corey Ashford
@ 2010-05-07 18:41   ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 6+ messages in thread
From: tip-bot for Peter Zijlstra @ 2010-05-07 18:41 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: linux-kernel, eranian, hpa, mingo, a.p.zijlstra, tglx, cjashfor, mingo

Commit-ID:  4fd38e4595e2f6c9d27732c042a0e16b2753049c
Gitweb:     http://git.kernel.org/tip/4fd38e4595e2f6c9d27732c042a0e16b2753049c
Author:     Peter Zijlstra <a.p.zijlstra@chello.nl>
AuthorDate: Thu, 6 May 2010 17:31:38 +0200
Committer:  Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 7 May 2010 11:30:17 +0200

perf: Fix exit() vs PERF_FORMAT_GROUP

Both Stephane and Corey reported that PERF_FORMAT_GROUP didn't work
as expected if the task the counters were attached to quit before
the read() call.

The cause is that we unconditionally destroy the grouping when we
remove counters from their context. Fix this by only doing this when
we free the counter itself.

Reported-by: Corey Ashford <cjashfor@linux.vnet.ibm.com>
Reported-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <1273160566.5605.404.camel@twins>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 include/linux/perf_event.h |    1 +
 kernel/perf_event.c        |    5 +++++
 2 files changed, 6 insertions(+), 0 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index c8e3754..bf8f3c0 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -522,6 +522,7 @@ struct pmu {
  * enum perf_event_active_state - the states of a event
  */
 enum perf_event_active_state {
+	PERF_EVENT_STATE_FREE		= -3,
 	PERF_EVENT_STATE_ERROR		= -2,
 	PERF_EVENT_STATE_OFF		= -1,
 	PERF_EVENT_STATE_INACTIVE	=  0,
diff --git a/kernel/perf_event.c b/kernel/perf_event.c
index 3d1552d..f13c3db 100644
--- a/kernel/perf_event.c
+++ b/kernel/perf_event.c
@@ -341,6 +341,9 @@ list_del_event(struct perf_event *event, struct perf_event_context *ctx)
 	if (event->state > PERF_EVENT_STATE_OFF)
 		event->state = PERF_EVENT_STATE_OFF;
 
+	if (event->state > PERF_EVENT_STATE_FREE)
+		return;
+
 	/*
 	 * If this was a group event with sibling events then
 	 * upgrade the siblings to singleton events by adding them
@@ -1856,6 +1859,8 @@ int perf_event_release_kernel(struct perf_event *event)
 {
 	struct perf_event_context *ctx = event->ctx;
 
+	event->state = PERF_EVENT_STATE_FREE;
+
 	WARN_ON_ONCE(ctx->parent_ctx);
 	mutex_lock(&ctx->mutex);
 	perf_event_remove_from_context(event);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-05-07 18:42 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-05-04  2:06 [BUG] perf_events: PERF_FORMAT_GROUP not working correctly when monitoring another task Corey Ashford
2010-05-06 15:42 ` Peter Zijlstra
2010-05-06 20:09   ` Corey Ashford
2010-05-06 20:22     ` Peter Zijlstra
2010-05-07  2:40       ` Corey Ashford
2010-05-07 18:41   ` [tip:perf/core] perf: Fix exit() vs PERF_FORMAT_GROUP tip-bot for Peter Zijlstra

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.