LKML Archive on lore.kernel.org
 help / Atom feed
* Re: perf group read for inherited events
       [not found] <20170526205601.GA5271@tassilo.jf.intel.com>
@ 2017-05-30  9:45 ` Peter Zijlstra
  2017-05-30 13:57   ` Andi Kleen
  2017-06-08  9:21   ` [tip:perf/core] perf/core: Correct event creation with PERF_FORMAT_GROUP tip-bot for Peter Zijlstra
  0 siblings, 2 replies; 7+ messages in thread
From: Peter Zijlstra @ 2017-05-30  9:45 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Vince Weaver, Thomas Gleixner, Ingo Molnar,
	Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Jiri Olsa



On Fri, May 26, 2017 at 01:56:01PM -0700, Andi Kleen wrote:

> I have a need for read group reads for inherited events.
> 
> It looks like the perf group read code already has all the code
> to handle inheritance, __perf_read_group_add walks
> the children list and adds them all up.
> 
>    4409 
>    4410         list_for_each_entry(sub, &leader->sibling_list, group_entry) {
>    4411                 values[n++] += perf_event_count(sub);
>    4412                 if (read_format & PERF_FORMAT_ID)
>    4413                         values[n++] = primary_event_id(sub);
>    4414         }
> 
> 
> I disabled the check that forbids this and it seems to work
> from some simple testing.
> 
> Again do I miss something why this was disabled?

The thing that seems difficult is PERF_SAMPLE_READ vs inherited,
irrespective of PERF_FORMAT_GROUP.

The error seems to be in that patch you fingered:

  3dab77fb1bf8 ("perf: Rework/fix the whole read vs group stuff")


-       PERF_SAMPLE_GROUP                       = 1U << 4,
+       PERF_SAMPLE_READ                        = 1U << 4,

-       if (attr->inherit && (attr->sample_type & PERF_SAMPLE_GROUP))
+       if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))

is a clear fail :/

> Perhaps some locking issue? There are some comments on it,
> but I'm not sure I understand all the subtleties:
> 
>         /*
>          * By locking the child_mutex of the leader we effectively
>          * lock the child list of all siblings.. XXX explain how.
>          */
>         mutex_lock(&leader->child_mutex);

This is more recent. Here I failed to find a coherent text to explain
the locking. It is correct through. I think its something like:

@@ -4426,8 +4426,9 @@ static int perf_read_group(struct perf_event *event,
 	values[0] = 1 + leader->nr_siblings;
 
 	/*
-	 * By locking the child_mutex of the leader we effectively
-	 * lock the child list of all siblings.. XXX explain how.
+	 * By locking the child_mutex of the leader we effectively lock the
+	 * child list of all siblings. Since inherit_group() will first clone
+	 * the leader and will this be blocked on us holding its child_mutex.
 	 */
 	mutex_lock(&leader->child_mutex);
 

> Or is the simple patch below good enough?

The below seems to be the correct thing. It is rather unfortunate that
this would break/significantly change existing semantics :/

---
 kernel/events/core.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 8d6acaeeea17..2d9de6fb9a5a 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5722,9 +5722,6 @@ static void perf_output_read_one(struct perf_output_handle *handle,
 	__output_copy(handle, values, n * sizeof(u64));
 }
 
-/*
- * XXX PERF_FORMAT_GROUP vs inherited events seems difficult.
- */
 static void perf_output_read_group(struct perf_output_handle *handle,
 			    struct perf_event *event,
 			    u64 enabled, u64 running)
@@ -5769,6 +5766,12 @@ static void perf_output_read_group(struct perf_output_handle *handle,
 #define PERF_FORMAT_TOTAL_TIMES (PERF_FORMAT_TOTAL_TIME_ENABLED|\
 				 PERF_FORMAT_TOTAL_TIME_RUNNING)
 
+/*
+ * XXX PERF_SAMPLE_READ vs inherited events seems difficult.
+ *
+ * The problem is that its both hard and excessively expensive to iterate the
+ * child list from interrupt/NMI context.
+ */
 static void perf_output_read(struct perf_output_handle *handle,
 			     struct perf_event *event)
 {
@@ -9434,9 +9437,10 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	local64_set(&hwc->period_left, hwc->sample_period);
 
 	/*
-	 * we currently do not support PERF_FORMAT_GROUP on inherited events
+	 * We currently do not support PERF_SAMPLE_READ on inherited events.
+	 * See perf_output_read().
 	 */
-	if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
+	if (attr->inherit && (attr->sample_type & PERF_SAMPLE_READ))
 		goto err_ns;
 
 	if (!has_branch_stack(event))

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf group read for inherited events
  2017-05-30  9:45 ` perf group read for inherited events Peter Zijlstra
@ 2017-05-30 13:57   ` Andi Kleen
  2017-05-30 17:01     ` Peter Zijlstra
  2017-06-08  9:21   ` [tip:perf/core] perf/core: Correct event creation with PERF_FORMAT_GROUP tip-bot for Peter Zijlstra
  1 sibling, 1 reply; 7+ messages in thread
From: Andi Kleen @ 2017-05-30 13:57 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Vince Weaver, Thomas Gleixner, Ingo Molnar,
	Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Jiri Olsa

On Tue, May 30, 2017 at 11:45:12AM +0200, Peter Zijlstra wrote:
> > Or is the simple patch below good enough?
> 
> The below seems to be the correct thing. It is rather unfortunate that
> this would break/significantly change existing semantics :/

The "existing semantics" as in ignoring the PERF_SAMPLE_READ in sample_type,
even though it wasn't implemented? It seems reasonable to me.

If you really worry about it could drop a printk_once in and see if it
triggers anywhere (and if yes drop the if completely)

The patch looks good to me. Please consider adding it.
I have some patches to use this in perf stat, will submit later.

-Andi

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf group read for inherited events
  2017-05-30 13:57   ` Andi Kleen
@ 2017-05-30 17:01     ` Peter Zijlstra
  2017-05-30 17:55       ` Vince Weaver
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2017-05-30 17:01 UTC (permalink / raw)
  To: Andi Kleen
  Cc: linux-kernel, Vince Weaver, Thomas Gleixner, Ingo Molnar,
	Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Jiri Olsa

On Tue, May 30, 2017 at 06:57:14AM -0700, Andi Kleen wrote:
> On Tue, May 30, 2017 at 11:45:12AM +0200, Peter Zijlstra wrote:
> > > Or is the simple patch below good enough?
> > 
> > The below seems to be the correct thing. It is rather unfortunate that
> > this would break/significantly change existing semantics :/
> 
> The "existing semantics" as in ignoring the PERF_SAMPLE_READ in sample_type,
> even though it wasn't implemented? It seems reasonable to me.

Right, so where we used to accept PERF_SAMPLE_READ on inherited events,
we now no longer will.

Note that it currently doesn't work right, even if it doesn't fail like
with the proposed patch.

Typically Vince will (rightly) complain when I change things like this.
But seeing how even if we accept it, it is fairly terminally buggered in
any case, we could change it.

Vince, do you know of anybody that would be immediately affected by
this?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf group read for inherited events
  2017-05-30 17:01     ` Peter Zijlstra
@ 2017-05-30 17:55       ` Vince Weaver
  2017-05-30 18:53         ` Peter Zijlstra
  0 siblings, 1 reply; 7+ messages in thread
From: Vince Weaver @ 2017-05-30 17:55 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Jiri Olsa

On Tue, 30 May 2017, Peter Zijlstra wrote:

> On Tue, May 30, 2017 at 06:57:14AM -0700, Andi Kleen wrote:
> > On Tue, May 30, 2017 at 11:45:12AM +0200, Peter Zijlstra wrote:
> > > > Or is the simple patch below good enough?
> > > 
> > > The below seems to be the correct thing. It is rather unfortunate that
> > > this would break/significantly change existing semantics :/
> > 
> > The "existing semantics" as in ignoring the PERF_SAMPLE_READ in sample_type,
> > even though it wasn't implemented? It seems reasonable to me.
> 
> Right, so where we used to accept PERF_SAMPLE_READ on inherited events,
> we now no longer will.
> 
> Note that it currently doesn't work right, even if it doesn't fail like
> with the proposed patch.
> 
> Typically Vince will (rightly) complain when I change things like this.
> But seeing how even if we accept it, it is fairly terminally buggered in
> any case, we could change it.
> 
> Vince, do you know of anybody that would be immediately affected by
> this?

I often only hear about breakage months later after it's too late.

So the issue is currently if you were sampling, and you were sampling on
an event group, and you had set PERF_SAMPLE_READ to get all counts for a 
group, and the event was also inherited.... perf_event_open() would let 
you do this even though the results would probably be wrong?

I'm not aware of anyone trying to do this, or at least I should say PAPI 
isn't trying to do this (PAPI currently has pretty simplistic sampling 
support, though that's getting a major overhaul this summer).

Vince

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf group read for inherited events
  2017-05-30 17:55       ` Vince Weaver
@ 2017-05-30 18:53         ` Peter Zijlstra
  2017-05-31 21:08           ` Vince Weaver
  0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2017-05-30 18:53 UTC (permalink / raw)
  To: Vince Weaver
  Cc: Andi Kleen, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Jiri Olsa

On Tue, May 30, 2017 at 01:55:33PM -0400, Vince Weaver wrote:
> So the issue is currently if you were sampling, and you were sampling on
> an event group, and you had set PERF_SAMPLE_READ to get all counts for a 
> group, and the event was also inherited....

No, anything PERF_SAMPLE_READ (group or not) on inherited events is
wrong.

It would only report the event count of the current task count + all
dead child counts (if you hit the parent event). It would not include
the current count of any other live tasks in the hierarchy.

And the problem is that fixing this is rather tricky and iterating the
hierarchy can be excessively expensive in any case (imagine having to
iterate several hundred tasks tasks from that NMI/interrupt).

And since its from NMI/interrupt context it is impossible to get the
current count of any other live tasks that are running on another CPU.

> perf_event_open() would let 
> you do this even though the results would probably be wrong?

Right, currently we have the filter on PERF_FORMAT_GROUP, but it should
be PERF_SAMPLE_READ.

So a !group SAMPLE_READ on inherited is currently allowed but returns
'interesting' values.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: perf group read for inherited events
  2017-05-30 18:53         ` Peter Zijlstra
@ 2017-05-31 21:08           ` Vince Weaver
  0 siblings, 0 replies; 7+ messages in thread
From: Vince Weaver @ 2017-05-31 21:08 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Andi Kleen, linux-kernel, Thomas Gleixner, Ingo Molnar,
	Stephane Eranian, Alexander Shishkin, Arnaldo Carvalho de Melo,
	Jiri Olsa

On Tue, 30 May 2017, Peter Zijlstra wrote:

> On Tue, May 30, 2017 at 01:55:33PM -0400, Vince Weaver wrote:
> > So the issue is currently if you were sampling, and you were sampling on
> > an event group, and you had set PERF_SAMPLE_READ to get all counts for a 
> > group, and the event was also inherited....
> 
> No, anything PERF_SAMPLE_READ (group or not) on inherited events is
> wrong.

ah, yes.  I should fix the wording in the manpage as it's a bit confusing.

> > perf_event_open() would let 
> > you do this even though the results would probably be wrong?
> 
> Right, currently we have the filter on PERF_FORMAT_GROUP, but it should
> be PERF_SAMPLE_READ.
> 
> So a !group SAMPLE_READ on inherited is currently allowed but returns
> 'interesting' values.

I'm not aware of anything that will break on fixing this.

Especially if we are always returning bad results then it would be better 
to break things so we find people (if any) who are depending on the wrong 
results.

Vince

^ permalink raw reply	[flat|nested] 7+ messages in thread

* [tip:perf/core] perf/core: Correct event creation with PERF_FORMAT_GROUP
  2017-05-30  9:45 ` perf group read for inherited events Peter Zijlstra
  2017-05-30 13:57   ` Andi Kleen
@ 2017-06-08  9:21   ` tip-bot for Peter Zijlstra
  1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Peter Zijlstra @ 2017-06-08  9:21 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: eranian, peterz, torvalds, jolsa, acme, tglx, alexander.shishkin,
	mingo, hpa, ak, vince, linux-kernel

Commit-ID:  ba5213ae6b88fb170c4771fef6553f759c7d8cdd
Gitweb:     http://git.kernel.org/tip/ba5213ae6b88fb170c4771fef6553f759c7d8cdd
Author:     Peter Zijlstra <peterz@infradead.org>
AuthorDate: Tue, 30 May 2017 11:45:12 +0200
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Thu, 8 Jun 2017 10:12:38 +0200

perf/core: Correct event creation with PERF_FORMAT_GROUP

Andi was asking about PERF_FORMAT_GROUP vs inherited events, which led
to the discovery of a bug from commit:

  3dab77fb1bf8 ("perf: Rework/fix the whole read vs group stuff")

 -       PERF_SAMPLE_GROUP                       = 1U << 4,
 +       PERF_SAMPLE_READ                        = 1U << 4,

 -       if (attr->inherit && (attr->sample_type & PERF_SAMPLE_GROUP))
 +       if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))

is a clear fail :/

While this changes user visible behaviour; it was previously possible
to create an inherited event with PERF_SAMPLE_READ; this is deemed
acceptible because its results were always incorrect.

Reported-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vince@deater.net>
Fixes:  3dab77fb1bf8 ("perf: Rework/fix the whole read vs group stuff")
Link: http://lkml.kernel.org/r/20170530094512.dy2nljns2uq7qa3j@hirez.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/events/core.c | 15 ++++++++++-----
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3de0b98..407dad6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5729,9 +5729,6 @@ static void perf_output_read_one(struct perf_output_handle *handle,
 	__output_copy(handle, values, n * sizeof(u64));
 }
 
-/*
- * XXX PERF_FORMAT_GROUP vs inherited events seems difficult.
- */
 static void perf_output_read_group(struct perf_output_handle *handle,
 			    struct perf_event *event,
 			    u64 enabled, u64 running)
@@ -5776,6 +5773,13 @@ static void perf_output_read_group(struct perf_output_handle *handle,
 #define PERF_FORMAT_TOTAL_TIMES (PERF_FORMAT_TOTAL_TIME_ENABLED|\
 				 PERF_FORMAT_TOTAL_TIME_RUNNING)
 
+/*
+ * XXX PERF_SAMPLE_READ vs inherited events seems difficult.
+ *
+ * The problem is that its both hard and excessively expensive to iterate the
+ * child list, not to mention that its impossible to IPI the children running
+ * on another CPU, from interrupt/NMI context.
+ */
 static void perf_output_read(struct perf_output_handle *handle,
 			     struct perf_event *event)
 {
@@ -9462,9 +9466,10 @@ perf_event_alloc(struct perf_event_attr *attr, int cpu,
 	local64_set(&hwc->period_left, hwc->sample_period);
 
 	/*
-	 * we currently do not support PERF_FORMAT_GROUP on inherited events
+	 * We currently do not support PERF_SAMPLE_READ on inherited events.
+	 * See perf_output_read().
 	 */
-	if (attr->inherit && (attr->read_format & PERF_FORMAT_GROUP))
+	if (attr->inherit && (attr->sample_type & PERF_SAMPLE_READ))
 		goto err_ns;
 
 	if (!has_branch_stack(event))

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, back to index

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <20170526205601.GA5271@tassilo.jf.intel.com>
2017-05-30  9:45 ` perf group read for inherited events Peter Zijlstra
2017-05-30 13:57   ` Andi Kleen
2017-05-30 17:01     ` Peter Zijlstra
2017-05-30 17:55       ` Vince Weaver
2017-05-30 18:53         ` Peter Zijlstra
2017-05-31 21:08           ` Vince Weaver
2017-06-08  9:21   ` [tip:perf/core] perf/core: Correct event creation with PERF_FORMAT_GROUP tip-bot for Peter Zijlstra

LKML Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/lkml/0 lkml/git/0.git
	git clone --mirror https://lore.kernel.org/lkml/1 lkml/git/1.git
	git clone --mirror https://lore.kernel.org/lkml/2 lkml/git/2.git
	git clone --mirror https://lore.kernel.org/lkml/3 lkml/git/3.git
	git clone --mirror https://lore.kernel.org/lkml/4 lkml/git/4.git
	git clone --mirror https://lore.kernel.org/lkml/5 lkml/git/5.git
	git clone --mirror https://lore.kernel.org/lkml/6 lkml/git/6.git
	git clone --mirror https://lore.kernel.org/lkml/7 lkml/git/7.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 lkml lkml/ https://lore.kernel.org/lkml \
		linux-kernel@vger.kernel.org linux-kernel@archiver.kernel.org
	public-inbox-index lkml


Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-kernel


AGPL code for this site: git clone https://public-inbox.org/ public-inbox