linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2] perf/core: Set event shadow time for inactive events too
@ 2021-12-01  4:58 Namhyung Kim
  2021-12-05 14:15 ` Peter Zijlstra
  0 siblings, 1 reply; 3+ messages in thread
From: Namhyung Kim @ 2021-12-01  4:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
	Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
	Ian Rogers, Namhyung Kim, Song Liu

From: Namhyung Kim <namhyung@google.com>

While f79256532682 ("perf/core: fix userpage->time_enabled of inactive
events") fixed this problem for user rdpmc usage, bperf (perf stat
with BPF) still has the same problem that accessing inactive perf
events from BPF using bpf_perf_event_read_value().

You can reproduce this problem easily.  As this is about a small
window with multiplexing, we need a large number of events and short
duration like below:

  # perf stat -a -v --bpf-counters -e instructions,branches,branch-misses \
    -e cache-references,cache-misses,bus-cycles,ref-cycles,cycles sleep 0.1

  Control descriptor is not initialized
  instructions: 19616489 431324015 360374366
  branches: 3685346 417640114 344175443
  branch-misses: 75714 404089360 336145421
  cache-references: 438667 390474289 327444074
  cache-misses: 49279 349333164 272835067
  bus-cycles: 631887 283423953 165164214
  ref-cycles: 2578771111104847872 18446744069443110306 182116355
  cycles: 1785221016051271680 18446744071682768912 115821694

   Performance counter stats for 'system wide':

          19,616,489      instructions              #    0.00  insn per cycle           ( 83.55%)
           3,685,346      branches                                                      ( 82.41%)
              75,714      branch-misses             #    2.05% of all branches          ( 83.19%)
             438,667      cache-references                                              ( 83.86%)
              49,279      cache-misses              #   11.234 % of all cache refs      ( 78.10%)
             631,887      bus-cycles                                                    ( 58.27%)
  2,578,771,111,104,847,872      ref-cycles                                                     (0.00%)
  1,785,221,016,051,271,680      cycles                                                         (0.00%)

       0.010824702 seconds time elapsed

As you can see, it shows invalid values for the last two events.
The -v option shows that the enabled time is way bigger than the
running time.  So it scaled the counter values using the ratio
between the two and resulted in that.  This problem can get worse
if users want no-aggregation or cgroup aggregation with a small
interval.

Actually 18446744069443110306 is 0xffffffff01b345a2 so it seems to
have a negative enabled time.  In fact, bperf keeps values returned by
bpf_perf_event_read_value() which calls perf_event_read_local(), and
accumulates delta between two calls.  When event->shadow_ctx_time is
not set, it'd return invalid enabled time which is bigger than normal.
Later, the shadow time is set and the function starts to return a
valid time.  At the moment, the recent value is smaller than before so
the delta in the bperf can be negative.

I think we need to set the shadow time even the events are inactive so
that BPF programs (or other potential users) can see valid time values
anytime.

Cc: Song Liu <songliubraving@fb.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
v2) rebased to tip/perf/core

 kernel/events/core.c | 29 ++++++++++++++---------------
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/kernel/events/core.c b/kernel/events/core.c
index 3b3297a57228..be37f830f51c 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -3707,27 +3707,26 @@ static noinline int visit_groups_merge(struct perf_cpu_context *cpuctx,
 	return 0;
 }
 
-static inline bool event_update_userpage(struct perf_event *event)
+static inline void group_update_event_time(struct perf_event *group_event)
 {
-	if (likely(!atomic_read(&event->mmap_count)))
-		return false;
-
-	perf_event_update_time(event);
-	perf_set_shadow_time(event, event->ctx);
-	perf_event_update_userpage(event);
+	struct perf_event *event;
+	struct perf_event_context *ctx = group_event->ctx;
 
-	return true;
-}
+	perf_event_update_time(group_event);
+	perf_set_shadow_time(group_event, ctx);
 
-static inline void group_update_userpage(struct perf_event *group_event)
-{
-	struct perf_event *event;
+	for_each_sibling_event(event, group_event) {
+		perf_event_update_time(event);
+		perf_set_shadow_time(event, ctx);
+	}
 
-	if (!event_update_userpage(group_event))
+	if (likely(!atomic_read(&group_event->mmap_count)))
 		return;
 
+	perf_event_update_userpage(group_event);
+
 	for_each_sibling_event(event, group_event)
-		event_update_userpage(event);
+		perf_event_update_userpage(event);
 }
 
 static int merge_sched_in(struct perf_event *event, void *data)
@@ -3755,7 +3754,7 @@ static int merge_sched_in(struct perf_event *event, void *data)
 		} else {
 			ctx->rotate_necessary = 1;
 			perf_mux_hrtimer_restart(cpuctx);
-			group_update_userpage(event);
+			group_update_event_time(event);
 		}
 	}
 
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] perf/core: Set event shadow time for inactive events too
  2021-12-01  4:58 [PATCH v2] perf/core: Set event shadow time for inactive events too Namhyung Kim
@ 2021-12-05 14:15 ` Peter Zijlstra
  2021-12-05 20:56   ` Namhyung Kim
  0 siblings, 1 reply; 3+ messages in thread
From: Peter Zijlstra @ 2021-12-05 14:15 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
	Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
	Ian Rogers, Namhyung Kim, Song Liu

On Tue, Nov 30, 2021 at 08:58:07PM -0800, Namhyung Kim wrote:
> From: Namhyung Kim <namhyung@google.com>
> 
> While f79256532682 ("perf/core: fix userpage->time_enabled of inactive
> events") fixed this problem for user rdpmc usage, bperf (perf stat
> with BPF) still has the same problem that accessing inactive perf
> events from BPF using bpf_perf_event_read_value().
> 
> +static inline void group_update_event_time(struct perf_event *group_event)
>  {
> +	struct perf_event *event;
> +	struct perf_event_context *ctx = group_event->ctx;

:-( surely you're aware of the reverse xmas tree thing by now?

>  
> +	perf_event_update_time(group_event);
> +	perf_set_shadow_time(group_event, ctx);
>  
> +	for_each_sibling_event(event, group_event) {
> +		perf_event_update_time(event);
> +		perf_set_shadow_time(event, ctx);
> +	}
>  
> +	if (likely(!atomic_read(&group_event->mmap_count)))
>  		return;
>  
> +	perf_event_update_userpage(group_event);
> +
>  	for_each_sibling_event(event, group_event)
> +		perf_event_update_userpage(event);

How does it make sense to chase those pointers twice?

>  }

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH v2] perf/core: Set event shadow time for inactive events too
  2021-12-05 14:15 ` Peter Zijlstra
@ 2021-12-05 20:56   ` Namhyung Kim
  0 siblings, 0 replies; 3+ messages in thread
From: Namhyung Kim @ 2021-12-05 20:56 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Ingo Molnar, Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
	Alexander Shishkin, LKML, Stephane Eranian, Andi Kleen,
	Ian Rogers, Namhyung Kim, Song Liu

Hi Peter,

On Sun, Dec 5, 2021 at 6:16 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Tue, Nov 30, 2021 at 08:58:07PM -0800, Namhyung Kim wrote:
> > From: Namhyung Kim <namhyung@google.com>
> >
> > While f79256532682 ("perf/core: fix userpage->time_enabled of inactive
> > events") fixed this problem for user rdpmc usage, bperf (perf stat
> > with BPF) still has the same problem that accessing inactive perf
> > events from BPF using bpf_perf_event_read_value().
> >
> > +static inline void group_update_event_time(struct perf_event *group_event)
> >  {
> > +     struct perf_event *event;
> > +     struct perf_event_context *ctx = group_event->ctx;
>
> :-( surely you're aware of the reverse xmas tree thing by now?

Will change the order.

>
> >
> > +     perf_event_update_time(group_event);
> > +     perf_set_shadow_time(group_event, ctx);
> >
> > +     for_each_sibling_event(event, group_event) {
> > +             perf_event_update_time(event);
> > +             perf_set_shadow_time(event, ctx);
> > +     }
> >
> > +     if (likely(!atomic_read(&group_event->mmap_count)))
> >               return;
> >
> > +     perf_event_update_userpage(group_event);
> > +
> >       for_each_sibling_event(event, group_event)
> > +             perf_event_update_userpage(event);
>
> How does it make sense to chase those pointers twice?

OK, then I think I can use the existing code to update
event time before checking the mmap count.

Thanks,
Namhyung

>
> >  }

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-12-05 20:56 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-01  4:58 [PATCH v2] perf/core: Set event shadow time for inactive events too Namhyung Kim
2021-12-05 14:15 ` Peter Zijlstra
2021-12-05 20:56   ` Namhyung Kim

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).