All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf/stat: Fix perf stat for forked applications
@ 2022-03-17 15:53 Thomas Richter
  2022-03-23 11:07 ` Jiri Olsa
  0 siblings, 1 reply; 4+ messages in thread
From: Thomas Richter @ 2022-03-17 15:53 UTC (permalink / raw)
  To: linux-kernel, linux-perf-users, acme, namhyung, gor, acme
  Cc: svens, sumanthk, hca, Thomas Richter

I have run into the following issue:

 # perf stat -a -e new_pmu/INSTRUCTION_7/ --  mytest -c1 7

 Performance counter stats for 'system wide':

                 0      new_pmu/INSTRUCTION_7/

       0.000366428 seconds time elapsed
 #

The new PMU for s390 counts the execution of certain CPU instructions.
The root cause is the extremely small run time of the
mytest program. It just executes some assembly instructions
and then exits. In above invocation the instruction is executed
exactly one time (-c1 option). The PMU is expected to report this one
time execution by a counter value of one, but fails to do so
in some cases, not all.

Debugging reveals the invocation of the child process is done
*before* the counter events are installed and enabled. Tracing
reveals that sometimes the child process starts and exits before
the event is installed on all CPUs. The more CPUs the machine has,
the more often this miscount happens.

Fix this by reversing the start of the work load after the events
have been installed on the specified CPUs. Now the comment also
matches the code.

Output after:
 # perf stat -a -e new_pmu/INSTRUCTION_7/ --  mytest -c1 7

 Performance counter stats for 'system wide':

                 1      new_pmu/INSTRUCTION_7/

       0.000366428 seconds time elapsed
 #

Now the correct result is reported rock solid all the time regardless
how many CPUs are online.

Fixes:  acf2892270dc ("perf stat: Use perf_evlist__prepare/start_workload())

Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/builtin-stat.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 3f98689dd687..60baa3dadc4b 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -955,10 +955,10 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	 * Enable counters and exec the command:
 	 */
 	if (forks) {
-		evlist__start_workload(evsel_list);
 		err = enable_counters();
 		if (err)
 			return -1;
+		evlist__start_workload(evsel_list);
 
 		t0 = rdclock();
 		clock_gettime(CLOCK_MONOTONIC, &ref_time);
-- 
2.35.1


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH] perf/stat: Fix perf stat for forked applications
  2022-03-17 15:53 [PATCH] perf/stat: Fix perf stat for forked applications Thomas Richter
@ 2022-03-23 11:07 ` Jiri Olsa
  2022-03-23 20:54   ` Namhyung Kim
  0 siblings, 1 reply; 4+ messages in thread
From: Jiri Olsa @ 2022-03-23 11:07 UTC (permalink / raw)
  To: Thomas Richter
  Cc: linux-kernel, linux-perf-users, acme, namhyung, gor, acme, svens,
	sumanthk, hca

On Thu, Mar 17, 2022 at 04:53:46PM +0100, Thomas Richter wrote:
> I have run into the following issue:
> 
>  # perf stat -a -e new_pmu/INSTRUCTION_7/ --  mytest -c1 7
> 
>  Performance counter stats for 'system wide':
> 
>                  0      new_pmu/INSTRUCTION_7/
> 
>        0.000366428 seconds time elapsed
>  #
> 
> The new PMU for s390 counts the execution of certain CPU instructions.
> The root cause is the extremely small run time of the
> mytest program. It just executes some assembly instructions
> and then exits. In above invocation the instruction is executed
> exactly one time (-c1 option). The PMU is expected to report this one
> time execution by a counter value of one, but fails to do so
> in some cases, not all.
> 
> Debugging reveals the invocation of the child process is done
> *before* the counter events are installed and enabled. Tracing
> reveals that sometimes the child process starts and exits before
> the event is installed on all CPUs. The more CPUs the machine has,
> the more often this miscount happens.
> 
> Fix this by reversing the start of the work load after the events
> have been installed on the specified CPUs. Now the comment also
> matches the code.
> 
> Output after:
>  # perf stat -a -e new_pmu/INSTRUCTION_7/ --  mytest -c1 7
> 
>  Performance counter stats for 'system wide':
> 
>                  1      new_pmu/INSTRUCTION_7/
> 
>        0.000366428 seconds time elapsed
>  #
> 
> Now the correct result is reported rock solid all the time regardless
> how many CPUs are online.
> 
> Fixes:  acf2892270dc ("perf stat: Use perf_evlist__prepare/start_workload())
> 
> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
> Cc: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/builtin-stat.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index 3f98689dd687..60baa3dadc4b 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -955,10 +955,10 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  	 * Enable counters and exec the command:
>  	 */
>  	if (forks) {
> -		evlist__start_workload(evsel_list);
>  		err = enable_counters();
>  		if (err)
>  			return -1;
> +		evlist__start_workload(evsel_list);

right, without -a the event has enable_on_exec so the race does not
matter, but it's a problem for system wide with fork

Acked-by: Jiri Olsa <jolsa@kernel.org>

thanks,
jirka

>  
>  		t0 = rdclock();
>  		clock_gettime(CLOCK_MONOTONIC, &ref_time);
> -- 
> 2.35.1
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] perf/stat: Fix perf stat for forked applications
  2022-03-23 11:07 ` Jiri Olsa
@ 2022-03-23 20:54   ` Namhyung Kim
  2022-03-24 20:36     ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 4+ messages in thread
From: Namhyung Kim @ 2022-03-23 20:54 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Thomas Richter, linux-kernel, linux-perf-users,
	Arnaldo Carvalho de Melo, gor, Arnaldo Carvalho de Melo,
	Sven Schnelle, sumanthk, Heiko Carstens

On Wed, Mar 23, 2022 at 4:07 AM Jiri Olsa <olsajiri@gmail.com> wrote:
>
> On Thu, Mar 17, 2022 at 04:53:46PM +0100, Thomas Richter wrote:
> > I have run into the following issue:
> >
> >  # perf stat -a -e new_pmu/INSTRUCTION_7/ --  mytest -c1 7
> >
> >  Performance counter stats for 'system wide':
> >
> >                  0      new_pmu/INSTRUCTION_7/
> >
> >        0.000366428 seconds time elapsed
> >  #
> >
> > The new PMU for s390 counts the execution of certain CPU instructions.
> > The root cause is the extremely small run time of the
> > mytest program. It just executes some assembly instructions
> > and then exits. In above invocation the instruction is executed
> > exactly one time (-c1 option). The PMU is expected to report this one
> > time execution by a counter value of one, but fails to do so
> > in some cases, not all.
> >
> > Debugging reveals the invocation of the child process is done
> > *before* the counter events are installed and enabled. Tracing
> > reveals that sometimes the child process starts and exits before
> > the event is installed on all CPUs. The more CPUs the machine has,
> > the more often this miscount happens.
> >
> > Fix this by reversing the start of the work load after the events
> > have been installed on the specified CPUs. Now the comment also
> > matches the code.
> >
> > Output after:
> >  # perf stat -a -e new_pmu/INSTRUCTION_7/ --  mytest -c1 7
> >
> >  Performance counter stats for 'system wide':
> >
> >                  1      new_pmu/INSTRUCTION_7/
> >
> >        0.000366428 seconds time elapsed
> >  #
> >
> > Now the correct result is reported rock solid all the time regardless
> > how many CPUs are online.
> >
> > Fixes:  acf2892270dc ("perf stat: Use perf_evlist__prepare/start_workload())
> >
> > Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> > Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
> > Cc: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/builtin-stat.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > index 3f98689dd687..60baa3dadc4b 100644
> > --- a/tools/perf/builtin-stat.c
> > +++ b/tools/perf/builtin-stat.c
> > @@ -955,10 +955,10 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> >        * Enable counters and exec the command:
> >        */
> >       if (forks) {
> > -             evlist__start_workload(evsel_list);
> >               err = enable_counters();
> >               if (err)
> >                       return -1;
> > +             evlist__start_workload(evsel_list);
>
> right, without -a the event has enable_on_exec so the race does not
> matter, but it's a problem for system wide with fork

Agreed.  Also we may move the enable_counters() and the
clock code out of the if block to be shared with the else block.

Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks,
Namhyung

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH] perf/stat: Fix perf stat for forked applications
  2022-03-23 20:54   ` Namhyung Kim
@ 2022-03-24 20:36     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 4+ messages in thread
From: Arnaldo Carvalho de Melo @ 2022-03-24 20:36 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Jiri Olsa, Thomas Richter, linux-kernel, linux-perf-users, gor,
	Arnaldo Carvalho de Melo, Sven Schnelle, sumanthk,
	Heiko Carstens

Em Wed, Mar 23, 2022 at 01:54:23PM -0700, Namhyung Kim escreveu:
> On Wed, Mar 23, 2022 at 4:07 AM Jiri Olsa <olsajiri@gmail.com> wrote:
> >
> > On Thu, Mar 17, 2022 at 04:53:46PM +0100, Thomas Richter wrote:
> > > I have run into the following issue:
> > >
> > >  # perf stat -a -e new_pmu/INSTRUCTION_7/ --  mytest -c1 7
> > >
> > >  Performance counter stats for 'system wide':
> > >
> > >                  0      new_pmu/INSTRUCTION_7/
> > >
> > >        0.000366428 seconds time elapsed
> > >  #
> > >
> > > The new PMU for s390 counts the execution of certain CPU instructions.
> > > The root cause is the extremely small run time of the
> > > mytest program. It just executes some assembly instructions
> > > and then exits. In above invocation the instruction is executed
> > > exactly one time (-c1 option). The PMU is expected to report this one
> > > time execution by a counter value of one, but fails to do so
> > > in some cases, not all.
> > >
> > > Debugging reveals the invocation of the child process is done
> > > *before* the counter events are installed and enabled. Tracing
> > > reveals that sometimes the child process starts and exits before
> > > the event is installed on all CPUs. The more CPUs the machine has,
> > > the more often this miscount happens.
> > >
> > > Fix this by reversing the start of the work load after the events
> > > have been installed on the specified CPUs. Now the comment also
> > > matches the code.
> > >
> > > Output after:
> > >  # perf stat -a -e new_pmu/INSTRUCTION_7/ --  mytest -c1 7
> > >
> > >  Performance counter stats for 'system wide':
> > >
> > >                  1      new_pmu/INSTRUCTION_7/
> > >
> > >        0.000366428 seconds time elapsed
> > >  #
> > >
> > > Now the correct result is reported rock solid all the time regardless
> > > how many CPUs are online.
> > >
> > > Fixes:  acf2892270dc ("perf stat: Use perf_evlist__prepare/start_workload())
> > >
> > > Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
> > > Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
> > > Cc: Namhyung Kim <namhyung@kernel.org>
> > > ---
> > >  tools/perf/builtin-stat.c | 2 +-
> > >  1 file changed, 1 insertion(+), 1 deletion(-)
> > >
> > > diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> > > index 3f98689dd687..60baa3dadc4b 100644
> > > --- a/tools/perf/builtin-stat.c
> > > +++ b/tools/perf/builtin-stat.c
> > > @@ -955,10 +955,10 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
> > >        * Enable counters and exec the command:
> > >        */
> > >       if (forks) {
> > > -             evlist__start_workload(evsel_list);
> > >               err = enable_counters();
> > >               if (err)
> > >                       return -1;
> > > +             evlist__start_workload(evsel_list);
> >
> > right, without -a the event has enable_on_exec so the race does not
> > matter, but it's a problem for system wide with fork
> 
> Agreed.  Also we may move the enable_counters() and the
> clock code out of the if block to be shared with the else block.
> 
> Acked-by: Namhyung Kim <namhyung@kernel.org>

Thanks, applied and added your comments as "Reviewer notes", and please
consider using the Reviewed-by tag instead of the Acked-by ones.

Thanks anyways, as always!

- Arnaldo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2022-03-24 20:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-17 15:53 [PATCH] perf/stat: Fix perf stat for forked applications Thomas Richter
2022-03-23 11:07 ` Jiri Olsa
2022-03-23 20:54   ` Namhyung Kim
2022-03-24 20:36     ` Arnaldo Carvalho de Melo

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.