linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 0/2] libperf: Add support for scaling counters obtained from the read() system call during multiplexing
@ 2021-09-22 10:16 Shunsuke Nakamura
  2021-09-22 10:16 ` [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing Shunsuke Nakamura
  2021-09-22 10:16 ` [PATCH v2 2/2] libperf tests: Add test_stat_multiplexing test Shunsuke Nakamura
  0 siblings, 2 replies; 12+ messages in thread
From: Shunsuke Nakamura @ 2021-09-22 10:16 UTC (permalink / raw)
  To: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung
  Cc: linux-kernel, linux-perf-users

This patch series supports counter scaling when perf_evsel__read() obtains a counter
using the read() system call during multiplexing.

The first patch adds scaling of counters obtained from the read() system call
during multiplexing.

The second patch adds a test for the first patch.
This patch is based on Vince's rdpmc_multiplexing.c [1]

---
Changes in v2:
 - Fix not to divide by zero when counter scaling
 - Add test to verify that no division by zero occurs


[1] https://github.com/deater/perf_event_tests/blob/master/tests/rdpmc/rdpmc_multiplexing.c


nakamura shunsuke (2):
  libperf: Add processing to scale the counters obtained during the
    read() system call when multiplexing
  libperf tests: Add test_stat_multiplexing test

 tools/lib/perf/evsel.c             |   6 +
 tools/lib/perf/tests/test-evlist.c | 183 +++++++++++++++++++++++++++++
 2 files changed, 189 insertions(+)

-- 
2.27.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
  2021-09-22 10:16 [PATCH v2 0/2] libperf: Add support for scaling counters obtained from the read() system call during multiplexing Shunsuke Nakamura
@ 2021-09-22 10:16 ` Shunsuke Nakamura
  2021-09-22 21:34   ` Jiri Olsa
  2021-09-22 10:16 ` [PATCH v2 2/2] libperf tests: Add test_stat_multiplexing test Shunsuke Nakamura
  1 sibling, 1 reply; 12+ messages in thread
From: Shunsuke Nakamura @ 2021-09-22 10:16 UTC (permalink / raw)
  To: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung
  Cc: linux-kernel, linux-perf-users

From: nakamura shunsuke <nakamura.shun@fujitsu.com>

perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
does not scale counters obtained by read() system call.

Add processing to perf_evsel__read() to scale the counters obtained during the
read() system call when multiplexing.


Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
---
 tools/lib/perf/evsel.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
index 8441e3e1aaac..0ebd1d34436f 100644
--- a/tools/lib/perf/evsel.c
+++ b/tools/lib/perf/evsel.c
@@ -18,6 +18,7 @@
 #include <sys/ioctl.h>
 #include <sys/mman.h>
 #include <asm/bug.h>
+#include <linux/math64.h>
 
 void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
 		      int idx)
@@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
 	if (readn(*fd, count->values, size) <= 0)
 		return -errno;
 
+	if (count->ena != count->run) {
+		if (count->run != 0)
+			count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
+	}
+
 	return 0;
 }
 
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v2 2/2] libperf tests: Add test_stat_multiplexing test
  2021-09-22 10:16 [PATCH v2 0/2] libperf: Add support for scaling counters obtained from the read() system call during multiplexing Shunsuke Nakamura
  2021-09-22 10:16 ` [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing Shunsuke Nakamura
@ 2021-09-22 10:16 ` Shunsuke Nakamura
  2021-10-08 19:11   ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 12+ messages in thread
From: Shunsuke Nakamura @ 2021-09-22 10:16 UTC (permalink / raw)
  To: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung
  Cc: linux-kernel, linux-perf-users

From: nakamura shunsuke <nakamura.shun@fujitsu.com>

Adds a test for a counter obtained using read() system call during multiplexing

Committer testing:

  $ sudo make tests -C tools/lib/perf/ V=1
  make: Entering directory '/home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/lib/perf'
  make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=. obj=libperf
  make -C /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/lib/api/ O= libapi.a
  make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=./fd obj=libapi
  make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=./fs obj=libapi
  make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=. obj=tests
  make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=./tests obj=tests
  running static:
  - running tests/test-cpumap.c...OK
  - running tests/test-threadmap.c...OK
  - running tests/test-evlist.c...
  	count = 502273264, run = 264105562, enable = 444092845
  	count = 501850699, run = 264100505, enable = 444088400
  	count = 499613780, run = 264096660, enable = 444085073
  	count = 499569398, run = 264501528, enable = 444081268
  	count = 500717228, run = 265495387, enable = 444077029
  	count = 501260539, run = 266488226, enable = 444071569
  	count = 501023910, run = 267469613, enable = 444065303
  	count = 501188267, run = 268461294, enable = 444058413
  	count = 503096683, run = 269450511, enable = 444050567
  	count = 503141904, run = 269587790, enable = 444041735
  	count = 503454490, run = 268590361, enable = 444030544
  	count = 504838207, run = 267592624, enable = 444021005
  	count = 505458594, run = 266594648, enable = 444001280
  	count = 504269433, run = 265595795, enable = 443991377
  	count = 502625858, run = 264599374, enable = 443982141
     Expected: 502065774
     High: 505458594   Low:  499569398   Average:  502292150
     Average Error = 0.05%
  	count = 26129, run = 134550, enable = 134550
  	count = 34035, run = 140540, enable = 140540
  	count = 40189, run = 141126, enable = 141126
  	count = 45263, run = 138290, enable = 138290
  	count = 51381, run = 136966, enable = 136966
  	count = 57499, run = 134993, enable = 134993
  	count = 63617, run = 132320, enable = 132320
  	count = 71714, run = 132047, enable = 132047
  	count = 77832, run = 128157, enable = 128157
  	count = 0, run = 0, enable = 123662
  	count = 0, run = 0, enable = 116753
  	count = 0, run = 0, enable = 110304
  	count = 0, run = 0, enable = 103834
  	count = 0, run = 0, enable = 97465
  	count = 0, run = 0, enable = 91110
  OK
  - running tests/test-evsel.c...
  	loop = 65536, count = 334303
  	loop = 131072, count = 656954
  	loop = 262144, count = 1321090
  	loop = 524288, count = 2652616
  	loop = 1048576, count = 5262140
  	loop = 65536, count = 393726
  	loop = 131072, count = 989345
  	loop = 262144, count = 1986206
  	loop = 524288, count = 3831652
  	loop = 1048576, count = 7747957
  OK
  running dynamic:
  - running tests/test-cpumap.c...OK
  - running tests/test-threadmap.c...OK
  - running tests/test-evlist.c...
  	count = 501975838, run = 247092568, enable = 415093723
  	count = 502491858, run = 247087145, enable = 415088830
  	count = 502571026, run = 247083574, enable = 415085139
  	count = 502743767, run = 247080183, enable = 415081197
  	count = 503259008, run = 247779561, enable = 415076990
  	count = 503169622, run = 248773141, enable = 415072091
  	count = 503430373, run = 249816445, enable = 415066529
  	count = 502597951, run = 250808797, enable = 415059863
  	count = 502211776, run = 251799199, enable = 415051831
  	count = 501921896, run = 252000721, enable = 415042717
  	count = 501441904, run = 251309399, enable = 415031152
  	count = 501224474, run = 250311107, enable = 415021479
  	count = 500502827, run = 249260579, enable = 415012168
  	count = 500775622, run = 248259621, enable = 415002830
  	count = 501377375, run = 247261134, enable = 414993890
     Expected: 501953755
     High: 503430373   Low:  500502827   Average:  502113021
     Average Error = 0.03%
  	count = 26098, run = 131826, enable = 131826
  	count = 33689, run = 138284, enable = 138284
  	count = 39808, run = 139152, enable = 139152
  	count = 45927, run = 138880, enable = 138880
  	count = 51049, run = 135405, enable = 135405
  	count = 57168, run = 133406, enable = 133406
  	count = 63287, run = 130736, enable = 130736
  	count = 71392, run = 130474, enable = 130474
  	count = 77794, run = 127203, enable = 127203
  	count = 0, run = 0, enable = 123172
  	count = 0, run = 0, enable = 116856
  	count = 0, run = 0, enable = 110557
  	count = 0, run = 0, enable = 104282
  	count = 0, run = 0, enable = 97929
  	count = 0, run = 0, enable = 91792
  OK
  - running tests/test-evsel.c...
  	loop = 65536, count = 334400
  	loop = 131072, count = 656908
  	loop = 262144, count = 1315441
  	loop = 524288, count = 2657481
  	loop = 1048576, count = 5258193
  	loop = 65536, count = 491938
  	loop = 131072, count = 965755
  	loop = 262144, count = 1666277
  	loop = 524288, count = 3516123
  	loop = 1048576, count = 6759204
  OK
  make: Leaving directory '/home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/lib/perf'


Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
---
 tools/lib/perf/tests/test-evlist.c | 183 +++++++++++++++++++++++++++++
 1 file changed, 183 insertions(+)

diff --git a/tools/lib/perf/tests/test-evlist.c b/tools/lib/perf/tests/test-evlist.c
index c67c83399170..ca7a9542f40e 100644
--- a/tools/lib/perf/tests/test-evlist.c
+++ b/tools/lib/perf/tests/test-evlist.c
@@ -21,6 +21,10 @@
 #include "tests.h"
 #include <internal/evsel.h>
 
+#define EVENT_NUM 15
+#define WAIT_COUNT 100000000UL
+
+
 static int libperf_print(enum libperf_print_level level,
 			 const char *fmt, va_list ap)
 {
@@ -413,6 +417,184 @@ static int test_mmap_cpus(void)
 	return 0;
 }
 
+static double display_error(long long average,
+		     long long high,
+		     long long low,
+		     long long expected) {
+
+	double error;
+
+	error = (((double)average - expected) / expected) * 100.0;
+
+	__T_VERBOSE("   Expected: %lld\n", expected);
+	__T_VERBOSE("   High: %lld   Low:  %lld   Average:  %lld\n",
+			high, low, average);
+
+	__T_VERBOSE("   Average Error = %.2f%%\n",error);
+
+	return error;
+
+}
+
+static int test_stat_multiplexing(void)
+{
+	struct perf_counts_values expected_counts = { .val = 0 };
+	struct perf_counts_values multiplexing_counts[EVENT_NUM] = {{ .val = 0 },};
+	struct perf_thread_map *threads;
+	struct perf_evlist *evlist;
+	struct perf_evsel *evsel;
+	struct perf_event_attr attr = {
+		.type	     = PERF_TYPE_HARDWARE,
+		.config	     = PERF_COUNT_HW_INSTRUCTIONS,
+		.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
+			       PERF_FORMAT_TOTAL_TIME_RUNNING,
+		.disabled    = 1,
+	};
+	int err, i, nonzero=0;
+	unsigned long count;
+	long long max = 0, min = 0, avg = 0;
+	double error = 0.0;
+
+
+	/* read for non-multiplexing event count */
+	threads = perf_thread_map__new_dummy();
+	__T("failed to create threads", threads);
+
+	perf_thread_map__set_pid(threads, 0, 0);
+
+	evsel = perf_evsel__new(&attr);
+	__T("failed to create evsel", evsel);
+
+	err = perf_evsel__open(evsel, NULL, threads);
+	__T("failed to open evsel", err == 0);
+
+	err = perf_evsel__enable(evsel);
+	__T("failed to enable evsel", err == 0);
+
+	/* wait loop */
+	count = WAIT_COUNT;
+	while(count--);
+
+	perf_evsel__read(evsel, 0, 0, &expected_counts);
+	__T("failed to read value for evsel", expected_counts.val != 0);
+	__T("failed to read non-multiplexing event count",
+	    expected_counts.ena == expected_counts.run);
+
+	err = perf_evsel__disable(evsel);
+	__T("failed to enable evsel", err == 0);
+
+	perf_evsel__close(evsel);
+	perf_evsel__delete(evsel);
+
+	perf_thread_map__put(threads);
+
+
+	/* read for multiplexing event count */
+	threads = perf_thread_map__new_dummy();
+	__T("failed to create threads", threads);
+
+	perf_thread_map__set_pid(threads, 0, 0);
+
+	evlist = perf_evlist__new();
+	__T("failed to create evlist", evlist);
+
+	for (i = 0; i < EVENT_NUM; i++) {
+		evsel = perf_evsel__new(&attr);
+		__T("failed to create evsel1", evsel);
+
+		perf_evlist__add(evlist, evsel);
+	}
+	perf_evlist__set_maps(evlist, NULL, threads);
+
+	err = perf_evlist__open(evlist);
+	__T("failed to open evsel", err == 0);
+
+	perf_evlist__enable(evlist);
+
+	/* wait loop */
+	count = WAIT_COUNT;
+	while(count--);
+
+	i = 0;
+	perf_evlist__for_each_evsel(evlist, evsel) {
+		perf_evsel__read(evsel, 0, 0, &multiplexing_counts[i]);
+		__T("failed to read value for evsel", multiplexing_counts[i].val != 0);
+		i++;
+	}
+
+	perf_evlist__disable(evlist);
+
+
+	min = multiplexing_counts[0].val;
+	for (i = 0; i < EVENT_NUM; i++) {
+		__T_VERBOSE("\tcount = %lu, run = %lu, enable = %lu\n",
+			    multiplexing_counts[i].val, multiplexing_counts[i].run,
+			    multiplexing_counts[i].ena);
+
+		if (multiplexing_counts[i].val > max)
+			max = multiplexing_counts[i].val;
+
+		if (multiplexing_counts[i].val < min)
+			min = multiplexing_counts[i].val;
+
+		avg += multiplexing_counts[i].val;
+
+		if (multiplexing_counts[i].val != 0)
+			nonzero++;
+	}
+
+	avg = avg / nonzero;
+
+	error = display_error(avg, max, min, expected_counts.val);
+
+	__T("Error out of range!", ((error <= 1.0) && (error >= -1.0)));
+
+	perf_evlist__close(evlist);
+	perf_evlist__delete(evlist);
+
+	perf_thread_map__put(threads);
+
+
+	/* Verify that no division by zero occurs */
+	threads = perf_thread_map__new_dummy();
+	__T("failed to create threads", threads);
+
+	perf_thread_map__set_pid(threads, 0, 0);
+
+	evlist = perf_evlist__new();
+	__T("failed to create evlist", evlist);
+
+	for (i = 0; i < EVENT_NUM; i++) {
+		evsel = perf_evsel__new(&attr);
+		__T("failed to create evsel1", evsel);
+
+		perf_evlist__add(evlist, evsel);
+	}
+	perf_evlist__set_maps(evlist, NULL, threads);
+
+	err = perf_evlist__open(evlist);
+	__T("failed to open evsel", err == 0);
+
+	perf_evlist__enable(evlist);
+
+	/* Reproduce run=0 by not executing wait loop. */
+
+	i = 0;
+	perf_evlist__for_each_evsel(evlist, evsel) {
+		perf_evsel__read(evsel, 0, 0, &multiplexing_counts[i]);
+		__T_VERBOSE("\tcount = %lu, run = %lu, enable = %lu\n",
+			    multiplexing_counts[i].val, multiplexing_counts[i].run,
+			    multiplexing_counts[i].ena);
+		i++;
+	}
+
+	perf_evlist__close(evlist);
+	perf_evlist__delete(evlist);
+
+	perf_thread_map__put(threads);
+	return 0;
+}
+
 int test_evlist(int argc, char **argv)
 {
 	__T_START;
@@ -424,6 +606,7 @@ int test_evlist(int argc, char **argv)
 	test_stat_thread_enable();
 	test_mmap_thread();
 	test_mmap_cpus();
+	test_stat_multiplexing();
 
 	__T_END;
 	return tests_failed == 0 ? 0 : -1;
-- 
2.27.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
  2021-09-22 10:16 ` [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing Shunsuke Nakamura
@ 2021-09-22 21:34   ` Jiri Olsa
  2021-09-28  9:53     ` nakamura.shun
  0 siblings, 1 reply; 12+ messages in thread
From: Jiri Olsa @ 2021-09-22 21:34 UTC (permalink / raw)
  To: Shunsuke Nakamura
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, namhyung,
	linux-kernel, linux-perf-users

On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> 
> perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> does not scale counters obtained by read() system call.
> 
> Add processing to perf_evsel__read() to scale the counters obtained during the
> read() system call when multiplexing.
> 
> 
> Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> ---
>  tools/lib/perf/evsel.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> index 8441e3e1aaac..0ebd1d34436f 100644
> --- a/tools/lib/perf/evsel.c
> +++ b/tools/lib/perf/evsel.c
> @@ -18,6 +18,7 @@
>  #include <sys/ioctl.h>
>  #include <sys/mman.h>
>  #include <asm/bug.h>
> +#include <linux/math64.h>
>  
>  void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
>  		      int idx)
> @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
>  	if (readn(*fd, count->values, size) <= 0)
>  		return -errno;
>  
> +	if (count->ena != count->run) {
> +		if (count->run != 0)
> +			count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> +	}

so I think perf stat expect raw values in there and does the
scaling by itself, please check following code:

read_counters
  read_affinity_counters
    read_counter_cpu
      read_single_counter
        evsel__read_counter

  perf_stat_process_counter
    process_counter_maps
      process_counter_values
        perf_counts_values__scale


perhaps we could export perf_counts_values__scale if it'd be any help

jirka


> +
>  	return 0;
>  }
>  
> -- 
> 2.27.0
> 


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
  2021-09-22 21:34   ` Jiri Olsa
@ 2021-09-28  9:53     ` nakamura.shun
  2021-10-05 16:36       ` Rob Herring
  2021-10-07 17:17       ` Jiri Olsa
  0 siblings, 2 replies; 12+ messages in thread
From: nakamura.shun @ 2021-09-28  9:53 UTC (permalink / raw)
  To: Jiri Olsa, Rob Herring
  Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, namhyung,
	irogers, linux-perf-users, linux-kernel

Hi Jirka

> On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> > 
> > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > does not scale counters obtained by read() system call.
> > 
> > Add processing to perf_evsel__read() to scale the counters obtained during the
> > read() system call when multiplexing.
> > 
> > 
> > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > ---
> >  tools/lib/perf/evsel.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> > 
> > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > index 8441e3e1aaac..0ebd1d34436f 100644
> > --- a/tools/lib/perf/evsel.c
> +++ b/tools/lib/perf/evsel.c
> > @@ -18,6 +18,7 @@
> >  #include <sys/ioctl.h>
> >  #include <sys/mman.h>
> >  #include <asm/bug.h>
> > +#include <linux/math64.h>
> >  
> >  void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> >                      int idx)
> > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> >        if (readn(*fd, count->values, size) <= 0)
> >                return -errno;
> >  
> > +     if (count->ena != count->run) {
> > +             if (count->run != 0)
> > +                     count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > +     }
> 
> so I think perf stat expect raw values in there and does the
> scaling by itself, please check following code:
> 
> read_counters
>   read_affinity_counters
>     read_counter_cpu
>       read_single_counter
>         evsel__read_counter
> 
>   perf_stat_process_counter
>     process_counter_maps
>       process_counter_values
>         perf_counts_values__scale
> 
> 
> perhaps we could export perf_counts_values__scale if it'd be any help

Thank you for your comment.

The purpose of this patch is to unify the counters obtained with 
perf_evsel__read() to scaled or unscaled values.

perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is 
available, else gets by readn(). In current implementation, caller 
gets scaled counter if goes through RDPMC path, otherwise gets unscaled 
counter via readn() path.

However caller cannnot know which path were taken.

If caller expects a raw value, I think the RDPMC path should also 
return an unscaled counter.

diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
index c89dfa5..aaa4579 100644
--- a/tools/lib/perf/mmap.c
+++ b/tools/lib/perf/mmap.c
@@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
                count->ena += delta;
                if (idx)
                        count->run += delta;
-
-               cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
        }

        count->val = cnt;

Rob, do you have any comments?

Best Regards
Shunsuke

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
  2021-09-28  9:53     ` nakamura.shun
@ 2021-10-05 16:36       ` Rob Herring
  2021-10-19  4:59         ` nakamura.shun
  2021-10-07 17:17       ` Jiri Olsa
  1 sibling, 1 reply; 12+ messages in thread
From: Rob Herring @ 2021-10-05 16:36 UTC (permalink / raw)
  To: nakamura.shun
  Cc: Jiri Olsa, peterz, mingo, acme, mark.rutland, alexander.shishkin,
	namhyung, irogers, linux-perf-users, linux-kernel

On Tue, Sep 28, 2021 at 7:41 AM nakamura.shun@fujitsu.com
<nakamura.shun@fujitsu.com> wrote:
>
> Hi Jirka
>
> > On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> > >
> > > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > > does not scale counters obtained by read() system call.
> > >
> > > Add processing to perf_evsel__read() to scale the counters obtained during the
> > > read() system call when multiplexing.
> > >
> > >
> > > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > > ---
> > >  tools/lib/perf/evsel.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > > index 8441e3e1aaac..0ebd1d34436f 100644
> > > --- a/tools/lib/perf/evsel.c
> > +++ b/tools/lib/perf/evsel.c
> > > @@ -18,6 +18,7 @@
> > >  #include <sys/ioctl.h>
> > >  #include <sys/mman.h>
> > >  #include <asm/bug.h>
> > > +#include <linux/math64.h>
> > >
> > >  void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> > >                      int idx)
> > > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > >        if (readn(*fd, count->values, size) <= 0)
> > >                return -errno;
> > >
> > > +     if (count->ena != count->run) {
> > > +             if (count->run != 0)
> > > +                     count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > > +     }
> >
> > so I think perf stat expect raw values in there and does the
> > scaling by itself, please check following code:
> >
> > read_counters
> >   read_affinity_counters
> >     read_counter_cpu
> >       read_single_counter
> >         evsel__read_counter
> >
> >   perf_stat_process_counter
> >     process_counter_maps
> >       process_counter_values
> >         perf_counts_values__scale
> >
> >
> > perhaps we could export perf_counts_values__scale if it'd be any help
>
> Thank you for your comment.
>
> The purpose of this patch is to unify the counters obtained with
> perf_evsel__read() to scaled or unscaled values.
>
> perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is
> available, else gets by readn(). In current implementation, caller
> gets scaled counter if goes through RDPMC path, otherwise gets unscaled
> counter via readn() path.
>
> However caller cannnot know which path were taken.
>
> If caller expects a raw value, I think the RDPMC path should also
> return an unscaled counter.
>
> diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> index c89dfa5..aaa4579 100644
> --- a/tools/lib/perf/mmap.c
> +++ b/tools/lib/perf/mmap.c
> @@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
>                 count->ena += delta;
>                 if (idx)
>                         count->run += delta;
> -
> -               cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
>         }
>
>         count->val = cnt;
>
> Rob, do you have any comments?

Submit a proper patch with the above.

Rob

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
  2021-09-28  9:53     ` nakamura.shun
  2021-10-05 16:36       ` Rob Herring
@ 2021-10-07 17:17       ` Jiri Olsa
  2021-10-19  5:00         ` nakamura.shun
  1 sibling, 1 reply; 12+ messages in thread
From: Jiri Olsa @ 2021-10-07 17:17 UTC (permalink / raw)
  To: nakamura.shun
  Cc: Rob Herring, peterz, mingo, acme, mark.rutland,
	alexander.shishkin, namhyung, irogers, linux-perf-users,
	linux-kernel

On Tue, Sep 28, 2021 at 09:53:24AM +0000, nakamura.shun@fujitsu.com wrote:
> Hi Jirka
> 
> > On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> > > 
> > > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > > does not scale counters obtained by read() system call.
> > > 
> > > Add processing to perf_evsel__read() to scale the counters obtained during the
> > > read() system call when multiplexing.
> > > 
> > > 
> > > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > > ---
> > >  tools/lib/perf/evsel.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > > 
> > > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > > index 8441e3e1aaac..0ebd1d34436f 100644
> > > --- a/tools/lib/perf/evsel.c
> > +++ b/tools/lib/perf/evsel.c
> > > @@ -18,6 +18,7 @@
> > >  #include <sys/ioctl.h>
> > >  #include <sys/mman.h>
> > >  #include <asm/bug.h>
> > > +#include <linux/math64.h>
> > >  
> > >  void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> > >                      int idx)
> > > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > >        if (readn(*fd, count->values, size) <= 0)
> > >                return -errno;
> > >  
> > > +     if (count->ena != count->run) {
> > > +             if (count->run != 0)
> > > +                     count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > > +     }
> > 
> > so I think perf stat expect raw values in there and does the
> > scaling by itself, please check following code:
> > 
> > read_counters
> >   read_affinity_counters
> >     read_counter_cpu
> >       read_single_counter
> >         evsel__read_counter
> > 
> >   perf_stat_process_counter
> >     process_counter_maps
> >       process_counter_values
> >         perf_counts_values__scale
> > 
> > 
> > perhaps we could export perf_counts_values__scale if it'd be any help
> 
> Thank you for your comment.
> 
> The purpose of this patch is to unify the counters obtained with 
> perf_evsel__read() to scaled or unscaled values.
> 
> perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is 
> available, else gets by readn(). In current implementation, caller 
> gets scaled counter if goes through RDPMC path, otherwise gets unscaled 
> counter via readn() path.
> 
> However caller cannnot know which path were taken.
> 
> If caller expects a raw value, I think the RDPMC path should also 
> return an unscaled counter.
> 
> diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> index c89dfa5..aaa4579 100644
> --- a/tools/lib/perf/mmap.c
> +++ b/tools/lib/perf/mmap.c
> @@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
>                 count->ena += delta;
>                 if (idx)
>                         count->run += delta;
> -
> -               cnt = mul_u64_u64_div64(cnt, count->ena, count->run);

perf stat does not mmap counters so this should not be invoked
within perf stat.. but we should be consistent and scale after
calling perf_evsel__read.. and give user the possibility to get
un-scaled counts

that perhaps brings new feature.. mmap perf stat counters to invoke
the fast reading path for counters.. IIRC it should be matter just
to mmap the first 'user' page

thanks,
jirka

>         }
> 
>         count->val = cnt;
> 
> Rob, do you have any comments?
> 
> Best Regards
> Shunsuke


^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 2/2] libperf tests: Add test_stat_multiplexing test
  2021-09-22 10:16 ` [PATCH v2 2/2] libperf tests: Add test_stat_multiplexing test Shunsuke Nakamura
@ 2021-10-08 19:11   ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 12+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-10-08 19:11 UTC (permalink / raw)
  To: Shunsuke Nakamura
  Cc: Rob Herring, peterz, mingo, mark.rutland, alexander.shishkin,
	jolsa, namhyung, linux-kernel, linux-perf-users

Em Wed, Sep 22, 2021 at 07:16:27PM +0900, Shunsuke Nakamura escreveu:
> From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> 
> Adds a test for a counter obtained using read() system call during multiplexing

You forgot to add Rob to the CC list, can I have Acked-by or
Reviewed-by?

Thanks,

- Arnaldo
 
> Committer testing:
> 
>   $ sudo make tests -C tools/lib/perf/ V=1
>   make: Entering directory '/home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/lib/perf'
>   make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=. obj=libperf
>   make -C /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/lib/api/ O= libapi.a
>   make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=./fd obj=libapi
>   make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=./fs obj=libapi
>   make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=. obj=tests
>   make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=./tests obj=tests
>   running static:
>   - running tests/test-cpumap.c...OK
>   - running tests/test-threadmap.c...OK
>   - running tests/test-evlist.c...
>   	count = 502273264, run = 264105562, enable = 444092845
>   	count = 501850699, run = 264100505, enable = 444088400
>   	count = 499613780, run = 264096660, enable = 444085073
>   	count = 499569398, run = 264501528, enable = 444081268
>   	count = 500717228, run = 265495387, enable = 444077029
>   	count = 501260539, run = 266488226, enable = 444071569
>   	count = 501023910, run = 267469613, enable = 444065303
>   	count = 501188267, run = 268461294, enable = 444058413
>   	count = 503096683, run = 269450511, enable = 444050567
>   	count = 503141904, run = 269587790, enable = 444041735
>   	count = 503454490, run = 268590361, enable = 444030544
>   	count = 504838207, run = 267592624, enable = 444021005
>   	count = 505458594, run = 266594648, enable = 444001280
>   	count = 504269433, run = 265595795, enable = 443991377
>   	count = 502625858, run = 264599374, enable = 443982141
>      Expected: 502065774
>      High: 505458594   Low:  499569398   Average:  502292150
>      Average Error = 0.05%
>   	count = 26129, run = 134550, enable = 134550
>   	count = 34035, run = 140540, enable = 140540
>   	count = 40189, run = 141126, enable = 141126
>   	count = 45263, run = 138290, enable = 138290
>   	count = 51381, run = 136966, enable = 136966
>   	count = 57499, run = 134993, enable = 134993
>   	count = 63617, run = 132320, enable = 132320
>   	count = 71714, run = 132047, enable = 132047
>   	count = 77832, run = 128157, enable = 128157
>   	count = 0, run = 0, enable = 123662
>   	count = 0, run = 0, enable = 116753
>   	count = 0, run = 0, enable = 110304
>   	count = 0, run = 0, enable = 103834
>   	count = 0, run = 0, enable = 97465
>   	count = 0, run = 0, enable = 91110
>   OK
>   - running tests/test-evsel.c...
>   	loop = 65536, count = 334303
>   	loop = 131072, count = 656954
>   	loop = 262144, count = 1321090
>   	loop = 524288, count = 2652616
>   	loop = 1048576, count = 5262140
>   	loop = 65536, count = 393726
>   	loop = 131072, count = 989345
>   	loop = 262144, count = 1986206
>   	loop = 524288, count = 3831652
>   	loop = 1048576, count = 7747957
>   OK
>   running dynamic:
>   - running tests/test-cpumap.c...OK
>   - running tests/test-threadmap.c...OK
>   - running tests/test-evlist.c...
>   	count = 501975838, run = 247092568, enable = 415093723
>   	count = 502491858, run = 247087145, enable = 415088830
>   	count = 502571026, run = 247083574, enable = 415085139
>   	count = 502743767, run = 247080183, enable = 415081197
>   	count = 503259008, run = 247779561, enable = 415076990
>   	count = 503169622, run = 248773141, enable = 415072091
>   	count = 503430373, run = 249816445, enable = 415066529
>   	count = 502597951, run = 250808797, enable = 415059863
>   	count = 502211776, run = 251799199, enable = 415051831
>   	count = 501921896, run = 252000721, enable = 415042717
>   	count = 501441904, run = 251309399, enable = 415031152
>   	count = 501224474, run = 250311107, enable = 415021479
>   	count = 500502827, run = 249260579, enable = 415012168
>   	count = 500775622, run = 248259621, enable = 415002830
>   	count = 501377375, run = 247261134, enable = 414993890
>      Expected: 501953755
>      High: 503430373   Low:  500502827   Average:  502113021
>      Average Error = 0.03%
>   	count = 26098, run = 131826, enable = 131826
>   	count = 33689, run = 138284, enable = 138284
>   	count = 39808, run = 139152, enable = 139152
>   	count = 45927, run = 138880, enable = 138880
>   	count = 51049, run = 135405, enable = 135405
>   	count = 57168, run = 133406, enable = 133406
>   	count = 63287, run = 130736, enable = 130736
>   	count = 71392, run = 130474, enable = 130474
>   	count = 77794, run = 127203, enable = 127203
>   	count = 0, run = 0, enable = 123172
>   	count = 0, run = 0, enable = 116856
>   	count = 0, run = 0, enable = 110557
>   	count = 0, run = 0, enable = 104282
>   	count = 0, run = 0, enable = 97929
>   	count = 0, run = 0, enable = 91792
>   OK
>   - running tests/test-evsel.c...
>   	loop = 65536, count = 334400
>   	loop = 131072, count = 656908
>   	loop = 262144, count = 1315441
>   	loop = 524288, count = 2657481
>   	loop = 1048576, count = 5258193
>   	loop = 65536, count = 491938
>   	loop = 131072, count = 965755
>   	loop = 262144, count = 1666277
>   	loop = 524288, count = 3516123
>   	loop = 1048576, count = 6759204
>   OK
>   make: Leaving directory '/home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/lib/perf'
> 
> 
> Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> ---
>  tools/lib/perf/tests/test-evlist.c | 183 +++++++++++++++++++++++++++++
>  1 file changed, 183 insertions(+)
> 
> diff --git a/tools/lib/perf/tests/test-evlist.c b/tools/lib/perf/tests/test-evlist.c
> index c67c83399170..ca7a9542f40e 100644
> --- a/tools/lib/perf/tests/test-evlist.c
> +++ b/tools/lib/perf/tests/test-evlist.c
> @@ -21,6 +21,10 @@
>  #include "tests.h"
>  #include <internal/evsel.h>
>  
> +#define EVENT_NUM 15
> +#define WAIT_COUNT 100000000UL
> +
> +
>  static int libperf_print(enum libperf_print_level level,
>  			 const char *fmt, va_list ap)
>  {
> @@ -413,6 +417,184 @@ static int test_mmap_cpus(void)
>  	return 0;
>  }
>  
> +static double display_error(long long average,
> +		     long long high,
> +		     long long low,
> +		     long long expected) {
> +
> +	double error;
> +
> +	error = (((double)average - expected) / expected) * 100.0;
> +
> +	__T_VERBOSE("   Expected: %lld\n", expected);
> +	__T_VERBOSE("   High: %lld   Low:  %lld   Average:  %lld\n",
> +			high, low, average);
> +
> +	__T_VERBOSE("   Average Error = %.2f%%\n",error);
> +
> +	return error;
> +
> +}
> +
> +static int test_stat_multiplexing(void)
> +{
> +	struct perf_counts_values expected_counts = { .val = 0 };
> +	struct perf_counts_values multiplexing_counts[EVENT_NUM] = {{ .val = 0 },};
> +	struct perf_thread_map *threads;
> +	struct perf_evlist *evlist;
> +	struct perf_evsel *evsel;
> +	struct perf_event_attr attr = {
> +		.type	     = PERF_TYPE_HARDWARE,
> +		.config	     = PERF_COUNT_HW_INSTRUCTIONS,
> +		.read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
> +			       PERF_FORMAT_TOTAL_TIME_RUNNING,
> +		.disabled    = 1,
> +	};
> +	int err, i, nonzero=0;
> +	unsigned long count;
> +	long long max = 0, min = 0, avg = 0;
> +	double error = 0.0;
> +
> +
> +	/* read for non-multiplexing event count */
> +	threads = perf_thread_map__new_dummy();
> +	__T("failed to create threads", threads);
> +
> +	perf_thread_map__set_pid(threads, 0, 0);
> +
> +	evsel = perf_evsel__new(&attr);
> +	__T("failed to create evsel", evsel);
> +
> +	err = perf_evsel__open(evsel, NULL, threads);
> +	__T("failed to open evsel", err == 0);
> +
> +	err = perf_evsel__enable(evsel);
> +	__T("failed to enable evsel", err == 0);
> +
> +	/* wait loop */
> +	count = WAIT_COUNT;
> +	while(count--);
> +
> +	perf_evsel__read(evsel, 0, 0, &expected_counts);
> +	__T("failed to read value for evsel", expected_counts.val != 0);
> +	__T("failed to read non-multiplexing event count",
> +	    expected_counts.ena == expected_counts.run);
> +
> +	err = perf_evsel__disable(evsel);
> +	__T("failed to enable evsel", err == 0);
> +
> +	perf_evsel__close(evsel);
> +	perf_evsel__delete(evsel);
> +
> +	perf_thread_map__put(threads);
> +
> +
> +	/* read for multiplexing event count */
> +	threads = perf_thread_map__new_dummy();
> +	__T("failed to create threads", threads);
> +
> +	perf_thread_map__set_pid(threads, 0, 0);
> +
> +	evlist = perf_evlist__new();
> +	__T("failed to create evlist", evlist);
> +
> +	for (i = 0; i < EVENT_NUM; i++) {
> +		evsel = perf_evsel__new(&attr);
> +		__T("failed to create evsel1", evsel);
> +
> +		perf_evlist__add(evlist, evsel);
> +	}
> +	perf_evlist__set_maps(evlist, NULL, threads);
> +
> +	err = perf_evlist__open(evlist);
> +	__T("failed to open evsel", err == 0);
> +
> +	perf_evlist__enable(evlist);
> +
> +	/* wait loop */
> +	count = WAIT_COUNT;
> +	while(count--);
> +
> +	i = 0;
> +	perf_evlist__for_each_evsel(evlist, evsel) {
> +		perf_evsel__read(evsel, 0, 0, &multiplexing_counts[i]);
> +		__T("failed to read value for evsel", multiplexing_counts[i].val != 0);
> +		i++;
> +	}
> +
> +	perf_evlist__disable(evlist);
> +
> +
> +	min = multiplexing_counts[0].val;
> +	for (i = 0; i < EVENT_NUM; i++) {
> +		__T_VERBOSE("\tcount = %lu, run = %lu, enable = %lu\n",
> +			    multiplexing_counts[i].val, multiplexing_counts[i].run,
> +			    multiplexing_counts[i].ena);
> +
> +		if (multiplexing_counts[i].val > max)
> +			max = multiplexing_counts[i].val;
> +
> +		if (multiplexing_counts[i].val < min)
> +			min = multiplexing_counts[i].val;
> +
> +		avg += multiplexing_counts[i].val;
> +
> +		if (multiplexing_counts[i].val != 0)
> +			nonzero++;
> +	}
> +
> +	avg = avg / nonzero;
> +
> +	error = display_error(avg, max, min, expected_counts.val);
> +
> +	__T("Error out of range!", ((error <= 1.0) && (error >= -1.0)));
> +
> +	perf_evlist__close(evlist);
> +	perf_evlist__delete(evlist);
> +
> +	perf_thread_map__put(threads);
> +
> +
> +	/* Verify that no division by zero occurs */
> +	threads = perf_thread_map__new_dummy();
> +	__T("failed to create threads", threads);
> +
> +	perf_thread_map__set_pid(threads, 0, 0);
> +
> +	evlist = perf_evlist__new();
> +	__T("failed to create evlist", evlist);
> +
> +	for (i = 0; i < EVENT_NUM; i++) {
> +		evsel = perf_evsel__new(&attr);
> +		__T("failed to create evsel1", evsel);
> +
> +		perf_evlist__add(evlist, evsel);
> +	}
> +	perf_evlist__set_maps(evlist, NULL, threads);
> +
> +	err = perf_evlist__open(evlist);
> +	__T("failed to open evsel", err == 0);
> +
> +	perf_evlist__enable(evlist);
> +
> +	/* Reproduce run=0 by not executing wait loop. */
> +
> +	i = 0;
> +	perf_evlist__for_each_evsel(evlist, evsel) {
> +		perf_evsel__read(evsel, 0, 0, &multiplexing_counts[i]);
> +		__T_VERBOSE("\tcount = %lu, run = %lu, enable = %lu\n",
> +			    multiplexing_counts[i].val, multiplexing_counts[i].run,
> +			    multiplexing_counts[i].ena);
> +		i++;
> +	}
> +
> +	perf_evlist__close(evlist);
> +	perf_evlist__delete(evlist);
> +
> +	perf_thread_map__put(threads);
> +	return 0;
> +}
> +
>  int test_evlist(int argc, char **argv)
>  {
>  	__T_START;
> @@ -424,6 +606,7 @@ int test_evlist(int argc, char **argv)
>  	test_stat_thread_enable();
>  	test_mmap_thread();
>  	test_mmap_cpus();
> +	test_stat_multiplexing();
>  
>  	__T_END;
>  	return tests_failed == 0 ? 0 : -1;
> -- 
> 2.27.0

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
  2021-10-05 16:36       ` Rob Herring
@ 2021-10-19  4:59         ` nakamura.shun
  0 siblings, 0 replies; 12+ messages in thread
From: nakamura.shun @ 2021-10-19  4:59 UTC (permalink / raw)
  To: Rob Herring
  Cc: Jiri Olsa, peterz, mingo, acme, mark.rutland, alexander.shishkin,
	namhyung, irogers, linux-perf-users, linux-kernel

Hi Rob
Sorry for the late reply.

> > > On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > > > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> > > >
> > > > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > > > does not scale counters obtained by read() system call.
> > > >
> > > > Add processing to perf_evsel__read() to scale the counters obtained during the
> > > > read() system call when multiplexing.
> > > >
> > > >
> > > > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > > > ---
> > > >  tools/lib/perf/evsel.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > > > index 8441e3e1aaac..0ebd1d34436f 100644
> > > > --- a/tools/lib/perf/evsel.c
> > > +++ b/tools/lib/perf/evsel.c
> > > > @@ -18,6 +18,7 @@
> > > >  #include <sys/ioctl.h>
> > > >  #include <sys/mman.h>
> > > >  #include <asm/bug.h>
> > > > +#include <linux/math64.h>
> > > >
> > > >  void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> > > >                      int idx)
> > > > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > > >        if (readn(*fd, count->values, size) <= 0)
> > > >                return -errno;
> > > >
> > > > +     if (count->ena != count->run) {
> > > > +             if (count->run != 0)
> > > > +                     count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > > > +     }
> > >
> > > so I think perf stat expect raw values in there and does the
> > > scaling by itself, please check following code:
> > >
> > > read_counters
> > >   read_affinity_counters
> > >     read_counter_cpu
> > >       read_single_counter
> > >         evsel__read_counter
> > >
> > >   perf_stat_process_counter
> > >     process_counter_maps
> > >       process_counter_values
> > >         perf_counts_values__scale
> > >
> > >
> > > perhaps we could export perf_counts_values__scale if it'd be any help
> >
> > Thank you for your comment.
> >
> > The purpose of this patch is to unify the counters obtained with
> > perf_evsel__read() to scaled or unscaled values.
> >
> > perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is
> > available, else gets by readn(). In current implementation, caller
> > gets scaled counter if goes through RDPMC path, otherwise gets unscaled
> > counter via readn() path.
> >
> > However caller cannnot know which path were taken.
> >
> > If caller expects a raw value, I think the RDPMC path should also
> > return an unscaled counter.
> >
> > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > index c89dfa5..aaa4579 100644
> > --- a/tools/lib/perf/mmap.c
> > +++ b/tools/lib/perf/mmap.c
> > @@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
> >                 count->ena += delta;
> >                 if (idx)
> >                         count->run += delta;
> > -
> > -               cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
> >         }
> >
> >         count->val = cnt;
> >
> > Rob, do you have any comments?
> 
> Submit a proper patch with the above.

Thank you for your comment.
I will send the v3 patch.

Best Regards
Shunsuke

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
  2021-10-07 17:17       ` Jiri Olsa
@ 2021-10-19  5:00         ` nakamura.shun
  2021-11-08  0:49           ` nakamura.shun
  0 siblings, 1 reply; 12+ messages in thread
From: nakamura.shun @ 2021-10-19  5:00 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Rob Herring, peterz, mingo, acme, mark.rutland,
	alexander.shishkin, namhyung, irogers, linux-perf-users,
	linux-kernel

Hi Jirka
Sorry for the late reply.

> > > On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > > > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> > > > 
> > > > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > > > does not scale counters obtained by read() system call.
> > > > 
> > > > Add processing to perf_evsel__read() to scale the counters obtained during the
> > > > read() system call when multiplexing.
> > > > 
> > > > 
> > > > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > > > ---
> > > >  tools/lib/perf/evsel.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > > 
> > > > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > > > index 8441e3e1aaac..0ebd1d34436f 100644
> > > > --- a/tools/lib/perf/evsel.c
> > > +++ b/tools/lib/perf/evsel.c
> > > > @@ -18,6 +18,7 @@
> > > >  #include <sys/ioctl.h>
> > > >  #include <sys/mman.h>
> > > >  #include <asm/bug.h>
> > > > +#include <linux/math64.h>
> > > >  
> > > >  void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> > > >                      int idx)
> > > > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > > >        if (readn(*fd, count->values, size) <= 0)
> > > >                return -errno;
> > > >  
> > > > +     if (count->ena != count->run) {
> > > > +             if (count->run != 0)
> > > > +                     count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > > > +     }
> > > 
> > > so I think perf stat expect raw values in there and does the
> > > scaling by itself, please check following code:
> > > 
> > > read_counters
> > >   read_affinity_counters
> > >     read_counter_cpu
> > >       read_single_counter
> > >         evsel__read_counter
> > > 
> > >   perf_stat_process_counter
> > >     process_counter_maps
> > >       process_counter_values
> > >         perf_counts_values__scale
> > > 
> > > 
> > > perhaps we could export perf_counts_values__scale if it'd be any help
> > 
> > Thank you for your comment.
> > 
> > The purpose of this patch is to unify the counters obtained with 
> > perf_evsel__read() to scaled or unscaled values.
> > 
> > perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is 
> > available, else gets by readn(). In current implementation, caller 
> > gets scaled counter if goes through RDPMC path, otherwise gets unscaled 
> > counter via readn() path.
> > 
> > However caller cannnot know which path were taken.
> > 
> > If caller expects a raw value, I think the RDPMC path should also 
> > return an unscaled counter.
> > 
> > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > index c89dfa5..aaa4579 100644
> > --- a/tools/lib/perf/mmap.c
> > +++ b/tools/lib/perf/mmap.c
> > @@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
> >                 count->ena += delta;
> >                 if (idx)
> >                         count->run += delta;
> > -
> > -               cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
> 
> perf stat does not mmap counters so this should not be invoked
> within perf stat.. but we should be consistent and scale after
> calling perf_evsel__read.. and give user the possibility to get
> un-scaled counts
> 
> that perhaps brings new feature.. mmap perf stat counters to invoke
> the fast reading path for counters.. IIRC it should be matter just
> to mmap the first 'user' page

Thank you for your comment.
I think it will be good that perf stat supports rdpmc.

I will modify the patch. 

Best Regards
Shunsuke

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
  2021-10-19  5:00         ` nakamura.shun
@ 2021-11-08  0:49           ` nakamura.shun
  2021-11-14 16:16             ` Jiri Olsa
  0 siblings, 1 reply; 12+ messages in thread
From: nakamura.shun @ 2021-11-08  0:49 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Rob Herring, peterz, mingo, acme, mark.rutland,
	alexander.shishkin, namhyung, irogers, linux-perf-users,
	linux-kernel

Hi Jirka

> > > > On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > > > > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> > > > > 
> > > > > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > > > > does not scale counters obtained by read() system call.
> > > > > 
> > > > > Add processing to perf_evsel__read() to scale the counters obtained during the
> > > > > read() system call when multiplexing.
> > > > > 
> > > > > 
> > > > > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > > > > ---
> > > > >  tools/lib/perf/evsel.c | 6 ++++++
> > > > >  1 file changed, 6 insertions(+)
> > > > > 
> > > > > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > > > > index 8441e3e1aaac..0ebd1d34436f 100644
> > > > > --- a/tools/lib/perf/evsel.c
> > > > +++ b/tools/lib/perf/evsel.c
> > > > > @@ -18,6 +18,7 @@
> > > > >  #include <sys/ioctl.h>
> > > > >  #include <sys/mman.h>
> > > > >  #include <asm/bug.h>
> > > > > +#include <linux/math64.h>
> > > > >  
> > > > >  void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> > > > >                      int idx)
> > > > > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > > > >        if (readn(*fd, count->values, size) <= 0)
> > > > >                return -errno;
> > > > >  
> > > > > +     if (count->ena != count->run) {
> > > > > +             if (count->run != 0)
> > > > > +                     count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > > > > +     }
> > > > 
> > > > so I think perf stat expect raw values in there and does the
> > > > scaling by itself, please check following code:
> > > > 
> > > > read_counters
> > > >   read_affinity_counters
> > > >     read_counter_cpu
> > > >       read_single_counter
> > > >         evsel__read_counter
> > > > 
> > > >   perf_stat_process_counter
> > > >     process_counter_maps
> > > >       process_counter_values
> > > >         perf_counts_values__scale
> > > > 
> > > > 
> > > > perhaps we could export perf_counts_values__scale if it'd be any help
> > > 
> > > Thank you for your comment.
> > > 
> > > The purpose of this patch is to unify the counters obtained with 
> > > perf_evsel__read() to scaled or unscaled values.
> > > 
> > > perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is 
> > > available, else gets by readn(). In current implementation, caller 
> > > gets scaled counter if goes through RDPMC path, otherwise gets unscaled 
> > > counter via readn() path.
> > > 
> > > However caller cannnot know which path were taken.
> > > 
> > > If caller expects a raw value, I think the RDPMC path should also 
> > > return an unscaled counter.
> > > 
> > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > > index c89dfa5..aaa4579 100644
> > > --- a/tools/lib/perf/mmap.c
> > > +++ b/tools/lib/perf/mmap.c
> > > @@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
> > >                 count->ena += delta;
> > >                 if (idx)
> > >                         count->run += delta;
> > > -
> > > -               cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
> > 
> > perf stat does not mmap counters so this should not be invoked
> > within perf stat.. but we should be consistent and scale after
> > calling perf_evsel__read.. and give user the possibility to get
> > un-scaled counts
> > 
> > that perhaps brings new feature.. mmap perf stat counters to invoke
> > the fast reading path for counters.. IIRC it should be matter just
> > to mmap the first 'user' page
> 
> Thank you for your comment.
> I think it will be good that perf stat supports rdpmc.
> 
> I will modify the patch. 

I think rdpmc cannot measure the command/program specified in perf stat 
because it measures the calling thread of perf_event_open.
If my understanding is wrong, please point it out to me.

Best Regards
Shunsuke

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
  2021-11-08  0:49           ` nakamura.shun
@ 2021-11-14 16:16             ` Jiri Olsa
  0 siblings, 0 replies; 12+ messages in thread
From: Jiri Olsa @ 2021-11-14 16:16 UTC (permalink / raw)
  To: nakamura.shun
  Cc: Rob Herring, peterz, mingo, acme, mark.rutland,
	alexander.shishkin, namhyung, irogers, linux-perf-users,
	linux-kernel

On Mon, Nov 08, 2021 at 12:49:24AM +0000, nakamura.shun@fujitsu.com wrote:
> Hi Jirka
> 
> > > > > On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > > > > > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> > > > > > 
> > > > > > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > > > > > does not scale counters obtained by read() system call.
> > > > > > 
> > > > > > Add processing to perf_evsel__read() to scale the counters obtained during the
> > > > > > read() system call when multiplexing.
> > > > > > 
> > > > > > 
> > > > > > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > > > > > ---
> > > > > >  tools/lib/perf/evsel.c | 6 ++++++
> > > > > >  1 file changed, 6 insertions(+)
> > > > > > 
> > > > > > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > > > > > index 8441e3e1aaac..0ebd1d34436f 100644
> > > > > > --- a/tools/lib/perf/evsel.c
> > > > > +++ b/tools/lib/perf/evsel.c
> > > > > > @@ -18,6 +18,7 @@
> > > > > >  #include <sys/ioctl.h>
> > > > > >  #include <sys/mman.h>
> > > > > >  #include <asm/bug.h>
> > > > > > +#include <linux/math64.h>
> > > > > >  
> > > > > >  void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> > > > > >                      int idx)
> > > > > > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > > > > >        if (readn(*fd, count->values, size) <= 0)
> > > > > >                return -errno;
> > > > > >  
> > > > > > +     if (count->ena != count->run) {
> > > > > > +             if (count->run != 0)
> > > > > > +                     count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > > > > > +     }
> > > > > 
> > > > > so I think perf stat expect raw values in there and does the
> > > > > scaling by itself, please check following code:
> > > > > 
> > > > > read_counters
> > > > >   read_affinity_counters
> > > > >     read_counter_cpu
> > > > >       read_single_counter
> > > > >         evsel__read_counter
> > > > > 
> > > > >   perf_stat_process_counter
> > > > >     process_counter_maps
> > > > >       process_counter_values
> > > > >         perf_counts_values__scale
> > > > > 
> > > > > 
> > > > > perhaps we could export perf_counts_values__scale if it'd be any help
> > > > 
> > > > Thank you for your comment.
> > > > 
> > > > The purpose of this patch is to unify the counters obtained with 
> > > > perf_evsel__read() to scaled or unscaled values.
> > > > 
> > > > perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is 
> > > > available, else gets by readn(). In current implementation, caller 
> > > > gets scaled counter if goes through RDPMC path, otherwise gets unscaled 
> > > > counter via readn() path.
> > > > 
> > > > However caller cannnot know which path were taken.
> > > > 
> > > > If caller expects a raw value, I think the RDPMC path should also 
> > > > return an unscaled counter.
> > > > 
> > > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > > > index c89dfa5..aaa4579 100644
> > > > --- a/tools/lib/perf/mmap.c
> > > > +++ b/tools/lib/perf/mmap.c
> > > > @@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
> > > >                 count->ena += delta;
> > > >                 if (idx)
> > > >                         count->run += delta;
> > > > -
> > > > -               cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
> > > 
> > > perf stat does not mmap counters so this should not be invoked
> > > within perf stat.. but we should be consistent and scale after
> > > calling perf_evsel__read.. and give user the possibility to get
> > > un-scaled counts
> > > 
> > > that perhaps brings new feature.. mmap perf stat counters to invoke
> > > the fast reading path for counters.. IIRC it should be matter just
> > > to mmap the first 'user' page
> > 
> > Thank you for your comment.
> > I think it will be good that perf stat supports rdpmc.
> > 
> > I will modify the patch. 
> 
> I think rdpmc cannot measure the command/program specified in perf stat 
> because it measures the calling thread of perf_event_open.
> If my understanding is wrong, please point it out to me.

right, I guess we could use that just for system wide monitoring,
where we open counter for each cpu

jirka


^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2021-11-14 16:16 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22 10:16 [PATCH v2 0/2] libperf: Add support for scaling counters obtained from the read() system call during multiplexing Shunsuke Nakamura
2021-09-22 10:16 ` [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing Shunsuke Nakamura
2021-09-22 21:34   ` Jiri Olsa
2021-09-28  9:53     ` nakamura.shun
2021-10-05 16:36       ` Rob Herring
2021-10-19  4:59         ` nakamura.shun
2021-10-07 17:17       ` Jiri Olsa
2021-10-19  5:00         ` nakamura.shun
2021-11-08  0:49           ` nakamura.shun
2021-11-14 16:16             ` Jiri Olsa
2021-09-22 10:16 ` [PATCH v2 2/2] libperf tests: Add test_stat_multiplexing test Shunsuke Nakamura
2021-10-08 19:11   ` Arnaldo Carvalho de Melo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).