* [PATCH v2 0/2] libperf: Add support for scaling counters obtained from the read() system call during multiplexing
@ 2021-09-22 10:16 Shunsuke Nakamura
2021-09-22 10:16 ` [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing Shunsuke Nakamura
2021-09-22 10:16 ` [PATCH v2 2/2] libperf tests: Add test_stat_multiplexing test Shunsuke Nakamura
0 siblings, 2 replies; 12+ messages in thread
From: Shunsuke Nakamura @ 2021-09-22 10:16 UTC (permalink / raw)
To: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung
Cc: linux-kernel, linux-perf-users
This patch series supports counter scaling when perf_evsel__read() obtains a counter
using the read() system call during multiplexing.
The first patch adds scaling of counters obtained from the read() system call
during multiplexing.
The second patch adds a test for the first patch.
This patch is based on Vince's rdpmc_multiplexing.c [1]
---
Changes in v2:
- Fix not to divide by zero when counter scaling
- Add test to verify that no division by zero occurs
[1] https://github.com/deater/perf_event_tests/blob/master/tests/rdpmc/rdpmc_multiplexing.c
nakamura shunsuke (2):
libperf: Add processing to scale the counters obtained during the
read() system call when multiplexing
libperf tests: Add test_stat_multiplexing test
tools/lib/perf/evsel.c | 6 +
tools/lib/perf/tests/test-evlist.c | 183 +++++++++++++++++++++++++++++
2 files changed, 189 insertions(+)
--
2.27.0
^ permalink raw reply [flat|nested] 12+ messages in thread
* [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
2021-09-22 10:16 [PATCH v2 0/2] libperf: Add support for scaling counters obtained from the read() system call during multiplexing Shunsuke Nakamura
@ 2021-09-22 10:16 ` Shunsuke Nakamura
2021-09-22 21:34 ` Jiri Olsa
2021-09-22 10:16 ` [PATCH v2 2/2] libperf tests: Add test_stat_multiplexing test Shunsuke Nakamura
1 sibling, 1 reply; 12+ messages in thread
From: Shunsuke Nakamura @ 2021-09-22 10:16 UTC (permalink / raw)
To: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung
Cc: linux-kernel, linux-perf-users
From: nakamura shunsuke <nakamura.shun@fujitsu.com>
perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
does not scale counters obtained by read() system call.
Add processing to perf_evsel__read() to scale the counters obtained during the
read() system call when multiplexing.
Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
---
tools/lib/perf/evsel.c | 6 ++++++
1 file changed, 6 insertions(+)
diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
index 8441e3e1aaac..0ebd1d34436f 100644
--- a/tools/lib/perf/evsel.c
+++ b/tools/lib/perf/evsel.c
@@ -18,6 +18,7 @@
#include <sys/ioctl.h>
#include <sys/mman.h>
#include <asm/bug.h>
+#include <linux/math64.h>
void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
int idx)
@@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
if (readn(*fd, count->values, size) <= 0)
return -errno;
+ if (count->ena != count->run) {
+ if (count->run != 0)
+ count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
+ }
+
return 0;
}
--
2.27.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* [PATCH v2 2/2] libperf tests: Add test_stat_multiplexing test
2021-09-22 10:16 [PATCH v2 0/2] libperf: Add support for scaling counters obtained from the read() system call during multiplexing Shunsuke Nakamura
2021-09-22 10:16 ` [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing Shunsuke Nakamura
@ 2021-09-22 10:16 ` Shunsuke Nakamura
2021-10-08 19:11 ` Arnaldo Carvalho de Melo
1 sibling, 1 reply; 12+ messages in thread
From: Shunsuke Nakamura @ 2021-09-22 10:16 UTC (permalink / raw)
To: peterz, mingo, acme, mark.rutland, alexander.shishkin, jolsa, namhyung
Cc: linux-kernel, linux-perf-users
From: nakamura shunsuke <nakamura.shun@fujitsu.com>
Adds a test for a counter obtained using read() system call during multiplexing
Committer testing:
$ sudo make tests -C tools/lib/perf/ V=1
make: Entering directory '/home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/lib/perf'
make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=. obj=libperf
make -C /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/lib/api/ O= libapi.a
make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=./fd obj=libapi
make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=./fs obj=libapi
make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=. obj=tests
make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=./tests obj=tests
running static:
- running tests/test-cpumap.c...OK
- running tests/test-threadmap.c...OK
- running tests/test-evlist.c...
count = 502273264, run = 264105562, enable = 444092845
count = 501850699, run = 264100505, enable = 444088400
count = 499613780, run = 264096660, enable = 444085073
count = 499569398, run = 264501528, enable = 444081268
count = 500717228, run = 265495387, enable = 444077029
count = 501260539, run = 266488226, enable = 444071569
count = 501023910, run = 267469613, enable = 444065303
count = 501188267, run = 268461294, enable = 444058413
count = 503096683, run = 269450511, enable = 444050567
count = 503141904, run = 269587790, enable = 444041735
count = 503454490, run = 268590361, enable = 444030544
count = 504838207, run = 267592624, enable = 444021005
count = 505458594, run = 266594648, enable = 444001280
count = 504269433, run = 265595795, enable = 443991377
count = 502625858, run = 264599374, enable = 443982141
Expected: 502065774
High: 505458594 Low: 499569398 Average: 502292150
Average Error = 0.05%
count = 26129, run = 134550, enable = 134550
count = 34035, run = 140540, enable = 140540
count = 40189, run = 141126, enable = 141126
count = 45263, run = 138290, enable = 138290
count = 51381, run = 136966, enable = 136966
count = 57499, run = 134993, enable = 134993
count = 63617, run = 132320, enable = 132320
count = 71714, run = 132047, enable = 132047
count = 77832, run = 128157, enable = 128157
count = 0, run = 0, enable = 123662
count = 0, run = 0, enable = 116753
count = 0, run = 0, enable = 110304
count = 0, run = 0, enable = 103834
count = 0, run = 0, enable = 97465
count = 0, run = 0, enable = 91110
OK
- running tests/test-evsel.c...
loop = 65536, count = 334303
loop = 131072, count = 656954
loop = 262144, count = 1321090
loop = 524288, count = 2652616
loop = 1048576, count = 5262140
loop = 65536, count = 393726
loop = 131072, count = 989345
loop = 262144, count = 1986206
loop = 524288, count = 3831652
loop = 1048576, count = 7747957
OK
running dynamic:
- running tests/test-cpumap.c...OK
- running tests/test-threadmap.c...OK
- running tests/test-evlist.c...
count = 501975838, run = 247092568, enable = 415093723
count = 502491858, run = 247087145, enable = 415088830
count = 502571026, run = 247083574, enable = 415085139
count = 502743767, run = 247080183, enable = 415081197
count = 503259008, run = 247779561, enable = 415076990
count = 503169622, run = 248773141, enable = 415072091
count = 503430373, run = 249816445, enable = 415066529
count = 502597951, run = 250808797, enable = 415059863
count = 502211776, run = 251799199, enable = 415051831
count = 501921896, run = 252000721, enable = 415042717
count = 501441904, run = 251309399, enable = 415031152
count = 501224474, run = 250311107, enable = 415021479
count = 500502827, run = 249260579, enable = 415012168
count = 500775622, run = 248259621, enable = 415002830
count = 501377375, run = 247261134, enable = 414993890
Expected: 501953755
High: 503430373 Low: 500502827 Average: 502113021
Average Error = 0.03%
count = 26098, run = 131826, enable = 131826
count = 33689, run = 138284, enable = 138284
count = 39808, run = 139152, enable = 139152
count = 45927, run = 138880, enable = 138880
count = 51049, run = 135405, enable = 135405
count = 57168, run = 133406, enable = 133406
count = 63287, run = 130736, enable = 130736
count = 71392, run = 130474, enable = 130474
count = 77794, run = 127203, enable = 127203
count = 0, run = 0, enable = 123172
count = 0, run = 0, enable = 116856
count = 0, run = 0, enable = 110557
count = 0, run = 0, enable = 104282
count = 0, run = 0, enable = 97929
count = 0, run = 0, enable = 91792
OK
- running tests/test-evsel.c...
loop = 65536, count = 334400
loop = 131072, count = 656908
loop = 262144, count = 1315441
loop = 524288, count = 2657481
loop = 1048576, count = 5258193
loop = 65536, count = 491938
loop = 131072, count = 965755
loop = 262144, count = 1666277
loop = 524288, count = 3516123
loop = 1048576, count = 6759204
OK
make: Leaving directory '/home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/lib/perf'
Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
---
tools/lib/perf/tests/test-evlist.c | 183 +++++++++++++++++++++++++++++
1 file changed, 183 insertions(+)
diff --git a/tools/lib/perf/tests/test-evlist.c b/tools/lib/perf/tests/test-evlist.c
index c67c83399170..ca7a9542f40e 100644
--- a/tools/lib/perf/tests/test-evlist.c
+++ b/tools/lib/perf/tests/test-evlist.c
@@ -21,6 +21,10 @@
#include "tests.h"
#include <internal/evsel.h>
+#define EVENT_NUM 15
+#define WAIT_COUNT 100000000UL
+
+
static int libperf_print(enum libperf_print_level level,
const char *fmt, va_list ap)
{
@@ -413,6 +417,184 @@ static int test_mmap_cpus(void)
return 0;
}
+static double display_error(long long average,
+ long long high,
+ long long low,
+ long long expected) {
+
+ double error;
+
+ error = (((double)average - expected) / expected) * 100.0;
+
+ __T_VERBOSE(" Expected: %lld\n", expected);
+ __T_VERBOSE(" High: %lld Low: %lld Average: %lld\n",
+ high, low, average);
+
+ __T_VERBOSE(" Average Error = %.2f%%\n",error);
+
+ return error;
+
+}
+
+static int test_stat_multiplexing(void)
+{
+ struct perf_counts_values expected_counts = { .val = 0 };
+ struct perf_counts_values multiplexing_counts[EVENT_NUM] = {{ .val = 0 },};
+ struct perf_thread_map *threads;
+ struct perf_evlist *evlist;
+ struct perf_evsel *evsel;
+ struct perf_event_attr attr = {
+ .type = PERF_TYPE_HARDWARE,
+ .config = PERF_COUNT_HW_INSTRUCTIONS,
+ .read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
+ PERF_FORMAT_TOTAL_TIME_RUNNING,
+ .disabled = 1,
+ };
+ int err, i, nonzero=0;
+ unsigned long count;
+ long long max = 0, min = 0, avg = 0;
+ double error = 0.0;
+
+
+ /* read for non-multiplexing event count */
+ threads = perf_thread_map__new_dummy();
+ __T("failed to create threads", threads);
+
+ perf_thread_map__set_pid(threads, 0, 0);
+
+ evsel = perf_evsel__new(&attr);
+ __T("failed to create evsel", evsel);
+
+ err = perf_evsel__open(evsel, NULL, threads);
+ __T("failed to open evsel", err == 0);
+
+ err = perf_evsel__enable(evsel);
+ __T("failed to enable evsel", err == 0);
+
+ /* wait loop */
+ count = WAIT_COUNT;
+ while(count--);
+
+ perf_evsel__read(evsel, 0, 0, &expected_counts);
+ __T("failed to read value for evsel", expected_counts.val != 0);
+ __T("failed to read non-multiplexing event count",
+ expected_counts.ena == expected_counts.run);
+
+ err = perf_evsel__disable(evsel);
+ __T("failed to enable evsel", err == 0);
+
+ perf_evsel__close(evsel);
+ perf_evsel__delete(evsel);
+
+ perf_thread_map__put(threads);
+
+
+ /* read for multiplexing event count */
+ threads = perf_thread_map__new_dummy();
+ __T("failed to create threads", threads);
+
+ perf_thread_map__set_pid(threads, 0, 0);
+
+ evlist = perf_evlist__new();
+ __T("failed to create evlist", evlist);
+
+ for (i = 0; i < EVENT_NUM; i++) {
+ evsel = perf_evsel__new(&attr);
+ __T("failed to create evsel1", evsel);
+
+ perf_evlist__add(evlist, evsel);
+ }
+ perf_evlist__set_maps(evlist, NULL, threads);
+
+ err = perf_evlist__open(evlist);
+ __T("failed to open evsel", err == 0);
+
+ perf_evlist__enable(evlist);
+
+ /* wait loop */
+ count = WAIT_COUNT;
+ while(count--);
+
+ i = 0;
+ perf_evlist__for_each_evsel(evlist, evsel) {
+ perf_evsel__read(evsel, 0, 0, &multiplexing_counts[i]);
+ __T("failed to read value for evsel", multiplexing_counts[i].val != 0);
+ i++;
+ }
+
+ perf_evlist__disable(evlist);
+
+
+ min = multiplexing_counts[0].val;
+ for (i = 0; i < EVENT_NUM; i++) {
+ __T_VERBOSE("\tcount = %lu, run = %lu, enable = %lu\n",
+ multiplexing_counts[i].val, multiplexing_counts[i].run,
+ multiplexing_counts[i].ena);
+
+ if (multiplexing_counts[i].val > max)
+ max = multiplexing_counts[i].val;
+
+ if (multiplexing_counts[i].val < min)
+ min = multiplexing_counts[i].val;
+
+ avg += multiplexing_counts[i].val;
+
+ if (multiplexing_counts[i].val != 0)
+ nonzero++;
+ }
+
+ avg = avg / nonzero;
+
+ error = display_error(avg, max, min, expected_counts.val);
+
+ __T("Error out of range!", ((error <= 1.0) && (error >= -1.0)));
+
+ perf_evlist__close(evlist);
+ perf_evlist__delete(evlist);
+
+ perf_thread_map__put(threads);
+
+
+ /* Verify that no division by zero occurs */
+ threads = perf_thread_map__new_dummy();
+ __T("failed to create threads", threads);
+
+ perf_thread_map__set_pid(threads, 0, 0);
+
+ evlist = perf_evlist__new();
+ __T("failed to create evlist", evlist);
+
+ for (i = 0; i < EVENT_NUM; i++) {
+ evsel = perf_evsel__new(&attr);
+ __T("failed to create evsel1", evsel);
+
+ perf_evlist__add(evlist, evsel);
+ }
+ perf_evlist__set_maps(evlist, NULL, threads);
+
+ err = perf_evlist__open(evlist);
+ __T("failed to open evsel", err == 0);
+
+ perf_evlist__enable(evlist);
+
+ /* Reproduce run=0 by not executing wait loop. */
+
+ i = 0;
+ perf_evlist__for_each_evsel(evlist, evsel) {
+ perf_evsel__read(evsel, 0, 0, &multiplexing_counts[i]);
+ __T_VERBOSE("\tcount = %lu, run = %lu, enable = %lu\n",
+ multiplexing_counts[i].val, multiplexing_counts[i].run,
+ multiplexing_counts[i].ena);
+ i++;
+ }
+
+ perf_evlist__close(evlist);
+ perf_evlist__delete(evlist);
+
+ perf_thread_map__put(threads);
+ return 0;
+}
+
int test_evlist(int argc, char **argv)
{
__T_START;
@@ -424,6 +606,7 @@ int test_evlist(int argc, char **argv)
test_stat_thread_enable();
test_mmap_thread();
test_mmap_cpus();
+ test_stat_multiplexing();
__T_END;
return tests_failed == 0 ? 0 : -1;
--
2.27.0
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
2021-09-22 10:16 ` [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing Shunsuke Nakamura
@ 2021-09-22 21:34 ` Jiri Olsa
2021-09-28 9:53 ` nakamura.shun
0 siblings, 1 reply; 12+ messages in thread
From: Jiri Olsa @ 2021-09-22 21:34 UTC (permalink / raw)
To: Shunsuke Nakamura
Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, namhyung,
linux-kernel, linux-perf-users
On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> From: nakamura shunsuke <nakamura.shun@fujitsu.com>
>
> perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> does not scale counters obtained by read() system call.
>
> Add processing to perf_evsel__read() to scale the counters obtained during the
> read() system call when multiplexing.
>
>
> Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> ---
> tools/lib/perf/evsel.c | 6 ++++++
> 1 file changed, 6 insertions(+)
>
> diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> index 8441e3e1aaac..0ebd1d34436f 100644
> --- a/tools/lib/perf/evsel.c
> +++ b/tools/lib/perf/evsel.c
> @@ -18,6 +18,7 @@
> #include <sys/ioctl.h>
> #include <sys/mman.h>
> #include <asm/bug.h>
> +#include <linux/math64.h>
>
> void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> int idx)
> @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> if (readn(*fd, count->values, size) <= 0)
> return -errno;
>
> + if (count->ena != count->run) {
> + if (count->run != 0)
> + count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> + }
so I think perf stat expect raw values in there and does the
scaling by itself, please check following code:
read_counters
read_affinity_counters
read_counter_cpu
read_single_counter
evsel__read_counter
perf_stat_process_counter
process_counter_maps
process_counter_values
perf_counts_values__scale
perhaps we could export perf_counts_values__scale if it'd be any help
jirka
> +
> return 0;
> }
>
> --
> 2.27.0
>
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
2021-09-22 21:34 ` Jiri Olsa
@ 2021-09-28 9:53 ` nakamura.shun
2021-10-05 16:36 ` Rob Herring
2021-10-07 17:17 ` Jiri Olsa
0 siblings, 2 replies; 12+ messages in thread
From: nakamura.shun @ 2021-09-28 9:53 UTC (permalink / raw)
To: Jiri Olsa, Rob Herring
Cc: peterz, mingo, acme, mark.rutland, alexander.shishkin, namhyung,
irogers, linux-perf-users, linux-kernel
Hi Jirka
> On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> >
> > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > does not scale counters obtained by read() system call.
> >
> > Add processing to perf_evsel__read() to scale the counters obtained during the
> > read() system call when multiplexing.
> >
> >
> > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > ---
> > tools/lib/perf/evsel.c | 6 ++++++
> > 1 file changed, 6 insertions(+)
> >
> > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > index 8441e3e1aaac..0ebd1d34436f 100644
> > --- a/tools/lib/perf/evsel.c
> +++ b/tools/lib/perf/evsel.c
> > @@ -18,6 +18,7 @@
> > #include <sys/ioctl.h>
> > #include <sys/mman.h>
> > #include <asm/bug.h>
> > +#include <linux/math64.h>
> >
> > void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> > int idx)
> > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > if (readn(*fd, count->values, size) <= 0)
> > return -errno;
> >
> > + if (count->ena != count->run) {
> > + if (count->run != 0)
> > + count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > + }
>
> so I think perf stat expect raw values in there and does the
> scaling by itself, please check following code:
>
> read_counters
> read_affinity_counters
> read_counter_cpu
> read_single_counter
> evsel__read_counter
>
> perf_stat_process_counter
> process_counter_maps
> process_counter_values
> perf_counts_values__scale
>
>
> perhaps we could export perf_counts_values__scale if it'd be any help
Thank you for your comment.
The purpose of this patch is to unify the counters obtained with
perf_evsel__read() to scaled or unscaled values.
perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is
available, else gets by readn(). In current implementation, caller
gets scaled counter if goes through RDPMC path, otherwise gets unscaled
counter via readn() path.
However caller cannnot know which path were taken.
If caller expects a raw value, I think the RDPMC path should also
return an unscaled counter.
diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
index c89dfa5..aaa4579 100644
--- a/tools/lib/perf/mmap.c
+++ b/tools/lib/perf/mmap.c
@@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
count->ena += delta;
if (idx)
count->run += delta;
-
- cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
}
count->val = cnt;
Rob, do you have any comments?
Best Regards
Shunsuke
^ permalink raw reply related [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
2021-09-28 9:53 ` nakamura.shun
@ 2021-10-05 16:36 ` Rob Herring
2021-10-19 4:59 ` nakamura.shun
2021-10-07 17:17 ` Jiri Olsa
1 sibling, 1 reply; 12+ messages in thread
From: Rob Herring @ 2021-10-05 16:36 UTC (permalink / raw)
To: nakamura.shun
Cc: Jiri Olsa, peterz, mingo, acme, mark.rutland, alexander.shishkin,
namhyung, irogers, linux-perf-users, linux-kernel
On Tue, Sep 28, 2021 at 7:41 AM nakamura.shun@fujitsu.com
<nakamura.shun@fujitsu.com> wrote:
>
> Hi Jirka
>
> > On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> > >
> > > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > > does not scale counters obtained by read() system call.
> > >
> > > Add processing to perf_evsel__read() to scale the counters obtained during the
> > > read() system call when multiplexing.
> > >
> > >
> > > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > > ---
> > > tools/lib/perf/evsel.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > > index 8441e3e1aaac..0ebd1d34436f 100644
> > > --- a/tools/lib/perf/evsel.c
> > +++ b/tools/lib/perf/evsel.c
> > > @@ -18,6 +18,7 @@
> > > #include <sys/ioctl.h>
> > > #include <sys/mman.h>
> > > #include <asm/bug.h>
> > > +#include <linux/math64.h>
> > >
> > > void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> > > int idx)
> > > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > > if (readn(*fd, count->values, size) <= 0)
> > > return -errno;
> > >
> > > + if (count->ena != count->run) {
> > > + if (count->run != 0)
> > > + count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > > + }
> >
> > so I think perf stat expect raw values in there and does the
> > scaling by itself, please check following code:
> >
> > read_counters
> > read_affinity_counters
> > read_counter_cpu
> > read_single_counter
> > evsel__read_counter
> >
> > perf_stat_process_counter
> > process_counter_maps
> > process_counter_values
> > perf_counts_values__scale
> >
> >
> > perhaps we could export perf_counts_values__scale if it'd be any help
>
> Thank you for your comment.
>
> The purpose of this patch is to unify the counters obtained with
> perf_evsel__read() to scaled or unscaled values.
>
> perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is
> available, else gets by readn(). In current implementation, caller
> gets scaled counter if goes through RDPMC path, otherwise gets unscaled
> counter via readn() path.
>
> However caller cannnot know which path were taken.
>
> If caller expects a raw value, I think the RDPMC path should also
> return an unscaled counter.
>
> diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> index c89dfa5..aaa4579 100644
> --- a/tools/lib/perf/mmap.c
> +++ b/tools/lib/perf/mmap.c
> @@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
> count->ena += delta;
> if (idx)
> count->run += delta;
> -
> - cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
> }
>
> count->val = cnt;
>
> Rob, do you have any comments?
Submit a proper patch with the above.
Rob
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
2021-09-28 9:53 ` nakamura.shun
2021-10-05 16:36 ` Rob Herring
@ 2021-10-07 17:17 ` Jiri Olsa
2021-10-19 5:00 ` nakamura.shun
1 sibling, 1 reply; 12+ messages in thread
From: Jiri Olsa @ 2021-10-07 17:17 UTC (permalink / raw)
To: nakamura.shun
Cc: Rob Herring, peterz, mingo, acme, mark.rutland,
alexander.shishkin, namhyung, irogers, linux-perf-users,
linux-kernel
On Tue, Sep 28, 2021 at 09:53:24AM +0000, nakamura.shun@fujitsu.com wrote:
> Hi Jirka
>
> > On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> > >
> > > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > > does not scale counters obtained by read() system call.
> > >
> > > Add processing to perf_evsel__read() to scale the counters obtained during the
> > > read() system call when multiplexing.
> > >
> > >
> > > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > > ---
> > > tools/lib/perf/evsel.c | 6 ++++++
> > > 1 file changed, 6 insertions(+)
> > >
> > > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > > index 8441e3e1aaac..0ebd1d34436f 100644
> > > --- a/tools/lib/perf/evsel.c
> > +++ b/tools/lib/perf/evsel.c
> > > @@ -18,6 +18,7 @@
> > > #include <sys/ioctl.h>
> > > #include <sys/mman.h>
> > > #include <asm/bug.h>
> > > +#include <linux/math64.h>
> > >
> > > void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> > > int idx)
> > > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > > if (readn(*fd, count->values, size) <= 0)
> > > return -errno;
> > >
> > > + if (count->ena != count->run) {
> > > + if (count->run != 0)
> > > + count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > > + }
> >
> > so I think perf stat expect raw values in there and does the
> > scaling by itself, please check following code:
> >
> > read_counters
> > read_affinity_counters
> > read_counter_cpu
> > read_single_counter
> > evsel__read_counter
> >
> > perf_stat_process_counter
> > process_counter_maps
> > process_counter_values
> > perf_counts_values__scale
> >
> >
> > perhaps we could export perf_counts_values__scale if it'd be any help
>
> Thank you for your comment.
>
> The purpose of this patch is to unify the counters obtained with
> perf_evsel__read() to scaled or unscaled values.
>
> perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is
> available, else gets by readn(). In current implementation, caller
> gets scaled counter if goes through RDPMC path, otherwise gets unscaled
> counter via readn() path.
>
> However caller cannnot know which path were taken.
>
> If caller expects a raw value, I think the RDPMC path should also
> return an unscaled counter.
>
> diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> index c89dfa5..aaa4579 100644
> --- a/tools/lib/perf/mmap.c
> +++ b/tools/lib/perf/mmap.c
> @@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
> count->ena += delta;
> if (idx)
> count->run += delta;
> -
> - cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
perf stat does not mmap counters so this should not be invoked
within perf stat.. but we should be consistent and scale after
calling perf_evsel__read.. and give user the possibility to get
un-scaled counts
that perhaps brings new feature.. mmap perf stat counters to invoke
the fast reading path for counters.. IIRC it should be matter just
to mmap the first 'user' page
thanks,
jirka
> }
>
> count->val = cnt;
>
> Rob, do you have any comments?
>
> Best Regards
> Shunsuke
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 2/2] libperf tests: Add test_stat_multiplexing test
2021-09-22 10:16 ` [PATCH v2 2/2] libperf tests: Add test_stat_multiplexing test Shunsuke Nakamura
@ 2021-10-08 19:11 ` Arnaldo Carvalho de Melo
0 siblings, 0 replies; 12+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-10-08 19:11 UTC (permalink / raw)
To: Shunsuke Nakamura
Cc: Rob Herring, peterz, mingo, mark.rutland, alexander.shishkin,
jolsa, namhyung, linux-kernel, linux-perf-users
Em Wed, Sep 22, 2021 at 07:16:27PM +0900, Shunsuke Nakamura escreveu:
> From: nakamura shunsuke <nakamura.shun@fujitsu.com>
>
> Adds a test for a counter obtained using read() system call during multiplexing
You forgot to add Rob to the CC list, can I have Acked-by or
Reviewed-by?
Thanks,
- Arnaldo
> Committer testing:
>
> $ sudo make tests -C tools/lib/perf/ V=1
> make: Entering directory '/home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/lib/perf'
> make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=. obj=libperf
> make -C /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/lib/api/ O= libapi.a
> make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=./fd obj=libapi
> make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=./fs obj=libapi
> make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=. obj=tests
> make -f /home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/build/Makefile.build dir=./tests obj=tests
> running static:
> - running tests/test-cpumap.c...OK
> - running tests/test-threadmap.c...OK
> - running tests/test-evlist.c...
> count = 502273264, run = 264105562, enable = 444092845
> count = 501850699, run = 264100505, enable = 444088400
> count = 499613780, run = 264096660, enable = 444085073
> count = 499569398, run = 264501528, enable = 444081268
> count = 500717228, run = 265495387, enable = 444077029
> count = 501260539, run = 266488226, enable = 444071569
> count = 501023910, run = 267469613, enable = 444065303
> count = 501188267, run = 268461294, enable = 444058413
> count = 503096683, run = 269450511, enable = 444050567
> count = 503141904, run = 269587790, enable = 444041735
> count = 503454490, run = 268590361, enable = 444030544
> count = 504838207, run = 267592624, enable = 444021005
> count = 505458594, run = 266594648, enable = 444001280
> count = 504269433, run = 265595795, enable = 443991377
> count = 502625858, run = 264599374, enable = 443982141
> Expected: 502065774
> High: 505458594 Low: 499569398 Average: 502292150
> Average Error = 0.05%
> count = 26129, run = 134550, enable = 134550
> count = 34035, run = 140540, enable = 140540
> count = 40189, run = 141126, enable = 141126
> count = 45263, run = 138290, enable = 138290
> count = 51381, run = 136966, enable = 136966
> count = 57499, run = 134993, enable = 134993
> count = 63617, run = 132320, enable = 132320
> count = 71714, run = 132047, enable = 132047
> count = 77832, run = 128157, enable = 128157
> count = 0, run = 0, enable = 123662
> count = 0, run = 0, enable = 116753
> count = 0, run = 0, enable = 110304
> count = 0, run = 0, enable = 103834
> count = 0, run = 0, enable = 97465
> count = 0, run = 0, enable = 91110
> OK
> - running tests/test-evsel.c...
> loop = 65536, count = 334303
> loop = 131072, count = 656954
> loop = 262144, count = 1321090
> loop = 524288, count = 2652616
> loop = 1048576, count = 5262140
> loop = 65536, count = 393726
> loop = 131072, count = 989345
> loop = 262144, count = 1986206
> loop = 524288, count = 3831652
> loop = 1048576, count = 7747957
> OK
> running dynamic:
> - running tests/test-cpumap.c...OK
> - running tests/test-threadmap.c...OK
> - running tests/test-evlist.c...
> count = 501975838, run = 247092568, enable = 415093723
> count = 502491858, run = 247087145, enable = 415088830
> count = 502571026, run = 247083574, enable = 415085139
> count = 502743767, run = 247080183, enable = 415081197
> count = 503259008, run = 247779561, enable = 415076990
> count = 503169622, run = 248773141, enable = 415072091
> count = 503430373, run = 249816445, enable = 415066529
> count = 502597951, run = 250808797, enable = 415059863
> count = 502211776, run = 251799199, enable = 415051831
> count = 501921896, run = 252000721, enable = 415042717
> count = 501441904, run = 251309399, enable = 415031152
> count = 501224474, run = 250311107, enable = 415021479
> count = 500502827, run = 249260579, enable = 415012168
> count = 500775622, run = 248259621, enable = 415002830
> count = 501377375, run = 247261134, enable = 414993890
> Expected: 501953755
> High: 503430373 Low: 500502827 Average: 502113021
> Average Error = 0.03%
> count = 26098, run = 131826, enable = 131826
> count = 33689, run = 138284, enable = 138284
> count = 39808, run = 139152, enable = 139152
> count = 45927, run = 138880, enable = 138880
> count = 51049, run = 135405, enable = 135405
> count = 57168, run = 133406, enable = 133406
> count = 63287, run = 130736, enable = 130736
> count = 71392, run = 130474, enable = 130474
> count = 77794, run = 127203, enable = 127203
> count = 0, run = 0, enable = 123172
> count = 0, run = 0, enable = 116856
> count = 0, run = 0, enable = 110557
> count = 0, run = 0, enable = 104282
> count = 0, run = 0, enable = 97929
> count = 0, run = 0, enable = 91792
> OK
> - running tests/test-evsel.c...
> loop = 65536, count = 334400
> loop = 131072, count = 656908
> loop = 262144, count = 1315441
> loop = 524288, count = 2657481
> loop = 1048576, count = 5258193
> loop = 65536, count = 491938
> loop = 131072, count = 965755
> loop = 262144, count = 1666277
> loop = 524288, count = 3516123
> loop = 1048576, count = 6759204
> OK
> make: Leaving directory '/home/nakamura/build_work/build_kernel/linux_kernel/linux/tools/lib/perf'
>
>
> Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> ---
> tools/lib/perf/tests/test-evlist.c | 183 +++++++++++++++++++++++++++++
> 1 file changed, 183 insertions(+)
>
> diff --git a/tools/lib/perf/tests/test-evlist.c b/tools/lib/perf/tests/test-evlist.c
> index c67c83399170..ca7a9542f40e 100644
> --- a/tools/lib/perf/tests/test-evlist.c
> +++ b/tools/lib/perf/tests/test-evlist.c
> @@ -21,6 +21,10 @@
> #include "tests.h"
> #include <internal/evsel.h>
>
> +#define EVENT_NUM 15
> +#define WAIT_COUNT 100000000UL
> +
> +
> static int libperf_print(enum libperf_print_level level,
> const char *fmt, va_list ap)
> {
> @@ -413,6 +417,184 @@ static int test_mmap_cpus(void)
> return 0;
> }
>
> +static double display_error(long long average,
> + long long high,
> + long long low,
> + long long expected) {
> +
> + double error;
> +
> + error = (((double)average - expected) / expected) * 100.0;
> +
> + __T_VERBOSE(" Expected: %lld\n", expected);
> + __T_VERBOSE(" High: %lld Low: %lld Average: %lld\n",
> + high, low, average);
> +
> + __T_VERBOSE(" Average Error = %.2f%%\n",error);
> +
> + return error;
> +
> +}
> +
> +static int test_stat_multiplexing(void)
> +{
> + struct perf_counts_values expected_counts = { .val = 0 };
> + struct perf_counts_values multiplexing_counts[EVENT_NUM] = {{ .val = 0 },};
> + struct perf_thread_map *threads;
> + struct perf_evlist *evlist;
> + struct perf_evsel *evsel;
> + struct perf_event_attr attr = {
> + .type = PERF_TYPE_HARDWARE,
> + .config = PERF_COUNT_HW_INSTRUCTIONS,
> + .read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
> + PERF_FORMAT_TOTAL_TIME_RUNNING,
> + .disabled = 1,
> + };
> + int err, i, nonzero=0;
> + unsigned long count;
> + long long max = 0, min = 0, avg = 0;
> + double error = 0.0;
> +
> +
> + /* read for non-multiplexing event count */
> + threads = perf_thread_map__new_dummy();
> + __T("failed to create threads", threads);
> +
> + perf_thread_map__set_pid(threads, 0, 0);
> +
> + evsel = perf_evsel__new(&attr);
> + __T("failed to create evsel", evsel);
> +
> + err = perf_evsel__open(evsel, NULL, threads);
> + __T("failed to open evsel", err == 0);
> +
> + err = perf_evsel__enable(evsel);
> + __T("failed to enable evsel", err == 0);
> +
> + /* wait loop */
> + count = WAIT_COUNT;
> + while(count--);
> +
> + perf_evsel__read(evsel, 0, 0, &expected_counts);
> + __T("failed to read value for evsel", expected_counts.val != 0);
> + __T("failed to read non-multiplexing event count",
> + expected_counts.ena == expected_counts.run);
> +
> + err = perf_evsel__disable(evsel);
> + __T("failed to enable evsel", err == 0);
> +
> + perf_evsel__close(evsel);
> + perf_evsel__delete(evsel);
> +
> + perf_thread_map__put(threads);
> +
> +
> + /* read for multiplexing event count */
> + threads = perf_thread_map__new_dummy();
> + __T("failed to create threads", threads);
> +
> + perf_thread_map__set_pid(threads, 0, 0);
> +
> + evlist = perf_evlist__new();
> + __T("failed to create evlist", evlist);
> +
> + for (i = 0; i < EVENT_NUM; i++) {
> + evsel = perf_evsel__new(&attr);
> + __T("failed to create evsel1", evsel);
> +
> + perf_evlist__add(evlist, evsel);
> + }
> + perf_evlist__set_maps(evlist, NULL, threads);
> +
> + err = perf_evlist__open(evlist);
> + __T("failed to open evsel", err == 0);
> +
> + perf_evlist__enable(evlist);
> +
> + /* wait loop */
> + count = WAIT_COUNT;
> + while(count--);
> +
> + i = 0;
> + perf_evlist__for_each_evsel(evlist, evsel) {
> + perf_evsel__read(evsel, 0, 0, &multiplexing_counts[i]);
> + __T("failed to read value for evsel", multiplexing_counts[i].val != 0);
> + i++;
> + }
> +
> + perf_evlist__disable(evlist);
> +
> +
> + min = multiplexing_counts[0].val;
> + for (i = 0; i < EVENT_NUM; i++) {
> + __T_VERBOSE("\tcount = %lu, run = %lu, enable = %lu\n",
> + multiplexing_counts[i].val, multiplexing_counts[i].run,
> + multiplexing_counts[i].ena);
> +
> + if (multiplexing_counts[i].val > max)
> + max = multiplexing_counts[i].val;
> +
> + if (multiplexing_counts[i].val < min)
> + min = multiplexing_counts[i].val;
> +
> + avg += multiplexing_counts[i].val;
> +
> + if (multiplexing_counts[i].val != 0)
> + nonzero++;
> + }
> +
> + avg = avg / nonzero;
> +
> + error = display_error(avg, max, min, expected_counts.val);
> +
> + __T("Error out of range!", ((error <= 1.0) && (error >= -1.0)));
> +
> + perf_evlist__close(evlist);
> + perf_evlist__delete(evlist);
> +
> + perf_thread_map__put(threads);
> +
> +
> + /* Verify that no division by zero occurs */
> + threads = perf_thread_map__new_dummy();
> + __T("failed to create threads", threads);
> +
> + perf_thread_map__set_pid(threads, 0, 0);
> +
> + evlist = perf_evlist__new();
> + __T("failed to create evlist", evlist);
> +
> + for (i = 0; i < EVENT_NUM; i++) {
> + evsel = perf_evsel__new(&attr);
> + __T("failed to create evsel1", evsel);
> +
> + perf_evlist__add(evlist, evsel);
> + }
> + perf_evlist__set_maps(evlist, NULL, threads);
> +
> + err = perf_evlist__open(evlist);
> + __T("failed to open evsel", err == 0);
> +
> + perf_evlist__enable(evlist);
> +
> + /* Reproduce run=0 by not executing wait loop. */
> +
> + i = 0;
> + perf_evlist__for_each_evsel(evlist, evsel) {
> + perf_evsel__read(evsel, 0, 0, &multiplexing_counts[i]);
> + __T_VERBOSE("\tcount = %lu, run = %lu, enable = %lu\n",
> + multiplexing_counts[i].val, multiplexing_counts[i].run,
> + multiplexing_counts[i].ena);
> + i++;
> + }
> +
> + perf_evlist__close(evlist);
> + perf_evlist__delete(evlist);
> +
> + perf_thread_map__put(threads);
> + return 0;
> +}
> +
> int test_evlist(int argc, char **argv)
> {
> __T_START;
> @@ -424,6 +606,7 @@ int test_evlist(int argc, char **argv)
> test_stat_thread_enable();
> test_mmap_thread();
> test_mmap_cpus();
> + test_stat_multiplexing();
>
> __T_END;
> return tests_failed == 0 ? 0 : -1;
> --
> 2.27.0
--
- Arnaldo
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
2021-10-05 16:36 ` Rob Herring
@ 2021-10-19 4:59 ` nakamura.shun
0 siblings, 0 replies; 12+ messages in thread
From: nakamura.shun @ 2021-10-19 4:59 UTC (permalink / raw)
To: Rob Herring
Cc: Jiri Olsa, peterz, mingo, acme, mark.rutland, alexander.shishkin,
namhyung, irogers, linux-perf-users, linux-kernel
Hi Rob
Sorry for the late reply.
> > > On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > > > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> > > >
> > > > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > > > does not scale counters obtained by read() system call.
> > > >
> > > > Add processing to perf_evsel__read() to scale the counters obtained during the
> > > > read() system call when multiplexing.
> > > >
> > > >
> > > > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > > > ---
> > > > tools/lib/perf/evsel.c | 6 ++++++
> > > > 1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > > > index 8441e3e1aaac..0ebd1d34436f 100644
> > > > --- a/tools/lib/perf/evsel.c
> > > +++ b/tools/lib/perf/evsel.c
> > > > @@ -18,6 +18,7 @@
> > > > #include <sys/ioctl.h>
> > > > #include <sys/mman.h>
> > > > #include <asm/bug.h>
> > > > +#include <linux/math64.h>
> > > >
> > > > void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> > > > int idx)
> > > > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > > > if (readn(*fd, count->values, size) <= 0)
> > > > return -errno;
> > > >
> > > > + if (count->ena != count->run) {
> > > > + if (count->run != 0)
> > > > + count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > > > + }
> > >
> > > so I think perf stat expect raw values in there and does the
> > > scaling by itself, please check following code:
> > >
> > > read_counters
> > > read_affinity_counters
> > > read_counter_cpu
> > > read_single_counter
> > > evsel__read_counter
> > >
> > > perf_stat_process_counter
> > > process_counter_maps
> > > process_counter_values
> > > perf_counts_values__scale
> > >
> > >
> > > perhaps we could export perf_counts_values__scale if it'd be any help
> >
> > Thank you for your comment.
> >
> > The purpose of this patch is to unify the counters obtained with
> > perf_evsel__read() to scaled or unscaled values.
> >
> > perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is
> > available, else gets by readn(). In current implementation, caller
> > gets scaled counter if goes through RDPMC path, otherwise gets unscaled
> > counter via readn() path.
> >
> > However caller cannnot know which path were taken.
> >
> > If caller expects a raw value, I think the RDPMC path should also
> > return an unscaled counter.
> >
> > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > index c89dfa5..aaa4579 100644
> > --- a/tools/lib/perf/mmap.c
> > +++ b/tools/lib/perf/mmap.c
> > @@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
> > count->ena += delta;
> > if (idx)
> > count->run += delta;
> > -
> > - cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
> > }
> >
> > count->val = cnt;
> >
> > Rob, do you have any comments?
>
> Submit a proper patch with the above.
Thank you for your comment.
I will send the v3 patch.
Best Regards
Shunsuke
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
2021-10-07 17:17 ` Jiri Olsa
@ 2021-10-19 5:00 ` nakamura.shun
2021-11-08 0:49 ` nakamura.shun
0 siblings, 1 reply; 12+ messages in thread
From: nakamura.shun @ 2021-10-19 5:00 UTC (permalink / raw)
To: Jiri Olsa
Cc: Rob Herring, peterz, mingo, acme, mark.rutland,
alexander.shishkin, namhyung, irogers, linux-perf-users,
linux-kernel
Hi Jirka
Sorry for the late reply.
> > > On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > > > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> > > >
> > > > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > > > does not scale counters obtained by read() system call.
> > > >
> > > > Add processing to perf_evsel__read() to scale the counters obtained during the
> > > > read() system call when multiplexing.
> > > >
> > > >
> > > > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > > > ---
> > > > tools/lib/perf/evsel.c | 6 ++++++
> > > > 1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > > > index 8441e3e1aaac..0ebd1d34436f 100644
> > > > --- a/tools/lib/perf/evsel.c
> > > +++ b/tools/lib/perf/evsel.c
> > > > @@ -18,6 +18,7 @@
> > > > #include <sys/ioctl.h>
> > > > #include <sys/mman.h>
> > > > #include <asm/bug.h>
> > > > +#include <linux/math64.h>
> > > >
> > > > void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> > > > int idx)
> > > > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > > > if (readn(*fd, count->values, size) <= 0)
> > > > return -errno;
> > > >
> > > > + if (count->ena != count->run) {
> > > > + if (count->run != 0)
> > > > + count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > > > + }
> > >
> > > so I think perf stat expect raw values in there and does the
> > > scaling by itself, please check following code:
> > >
> > > read_counters
> > > read_affinity_counters
> > > read_counter_cpu
> > > read_single_counter
> > > evsel__read_counter
> > >
> > > perf_stat_process_counter
> > > process_counter_maps
> > > process_counter_values
> > > perf_counts_values__scale
> > >
> > >
> > > perhaps we could export perf_counts_values__scale if it'd be any help
> >
> > Thank you for your comment.
> >
> > The purpose of this patch is to unify the counters obtained with
> > perf_evsel__read() to scaled or unscaled values.
> >
> > perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is
> > available, else gets by readn(). In current implementation, caller
> > gets scaled counter if goes through RDPMC path, otherwise gets unscaled
> > counter via readn() path.
> >
> > However caller cannnot know which path were taken.
> >
> > If caller expects a raw value, I think the RDPMC path should also
> > return an unscaled counter.
> >
> > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > index c89dfa5..aaa4579 100644
> > --- a/tools/lib/perf/mmap.c
> > +++ b/tools/lib/perf/mmap.c
> > @@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
> > count->ena += delta;
> > if (idx)
> > count->run += delta;
> > -
> > - cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
>
> perf stat does not mmap counters so this should not be invoked
> within perf stat.. but we should be consistent and scale after
> calling perf_evsel__read.. and give user the possibility to get
> un-scaled counts
>
> that perhaps brings new feature.. mmap perf stat counters to invoke
> the fast reading path for counters.. IIRC it should be matter just
> to mmap the first 'user' page
Thank you for your comment.
I think it will be good that perf stat supports rdpmc.
I will modify the patch.
Best Regards
Shunsuke
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
2021-10-19 5:00 ` nakamura.shun
@ 2021-11-08 0:49 ` nakamura.shun
2021-11-14 16:16 ` Jiri Olsa
0 siblings, 1 reply; 12+ messages in thread
From: nakamura.shun @ 2021-11-08 0:49 UTC (permalink / raw)
To: Jiri Olsa
Cc: Rob Herring, peterz, mingo, acme, mark.rutland,
alexander.shishkin, namhyung, irogers, linux-perf-users,
linux-kernel
Hi Jirka
> > > > On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > > > > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> > > > >
> > > > > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > > > > does not scale counters obtained by read() system call.
> > > > >
> > > > > Add processing to perf_evsel__read() to scale the counters obtained during the
> > > > > read() system call when multiplexing.
> > > > >
> > > > >
> > > > > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > > > > ---
> > > > > tools/lib/perf/evsel.c | 6 ++++++
> > > > > 1 file changed, 6 insertions(+)
> > > > >
> > > > > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > > > > index 8441e3e1aaac..0ebd1d34436f 100644
> > > > > --- a/tools/lib/perf/evsel.c
> > > > +++ b/tools/lib/perf/evsel.c
> > > > > @@ -18,6 +18,7 @@
> > > > > #include <sys/ioctl.h>
> > > > > #include <sys/mman.h>
> > > > > #include <asm/bug.h>
> > > > > +#include <linux/math64.h>
> > > > >
> > > > > void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> > > > > int idx)
> > > > > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > > > > if (readn(*fd, count->values, size) <= 0)
> > > > > return -errno;
> > > > >
> > > > > + if (count->ena != count->run) {
> > > > > + if (count->run != 0)
> > > > > + count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > > > > + }
> > > >
> > > > so I think perf stat expect raw values in there and does the
> > > > scaling by itself, please check following code:
> > > >
> > > > read_counters
> > > > read_affinity_counters
> > > > read_counter_cpu
> > > > read_single_counter
> > > > evsel__read_counter
> > > >
> > > > perf_stat_process_counter
> > > > process_counter_maps
> > > > process_counter_values
> > > > perf_counts_values__scale
> > > >
> > > >
> > > > perhaps we could export perf_counts_values__scale if it'd be any help
> > >
> > > Thank you for your comment.
> > >
> > > The purpose of this patch is to unify the counters obtained with
> > > perf_evsel__read() to scaled or unscaled values.
> > >
> > > perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is
> > > available, else gets by readn(). In current implementation, caller
> > > gets scaled counter if goes through RDPMC path, otherwise gets unscaled
> > > counter via readn() path.
> > >
> > > However caller cannnot know which path were taken.
> > >
> > > If caller expects a raw value, I think the RDPMC path should also
> > > return an unscaled counter.
> > >
> > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > > index c89dfa5..aaa4579 100644
> > > --- a/tools/lib/perf/mmap.c
> > > +++ b/tools/lib/perf/mmap.c
> > > @@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
> > > count->ena += delta;
> > > if (idx)
> > > count->run += delta;
> > > -
> > > - cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
> >
> > perf stat does not mmap counters so this should not be invoked
> > within perf stat.. but we should be consistent and scale after
> > calling perf_evsel__read.. and give user the possibility to get
> > un-scaled counts
> >
> > that perhaps brings new feature.. mmap perf stat counters to invoke
> > the fast reading path for counters.. IIRC it should be matter just
> > to mmap the first 'user' page
>
> Thank you for your comment.
> I think it will be good that perf stat supports rdpmc.
>
> I will modify the patch.
I think rdpmc cannot measure the command/program specified in perf stat
because it measures the calling thread of perf_event_open.
If my understanding is wrong, please point it out to me.
Best Regards
Shunsuke
^ permalink raw reply [flat|nested] 12+ messages in thread
* Re: [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing
2021-11-08 0:49 ` nakamura.shun
@ 2021-11-14 16:16 ` Jiri Olsa
0 siblings, 0 replies; 12+ messages in thread
From: Jiri Olsa @ 2021-11-14 16:16 UTC (permalink / raw)
To: nakamura.shun
Cc: Rob Herring, peterz, mingo, acme, mark.rutland,
alexander.shishkin, namhyung, irogers, linux-perf-users,
linux-kernel
On Mon, Nov 08, 2021 at 12:49:24AM +0000, nakamura.shun@fujitsu.com wrote:
> Hi Jirka
>
> > > > > On Wed, Sep 22, 2021 at 07:16:26PM +0900, Shunsuke Nakamura wrote:
> > > > > > From: nakamura shunsuke <nakamura.shun@fujitsu.com>
> > > > > >
> > > > > > perf_evsel__read() scales counters obtained by RDPMC during multiplexing, but
> > > > > > does not scale counters obtained by read() system call.
> > > > > >
> > > > > > Add processing to perf_evsel__read() to scale the counters obtained during the
> > > > > > read() system call when multiplexing.
> > > > > >
> > > > > >
> > > > > > Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
> > > > > > ---
> > > > > > tools/lib/perf/evsel.c | 6 ++++++
> > > > > > 1 file changed, 6 insertions(+)
> > > > > >
> > > > > > diff --git a/tools/lib/perf/evsel.c b/tools/lib/perf/evsel.c
> > > > > > index 8441e3e1aaac..0ebd1d34436f 100644
> > > > > > --- a/tools/lib/perf/evsel.c
> > > > > +++ b/tools/lib/perf/evsel.c
> > > > > > @@ -18,6 +18,7 @@
> > > > > > #include <sys/ioctl.h>
> > > > > > #include <sys/mman.h>
> > > > > > #include <asm/bug.h>
> > > > > > +#include <linux/math64.h>
> > > > > >
> > > > > > void perf_evsel__init(struct perf_evsel *evsel, struct perf_event_attr *attr,
> > > > > > int idx)
> > > > > > @@ -321,6 +322,11 @@ int perf_evsel__read(struct perf_evsel *evsel, int cpu, int thread,
> > > > > > if (readn(*fd, count->values, size) <= 0)
> > > > > > return -errno;
> > > > > >
> > > > > > + if (count->ena != count->run) {
> > > > > > + if (count->run != 0)
> > > > > > + count->val = mul_u64_u64_div64(count->val, count->ena, count->run);
> > > > > > + }
> > > > >
> > > > > so I think perf stat expect raw values in there and does the
> > > > > scaling by itself, please check following code:
> > > > >
> > > > > read_counters
> > > > > read_affinity_counters
> > > > > read_counter_cpu
> > > > > read_single_counter
> > > > > evsel__read_counter
> > > > >
> > > > > perf_stat_process_counter
> > > > > process_counter_maps
> > > > > process_counter_values
> > > > > perf_counts_values__scale
> > > > >
> > > > >
> > > > > perhaps we could export perf_counts_values__scale if it'd be any help
> > > >
> > > > Thank you for your comment.
> > > >
> > > > The purpose of this patch is to unify the counters obtained with
> > > > perf_evsel__read() to scaled or unscaled values.
> > > >
> > > > perf_evsel__read() gets counter by perf_mmap__read_self() if RDPMC is
> > > > available, else gets by readn(). In current implementation, caller
> > > > gets scaled counter if goes through RDPMC path, otherwise gets unscaled
> > > > counter via readn() path.
> > > >
> > > > However caller cannnot know which path were taken.
> > > >
> > > > If caller expects a raw value, I think the RDPMC path should also
> > > > return an unscaled counter.
> > > >
> > > > diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
> > > > index c89dfa5..aaa4579 100644
> > > > --- a/tools/lib/perf/mmap.c
> > > > +++ b/tools/lib/perf/mmap.c
> > > > @@ -353,8 +353,6 @@ int perf_mmap__read_self(struct perf_mmap *map, struct perf_counts_values *count
> > > > count->ena += delta;
> > > > if (idx)
> > > > count->run += delta;
> > > > -
> > > > - cnt = mul_u64_u64_div64(cnt, count->ena, count->run);
> > >
> > > perf stat does not mmap counters so this should not be invoked
> > > within perf stat.. but we should be consistent and scale after
> > > calling perf_evsel__read.. and give user the possibility to get
> > > un-scaled counts
> > >
> > > that perhaps brings new feature.. mmap perf stat counters to invoke
> > > the fast reading path for counters.. IIRC it should be matter just
> > > to mmap the first 'user' page
> >
> > Thank you for your comment.
> > I think it will be good that perf stat supports rdpmc.
> >
> > I will modify the patch.
>
> I think rdpmc cannot measure the command/program specified in perf stat
> because it measures the calling thread of perf_event_open.
> If my understanding is wrong, please point it out to me.
right, I guess we could use that just for system wide monitoring,
where we open counter for each cpu
jirka
^ permalink raw reply [flat|nested] 12+ messages in thread
end of thread, other threads:[~2021-11-14 16:16 UTC | newest]
Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-22 10:16 [PATCH v2 0/2] libperf: Add support for scaling counters obtained from the read() system call during multiplexing Shunsuke Nakamura
2021-09-22 10:16 ` [PATCH v2 1/2] libperf: Add processing to scale the counters obtained during the read() system call when multiplexing Shunsuke Nakamura
2021-09-22 21:34 ` Jiri Olsa
2021-09-28 9:53 ` nakamura.shun
2021-10-05 16:36 ` Rob Herring
2021-10-19 4:59 ` nakamura.shun
2021-10-07 17:17 ` Jiri Olsa
2021-10-19 5:00 ` nakamura.shun
2021-11-08 0:49 ` nakamura.shun
2021-11-14 16:16 ` Jiri Olsa
2021-09-22 10:16 ` [PATCH v2 2/2] libperf tests: Add test_stat_multiplexing test Shunsuke Nakamura
2021-10-08 19:11 ` Arnaldo Carvalho de Melo
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.