linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] perf stat: Unbreak perf stat with ARMv8 PMU events
@ 2020-09-22  3:13 Wei Li
  2020-09-22  3:13 ` [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events Wei Li
  2020-09-22  3:13 ` [PATCH 2/2] perf stat: Unbreak perf stat with " Wei Li
  0 siblings, 2 replies; 15+ messages in thread
From: Wei Li @ 2020-09-22  3:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andi Kleen, Alexey Budankov,
	Adrian Hunter
  Cc: Peter Zijlstra, huawei.libin, Ingo Molnar, linux-kernel,
	linux-arm-kernel

Currently, perf-stat with armv8_pmu events with a workload is broken.
This patch set just fixes that.

Before the patch set:
[root@localhost hulk]# tools/perf/perf stat  -e armv8_pmuv3_0/ll_cache_rd/,armv8_pmuv3_0/ll_cache_miss_rd/ ls > /dev/null
Segmentation fault

After the patch set:
[root@localhost hulk]# tools/perf/perf stat  -e armv8_pmuv3_0/ll_cache_rd/,armv8_pmuv3_0/ll_cache_miss_rd/ ls > /dev/null

 Performance counter stats for 'ls':

            39,882      armv8_pmuv3_0/ll_cache_rd/                                   
             9,639      armv8_pmuv3_0/ll_cache_miss_rd/                                   

       0.001416690 seconds time elapsed

       0.001469000 seconds user
       0.000000000 seconds sys

Wei Li (2):
  perf stat: Fix segfault when counting armv8 PMU events
  perf stat: Unbreak perf stat with armv8 PMU events

 tools/lib/perf/include/internal/evlist.h |  1 +
 tools/perf/builtin-stat.c                | 37 ++++++++++++++++--------
 tools/perf/util/evlist.c                 | 23 ++++++++++++++-
 3 files changed, 48 insertions(+), 13 deletions(-)

-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
  2020-09-22  3:13 [PATCH 0/2] perf stat: Unbreak perf stat with ARMv8 PMU events Wei Li
@ 2020-09-22  3:13 ` Wei Li
  2020-09-22 19:23   ` Andi Kleen
  2020-09-23  5:44   ` Jiri Olsa
  2020-09-22  3:13 ` [PATCH 2/2] perf stat: Unbreak perf stat with " Wei Li
  1 sibling, 2 replies; 15+ messages in thread
From: Wei Li @ 2020-09-22  3:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andi Kleen, Alexey Budankov,
	Adrian Hunter
  Cc: Peter Zijlstra, huawei.libin, Ingo Molnar, linux-kernel,
	linux-arm-kernel

When executing perf stat with armv8_pmu events with a workload, it will
report a segfault as result.

(gdb) bt
#0  0x0000000000603fc8 in perf_evsel__close_fd_cpu (evsel=<optimized out>,
    cpu=<optimized out>) at evsel.c:122
#1  perf_evsel__close_cpu (evsel=evsel@entry=0x716e950, cpu=7) at evsel.c:156
#2  0x00000000004d4718 in evlist__close (evlist=0x70a7cb0) at util/evlist.c:1242
#3  0x0000000000453404 in __run_perf_stat (argc=3, argc@entry=1, argv=0x30,
    argv@entry=0xfffffaea2f90, run_idx=119, run_idx@entry=1701998435)
    at builtin-stat.c:929
#4  0x0000000000455058 in run_perf_stat (run_idx=1701998435, argv=0xfffffaea2f90,
    argc=1) at builtin-stat.c:947
#5  cmd_stat (argc=1, argv=0xfffffaea2f90) at builtin-stat.c:2357
#6  0x00000000004bb888 in run_builtin (p=p@entry=0x9764b8 <commands+288>,
    argc=argc@entry=4, argv=argv@entry=0xfffffaea2f90) at perf.c:312
#7  0x00000000004bbb54 in handle_internal_command (argc=argc@entry=4,
    argv=argv@entry=0xfffffaea2f90) at perf.c:364
#8  0x0000000000435378 in run_argv (argcp=<synthetic pointer>,
    argv=<synthetic pointer>) at perf.c:408
#9  main (argc=4, argv=0xfffffaea2f90) at perf.c:538

After debugging, i found the root reason is that the xyarray fd is created
by evsel__open_per_thread() ignoring the cpu passed in
create_perf_stat_counter(), while the evsel' cpumap is assigned as the
corresponding PMU's cpumap in __add_event(). Thus, the xyarray fd is created
with ncpus of dummy cpumap and an out of bounds 'cpu' index will be used in
perf_evsel__close_fd_cpu().

To address this, add a flag to mark this situation and avoid using the
affinity technique when closing/enabling/disabling events.

Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
Fixes: 704e2f5b700d ("perf stat: Use affinity for enabling/disabling events")
Signed-off-by: Wei Li <liwei391@huawei.com>
---
 tools/lib/perf/include/internal/evlist.h |  1 +
 tools/perf/builtin-stat.c                |  3 +++
 tools/perf/util/evlist.c                 | 23 ++++++++++++++++++++++-
 3 files changed, 26 insertions(+), 1 deletion(-)

diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
index 2d0fa02b036f..c02d7e583846 100644
--- a/tools/lib/perf/include/internal/evlist.h
+++ b/tools/lib/perf/include/internal/evlist.h
@@ -17,6 +17,7 @@ struct perf_evlist {
 	struct list_head	 entries;
 	int			 nr_entries;
 	bool			 has_user_cpus;
+	bool			 open_per_thread;
 	struct perf_cpu_map	*cpus;
 	struct perf_cpu_map	*all_cpus;
 	struct perf_thread_map	*threads;
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index fddc97cac984..6e6ceacce634 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -725,6 +725,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	if (group)
 		perf_evlist__set_leader(evsel_list);
 
+	if (!(target__has_cpu(&target) && !target__has_per_thread(&target)))
+		evsel_list->core.open_per_thread = true;
+
 	if (affinity__setup(&affinity) < 0)
 		return -1;
 
diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
index e3fa3bf7498a..bf8a3ccc599f 100644
--- a/tools/perf/util/evlist.c
+++ b/tools/perf/util/evlist.c
@@ -383,6 +383,15 @@ void evlist__disable(struct evlist *evlist)
 	int cpu, i, imm = 0;
 	bool has_imm = false;
 
+	if (evlist->core.open_per_thread) {
+		evlist__for_each_entry(evlist, pos) {
+			if (pos->disabled || !evsel__is_group_leader(pos) || !pos->core.fd)
+				continue;
+			evsel__disable(pos);
+		}
+		goto out;
+	}
+
 	if (affinity__setup(&affinity) < 0)
 		return;
 
@@ -414,6 +423,7 @@ void evlist__disable(struct evlist *evlist)
 		pos->disabled = true;
 	}
 
+out:
 	evlist->enabled = false;
 }
 
@@ -423,6 +433,15 @@ void evlist__enable(struct evlist *evlist)
 	struct affinity affinity;
 	int cpu, i;
 
+	if (evlist->core.open_per_thread) {
+		evlist__for_each_entry(evlist, pos) {
+			if (!evsel__is_group_leader(pos) || !pos->core.fd)
+				continue;
+			evsel__enable(pos);
+		}
+		goto out;
+	}
+
 	if (affinity__setup(&affinity) < 0)
 		return;
 
@@ -444,6 +463,7 @@ void evlist__enable(struct evlist *evlist)
 		pos->disabled = false;
 	}
 
+out:
 	evlist->enabled = true;
 }
 
@@ -1223,9 +1243,10 @@ void evlist__close(struct evlist *evlist)
 
 	/*
 	 * With perf record core.cpus is usually NULL.
+	 * Or perf stat may open events per-thread.
 	 * Use the old method to handle this for now.
 	 */
-	if (!evlist->core.cpus) {
+	if (evlist->core.open_per_thread || !evlist->core.cpus) {
 		evlist__for_each_entry_reverse(evlist, evsel)
 			evsel__close(evsel);
 		return;
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 2/2] perf stat: Unbreak perf stat with armv8_pmu events
  2020-09-22  3:13 [PATCH 0/2] perf stat: Unbreak perf stat with ARMv8 PMU events Wei Li
  2020-09-22  3:13 ` [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events Wei Li
@ 2020-09-22  3:13 ` Wei Li
  1 sibling, 0 replies; 15+ messages in thread
From: Wei Li @ 2020-09-22  3:13 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, Andi Kleen, Alexey Budankov,
	Adrian Hunter
  Cc: Peter Zijlstra, huawei.libin, Ingo Molnar, linux-kernel,
	linux-arm-kernel

After the segfault is fixed, perf-stat with armv8_pmu events with a
workload is still broken:

[root@localhost hulk]# tools/perf/perf stat -e armv8_pmuv3_0/ll_cache_rd/,armv8_pmuv3_0/ll_cache_miss_rd/ ls > /dev/null

 Performance counter stats for 'ls':

     <not counted>      armv8_pmuv3_0/ll_cache_rd/                                     (0.00%)
     <not counted>      armv8_pmuv3_0/ll_cache_miss_rd/                                     (0.00%)

       0.002052670 seconds time elapsed

       0.000000000 seconds user
       0.002086000 seconds sys

In fact, while the event will be opened per-thread,
create_perf_stat_counter() is called as many times as the count of cpu
in the evlist's cpumap, and lost all the file descriptors except the
last one. If this counter is not scheduled during the period of time,
it will be "not counted".

Add the process to don't open the needless events in such situation.

Fixes: 4804e0111662 ("perf stat: Use affinity for opening events")
Signed-off-by: Wei Li <liwei391@huawei.com>
---
 tools/perf/builtin-stat.c | 36 +++++++++++++++++++++++-------------
 1 file changed, 23 insertions(+), 13 deletions(-)

diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 6e6ceacce634..9a43b3de26d1 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -712,6 +712,7 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 	struct affinity affinity;
 	int i, cpu;
 	bool second_pass = false;
+	bool open_per_thread = false;
 
 	if (forks) {
 		if (perf_evlist__prepare_workload(evsel_list, &target, argv, is_pipe,
@@ -726,16 +727,17 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 		perf_evlist__set_leader(evsel_list);
 
 	if (!(target__has_cpu(&target) && !target__has_per_thread(&target)))
-		evsel_list->core.open_per_thread = true;
+		evsel_list->core.open_per_thread = open_per_thread = true;
 
 	if (affinity__setup(&affinity) < 0)
 		return -1;
 
 	evlist__for_each_cpu (evsel_list, i, cpu) {
-		affinity__set(&affinity, cpu);
+		if (!open_per_thread)
+			affinity__set(&affinity, cpu);
 
 		evlist__for_each_entry(evsel_list, counter) {
-			if (evsel__cpu_iter_skip(counter, cpu))
+			if (!open_per_thread && evsel__cpu_iter_skip(counter, cpu))
 				continue;
 			if (counter->reset_group || counter->errored)
 				continue;
@@ -753,7 +755,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 				if ((errno == EINVAL || errno == EBADF) &&
 				    counter->leader != counter &&
 				    counter->weak_group) {
-					perf_evlist__reset_weak_group(evsel_list, counter, false);
+					perf_evlist__reset_weak_group(evsel_list, counter,
+							open_per_thread);
 					assert(counter->reset_group);
 					second_pass = true;
 					continue;
@@ -773,6 +776,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 			}
 			counter->supported = true;
 		}
+
+		if (open_per_thread)
+			break;
 	}
 
 	if (second_pass) {
@@ -782,20 +788,22 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 		 */
 
 		evlist__for_each_cpu(evsel_list, i, cpu) {
-			affinity__set(&affinity, cpu);
-			/* First close errored or weak retry */
-			evlist__for_each_entry(evsel_list, counter) {
-				if (!counter->reset_group && !counter->errored)
-					continue;
-				if (evsel__cpu_iter_skip_no_inc(counter, cpu))
-					continue;
-				perf_evsel__close_cpu(&counter->core, counter->cpu_iter);
+			if (!open_per_thread) {
+				affinity__set(&affinity, cpu);
+				/* First close errored or weak retry */
+				evlist__for_each_entry(evsel_list, counter) {
+					if (!counter->reset_group && !counter->errored)
+						continue;
+					if (evsel__cpu_iter_skip_no_inc(counter, cpu))
+						continue;
+					perf_evsel__close_cpu(&counter->core, counter->cpu_iter);
+				}
 			}
 			/* Now reopen weak */
 			evlist__for_each_entry(evsel_list, counter) {
 				if (!counter->reset_group && !counter->errored)
 					continue;
-				if (evsel__cpu_iter_skip(counter, cpu))
+				if (!open_per_thread && evsel__cpu_iter_skip(counter, cpu))
 					continue;
 				if (!counter->reset_group)
 					continue;
@@ -817,6 +825,8 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
 				}
 				counter->supported = true;
 			}
+			if (open_per_thread)
+				break;
 		}
 	}
 	affinity__cleanup(&affinity);
-- 
2.17.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
  2020-09-22  3:13 ` [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events Wei Li
@ 2020-09-22 19:23   ` Andi Kleen
  2020-09-22 19:50     ` Andi Kleen
  2020-09-23  5:44   ` Jiri Olsa
  1 sibling, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2020-09-22 19:23 UTC (permalink / raw)
  To: Wei Li
  Cc: Mark Rutland, Alexander Shishkin, Alexey Budankov, Adrian Hunter,
	Arnaldo Carvalho de Melo, linux-kernel, Peter Zijlstra,
	Ingo Molnar, huawei.libin, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel

> After debugging, i found the root reason is that the xyarray fd is created
> by evsel__open_per_thread() ignoring the cpu passed in
> create_perf_stat_counter(), while the evsel' cpumap is assigned as the
> corresponding PMU's cpumap in __add_event(). Thus, the xyarray fd is created
> with ncpus of dummy cpumap and an out of bounds 'cpu' index will be used in
> perf_evsel__close_fd_cpu().
> 
> To address this, add a flag to mark this situation and avoid using the
> affinity technique when closing/enabling/disabling events.

The flag seems like a hack. How about figuring out the correct number of 
CPUs and using that?

-Andi

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
  2020-09-22 19:23   ` Andi Kleen
@ 2020-09-22 19:50     ` Andi Kleen
  2020-09-24 14:14       ` liwei (GF)
  0 siblings, 1 reply; 15+ messages in thread
From: Andi Kleen @ 2020-09-22 19:50 UTC (permalink / raw)
  To: Wei Li
  Cc: Mark Rutland, Alexander Shishkin, Alexey Budankov, Adrian Hunter,
	Arnaldo Carvalho de Melo, linux-kernel, Peter Zijlstra,
	Ingo Molnar, huawei.libin, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel

On Tue, Sep 22, 2020 at 12:23:21PM -0700, Andi Kleen wrote:
> > After debugging, i found the root reason is that the xyarray fd is created
> > by evsel__open_per_thread() ignoring the cpu passed in
> > create_perf_stat_counter(), while the evsel' cpumap is assigned as the
> > corresponding PMU's cpumap in __add_event(). Thus, the xyarray fd is created
> > with ncpus of dummy cpumap and an out of bounds 'cpu' index will be used in
> > perf_evsel__close_fd_cpu().
> > 
> > To address this, add a flag to mark this situation and avoid using the
> > affinity technique when closing/enabling/disabling events.
> 
> The flag seems like a hack. How about figuring out the correct number of 
> CPUs and using that?

Also would like to understand what's different on ARM64 than other architectures.
Or could this happen on x86 too?

-Andi

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
  2020-09-22  3:13 ` [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events Wei Li
  2020-09-22 19:23   ` Andi Kleen
@ 2020-09-23  5:44   ` Jiri Olsa
  2020-09-23 13:49     ` Namhyung Kim
  1 sibling, 1 reply; 15+ messages in thread
From: Jiri Olsa @ 2020-09-23  5:44 UTC (permalink / raw)
  To: Wei Li
  Cc: Mark Rutland, Andi Kleen, Alexander Shishkin, Alexey Budankov,
	Adrian Hunter, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Ingo Molnar, huawei.libin, Namhyung Kim,
	linux-arm-kernel

On Tue, Sep 22, 2020 at 11:13:45AM +0800, Wei Li wrote:
> When executing perf stat with armv8_pmu events with a workload, it will
> report a segfault as result.

please share the perf stat command line you see that segfault for

thanks,
jirka

> 
> (gdb) bt
> #0  0x0000000000603fc8 in perf_evsel__close_fd_cpu (evsel=<optimized out>,
>     cpu=<optimized out>) at evsel.c:122
> #1  perf_evsel__close_cpu (evsel=evsel@entry=0x716e950, cpu=7) at evsel.c:156
> #2  0x00000000004d4718 in evlist__close (evlist=0x70a7cb0) at util/evlist.c:1242
> #3  0x0000000000453404 in __run_perf_stat (argc=3, argc@entry=1, argv=0x30,
>     argv@entry=0xfffffaea2f90, run_idx=119, run_idx@entry=1701998435)
>     at builtin-stat.c:929
> #4  0x0000000000455058 in run_perf_stat (run_idx=1701998435, argv=0xfffffaea2f90,
>     argc=1) at builtin-stat.c:947
> #5  cmd_stat (argc=1, argv=0xfffffaea2f90) at builtin-stat.c:2357
> #6  0x00000000004bb888 in run_builtin (p=p@entry=0x9764b8 <commands+288>,
>     argc=argc@entry=4, argv=argv@entry=0xfffffaea2f90) at perf.c:312
> #7  0x00000000004bbb54 in handle_internal_command (argc=argc@entry=4,
>     argv=argv@entry=0xfffffaea2f90) at perf.c:364
> #8  0x0000000000435378 in run_argv (argcp=<synthetic pointer>,
>     argv=<synthetic pointer>) at perf.c:408
> #9  main (argc=4, argv=0xfffffaea2f90) at perf.c:538
> 
> After debugging, i found the root reason is that the xyarray fd is created
> by evsel__open_per_thread() ignoring the cpu passed in
> create_perf_stat_counter(), while the evsel' cpumap is assigned as the
> corresponding PMU's cpumap in __add_event(). Thus, the xyarray fd is created
> with ncpus of dummy cpumap and an out of bounds 'cpu' index will be used in
> perf_evsel__close_fd_cpu().
> 
> To address this, add a flag to mark this situation and avoid using the
> affinity technique when closing/enabling/disabling events.
> 
> Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
> Fixes: 704e2f5b700d ("perf stat: Use affinity for enabling/disabling events")
> Signed-off-by: Wei Li <liwei391@huawei.com>
> ---
>  tools/lib/perf/include/internal/evlist.h |  1 +
>  tools/perf/builtin-stat.c                |  3 +++
>  tools/perf/util/evlist.c                 | 23 ++++++++++++++++++++++-
>  3 files changed, 26 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/lib/perf/include/internal/evlist.h b/tools/lib/perf/include/internal/evlist.h
> index 2d0fa02b036f..c02d7e583846 100644
> --- a/tools/lib/perf/include/internal/evlist.h
> +++ b/tools/lib/perf/include/internal/evlist.h
> @@ -17,6 +17,7 @@ struct perf_evlist {
>  	struct list_head	 entries;
>  	int			 nr_entries;
>  	bool			 has_user_cpus;
> +	bool			 open_per_thread;
>  	struct perf_cpu_map	*cpus;
>  	struct perf_cpu_map	*all_cpus;
>  	struct perf_thread_map	*threads;
> diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
> index fddc97cac984..6e6ceacce634 100644
> --- a/tools/perf/builtin-stat.c
> +++ b/tools/perf/builtin-stat.c
> @@ -725,6 +725,9 @@ static int __run_perf_stat(int argc, const char **argv, int run_idx)
>  	if (group)
>  		perf_evlist__set_leader(evsel_list);
>  
> +	if (!(target__has_cpu(&target) && !target__has_per_thread(&target)))
> +		evsel_list->core.open_per_thread = true;
> +
>  	if (affinity__setup(&affinity) < 0)
>  		return -1;
>  
> diff --git a/tools/perf/util/evlist.c b/tools/perf/util/evlist.c
> index e3fa3bf7498a..bf8a3ccc599f 100644
> --- a/tools/perf/util/evlist.c
> +++ b/tools/perf/util/evlist.c
> @@ -383,6 +383,15 @@ void evlist__disable(struct evlist *evlist)
>  	int cpu, i, imm = 0;
>  	bool has_imm = false;
>  
> +	if (evlist->core.open_per_thread) {
> +		evlist__for_each_entry(evlist, pos) {
> +			if (pos->disabled || !evsel__is_group_leader(pos) || !pos->core.fd)
> +				continue;
> +			evsel__disable(pos);
> +		}
> +		goto out;
> +	}
> +
>  	if (affinity__setup(&affinity) < 0)
>  		return;
>  
> @@ -414,6 +423,7 @@ void evlist__disable(struct evlist *evlist)
>  		pos->disabled = true;
>  	}
>  
> +out:
>  	evlist->enabled = false;
>  }
>  
> @@ -423,6 +433,15 @@ void evlist__enable(struct evlist *evlist)
>  	struct affinity affinity;
>  	int cpu, i;
>  
> +	if (evlist->core.open_per_thread) {
> +		evlist__for_each_entry(evlist, pos) {
> +			if (!evsel__is_group_leader(pos) || !pos->core.fd)
> +				continue;
> +			evsel__enable(pos);
> +		}
> +		goto out;
> +	}
> +
>  	if (affinity__setup(&affinity) < 0)
>  		return;
>  
> @@ -444,6 +463,7 @@ void evlist__enable(struct evlist *evlist)
>  		pos->disabled = false;
>  	}
>  
> +out:
>  	evlist->enabled = true;
>  }
>  
> @@ -1223,9 +1243,10 @@ void evlist__close(struct evlist *evlist)
>  
>  	/*
>  	 * With perf record core.cpus is usually NULL.
> +	 * Or perf stat may open events per-thread.
>  	 * Use the old method to handle this for now.
>  	 */
> -	if (!evlist->core.cpus) {
> +	if (evlist->core.open_per_thread || !evlist->core.cpus) {
>  		evlist__for_each_entry_reverse(evlist, evsel)
>  			evsel__close(evsel);
>  		return;
> -- 
> 2.17.1
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
  2020-09-23  5:44   ` Jiri Olsa
@ 2020-09-23 13:49     ` Namhyung Kim
  2020-09-23 14:07       ` Jiri Olsa
  0 siblings, 1 reply; 15+ messages in thread
From: Namhyung Kim @ 2020-09-23 13:49 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Mark Rutland, Andi Kleen, Alexander Shishkin, Alexey Budankov,
	Adrian Hunter, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Li Bin, Wei Li, linux-arm-kernel

On Wed, Sep 23, 2020 at 2:44 PM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Tue, Sep 22, 2020 at 11:13:45AM +0800, Wei Li wrote:
> > When executing perf stat with armv8_pmu events with a workload, it will
> > report a segfault as result.
>
> please share the perf stat command line you see that segfault for

It seems the description in the patch 0/2 already has it:

  [root@localhost hulk]# tools/perf/perf stat  -e
armv8_pmuv3_0/ll_cache_rd/,armv8_pmuv3_0/ll_cache_miss_rd/ ls >
/dev/null
  Segmentation fault

Thanks
Namhyun


>
> thanks,
> jirka
>
> >
> > (gdb) bt
> > #0  0x0000000000603fc8 in perf_evsel__close_fd_cpu (evsel=<optimized out>,
> >     cpu=<optimized out>) at evsel.c:122
> > #1  perf_evsel__close_cpu (evsel=evsel@entry=0x716e950, cpu=7) at evsel.c:156
> > #2  0x00000000004d4718 in evlist__close (evlist=0x70a7cb0) at util/evlist.c:1242
> > #3  0x0000000000453404 in __run_perf_stat (argc=3, argc@entry=1, argv=0x30,
> >     argv@entry=0xfffffaea2f90, run_idx=119, run_idx@entry=1701998435)
> >     at builtin-stat.c:929
> > #4  0x0000000000455058 in run_perf_stat (run_idx=1701998435, argv=0xfffffaea2f90,
> >     argc=1) at builtin-stat.c:947
> > #5  cmd_stat (argc=1, argv=0xfffffaea2f90) at builtin-stat.c:2357
> > #6  0x00000000004bb888 in run_builtin (p=p@entry=0x9764b8 <commands+288>,
> >     argc=argc@entry=4, argv=argv@entry=0xfffffaea2f90) at perf.c:312
> > #7  0x00000000004bbb54 in handle_internal_command (argc=argc@entry=4,
> >     argv=argv@entry=0xfffffaea2f90) at perf.c:364
> > #8  0x0000000000435378 in run_argv (argcp=<synthetic pointer>,
> >     argv=<synthetic pointer>) at perf.c:408
> > #9  main (argc=4, argv=0xfffffaea2f90) at perf.c:538
> >
> > After debugging, i found the root reason is that the xyarray fd is created
> > by evsel__open_per_thread() ignoring the cpu passed in
> > create_perf_stat_counter(), while the evsel' cpumap is assigned as the
> > corresponding PMU's cpumap in __add_event(). Thus, the xyarray fd is created
> > with ncpus of dummy cpumap and an out of bounds 'cpu' index will be used in
> > perf_evsel__close_fd_cpu().
> >
> > To address this, add a flag to mark this situation and avoid using the
> > affinity technique when closing/enabling/disabling events.
> >
> > Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
> > Fixes: 704e2f5b700d ("perf stat: Use affinity for enabling/disabling events")
> > Signed-off-by: Wei Li <liwei391@huawei.com>
> > ---

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
  2020-09-23 13:49     ` Namhyung Kim
@ 2020-09-23 14:07       ` Jiri Olsa
  2020-09-23 14:15         ` Namhyung Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Olsa @ 2020-09-23 14:07 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Mark Rutland, Andi Kleen, Alexander Shishkin, Alexey Budankov,
	Adrian Hunter, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Li Bin, Wei Li, linux-arm-kernel

On Wed, Sep 23, 2020 at 10:49:52PM +0900, Namhyung Kim wrote:
> On Wed, Sep 23, 2020 at 2:44 PM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Tue, Sep 22, 2020 at 11:13:45AM +0800, Wei Li wrote:
> > > When executing perf stat with armv8_pmu events with a workload, it will
> > > report a segfault as result.
> >
> > please share the perf stat command line you see that segfault for
> 
> It seems the description in the patch 0/2 already has it:
> 
>   [root@localhost hulk]# tools/perf/perf stat  -e
> armv8_pmuv3_0/ll_cache_rd/,armv8_pmuv3_0/ll_cache_miss_rd/ ls >
> /dev/null
>   Segmentation fault

yea I found it, but can't reproduce it.. I see the issue from
patch 2, but not sure what's the problem so far

jirka

> 
> Thanks
> Namhyun
> 
> 
> >
> > thanks,
> > jirka
> >
> > >
> > > (gdb) bt
> > > #0  0x0000000000603fc8 in perf_evsel__close_fd_cpu (evsel=<optimized out>,
> > >     cpu=<optimized out>) at evsel.c:122
> > > #1  perf_evsel__close_cpu (evsel=evsel@entry=0x716e950, cpu=7) at evsel.c:156
> > > #2  0x00000000004d4718 in evlist__close (evlist=0x70a7cb0) at util/evlist.c:1242
> > > #3  0x0000000000453404 in __run_perf_stat (argc=3, argc@entry=1, argv=0x30,
> > >     argv@entry=0xfffffaea2f90, run_idx=119, run_idx@entry=1701998435)
> > >     at builtin-stat.c:929
> > > #4  0x0000000000455058 in run_perf_stat (run_idx=1701998435, argv=0xfffffaea2f90,
> > >     argc=1) at builtin-stat.c:947
> > > #5  cmd_stat (argc=1, argv=0xfffffaea2f90) at builtin-stat.c:2357
> > > #6  0x00000000004bb888 in run_builtin (p=p@entry=0x9764b8 <commands+288>,
> > >     argc=argc@entry=4, argv=argv@entry=0xfffffaea2f90) at perf.c:312
> > > #7  0x00000000004bbb54 in handle_internal_command (argc=argc@entry=4,
> > >     argv=argv@entry=0xfffffaea2f90) at perf.c:364
> > > #8  0x0000000000435378 in run_argv (argcp=<synthetic pointer>,
> > >     argv=<synthetic pointer>) at perf.c:408
> > > #9  main (argc=4, argv=0xfffffaea2f90) at perf.c:538
> > >
> > > After debugging, i found the root reason is that the xyarray fd is created
> > > by evsel__open_per_thread() ignoring the cpu passed in
> > > create_perf_stat_counter(), while the evsel' cpumap is assigned as the
> > > corresponding PMU's cpumap in __add_event(). Thus, the xyarray fd is created
> > > with ncpus of dummy cpumap and an out of bounds 'cpu' index will be used in
> > > perf_evsel__close_fd_cpu().
> > >
> > > To address this, add a flag to mark this situation and avoid using the
> > > affinity technique when closing/enabling/disabling events.
> > >
> > > Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")
> > > Fixes: 704e2f5b700d ("perf stat: Use affinity for enabling/disabling events")
> > > Signed-off-by: Wei Li <liwei391@huawei.com>
> > > ---
> 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
  2020-09-23 14:07       ` Jiri Olsa
@ 2020-09-23 14:15         ` Namhyung Kim
  2020-09-23 20:19           ` Jiri Olsa
  0 siblings, 1 reply; 15+ messages in thread
From: Namhyung Kim @ 2020-09-23 14:15 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Mark Rutland, Andi Kleen, Alexander Shishkin, Alexey Budankov,
	Adrian Hunter, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Ingo Molnar, Li Bin, Wei Li, linux-arm-kernel

On Wed, Sep 23, 2020 at 11:08 PM Jiri Olsa <jolsa@redhat.com> wrote:
>
> On Wed, Sep 23, 2020 at 10:49:52PM +0900, Namhyung Kim wrote:
> > On Wed, Sep 23, 2020 at 2:44 PM Jiri Olsa <jolsa@redhat.com> wrote:
> > >
> > > On Tue, Sep 22, 2020 at 11:13:45AM +0800, Wei Li wrote:
> > > > When executing perf stat with armv8_pmu events with a workload, it will
> > > > report a segfault as result.
> > >
> > > please share the perf stat command line you see that segfault for
> >
> > It seems the description in the patch 0/2 already has it:
> >
> >   [root@localhost hulk]# tools/perf/perf stat  -e
> > armv8_pmuv3_0/ll_cache_rd/,armv8_pmuv3_0/ll_cache_miss_rd/ ls >
> > /dev/null
> >   Segmentation fault
>
> yea I found it, but can't reproduce it.. I see the issue from
> patch 2, but not sure what's the problem so far

I think the problem is that armv8_pmu has a cpumask,
and the user requested per-task events.

The code tried to open the event with a dummy cpu map
since it's not a cpu event, but the pmu has cpu map and
it's passed to evsel.  So there's confusion somewhere
whether it should use evsel->cpus or a dummy map.

Thanks
Namhyung

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
  2020-09-23 14:15         ` Namhyung Kim
@ 2020-09-23 20:19           ` Jiri Olsa
  2020-09-24 14:36             ` Namhyung Kim
  0 siblings, 1 reply; 15+ messages in thread
From: Jiri Olsa @ 2020-09-23 20:19 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Mark Rutland, Andi Kleen, Alexander Shishkin, Alexey Budankov,
	Adrian Hunter, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Andi Kleen, Li Bin, Wei Li, Ingo Molnar,
	linux-arm-kernel

On Wed, Sep 23, 2020 at 11:15:06PM +0900, Namhyung Kim wrote:
> On Wed, Sep 23, 2020 at 11:08 PM Jiri Olsa <jolsa@redhat.com> wrote:
> >
> > On Wed, Sep 23, 2020 at 10:49:52PM +0900, Namhyung Kim wrote:
> > > On Wed, Sep 23, 2020 at 2:44 PM Jiri Olsa <jolsa@redhat.com> wrote:
> > > >
> > > > On Tue, Sep 22, 2020 at 11:13:45AM +0800, Wei Li wrote:
> > > > > When executing perf stat with armv8_pmu events with a workload, it will
> > > > > report a segfault as result.
> > > >
> > > > please share the perf stat command line you see that segfault for
> > >
> > > It seems the description in the patch 0/2 already has it:
> > >
> > >   [root@localhost hulk]# tools/perf/perf stat  -e
> > > armv8_pmuv3_0/ll_cache_rd/,armv8_pmuv3_0/ll_cache_miss_rd/ ls >
> > > /dev/null
> > >   Segmentation fault
> >
> > yea I found it, but can't reproduce it.. I see the issue from
> > patch 2, but not sure what's the problem so far
> 
> I think the problem is that armv8_pmu has a cpumask,
> and the user requested per-task events.
> 
> The code tried to open the event with a dummy cpu map
> since it's not a cpu event, but the pmu has cpu map and
> it's passed to evsel.  So there's confusion somewhere
> whether it should use evsel->cpus or a dummy map.

you're right, I have following cpus file in pmu:

  # cat /sys/devices/armv8_pmuv3_0/cpus 
  0-3

covering all the cpus.. and once you have cpumask/cpus file,
you're system wide by default in current code, but we should
not crash ;-)

I tried to cover this case in patch below and I probably broke
some other use cases, but perhaps we could allow to open counters
per cpus for given workload

I'll try to look at this more tomorrow

jirka


---
diff --git a/tools/perf/builtin-stat.c b/tools/perf/builtin-stat.c
index 7f8d756d9408..0c7f16a673c2 100644
--- a/tools/perf/builtin-stat.c
+++ b/tools/perf/builtin-stat.c
@@ -379,12 +379,7 @@ static int read_affinity_counters(struct timespec *rs)
 	if (affinity__setup(&affinity) < 0)
 		return -1;
 
-	ncpus = perf_cpu_map__nr(evsel_list->core.all_cpus);
-	if (!target__has_cpu(&target) || target__has_per_thread(&target))
-		ncpus = 1;
 	evlist__for_each_cpu(evsel_list, i, cpu) {
-		if (i >= ncpus)
-			break;
 		affinity__set(&affinity, cpu);
 
 		evlist__for_each_entry(evsel_list, counter) {
diff --git a/tools/perf/util/evsel.c b/tools/perf/util/evsel.c
index fd865002cbbd..ef525eb2f619 100644
--- a/tools/perf/util/evsel.c
+++ b/tools/perf/util/evsel.c
@@ -1861,6 +1861,16 @@ void evsel__close(struct evsel *evsel)
 	perf_evsel__free_id(&evsel->core);
 }
 
+int evsel__open_threads_per_cpu(struct evsel *evsel, struct perf_thread_map *threads,
+				struct perf_cpu_map *cpus, int cpu)
+{
+	if (cpu == -1)
+		return evsel__open_cpu(evsel, cpus, threads, 0,
+					cpus ? cpus->nr : 1);
+
+	return evsel__open_cpu(evsel, cpus, threads, cpu, cpu + 1);
+}
+
 int evsel__open_per_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, int cpu)
 {
 	if (cpu == -1)
diff --git a/tools/perf/util/evsel.h b/tools/perf/util/evsel.h
index 35e3f6d66085..1d055699bd1f 100644
--- a/tools/perf/util/evsel.h
+++ b/tools/perf/util/evsel.h
@@ -231,6 +231,8 @@ int evsel__enable(struct evsel *evsel);
 int evsel__disable(struct evsel *evsel);
 int evsel__disable_cpu(struct evsel *evsel, int cpu);
 
+int evsel__open_threads_per_cpu(struct evsel *evsel, struct perf_thread_map *threads,
+				struct perf_cpu_map *cpus, int cpu);
 int evsel__open_per_cpu(struct evsel *evsel, struct perf_cpu_map *cpus, int cpu);
 int evsel__open_per_thread(struct evsel *evsel, struct perf_thread_map *threads);
 int evsel__open(struct evsel *evsel, struct perf_cpu_map *cpus,
diff --git a/tools/perf/util/stat.c b/tools/perf/util/stat.c
index cdb154381a87..2b17f1315cfb 100644
--- a/tools/perf/util/stat.c
+++ b/tools/perf/util/stat.c
@@ -560,6 +560,11 @@ int create_perf_stat_counter(struct evsel *evsel,
 			attr->enable_on_exec = 1;
 	}
 
+	if (evsel->core.own_cpus && evsel->core.threads) {
+		return evsel__open_threads_per_cpu(evsel, evsel->core.threads,
+						   evsel__cpus(evsel), cpu);
+	}
+
 	if (target__has_cpu(target) && !target__has_per_thread(target))
 		return evsel__open_per_cpu(evsel, evsel__cpus(evsel), cpu);
 


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
  2020-09-22 19:50     ` Andi Kleen
@ 2020-09-24 14:14       ` liwei (GF)
  0 siblings, 0 replies; 15+ messages in thread
From: liwei (GF) @ 2020-09-24 14:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Mark Rutland, Alexander Shishkin, Alexey Budankov, Adrian Hunter,
	Arnaldo Carvalho de Melo, linux-kernel, Peter Zijlstra,
	Ingo Molnar, huawei.libin, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel

Hi Andi,

On 2020/9/23 3:50, Andi Kleen wrote:
> On Tue, Sep 22, 2020 at 12:23:21PM -0700, Andi Kleen wrote:
>>> After debugging, i found the root reason is that the xyarray fd is created
>>> by evsel__open_per_thread() ignoring the cpu passed in
>>> create_perf_stat_counter(), while the evsel' cpumap is assigned as the
>>> corresponding PMU's cpumap in __add_event(). Thus, the xyarray fd is created
>>> with ncpus of dummy cpumap and an out of bounds 'cpu' index will be used in
>>> perf_evsel__close_fd_cpu().
>>>
>>> To address this, add a flag to mark this situation and avoid using the
>>> affinity technique when closing/enabling/disabling events.
>>
>> The flag seems like a hack. How about figuring out the correct number of 
>> CPUs and using that?
> 
> Also would like to understand what's different on ARM64 than other architectures.
> Or could this happen on x86 too?
> 

The problem is that when the user requests per-task events, the cpumask is expected
as NULL(dummy), while the armv8_pmu do has a cpumask which inherited by evsel.
The armv8_pmu's cpumask was added for heterogeneous systems. So this issue can not
happen on x86.

In fact, the cpumask is correct indeed, but it should't be used when we requesting
per-task events. As these events should be install on all cores, i doubt how much we
can benefit from the affinity technique, so i choosed to add a flag.

I also did a test on hisilicon arm64 d06 board, with 2 sockets 128 cores.
Testing the following command 3 times, with/without the affinity technique:

time tools/perf/perf stat -ddd -C 0-127 --per-core --timeout=5000 2> /dev/null

* (HEAD detached at 7074674e7338) perf cpumap: Maintain cpumaps ordered and without dups
real	0m8.039s
user	0m0.402s
sys	0m2.582s

real	0m7.939s
user	0m0.360s
sys	0m2.560s

real	0m7.997s
user	0m0.358s
sys	0m2.586s

* (HEAD detached at 704e2f5b700d) perf stat: Use affinity for enabling/disabling events
real	0m7.954s
user	0m0.308s
sys	0m2.590s

real	0m12.959s
user	0m0.332s
sys	0m2.582s

real	0m18.009s
user	0m0.346s
sys	0m2.562s

The offcpu time is much longer when using affinity, i think that's what migration costs,
could you please share me your test case?

Thanks,
Wei

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
  2020-09-23 20:19           ` Jiri Olsa
@ 2020-09-24 14:36             ` Namhyung Kim
  2020-09-25 21:01               ` Jiri Olsa
  2020-10-02  8:59               ` Jiri Olsa
  0 siblings, 2 replies; 15+ messages in thread
From: Namhyung Kim @ 2020-09-24 14:36 UTC (permalink / raw)
  To: Jiri Olsa
  Cc: Mark Rutland, Andi Kleen, Alexander Shishkin, Alexey Budankov,
	Adrian Hunter, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Andi Kleen, Li Bin, Wei Li, Ingo Molnar,
	linux-arm-kernel

On Wed, Sep 23, 2020 at 10:19:00PM +0200, Jiri Olsa wrote:
> On Wed, Sep 23, 2020 at 11:15:06PM +0900, Namhyung Kim wrote:
> > I think the problem is that armv8_pmu has a cpumask,
> > and the user requested per-task events.
> > 
> > The code tried to open the event with a dummy cpu map
> > since it's not a cpu event, but the pmu has cpu map and
> > it's passed to evsel.  So there's confusion somewhere
> > whether it should use evsel->cpus or a dummy map.
> 
> you're right, I have following cpus file in pmu:
> 
>   # cat /sys/devices/armv8_pmuv3_0/cpus 
>   0-3
> 
> covering all the cpus.. and once you have cpumask/cpus file,
> you're system wide by default in current code, but we should
> not crash ;-)
> 
> I tried to cover this case in patch below and I probably broke
> some other use cases, but perhaps we could allow to open counters
> per cpus for given workload
> 
> I'll try to look at this more tomorrow

I'm thinking about a different approach, we can ignore cpu map
for the ARM cpu PMU and use the dummy, not tested ;-)

Thanks
Namhyung


diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
index 2208444ecb44..cfcdbd7be066 100644
--- a/tools/lib/perf/evlist.c
+++ b/tools/lib/perf/evlist.c
@@ -45,6 +45,9 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
        if (!evsel->own_cpus || evlist->has_user_cpus) {
                perf_cpu_map__put(evsel->cpus);
                evsel->cpus = perf_cpu_map__get(evlist->cpus);
+       } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
+               perf_cpu_map__put(evsel->cpus);
+               evsel->cpus = perf_cpu_map__get(evlist->cpus);
        } else if (evsel->cpus != evsel->own_cpus) {
                perf_cpu_map__put(evsel->cpus);
                evsel->cpus = perf_cpu_map__get(evsel->own_cpus);

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
  2020-09-24 14:36             ` Namhyung Kim
@ 2020-09-25 21:01               ` Jiri Olsa
  2020-10-02  8:59               ` Jiri Olsa
  1 sibling, 0 replies; 15+ messages in thread
From: Jiri Olsa @ 2020-09-25 21:01 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Mark Rutland, Andi Kleen, Alexander Shishkin, Will Deacon,
	Alexey Budankov, Adrian Hunter, Arnaldo Carvalho de Melo,
	linux-kernel, Peter Zijlstra, Andi Kleen, Li Bin, Wei Li,
	Ingo Molnar, linux-arm-kernel

On Thu, Sep 24, 2020 at 11:36:23PM +0900, Namhyung Kim wrote:
> On Wed, Sep 23, 2020 at 10:19:00PM +0200, Jiri Olsa wrote:
> > On Wed, Sep 23, 2020 at 11:15:06PM +0900, Namhyung Kim wrote:
> > > I think the problem is that armv8_pmu has a cpumask,
> > > and the user requested per-task events.
> > > 
> > > The code tried to open the event with a dummy cpu map
> > > since it's not a cpu event, but the pmu has cpu map and
> > > it's passed to evsel.  So there's confusion somewhere
> > > whether it should use evsel->cpus or a dummy map.
> > 
> > you're right, I have following cpus file in pmu:
> > 
> >   # cat /sys/devices/armv8_pmuv3_0/cpus 
> >   0-3
> > 
> > covering all the cpus.. and once you have cpumask/cpus file,
> > you're system wide by default in current code, but we should
> > not crash ;-)
> > 
> > I tried to cover this case in patch below and I probably broke
> > some other use cases, but perhaps we could allow to open counters
> > per cpus for given workload
> > 
> > I'll try to look at this more tomorrow
> 
> I'm thinking about a different approach, we can ignore cpu map
> for the ARM cpu PMU and use the dummy, not tested ;-)
> 
> Thanks
> Namhyung
> 
> 
> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> index 2208444ecb44..cfcdbd7be066 100644
> --- a/tools/lib/perf/evlist.c
> +++ b/tools/lib/perf/evlist.c
> @@ -45,6 +45,9 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
>         if (!evsel->own_cpus || evlist->has_user_cpus) {
>                 perf_cpu_map__put(evsel->cpus);
>                 evsel->cpus = perf_cpu_map__get(evlist->cpus);
> +       } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
> +               perf_cpu_map__put(evsel->cpus);
> +               evsel->cpus = perf_cpu_map__get(evlist->cpus);

but I like this fix, because mine messes up scaling ;-)

>         } else if (evsel->cpus != evsel->own_cpus) {
>                 perf_cpu_map__put(evsel->cpus);
>                 evsel->cpus = perf_cpu_map__get(evsel->own_cpus);

it looks like that cpus (/sys/devices/armv8_pmuv3_0/cpus) file
can have some cpus switched off.. from brief scan of:

  drivers/perf/arm_pmu_platform.c (search for supported_cpus)

but it looks like it's just check for interrupt, so counting
might still work..?

thanks,
jirka


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
  2020-09-24 14:36             ` Namhyung Kim
  2020-09-25 21:01               ` Jiri Olsa
@ 2020-10-02  8:59               ` Jiri Olsa
  2020-10-06  6:51                 ` Song Bao Hua (Barry Song)
  1 sibling, 1 reply; 15+ messages in thread
From: Jiri Olsa @ 2020-10-02  8:59 UTC (permalink / raw)
  To: Namhyung Kim, Wei Li
  Cc: Mark Rutland, Andi Kleen, Alexander Shishkin, Alexey Budankov,
	Adrian Hunter, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Andi Kleen, Li Bin, Ingo Molnar,
	linux-arm-kernel

On Thu, Sep 24, 2020 at 11:36:23PM +0900, Namhyung Kim wrote:
> On Wed, Sep 23, 2020 at 10:19:00PM +0200, Jiri Olsa wrote:
> > On Wed, Sep 23, 2020 at 11:15:06PM +0900, Namhyung Kim wrote:
> > > I think the problem is that armv8_pmu has a cpumask,
> > > and the user requested per-task events.
> > > 
> > > The code tried to open the event with a dummy cpu map
> > > since it's not a cpu event, but the pmu has cpu map and
> > > it's passed to evsel.  So there's confusion somewhere
> > > whether it should use evsel->cpus or a dummy map.
> > 
> > you're right, I have following cpus file in pmu:
> > 
> >   # cat /sys/devices/armv8_pmuv3_0/cpus 
> >   0-3
> > 
> > covering all the cpus.. and once you have cpumask/cpus file,
> > you're system wide by default in current code, but we should
> > not crash ;-)
> > 
> > I tried to cover this case in patch below and I probably broke
> > some other use cases, but perhaps we could allow to open counters
> > per cpus for given workload
> > 
> > I'll try to look at this more tomorrow
> 
> I'm thinking about a different approach, we can ignore cpu map
> for the ARM cpu PMU and use the dummy, not tested ;-)
> 
> Thanks
> Namhyung
> 
> 
> diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> index 2208444ecb44..cfcdbd7be066 100644
> --- a/tools/lib/perf/evlist.c
> +++ b/tools/lib/perf/evlist.c
> @@ -45,6 +45,9 @@ static void __perf_evlist__propagate_maps(struct perf_evlist *evlist,
>         if (!evsel->own_cpus || evlist->has_user_cpus) {
>                 perf_cpu_map__put(evsel->cpus);
>                 evsel->cpus = perf_cpu_map__get(evlist->cpus);
> +       } else if (!evsel->system_wide && perf_cpu_map__empty(evlist->cpus)) {
> +               perf_cpu_map__put(evsel->cpus);
> +               evsel->cpus = perf_cpu_map__get(evlist->cpus);
>         } else if (evsel->cpus != evsel->own_cpus) {
>                 perf_cpu_map__put(evsel->cpus);
>                 evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
> 

Wei Li,
is this fixing your problem?

thanks,
jirka


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

* RE: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events
  2020-10-02  8:59               ` Jiri Olsa
@ 2020-10-06  6:51                 ` Song Bao Hua (Barry Song)
  0 siblings, 0 replies; 15+ messages in thread
From: Song Bao Hua (Barry Song) @ 2020-10-06  6:51 UTC (permalink / raw)
  To: Jiri Olsa, Namhyung Kim, liwei (GF)
  Cc: Mark Rutland, Andi Kleen, Alexander Shishkin, Alexey Budankov,
	Adrian Hunter, Arnaldo Carvalho de Melo, linux-kernel,
	Peter Zijlstra, Andi Kleen, Libin (Huawei),
	Ingo Molnar, linux-arm-kernel



> -----Original Message-----
> From: Jiri Olsa [mailto:jolsa@redhat.com]
> Sent: Friday, October 2, 2020 10:00 PM
> To: Namhyung Kim <namhyung@kernel.org>; liwei (GF)
> <liwei391@huawei.com>
> Cc: Mark Rutland <mark.rutland@arm.com>; Andi Kleen <ak@linux.intel.com>;
> Alexander Shishkin <alexander.shishkin@linux.intel.com>; Alexey Budankov
> <alexey.budankov@linux.intel.com>; Adrian Hunter
> <adrian.hunter@intel.com>; Arnaldo Carvalho de Melo <acme@kernel.org>;
> linux-kernel <linux-kernel@vger.kernel.org>; Peter Zijlstra
> <peterz@infradead.org>; Andi Kleen <andi@firstfloor.org>; Libin (Huawei)
> <huawei.libin@huawei.com>; Ingo Molnar <mingo@redhat.com>;
> linux-arm-kernel@lists.infradead.org
> Subject: Re: [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu
> events
> 
> On Thu, Sep 24, 2020 at 11:36:23PM +0900, Namhyung Kim wrote:
> > On Wed, Sep 23, 2020 at 10:19:00PM +0200, Jiri Olsa wrote:
> > > On Wed, Sep 23, 2020 at 11:15:06PM +0900, Namhyung Kim wrote:
> > > > I think the problem is that armv8_pmu has a cpumask,
> > > > and the user requested per-task events.
> > > >
> > > > The code tried to open the event with a dummy cpu map
> > > > since it's not a cpu event, but the pmu has cpu map and
> > > > it's passed to evsel.  So there's confusion somewhere
> > > > whether it should use evsel->cpus or a dummy map.
> > >
> > > you're right, I have following cpus file in pmu:
> > >
> > >   # cat /sys/devices/armv8_pmuv3_0/cpus
> > >   0-3
> > >
> > > covering all the cpus.. and once you have cpumask/cpus file,
> > > you're system wide by default in current code, but we should
> > > not crash ;-)
> > >
> > > I tried to cover this case in patch below and I probably broke
> > > some other use cases, but perhaps we could allow to open counters
> > > per cpus for given workload
> > >
> > > I'll try to look at this more tomorrow
> >
> > I'm thinking about a different approach, we can ignore cpu map
> > for the ARM cpu PMU and use the dummy, not tested ;-)
> >
> > Thanks
> > Namhyung
> >
> >
> > diff --git a/tools/lib/perf/evlist.c b/tools/lib/perf/evlist.c
> > index 2208444ecb44..cfcdbd7be066 100644
> > --- a/tools/lib/perf/evlist.c
> > +++ b/tools/lib/perf/evlist.c
> > @@ -45,6 +45,9 @@ static void __perf_evlist__propagate_maps(struct
> perf_evlist *evlist,
> >         if (!evsel->own_cpus || evlist->has_user_cpus) {
> >                 perf_cpu_map__put(evsel->cpus);
> >                 evsel->cpus = perf_cpu_map__get(evlist->cpus);
> > +       } else if (!evsel->system_wide &&
> perf_cpu_map__empty(evlist->cpus)) {
> > +               perf_cpu_map__put(evsel->cpus);
> > +               evsel->cpus = perf_cpu_map__get(evlist->cpus);
> >         } else if (evsel->cpus != evsel->own_cpus) {
> >                 perf_cpu_map__put(evsel->cpus);
> >                 evsel->cpus = perf_cpu_map__get(evsel->own_cpus);
> >
> 
> Wei Li,
> is this fixing your problem?

As I have seen the same crash and suggested another patch:
[PATCH] perf evlist: fix memory corruption for Kernel PMU event
https://lore.kernel.org/lkml/20201001115729.27116-1-song.bao.hua@hisilicon.com/

Also, I have tested Namhyung Kim's patch on ARM64. It does fix the crash for me. So:
Tested-by: Barry Song <song.bao.hua@hisilicon.com>

Please put the below fixes-tag in commit log:
Fixes: 7736627b865d ("perf stat: Use affinity for closing file descriptors")

Thanks
Barry

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2020-10-06  6:53 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-09-22  3:13 [PATCH 0/2] perf stat: Unbreak perf stat with ARMv8 PMU events Wei Li
2020-09-22  3:13 ` [PATCH 1/2] perf stat: Fix segfault when counting armv8_pmu events Wei Li
2020-09-22 19:23   ` Andi Kleen
2020-09-22 19:50     ` Andi Kleen
2020-09-24 14:14       ` liwei (GF)
2020-09-23  5:44   ` Jiri Olsa
2020-09-23 13:49     ` Namhyung Kim
2020-09-23 14:07       ` Jiri Olsa
2020-09-23 14:15         ` Namhyung Kim
2020-09-23 20:19           ` Jiri Olsa
2020-09-24 14:36             ` Namhyung Kim
2020-09-25 21:01               ` Jiri Olsa
2020-10-02  8:59               ` Jiri Olsa
2020-10-06  6:51                 ` Song Bao Hua (Barry Song)
2020-09-22  3:13 ` [PATCH 2/2] perf stat: Unbreak perf stat with " Wei Li

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).