[1/5] perf metricgroup: Support printing metrics for arm64
diff mbox series

Message ID 1614784938-27080-2-git-send-email-john.garry@huawei.com
State New, archived
Headers show
Series
  • perf arm64 metricgroup support
Related show

Commit Message

John Garry March 3, 2021, 3:22 p.m. UTC
Calling perf_pmu__find_map(NULL) returns the cpumap for the common CPU
PMU. However arm64 supports heterogeneous-CPU based systems, and so there
may be no common CPU PMU. As such, perf_pmu__find_map(NULL) returns NULL
for arm64.

To support printing metrics for arm64, iterate through all PMUs, looking
for a CPU PMU, and use the cpumap there for determining supported metrics.

For heterogeneous systems (like arm big.LITTLE), supporting metrics has
potential challenges, like not all CPUs in a system not supporting a
specific metric event. So just don't support it for now.

Signed-off-by: John Garry <john.garry@huawei.com>
---
 tools/perf/util/metricgroup.c | 24 +++++++++++++++++++++++-
 1 file changed, 23 insertions(+), 1 deletion(-)

Comments

Jiri Olsa March 4, 2021, 8:05 p.m. UTC | #1
On Wed, Mar 03, 2021 at 11:22:14PM +0800, John Garry wrote:
> Calling perf_pmu__find_map(NULL) returns the cpumap for the common CPU
> PMU. However arm64 supports heterogeneous-CPU based systems, and so there
> may be no common CPU PMU. As such, perf_pmu__find_map(NULL) returns NULL
> for arm64.
> 
> To support printing metrics for arm64, iterate through all PMUs, looking
> for a CPU PMU, and use the cpumap there for determining supported metrics.
> 
> For heterogeneous systems (like arm big.LITTLE), supporting metrics has
> potential challenges, like not all CPUs in a system not supporting a
> specific metric event. So just don't support it for now.
> 
> Signed-off-by: John Garry <john.garry@huawei.com>
> ---
>  tools/perf/util/metricgroup.c | 24 +++++++++++++++++++++++-
>  1 file changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index 26c990e32378..9a2a23093961 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -6,6 +6,7 @@
>  /* Manage metrics and groups of metrics from JSON files */
>  
>  #include "metricgroup.h"
> +#include "cpumap.h"
>  #include "debug.h"
>  #include "evlist.h"
>  #include "evsel.h"
> @@ -615,10 +616,31 @@ static int metricgroup__print_sys_event_iter(struct pmu_event *pe, void *data)
>  				     d->details, d->groups, d->metriclist);
>  }
>  
> +static struct pmu_events_map *find_cpumap(void)
> +{
> +	struct perf_pmu *pmu = NULL;
> +
> +	while ((pmu = perf_pmu__scan(pmu))) {
> +		if (!is_pmu_core(pmu->name))
> +			continue;
> +
> +		/*
> +		 * The cpumap should cover all CPUs. Otherwise, some CPUs may
> +		 * not support some events or have different event IDs.
> +		 */
> +		if (pmu->cpus && pmu->cpus->nr != cpu__max_cpu())
> +			return NULL;
> +
> +		return perf_pmu__find_map(pmu);
> +	}
> +
> +	return NULL;
> +}
> +
>  void metricgroup__print(bool metrics, bool metricgroups, char *filter,
>  			bool raw, bool details)
>  {
> -	struct pmu_events_map *map = perf_pmu__find_map(NULL);
> +	struct pmu_events_map *map = find_cpumap();

so this is just for arm at the moment right?

could we rather make this arch specific code, so we don't need
to do the scanning on archs where this is not needed?

like marking perf_pmu__find_map as __weak and add arm specific
version?

thanks,
jirka

>  	struct pmu_event *pe;
>  	int i;
>  	struct rblist groups;
> -- 
> 2.26.2
>
John Garry March 5, 2021, 11:06 a.m. UTC | #2
Hi Jirka,

>> -	struct pmu_events_map *map = perf_pmu__find_map(NULL);
>> +	struct pmu_events_map *map = find_cpumap();
> so this is just for arm at the moment right?
> 

Yes - but to be more accurate, arm64.

At the moment, from the archs which use pmu-events, only arm64 and nds32 
have versions of get_cpuid_str() which require a non-NULL pmu argument.

But then apparently nds32 only supports a single CPU, so this issue of 
heterogeneous CPUs should not be a concern there :)

> could we rather make this arch specific code, so we don't need
> to do the scanning on archs where this is not needed?
> 
> like marking perf_pmu__find_map as __weak and add arm specific
> version?

Well I was thinking that this code should not be in metricgroup.c anyway.

So there is code which is common in current perf_pmu__find_map() for all 
archs.

I could factor that out into a common function, below. Just a bit 
worried about perf_pmu__find_map() and perf_pmu__find_pmu_map() being 
confused.

Here's how that would look:

+++ b/tools/perf/arch/arm64/util/pmu.c

#include "../../util/cpumap.h"
#include "../../util/pmu.h"

struct pmu_events_map *perf_pmu__find_map(void)
{
	struct perf_pmu *pmu = perf_pmu__find("armv8_pmuv3_0");

	if (!pmu || !pmu->cpus || pmu->cpus->nr != cpu__max_cpu())
		return NULL;

	return perf_pmu__find_pmu_map(pmu);
}

And:

diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 26c990e32378..312164ce9299 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -618,7 +618,7 @@ static int metricgroup__print_sys_event_iter(struct 
pmu_event *pe, void *data)
  void metricgroup__print(bool metrics, bool metricgroups, char *filter,
  			bool raw, bool details)
  {
-	struct pmu_events_map *map = perf_pmu__find_map(NULL);
+	struct pmu_events_map *map = perf_pmu__find_map();
  	struct pmu_event *pe;
  	int i;
  	struct rblist groups;
@@ -1253,8 +1253,7 @@ int metricgroup__parse_groups(const struct option 
*opt,
  			      struct rblist *metric_events)
  {
  	struct evlist *perf_evlist = *(struct evlist **)opt->value;
-	struct pmu_events_map *map = perf_pmu__find_map(NULL);
-
+	struct pmu_events_map *map = perf_pmu__find_map();

  	return parse_groups(perf_evlist, str, metric_no_group,
  			    metric_no_merge, NULL, metric_events, map);
@@ -1273,7 +1272,7 @@ int metricgroup__parse_groups_test(struct evlist 
*evlist,

  bool metricgroup__has_metric(const char *metric)
  {
-	struct pmu_events_map *map = perf_pmu__find_map(NULL);
+	struct pmu_events_map *map = perf_pmu__find_map();
  	struct pmu_event *pe;
  	int i;

diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
index 44ef28302fc7..d49bf20b6058 100644
--- a/tools/perf/util/pmu.c
+++ b/tools/perf/util/pmu.c
@@ -690,7 +690,7 @@ static char *perf_pmu__getcpuid(struct perf_pmu *pmu)
  	return cpuid;
  }

-struct pmu_events_map *perf_pmu__find_map(struct perf_pmu *pmu)
+struct pmu_events_map *perf_pmu__find_pmu_map(struct perf_pmu *pmu)
  {
  	struct pmu_events_map *map;
  	char *cpuid = perf_pmu__getcpuid(pmu);
@@ -717,6 +717,11 @@ struct pmu_events_map *perf_pmu__find_map(struct 
perf_pmu *pmu)
  	return map;
  }

+struct pmu_events_map *__weak perf_pmu__find_map(void)
+{
+	return perf_pmu__find_pmu_map(NULL);
+}
+
  bool pmu_uncore_alias_match(const char *pmu_name, const char *name)
  {
  	char *tmp = NULL, *tok, *str;
@@ -805,7 +810,7 @@ static void pmu_add_cpu_aliases(struct list_head 
*head, struct perf_pmu *pmu)
  {
  	struct pmu_events_map *map;

-	map = perf_pmu__find_map(pmu);
+	map = perf_pmu__find_pmu_map(pmu);
  	if (!map)
  		return;


Thoughts?

Thanks!
Jiri Olsa March 6, 2021, 7:34 p.m. UTC | #3
On Fri, Mar 05, 2021 at 11:06:58AM +0000, John Garry wrote:
> 
> Hi Jirka,
> 
> > > -	struct pmu_events_map *map = perf_pmu__find_map(NULL);
> > > +	struct pmu_events_map *map = find_cpumap();
> > so this is just for arm at the moment right?
> > 
> 
> Yes - but to be more accurate, arm64.
> 
> At the moment, from the archs which use pmu-events, only arm64 and nds32
> have versions of get_cpuid_str() which require a non-NULL pmu argument.
> 
> But then apparently nds32 only supports a single CPU, so this issue of
> heterogeneous CPUs should not be a concern there :)
> 
> > could we rather make this arch specific code, so we don't need
> > to do the scanning on archs where this is not needed?
> > 
> > like marking perf_pmu__find_map as __weak and add arm specific
> > version?
> 
> Well I was thinking that this code should not be in metricgroup.c anyway.
> 
> So there is code which is common in current perf_pmu__find_map() for all
> archs.
> 
> I could factor that out into a common function, below. Just a bit worried
> about perf_pmu__find_map() and perf_pmu__find_pmu_map() being confused.

right, so perf_pmu__find_map does not take perf_pmu as argument
anymore, so the prefix does not fit, how about pmu_events_map__find ?

thanks,
jirka


> 
> Here's how that would look:
> 
> +++ b/tools/perf/arch/arm64/util/pmu.c
> 
> #include "../../util/cpumap.h"
> #include "../../util/pmu.h"
> 
> struct pmu_events_map *perf_pmu__find_map(void)
> {
> 	struct perf_pmu *pmu = perf_pmu__find("armv8_pmuv3_0");
> 
> 	if (!pmu || !pmu->cpus || pmu->cpus->nr != cpu__max_cpu())
> 		return NULL;
> 
> 	return perf_pmu__find_pmu_map(pmu);
> }
> 
> And:
> 
> diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
> index 26c990e32378..312164ce9299 100644
> --- a/tools/perf/util/metricgroup.c
> +++ b/tools/perf/util/metricgroup.c
> @@ -618,7 +618,7 @@ static int metricgroup__print_sys_event_iter(struct
> pmu_event *pe, void *data)
>  void metricgroup__print(bool metrics, bool metricgroups, char *filter,
>  			bool raw, bool details)
>  {
> -	struct pmu_events_map *map = perf_pmu__find_map(NULL);
> +	struct pmu_events_map *map = perf_pmu__find_map();
>  	struct pmu_event *pe;
>  	int i;
>  	struct rblist groups;
> @@ -1253,8 +1253,7 @@ int metricgroup__parse_groups(const struct option
> *opt,
>  			      struct rblist *metric_events)
>  {
>  	struct evlist *perf_evlist = *(struct evlist **)opt->value;
> -	struct pmu_events_map *map = perf_pmu__find_map(NULL);
> -
> +	struct pmu_events_map *map = perf_pmu__find_map();
> 
>  	return parse_groups(perf_evlist, str, metric_no_group,
>  			    metric_no_merge, NULL, metric_events, map);
> @@ -1273,7 +1272,7 @@ int metricgroup__parse_groups_test(struct evlist
> *evlist,
> 
>  bool metricgroup__has_metric(const char *metric)
>  {
> -	struct pmu_events_map *map = perf_pmu__find_map(NULL);
> +	struct pmu_events_map *map = perf_pmu__find_map();
>  	struct pmu_event *pe;
>  	int i;
> 
> diff --git a/tools/perf/util/pmu.c b/tools/perf/util/pmu.c
> index 44ef28302fc7..d49bf20b6058 100644
> --- a/tools/perf/util/pmu.c
> +++ b/tools/perf/util/pmu.c
> @@ -690,7 +690,7 @@ static char *perf_pmu__getcpuid(struct perf_pmu *pmu)
>  	return cpuid;
>  }
> 
> -struct pmu_events_map *perf_pmu__find_map(struct perf_pmu *pmu)
> +struct pmu_events_map *perf_pmu__find_pmu_map(struct perf_pmu *pmu)
>  {
>  	struct pmu_events_map *map;
>  	char *cpuid = perf_pmu__getcpuid(pmu);
> @@ -717,6 +717,11 @@ struct pmu_events_map *perf_pmu__find_map(struct
> perf_pmu *pmu)
>  	return map;
>  }
> 
> +struct pmu_events_map *__weak perf_pmu__find_map(void)
> +{
> +	return perf_pmu__find_pmu_map(NULL);
> +}
> +
>  bool pmu_uncore_alias_match(const char *pmu_name, const char *name)
>  {
>  	char *tmp = NULL, *tok, *str;
> @@ -805,7 +810,7 @@ static void pmu_add_cpu_aliases(struct list_head *head,
> struct perf_pmu *pmu)
>  {
>  	struct pmu_events_map *map;
> 
> -	map = perf_pmu__find_map(pmu);
> +	map = perf_pmu__find_pmu_map(pmu);
>  	if (!map)
>  		return;
> 
> 
> Thoughts?
> 
> Thanks!
>
John Garry March 8, 2021, 4:34 p.m. UTC | #4
On 06/03/2021 19:34, Jiri Olsa wrote:
> On Fri, Mar 05, 2021 at 11:06:58AM +0000, John Garry wrote:
>> Hi Jirka,
>>
>>>> -	struct pmu_events_map *map = perf_pmu__find_map(NULL);
>>>> +	struct pmu_events_map *map = find_cpumap();
>>> so this is just for arm at the moment right?
>>>
>> Yes - but to be more accurate, arm64.
>>
>> At the moment, from the archs which use pmu-events, only arm64 and nds32
>> have versions of get_cpuid_str() which require a non-NULL pmu argument.
>>
>> But then apparently nds32 only supports a single CPU, so this issue of
>> heterogeneous CPUs should not be a concern there:)
>>
>>> could we rather make this arch specific code, so we don't need
>>> to do the scanning on archs where this is not needed?
>>>
>>> like marking perf_pmu__find_map as __weak and add arm specific
>>> version?
>> Well I was thinking that this code should not be in metricgroup.c anyway.
>>
>> So there is code which is common in current perf_pmu__find_map() for all
>> archs.
>>
>> I could factor that out into a common function, below. Just a bit worried
>> about perf_pmu__find_map() and perf_pmu__find_pmu_map() being confused.
> right, so perf_pmu__find_map does not take perf_pmu as argument
> anymore, so the prefix does not fit, how about pmu_events_map__find ?

I think it could be ok.

But now I am slightly concerned that we don't put anything like this in 
arch/arm64, based on this earlier discussion on close topic:

https://lore.kernel.org/lkml/20190719075450.xcm4i4a5sfaxlfap@willie-the-truck/

Hi Will, Mark,

Do you have any objection to add arm64 specific code here?

So what I had originally in this patch was to iterate PMUs  in common 
code and find the CPU PMU and use that to match CPU metrics, as long as 
it's not a heterogeneous system.

Now the suggestion was to move that into arch specific code, as it's not 
needed for all archs.

Thanks,
John
John Garry March 11, 2021, 8:47 a.m. UTC | #5
On 06/03/2021 19:34, Jiri Olsa wrote:
> On Fri, Mar 05, 2021 at 11:06:58AM +0000, John Garry wrote:
>> Hi Jirka,
>>
>>>> -	struct pmu_events_map *map = perf_pmu__find_map(NULL);
>>>> +	struct pmu_events_map *map = find_cpumap();
>>> so this is just for arm at the moment right?
>>>
>> Yes - but to be more accurate, arm64.
>>
>> At the moment, from the archs which use pmu-events, only arm64 and nds32
>> have versions of get_cpuid_str() which require a non-NULL pmu argument.
>>
>> But then apparently nds32 only supports a single CPU, so this issue of
>> heterogeneous CPUs should not be a concern there:)
>>
>>> could we rather make this arch specific code, so we don't need
>>> to do the scanning on archs where this is not needed?
>>>
>>> like marking perf_pmu__find_map as __weak and add arm specific
>>> version?
>> Well I was thinking that this code should not be in metricgroup.c anyway.
>>
>> So there is code which is common in current perf_pmu__find_map() for all
>> archs.
>>
>> I could factor that out into a common function, below. Just a bit worried
>> about perf_pmu__find_map() and perf_pmu__find_pmu_map() being confused.
> right, so perf_pmu__find_map does not take perf_pmu as argument
> anymore, so the prefix does not fit, how about pmu_events_map__find ?

I just noticed this series:
https://lore.kernel.org/lkml/1612797946-18784-1-git-send-email-kan.liang@linux.intel.com/

Seems that this has metricgroup support for heterogeneous system config, 
while this series is metricgroup support for homogeneous system config 
for arch which supports heterogeneous system config. I need to check 
further for any conflicts.

@Kan Liang, it would be great if you could cc me on that series. I don't 
subscribe to the general list.

Thanks,
John

Patch
diff mbox series

diff --git a/tools/perf/util/metricgroup.c b/tools/perf/util/metricgroup.c
index 26c990e32378..9a2a23093961 100644
--- a/tools/perf/util/metricgroup.c
+++ b/tools/perf/util/metricgroup.c
@@ -6,6 +6,7 @@ 
 /* Manage metrics and groups of metrics from JSON files */
 
 #include "metricgroup.h"
+#include "cpumap.h"
 #include "debug.h"
 #include "evlist.h"
 #include "evsel.h"
@@ -615,10 +616,31 @@  static int metricgroup__print_sys_event_iter(struct pmu_event *pe, void *data)
 				     d->details, d->groups, d->metriclist);
 }
 
+static struct pmu_events_map *find_cpumap(void)
+{
+	struct perf_pmu *pmu = NULL;
+
+	while ((pmu = perf_pmu__scan(pmu))) {
+		if (!is_pmu_core(pmu->name))
+			continue;
+
+		/*
+		 * The cpumap should cover all CPUs. Otherwise, some CPUs may
+		 * not support some events or have different event IDs.
+		 */
+		if (pmu->cpus && pmu->cpus->nr != cpu__max_cpu())
+			return NULL;
+
+		return perf_pmu__find_map(pmu);
+	}
+
+	return NULL;
+}
+
 void metricgroup__print(bool metrics, bool metricgroups, char *filter,
 			bool raw, bool details)
 {
-	struct pmu_events_map *map = perf_pmu__find_map(NULL);
+	struct pmu_events_map *map = find_cpumap();
 	struct pmu_event *pe;
 	int i;
 	struct rblist groups;