linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Some questions about using the perf tool in ARM-SPE
@ 2023-06-09  7:14 蔡沅信
  2023-06-09  9:06 ` James Clark
  0 siblings, 1 reply; 16+ messages in thread
From: 蔡沅信 @ 2023-06-09  7:14 UTC (permalink / raw)
  To: linux-perf-users

hi linux-perf-users

I glad perf tool can support ARM-SPE.
however, I have encountered some problems in using
My perf tool version is 6.1. I compiled it to run on an Android system
"Here are my questions"

1.I can't use the perf c2c function to analyze, i'm not sure if it
only supports certain hardware architectures

>>user_shell:/data/local/tmp # ./perf c2c report
>> Failed setup nodes

2.I get more information via the command /.perf report --mem-mode but
I'm confused about the data

Snoop  Locked  Blocked     Local INSTR Latency
......         ..........  .       ..................
N/A            No          N/A           0
N/A            No          N/A           0
N/A            No          N/A           0
N/A            No          N/A           0
.               ,                .
.
.
I have observed a lot of data about Snoop&Locked&Blocked&Local INSTR
Latency and their results are always N/A&No&N/A&0 which makes me feel
like it is not supported. Please give me some information about it and
introduction.

Many Thanks
Best Regards
Zack.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
  2023-06-09  7:14 Some questions about using the perf tool in ARM-SPE 蔡沅信
@ 2023-06-09  9:06 ` James Clark
       [not found]   ` <CALDTKqg01+xJ2xu218c_QH2PbX9wdhYOiJfDieCXL5PHWV-6FQ@mail.gmail.com>
  0 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2023-06-09  9:06 UTC (permalink / raw)
  To: 蔡沅信, linux-perf-users



On 09/06/2023 08:14, 蔡沅信 wrote:
> hi linux-perf-users
> 
> I glad perf tool can support ARM-SPE.
> however, I have encountered some problems in using
> My perf tool version is 6.1. I compiled it to run on an Android system
> "Here are my questions"
> 
> 1.I can't use the perf c2c function to analyze, i'm not sure if it
> only supports certain hardware architectures
> 
>>> user_shell:/data/local/tmp # ./perf c2c report
>>> Failed setup nodes
> 
> 2.I get more information via the command /.perf report --mem-mode but
> I'm confused about the data
> 
> Snoop  Locked  Blocked     Local INSTR Latency
> ......         ..........  .       ..................
> N/A            No          N/A           0
> N/A            No          N/A           0
> N/A            No          N/A           0
> N/A            No          N/A           0
> .               ,                .
> .
> .
> I have observed a lot of data about Snoop&Locked&Blocked&Local INSTR
> Latency and their results are always N/A&No&N/A&0 which makes me feel
> like it is not supported. Please give me some information about it and
> introduction.
> 
> Many Thanks
> Best Regards
> Zack.

Hi Zack,

I wasn't able to reproduce your issue on Ubuntu, it could be something
to do with Android.

It looks like the error "Failed setup nodes" comes from the
setup_nodes() function. Can you debug it to see exactly which part of it
is failing? I see it's something to do with numa nodes, maybe that
doesn't work on Android or you need to add some numa stuff to the kernel
config. Or the function needs to handle missing data differently.

James

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
       [not found]     ` <77773641-26e5-a754-63cf-e7d3443e11fc@arm.com>
@ 2023-06-13 13:23       ` 蔡沅信
  2023-06-14  1:21         ` Leo Yan
  0 siblings, 1 reply; 16+ messages in thread
From: 蔡沅信 @ 2023-06-13 13:23 UTC (permalink / raw)
  To: James Clark, linux-perf-users

OK
I have a new discovery that c2c seems to support only certain Arm
Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it?

Using the Arm Statistical Profiling Extension to detect false
cache-line sharing | Blog | Linaro

Thanks
Zack

<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
乾淨無病毒。www.avg.com
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


James Clark <james.clark@arm.com> 於 2023年6月13日 週二 下午9:18寫道:
>
> Hi Zack,
>
> It looks like you replied just to me rather than to the list. Are you
> able to re-send this as a reply-all to the original post so others might
> be able to help as well?
>
> Thanks
> James
>
> On 12/06/2023 06:50, 蔡沅信 wrote:
> > Hi James.
> >
> > After my debug experiment. I found that there are seven CPUs on my platform
> > but I can't get them to work together
> > The first setting is divided into three types of clusters. This part is set
> > to three cluter through DTS
> >
> > user_shell:/sys/bus/event_source/devices # cat arm_spe_0/cpumask
> > arm_spe_1/cpumask arm_spe_2/cpumask
> > 7
> > 4-6
> > 0-3
> >
> > This makes it impossible for me to record the whole system together (cpu
> > 0-7). only one cluter can be recorded at the same time  thhe above is -v
> > information
> >
> > user_shell:/data/local/tmp # ./perf record -e
> > arm_spe/branch_filter=1,load_filter=1,store_filter=1,ts_enable=1,pct_enable=1,pa_enable=1,min_latency=10,jitter=1/
> > -a  -v  --vmlinux ./vmlinux -o text.data
> >  Warning: option `vmlinux' is being ignored because NO_DWARF=1
> > DEBUGINFOD_URLS=
> > nr_cblocks: 0
> > affinity: SYS
> > mmap flush: 1
> > comp level: 0
> > maps__set_modules_path_dir: cannot open
> > /lib/modules/6.1.25-android14-5-maybe-dirty-mainline dir
> > Problems setting modules path maps, continuing anyway...
> > mmap size 528384B
> > Control descriptor is not initialized
> > ^C[ perf record: Woken up 51 times to write data ]
> > failed to write feature CPUDESC
> > failed to write feature NUMA_TOPOLOGY
> > failed to write feature MEM_TOPOLOGY
> > failed to write feature CPU_PMU_CAPS
> > failed to write feature HYBRID_TOPOLOGY
> > [ perf record: Captured and wrote 23.952 MB text.data ]
> >
> > Could give me some help can three cluter work together I think perf c2c
> > should also work
> > If I have left out any information you may need, please let me know!
> >
> > Many Thanks
> > Best Regards
> > Zack.
> >
> >
> > James Clark <james.clark@arm.com> 於 2023年6月9日 週五 下午5:06寫道:
> >
> >>
> >>
> >> On 09/06/2023 08:14, 蔡沅信 wrote:
> >>> hi linux-perf-users
> >>>
> >>> I glad perf tool can support ARM-SPE.
> >>> however, I have encountered some problems in using
> >>> My perf tool version is 6.1. I compiled it to run on an Android system
> >>> "Here are my questions"
> >>>
> >>> 1.I can't use the perf c2c function to analyze, i'm not sure if it
> >>> only supports certain hardware architectures
> >>>
> >>>>> user_shell:/data/local/tmp # ./perf c2c report
> >>>>> Failed setup nodes
> >>>
> >>> 2.I get more information via the command /.perf report --mem-mode but
> >>> I'm confused about the data
> >>>
> >>> Snoop  Locked  Blocked     Local INSTR Latency
> >>> ......         ..........  .       ..................
> >>> N/A            No          N/A           0
> >>> N/A            No          N/A           0
> >>> N/A            No          N/A           0
> >>> N/A            No          N/A           0
> >>> .               ,                .
> >>> .
> >>> .
> >>> I have observed a lot of data about Snoop&Locked&Blocked&Local INSTR
> >>> Latency and their results are always N/A&No&N/A&0 which makes me feel
> >>> like it is not supported. Please give me some information about it and
> >>> introduction.
> >>>
> >>> Many Thanks
> >>> Best Regards
> >>> Zack.
> >>
> >> Hi Zack,
> >>
> >> I wasn't able to reproduce your issue on Ubuntu, it could be something
> >> to do with Android.
> >>
> >> It looks like the error "Failed setup nodes" comes from the
> >> setup_nodes() function. Can you debug it to see exactly which part of it
> >> is failing? I see it's something to do with numa nodes, maybe that
> >> doesn't work on Android or you need to add some numa stuff to the kernel
> >> config. Or the function needs to handle missing data differently.
> >>
> >> James
> >>
> >

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
  2023-06-13 13:23       ` 蔡沅信
@ 2023-06-14  1:21         ` Leo Yan
       [not found]           ` <CALDTKqgz6=WFs=bMvnFkKv5kt5OP5wtUqQ2uekVbumCxNqeRXw@mail.gmail.com>
  2023-07-03  8:18           ` Suzuki K Poulose
  0 siblings, 2 replies; 16+ messages in thread
From: Leo Yan @ 2023-06-14  1:21 UTC (permalink / raw)
  To: 蔡沅信, Mark Rutland, Suzuki Kuruppassery Poulose
  Cc: James Clark, linux-perf-users

Hi,

On Tue, Jun 13, 2023 at 09:23:33PM +0800, 蔡沅信 wrote:
> OK
> I have a new discovery that c2c seems to support only certain Arm
> Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it?

Based on Cortex-X4 TRM [1], we can see Cortex-X4 has the same SPE data
source packet format with Neoverse CPUs.  In theory, we can add
Cortex-X4's MIDR into the neoverse_spe[] array in
tools/perf/util/arm-spe.c to support Cortex-X4.

Linux master branch misses the definition for Cortex-X4's MIDR [2],
Mark.R / Suzuki / James, could you confirm if Arm has plan or already
has patches for enabling Cortex-X4's MIDR?

Come back to your current issue, as James said, seems the issue is
related with NUMA (or CPU topology) which is missed in your kernel
config, it's very unlikely related with CPU type, even Cortex-X4 is not
supported, perf should still work for SPE packets except data source
packet.  But your shared log is not related with decoding, anyway, you
can try below change to rule out if the issue is related with CPU type:

diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 7b36ba6b4079..3c3a3846f253 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -527,6 +527,7 @@ static u64 arm_spe__synth_data_source(const struct arm_spe_record *record, u64 m
        else
                return 0;

+       is_neoverse = 1;
        if (is_neoverse)
                arm_spe__synth_data_source_neoverse(record, &data_src);
        else


[1] https://developer.arm.com/documentation/102484/0001/Statistical-Profiling-Extension-support/Statistical-Profiling-Extension-data-source-packet
[2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/include/asm/cputype.h

> Using the Arm Statistical Profiling Extension to detect false
> cache-line sharing | Blog | Linaro
> 
> Thanks
> Zack

^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
       [not found]           ` <CALDTKqgz6=WFs=bMvnFkKv5kt5OP5wtUqQ2uekVbumCxNqeRXw@mail.gmail.com>
@ 2023-06-14  6:08             ` 蔡沅信
  2023-06-18  9:28               ` Leo Yan
  0 siblings, 1 reply; 16+ messages in thread
From: 蔡沅信 @ 2023-06-14  6:08 UTC (permalink / raw)
  To: Leo Yan, James Clark, linux-perf-users, Mark Rutland,
	Suzuki Kuruppassery Poulose

"Fix mail to text modeFix mail to text mode"

Hi,
How do I add NUME nodes (or CPU topology) to the kernel config?
After I modified arm-spe.c. Snoop is working but Locked&Blocked&Local
INSTR Latency their results are always No&N/A&0


I merged 3 Cluter into one and have been able to record the whole system.
user_shell:/sys/bus/event_source/devices/arm_spe_0 # cat cpumask
0-7



On the c2c side:
user_shel:/data/local/tmp # ./perf c2c report -vvv
coalesce sort   fields: offset,iaddr
coalesce resort fields: offset,tot_peer
coalesce output fields:
cl_num_empty,percent_rmt_peer,percent_lcl_peer,percent_stores_l1hit,percent_stores_l1miss,percent_stores_na,offset,offset_node,dcacheline_count,iaddr,mean_rmt_peer,mean_lcl_peer,mean_load,tot_recs,cpucnt,symbol,dso,cl_srcline,node
Failed setup nodes


On the other hand, the perf I use is statically compiled with the
aarch64 cross-compiler. I can't open all the features
Auto-detecting system features:
...                                   dwarf: [ OFF ]
...                      dwarf_getlocations: [ OFF ]
...                                   glibc: [ on  ]
...                                  libbfd: [ OFF ]
...                          libbfd-buildid: [ OFF ]
...                                  libcap: [ OFF ]
...                                  libelf: [ OFF ]
...                                 libnuma: [ OFF ]
...                  numa_num_possible_cpus: [ OFF ]
...                                 libperl: [ OFF ]
...                               libpython: [ OFF ]
...                               libcrypto: [ OFF ]
...                               libunwind: [ OFF ]
...                      libdw-dwarf-unwind: [ OFF ]
...                                    zlib: [ OFF ]
...                                    lzma: [ OFF ]
...                               get_cpuid: [ OFF ]
...                                     bpf: [ on  ]
...                                  libaio: [ on  ]
...                                 libzstd: [ OFF ]

Does it affect the results?

Many Thanks
Best Regards
Zack.


蔡沅信 <fissure2010@gmail.com> 於 2023年6月14日 週三 下午12:06寫道:
>
> Hi,
> How do I add NUME nodes (or CPU topology) to the kernel config?
> After I modified arm-spe.c. Snoop is working but Locked&Blocked&Local INSTR Latency their results are always No&N/A&0
>
>
> I merged 3 Cluter into one and have been able to record the whole system.
> user_shell:/sys/bus/event_source/devices/arm_spe_0 # cat cpumask
> 0-7
>
>
>
> On the c2c side:
> user_shel:/data/local/tmp # ./perf c2c report -vvv
> coalesce sort   fields: offset,iaddr
> coalesce resort fields: offset,tot_peer
> coalesce output fields: cl_num_empty,percent_rmt_peer,percent_lcl_peer,percent_stores_l1hit,percent_stores_l1miss,percent_stores_na,offset,offset_node,dcacheline_count,iaddr,mean_rmt_peer,mean_lcl_peer,mean_load,tot_recs,cpucnt,symbol,dso,cl_srcline,node
> Failed setup nodes
>
>
> On the other hand, the perf I use is statically compiled with the aarch64 cross-compiler. I can't open all the features
> Auto-detecting system features:
> ...                                   dwarf: [ OFF ]
> ...                      dwarf_getlocations: [ OFF ]
> ...                                   glibc: [ on  ]
> ...                                  libbfd: [ OFF ]
> ...                          libbfd-buildid: [ OFF ]
> ...                                  libcap: [ OFF ]
> ...                                  libelf: [ OFF ]
> ...                                 libnuma: [ OFF ]
> ...                  numa_num_possible_cpus: [ OFF ]
> ...                                 libperl: [ OFF ]
> ...                               libpython: [ OFF ]
> ...                               libcrypto: [ OFF ]
> ...                               libunwind: [ OFF ]
> ...                      libdw-dwarf-unwind: [ OFF ]
> ...                                    zlib: [ OFF ]
> ...                                    lzma: [ OFF ]
> ...                               get_cpuid: [ OFF ]
> ...                                     bpf: [ on  ]
> ...                                  libaio: [ on  ]
> ...                                 libzstd: [ OFF ]
>
> Does it affect the results?
>
> Many Thanks
> Best Regards
> Zack.
>
>
> Leo Yan <leo.yan@linaro.org> 於 2023年6月14日 週三 上午9:21寫道:
>>
>> Hi,
>>
>> On Tue, Jun 13, 2023 at 09:23:33PM +0800, 蔡沅信 wrote:
>> > OK
>> > I have a new discovery that c2c seems to support only certain Arm
>> > Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it?
>>
>> Based on Cortex-X4 TRM [1], we can see Cortex-X4 has the same SPE data
>> source packet format with Neoverse CPUs.  In theory, we can add
>> Cortex-X4's MIDR into the neoverse_spe[] array in
>> tools/perf/util/arm-spe.c to support Cortex-X4.
>>
>> Linux master branch misses the definition for Cortex-X4's MIDR [2],
>> Mark.R / Suzuki / James, could you confirm if Arm has plan or already
>> has patches for enabling Cortex-X4's MIDR?
>>
>> Come back to your current issue, as James said, seems the issue is
>> related with NUMA (or CPU topology) which is missed in your kernel
>> config, it's very unlikely related with CPU type, even Cortex-X4 is not
>> supported, perf should still work for SPE packets except data source
>> packet.  But your shared log is not related with decoding, anyway, you
>> can try below change to rule out if the issue is related with CPU type:
>>
>> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
>> index 7b36ba6b4079..3c3a3846f253 100644
>> --- a/tools/perf/util/arm-spe.c
>> +++ b/tools/perf/util/arm-spe.c
>> @@ -527,6 +527,7 @@ static u64 arm_spe__synth_data_source(const struct arm_spe_record *record, u64 m
>>         else
>>                 return 0;
>>
>> +       is_neoverse = 1;
>>         if (is_neoverse)
>>                 arm_spe__synth_data_source_neoverse(record, &data_src);
>>         else
>>
>>
>> [1] https://developer.arm.com/documentation/102484/0001/Statistical-Profiling-Extension-support/Statistical-Profiling-Extension-data-source-packet
>> [2] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm64/include/asm/cputype.h
>>
>> > Using the Arm Statistical Profiling Extension to detect false
>> > cache-line sharing | Blog | Linaro
>> >
>> > Thanks
>> > Zack

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
  2023-06-14  6:08             ` 蔡沅信
@ 2023-06-18  9:28               ` Leo Yan
  2023-07-01  5:25                 ` 蔡沅信
  0 siblings, 1 reply; 16+ messages in thread
From: Leo Yan @ 2023-06-18  9:28 UTC (permalink / raw)
  To: 蔡沅信
  Cc: James Clark, linux-perf-users, Mark Rutland, Suzuki Kuruppassery Poulose

On Wed, Jun 14, 2023 at 02:08:36PM +0800, 蔡沅信 wrote:
> "Fix mail to text modeFix mail to text mode"
> 
> Hi,
> How do I add NUME nodes (or CPU topology) to the kernel config?

Below configurations are enabled in my testing kernel:

root@leoy-huangpu:/home/leoy# zcat /proc/config.gz | grep NUMA
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING=y
CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
CONFIG_NUMA=y
CONFIG_ACPI_NUMA=y
CONFIG_NUMA_KEEP_MEMINFO=y
CONFIG_USE_PERCPU_NUMA_NODE_ID=y
CONFIG_GENERIC_ARCH_NUMA=y
CONFIG_OF_NUMA=y
CONFIG_DMA_PERNUMA_CMA=y

> After I modified arm-spe.c. Snoop is working but Locked&Blocked&Local
> INSTR Latency their results are always No&N/A&0

I cannot understand this ... What's you have modified for arm-spe.c?

If you don't share more complete perf log then it would be difficult to
understand and locate issue.

> I merged 3 Cluter into one and have been able to record the whole system.
> user_shell:/sys/bus/event_source/devices/arm_spe_0 # cat cpumask
> 0-7

This seems to me fine to me.

> On the c2c side:
> user_shel:/data/local/tmp # ./perf c2c report -vvv
> coalesce sort   fields: offset,iaddr
> coalesce resort fields: offset,tot_peer
> coalesce output fields:
> cl_num_empty,percent_rmt_peer,percent_lcl_peer,percent_stores_l1hit,percent_stores_l1miss,percent_stores_na,offset,offset_node,dcacheline_count,iaddr,mean_rmt_peer,mean_lcl_peer,mean_load,tot_recs,cpucnt,symbol,dso,cl_srcline,node
> Failed setup nodes

Before diving into "perf c2c" tool, please use "perf script" tool to
decode the perf data file and check if you have captured any SPE trace
data, and it's good to dump header info.  This would be useful to
analyze the issie.

$ ./perf script --header -I

> On the other hand, the perf I use is statically compiled with the
> aarch64 cross-compiler. I can't open all the features

This would be fine.  At my side, I built perf statically on x86_64
machine with the command:

$ make LDFLAGS=-static NO_LIBELF=1 NO_JVMTI=1  VF=1 DEBUG=1 NO_LIBTRACEEVENT=1

And then I copied the perf binary on my Arm64 machine, it works pretty
well for Arm SPE.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
  2023-06-18  9:28               ` Leo Yan
@ 2023-07-01  5:25                 ` 蔡沅信
  0 siblings, 0 replies; 16+ messages in thread
From: 蔡沅信 @ 2023-07-01  5:25 UTC (permalink / raw)
  To: Leo Yan, James Clark, linux-perf-users, Mark Rutland,
	Suzuki Kuruppassery Poulose

Hi Leo Yan

On our platform, I cannot enable too many NUMA configurations as it
would lead to build failures.
could this be the reason for c2c failure? If so, I will try to
identify which side is experiencing issues.

This is my platform config status. ccould support c2c?
user_shell:/data/local/tmp # zcat /proc/config.gz | grep NUMA
CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
# CONFIG_NUMA is not set
CONFIG_DMA_PERNUMA_CMA=y

> ./perf script --header -I
I think that's a key message.
# ======
# missing features: TRACING_DATA CPUDESC NUMA_TOPOLOGY BRANCH_STACK
GROUP_DESC STAT MEM_TOPOLOGY CLOCKID DIR_FORMAT (null) (null)
COMPRESSED CPU_PMU_CAPS CLOCK_DATA HYBRID_TOPOLOGY
# ========
#
Only instruction-based sampling period is currently supported by Arm SPE.

There are also many events related to L1d, TLB, and memory that are
displayed. If you require this information, I can provide it

         perf6.1  4725 [000]   129.779296:          1
                                                   l1d-access:
ffffffe012888f64 [unknown] ([unknown])

Many Thanks
Best Regards
Zack.
Leo Yan <leo.yan@linaro.org> 於 2023年6月18日 週日 下午5:28寫道:

<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
乾淨無病毒。www.avg.com
<http://www.avg.com/email-signature?utm_medium=email&utm_source=link&utm_campaign=sig-email&utm_content=webmail>
<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>

>
> On Wed, Jun 14, 2023 at 02:08:36PM +0800, 蔡沅信 wrote:
> > "Fix mail to text modeFix mail to text mode"
> >
> > Hi,
> > How do I add NUME nodes (or CPU topology) to the kernel config?
>
> Below configurations are enabled in my testing kernel:
>
> root@leoy-huangpu:/home/leoy# zcat /proc/config.gz | grep NUMA
> CONFIG_ARCH_SUPPORTS_NUMA_BALANCING=y
> CONFIG_NUMA_BALANCING=y
> CONFIG_NUMA_BALANCING_DEFAULT_ENABLED=y
> CONFIG_NUMA=y
> CONFIG_ACPI_NUMA=y
> CONFIG_NUMA_KEEP_MEMINFO=y
> CONFIG_USE_PERCPU_NUMA_NODE_ID=y
> CONFIG_GENERIC_ARCH_NUMA=y
> CONFIG_OF_NUMA=y
> CONFIG_DMA_PERNUMA_CMA=y
>
> > After I modified arm-spe.c. Snoop is working but Locked&Blocked&Local
> > INSTR Latency their results are always No&N/A&0
>
> I cannot understand this ... What's you have modified for arm-spe.c?
>
> If you don't share more complete perf log then it would be difficult to
> understand and locate issue.
>
> > I merged 3 Cluter into one and have been able to record the whole system.
> > user_shell:/sys/bus/event_source/devices/arm_spe_0 # cat cpumask
> > 0-7
>
> This seems to me fine to me.
>
> > On the c2c side:
> > user_shel:/data/local/tmp # ./perf c2c report -vvv
> > coalesce sort   fields: offset,iaddr
> > coalesce resort fields: offset,tot_peer
> > coalesce output fields:
> > cl_num_empty,percent_rmt_peer,percent_lcl_peer,percent_stores_l1hit,percent_stores_l1miss,percent_stores_na,offset,offset_node,dcacheline_count,iaddr,mean_rmt_peer,mean_lcl_peer,mean_load,tot_recs,cpucnt,symbol,dso,cl_srcline,node
> > Failed setup nodes
>
> Before diving into "perf c2c" tool, please use "perf script" tool to
> decode the perf data file and check if you have captured any SPE trace
> data, and it's good to dump header info.  This would be useful to
> analyze the issie.
>
> $ ./perf script --header -I
>
> > On the other hand, the perf I use is statically compiled with the
> > aarch64 cross-compiler. I can't open all the features
>
> This would be fine.  At my side, I built perf statically on x86_64
> machine with the command:
>
> $ make LDFLAGS=-static NO_LIBELF=1 NO_JVMTI=1  VF=1 DEBUG=1 NO_LIBTRACEEVENT=1
>
> And then I copied the perf binary on my Arm64 machine, it works pretty
> well for Arm SPE.
>
> Thanks,
> Leo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
  2023-06-14  1:21         ` Leo Yan
       [not found]           ` <CALDTKqgz6=WFs=bMvnFkKv5kt5OP5wtUqQ2uekVbumCxNqeRXw@mail.gmail.com>
@ 2023-07-03  8:18           ` Suzuki K Poulose
  2023-07-03  8:24             ` James Clark
  1 sibling, 1 reply; 16+ messages in thread
From: Suzuki K Poulose @ 2023-07-03  8:18 UTC (permalink / raw)
  To: Leo Yan, 蔡沅信, Mark Rutland
  Cc: James Clark, linux-perf-users

Hi Leo
On 14/06/2023 02:21, Leo Yan wrote:
> Hi,
> 
> On Tue, Jun 13, 2023 at 09:23:33PM +0800, 蔡沅信 wrote:
>> OK
>> I have a new discovery that c2c seems to support only certain Arm
>> Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it?
> 
> Based on Cortex-X4 TRM [1], we can see Cortex-X4 has the same SPE data
> source packet format with Neoverse CPUs.  In theory, we can add
> Cortex-X4's MIDR into the neoverse_spe[] array in
> tools/perf/util/arm-spe.c to support Cortex-X4.
> 
> Linux master branch misses the definition for Cortex-X4's MIDR [2],
> Mark.R / Suzuki / James, could you confirm if Arm has plan or already
> has patches for enabling Cortex-X4's MIDR?
> 

Is there a particular reason why need this in the kernel ? Usual
policy is, kernel needs to distinguish an MIDR, only if it ever
needs to. e.g, a CPU erratum specific to the MIDR.

Suzuki


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
  2023-07-03  8:18           ` Suzuki K Poulose
@ 2023-07-03  8:24             ` James Clark
  2023-07-03  9:39               ` Leo Yan
  0 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2023-07-03  8:24 UTC (permalink / raw)
  To: Suzuki K Poulose, Leo Yan, 蔡沅信, Mark Rutland
  Cc: linux-perf-users



On 03/07/2023 09:18, Suzuki K Poulose wrote:
> Hi Leo
> On 14/06/2023 02:21, Leo Yan wrote:
>> Hi,
>>
>> On Tue, Jun 13, 2023 at 09:23:33PM +0800, 蔡沅信 wrote:
>>> OK
>>> I have a new discovery that c2c seems to support only certain Arm
>>> Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it?
>>
>> Based on Cortex-X4 TRM [1], we can see Cortex-X4 has the same SPE data
>> source packet format with Neoverse CPUs.  In theory, we can add
>> Cortex-X4's MIDR into the neoverse_spe[] array in
>> tools/perf/util/arm-spe.c to support Cortex-X4.
>>
>> Linux master branch misses the definition for Cortex-X4's MIDR [2],
>> Mark.R / Suzuki / James, could you confirm if Arm has plan or already
>> has patches for enabling Cortex-X4's MIDR?
>>
> 
> Is there a particular reason why need this in the kernel ? Usual
> policy is, kernel needs to distinguish an MIDR, only if it ever
> needs to. e.g, a CPU erratum specific to the MIDR.
> 
> Suzuki>

This is on the tool side rather than in the kernel. But yes, if the data
source encoding is the same as the existing ones please send a patch
adding X4's MIDR to the list.

Thanks
James

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
  2023-07-03  8:24             ` James Clark
@ 2023-07-03  9:39               ` Leo Yan
  2023-07-03  9:42                 ` James Clark
  0 siblings, 1 reply; 16+ messages in thread
From: Leo Yan @ 2023-07-03  9:39 UTC (permalink / raw)
  To: James Clark
  Cc: Suzuki K Poulose, 蔡沅信,
	Mark Rutland, linux-perf-users

Hi Suzuki, James,

On Mon, Jul 03, 2023 at 09:24:27AM +0100, James Clark wrote:

[...]

> >>> I have a new discovery that c2c seems to support only certain Arm
> >>> Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it?
> >>
> >> Based on Cortex-X4 TRM [1], we can see Cortex-X4 has the same SPE data
> >> source packet format with Neoverse CPUs.  In theory, we can add
> >> Cortex-X4's MIDR into the neoverse_spe[] array in
> >> tools/perf/util/arm-spe.c to support Cortex-X4.
> >>
> >> Linux master branch misses the definition for Cortex-X4's MIDR [2],
> >> Mark.R / Suzuki / James, could you confirm if Arm has plan or already
> >> has patches for enabling Cortex-X4's MIDR?
> > 
> > Is there a particular reason why need this in the kernel ? Usual
> > policy is, kernel needs to distinguish an MIDR, only if it ever
> > needs to. e.g, a CPU erratum specific to the MIDR.

Just clarify a bit.  Since Arm SPE data source packet is
implementation dependent, which means different CPU types can use
different data source formats, or some CPU variants don't support data
source packet at all; for this reason, perf tool checks CPU MIDR to
decide if data source format can be supported or not.

Now perf tool directly includes kernel header
arch/arm64/include/asm/cputype.h for CPU MIDR definitions, which is
why I am asking if any one is working on adding MIDR for Cortex-X4.

> This is on the tool side rather than in the kernel. But yes, if the data
> source encoding is the same as the existing ones please send a patch
> adding X4's MIDR to the list.

I can do this; as elaborated above, I think we need two patches, one is
a kernel patch for adding MIDR and another patch is for perf tool to
consume the MIDR of X4.  I would like to know if this is expected or
not from your side.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
  2023-07-03  9:39               ` Leo Yan
@ 2023-07-03  9:42                 ` James Clark
  2023-07-03 10:20                   ` Leo Yan
  0 siblings, 1 reply; 16+ messages in thread
From: James Clark @ 2023-07-03  9:42 UTC (permalink / raw)
  To: Leo Yan
  Cc: Suzuki K Poulose, 蔡沅信,
	Mark Rutland, linux-perf-users



On 03/07/2023 10:39, Leo Yan wrote:
> Hi Suzuki, James,
> 
> On Mon, Jul 03, 2023 at 09:24:27AM +0100, James Clark wrote:
> 
> [...]
> 
>>>>> I have a new discovery that c2c seems to support only certain Arm
>>>>> Neoverse (N1/N2/V1) CPUs, I wonder if cortex-X4 could support it?
>>>>
>>>> Based on Cortex-X4 TRM [1], we can see Cortex-X4 has the same SPE data
>>>> source packet format with Neoverse CPUs.� In theory, we can add
>>>> Cortex-X4's MIDR into the neoverse_spe[] array in
>>>> tools/perf/util/arm-spe.c to support Cortex-X4.
>>>>
>>>> Linux master branch misses the definition for Cortex-X4's MIDR [2],
>>>> Mark.R / Suzuki / James, could you confirm if Arm has plan or already
>>>> has patches for enabling Cortex-X4's MIDR?
>>>
>>> Is there a particular reason why need this in the kernel ? Usual
>>> policy is, kernel needs to distinguish an MIDR, only if it ever
>>> needs to. e.g, a CPU erratum specific to the MIDR.
> 
> Just clarify a bit.  Since Arm SPE data source packet is
> implementation dependent, which means different CPU types can use
> different data source formats, or some CPU variants don't support data
> source packet at all; for this reason, perf tool checks CPU MIDR to
> decide if data source format can be supported or not.
> 
> Now perf tool directly includes kernel header
> arch/arm64/include/asm/cputype.h for CPU MIDR definitions, which is
> why I am asking if any one is working on adding MIDR for Cortex-X4.
> 
>> This is on the tool side rather than in the kernel. But yes, if the data
>> source encoding is the same as the existing ones please send a patch
>> adding X4's MIDR to the list.
> 
> I can do this; as elaborated above, I think we need two patches, one is
> a kernel patch for adding MIDR and another patch is for perf tool to
> consume the MIDR of X4.  I would like to know if this is expected or
> not from your side.
> 

Personally I would just add it on the tool side only. If it's not really
needed in the kernel then it doesn't make sense to add it.

But I suppose from a consistency point of view we could add it in both
places. I'm not too fussed either way.

> Thanks,
> Leo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
  2023-07-03  9:42                 ` James Clark
@ 2023-07-03 10:20                   ` Leo Yan
  2023-07-03 13:28                     ` Suzuki K Poulose
  0 siblings, 1 reply; 16+ messages in thread
From: Leo Yan @ 2023-07-03 10:20 UTC (permalink / raw)
  To: James Clark
  Cc: Suzuki K Poulose, 蔡沅信,
	Mark Rutland, linux-perf-users

On Mon, Jul 03, 2023 at 10:42:21AM +0100, James Clark wrote:

[...]

> >> This is on the tool side rather than in the kernel. But yes, if the data
> >> source encoding is the same as the existing ones please send a patch
> >> adding X4's MIDR to the list.
> > 
> > I can do this; as elaborated above, I think we need two patches, one is
> > a kernel patch for adding MIDR and another patch is for perf tool to
> > consume the MIDR of X4.  I would like to know if this is expected or
> > not from your side.
> > 
> 
> Personally I would just add it on the tool side only. If it's not really
> needed in the kernel then it doesn't make sense to add it.

Makes sense to me.  I will try to work out a patch, and it's good to not
break building when kernel adds the same MIDR x4 definition.

Thanks,
Leo

> But I suppose from a consistency point of view we could add it in both
> places. I'm not too fussed either way.
> 
> > Thanks,
> > Leo

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
  2023-07-03 10:20                   ` Leo Yan
@ 2023-07-03 13:28                     ` Suzuki K Poulose
  2023-07-05 17:33                       ` Namhyung Kim
  0 siblings, 1 reply; 16+ messages in thread
From: Suzuki K Poulose @ 2023-07-03 13:28 UTC (permalink / raw)
  To: Leo Yan, James Clark
  Cc: 蔡沅信, Mark Rutland, linux-perf-users

On 03/07/2023 11:20, Leo Yan wrote:
> On Mon, Jul 03, 2023 at 10:42:21AM +0100, James Clark wrote:
> 
> [...]
> 
>>>> This is on the tool side rather than in the kernel. But yes, if the data
>>>> source encoding is the same as the existing ones please send a patch
>>>> adding X4's MIDR to the list.
>>>
>>> I can do this; as elaborated above, I think we need two patches, one is
>>> a kernel patch for adding MIDR and another patch is for perf tool to
>>> consume the MIDR of X4.  I would like to know if this is expected or
>>> not from your side.
>>>
>>
>> Personally I would just add it on the tool side only. If it's not really
>> needed in the kernel then it doesn't make sense to add it.
> 
> Makes sense to me.  I will try to work out a patch, and it's good to not
> break building when kernel adds the same MIDR x4 definition.

Do we somehow sync (copy) the cputype.h for perf tool ? Or do we keep
them in sync with a patch ?

If it is the former, I wouldn't bother about updating the kernel
headers.

Also, how does the perf tool read the midr ? It is safer to read :

/sys/devices/system/cpu/cpu<N>/regs/identification/midr_el1

than relying mrs emulation, which could get you migrated to another CPU
and get a completely different MIDR_EL1 of a little CPU.

Suzuki



> 
> Thanks,
> Leo
> 
>> But I suppose from a consistency point of view we could add it in both
>> places. I'm not too fussed either way.
>>
>>> Thanks,
>>> Leo


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
  2023-07-03 13:28                     ` Suzuki K Poulose
@ 2023-07-05 17:33                       ` Namhyung Kim
  2023-07-17  5:57                         ` Leo Yan
  0 siblings, 1 reply; 16+ messages in thread
From: Namhyung Kim @ 2023-07-05 17:33 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Leo Yan, James Clark, 蔡沅信,
	Mark Rutland, linux-perf-users, Arnaldo Carvalho de Melo

Hello,

On Mon, Jul 3, 2023 at 6:33 AM Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>
> On 03/07/2023 11:20, Leo Yan wrote:
> > On Mon, Jul 03, 2023 at 10:42:21AM +0100, James Clark wrote:
> >
> > [...]
> >
> >>>> This is on the tool side rather than in the kernel. But yes, if the data
> >>>> source encoding is the same as the existing ones please send a patch
> >>>> adding X4's MIDR to the list.
> >>>
> >>> I can do this; as elaborated above, I think we need two patches, one is
> >>> a kernel patch for adding MIDR and another patch is for perf tool to
> >>> consume the MIDR of X4.  I would like to know if this is expected or
> >>> not from your side.
> >>>
> >>
> >> Personally I would just add it on the tool side only. If it's not really
> >> needed in the kernel then it doesn't make sense to add it.
> >
> > Makes sense to me.  I will try to work out a patch, and it's good to not
> > break building when kernel adds the same MIDR x4 definition.
>
> Do we somehow sync (copy) the cputype.h for perf tool ? Or do we keep
> them in sync with a patch ?
>
> If it is the former, I wouldn't bother about updating the kernel
> headers.

In general, perf tool has a copy of kernel headers and there's a script
called check-headers.sh to verify they are in sync.  We don't
recommend kernel patches to touch the tool's copy.  And it's done
by tool devs separately.

Thanks,
Namhyung


>
> Also, how does the perf tool read the midr ? It is safer to read :
>
> /sys/devices/system/cpu/cpu<N>/regs/identification/midr_el1
>
> than relying mrs emulation, which could get you migrated to another CPU
> and get a completely different MIDR_EL1 of a little CPU.
>
> Suzuki
>
>
>
> >
> > Thanks,
> > Leo
> >
> >> But I suppose from a consistency point of view we could add it in both
> >> places. I'm not too fussed either way.
> >>
> >>> Thanks,
> >>> Leo
>

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
  2023-07-05 17:33                       ` Namhyung Kim
@ 2023-07-17  5:57                         ` Leo Yan
  2023-07-17  9:05                           ` Suzuki K Poulose
  0 siblings, 1 reply; 16+ messages in thread
From: Leo Yan @ 2023-07-17  5:57 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Suzuki K Poulose, James Clark, 蔡沅信,
	Mark Rutland, linux-perf-users, Arnaldo Carvalho de Melo

Hi Suzuki,

On Wed, Jul 05, 2023 at 10:33:18AM -0700, Namhyung Kim wrote:
> On Mon, Jul 3, 2023 at 6:33 AM Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> > On 03/07/2023 11:20, Leo Yan wrote:
> > > On Mon, Jul 03, 2023 at 10:42:21AM +0100, James Clark wrote:

[...]

> > Do we somehow sync (copy) the cputype.h for perf tool ? Or do we keep
> > them in sync with a patch ?
> >
> > If it is the former, I wouldn't bother about updating the kernel
> > headers.
> 
> In general, perf tool has a copy of kernel headers and there's a script
> called check-headers.sh to verify they are in sync.  We don't
> recommend kernel patches to touch the tool's copy.  And it's done
> by tool devs separately.

We have the kernel header arch/arm64/include/asm/cputype.h, tools have a
copy which is placed in tools/arch/arm64/include/asm/cputype.h.

As Namhyung explained, usually the kernel header is firstly changed and
then the tool developers send a separate patch to sync with kernel
header.  You could see a recent example by Arnaldo's patch [1].

By following this working model, I sent patch series to add Cortex-X4
CPU part and MIDR definitions.  I personally think this is the best
way for us to keep the alignment between the kernel header and tools
header [2].  Please let me know if this doable or not?

Sorry for late response due to vacation :)

Thanks,
Leo

[1] https://lore.kernel.org/lkml/ZLFQ%2Ftu%2FATQwDEIW@kernel.org/
[2] https://lore.kernel.org/lkml/20230717054327.79815-1-leo.yan@linaro.org/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: Some questions about using the perf tool in ARM-SPE
  2023-07-17  5:57                         ` Leo Yan
@ 2023-07-17  9:05                           ` Suzuki K Poulose
  0 siblings, 0 replies; 16+ messages in thread
From: Suzuki K Poulose @ 2023-07-17  9:05 UTC (permalink / raw)
  To: Leo Yan, Namhyung Kim
  Cc: James Clark, 蔡沅信,
	Mark Rutland, linux-perf-users, Arnaldo Carvalho de Melo

On 17/07/2023 06:57, Leo Yan wrote:
> Hi Suzuki,
> 
> On Wed, Jul 05, 2023 at 10:33:18AM -0700, Namhyung Kim wrote:
>> On Mon, Jul 3, 2023 at 6:33 AM Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>>> On 03/07/2023 11:20, Leo Yan wrote:
>>>> On Mon, Jul 03, 2023 at 10:42:21AM +0100, James Clark wrote:
> 
> [...]
> 
>>> Do we somehow sync (copy) the cputype.h for perf tool ? Or do we keep
>>> them in sync with a patch ?
>>>
>>> If it is the former, I wouldn't bother about updating the kernel
>>> headers.
>>
>> In general, perf tool has a copy of kernel headers and there's a script
>> called check-headers.sh to verify they are in sync.  We don't
>> recommend kernel patches to touch the tool's copy.  And it's done
>> by tool devs separately.
> 
> We have the kernel header arch/arm64/include/asm/cputype.h, tools have a
> copy which is placed in tools/arch/arm64/include/asm/cputype.h.
> 
> As Namhyung explained, usually the kernel header is firstly changed and
> then the tool developers send a separate patch to sync with kernel
> header.  You could see a recent example by Arnaldo's patch [1].
> 
> By following this working model, I sent patch series to add Cortex-X4
> CPU part and MIDR definitions.  I personally think this is the best
> way for us to keep the alignment between the kernel header and tools
> header [2].  Please let me know if this doable or not?
Sure, if the tool relies on syncing the kernel headers, thats fine.

> 
> Sorry for late response due to vacation :)

No worries, I will dig the series from the mailing list and
respond there. Thanks for sending this out.

Suzuki

> 
> Thanks,
> Leo
> 
> [1] https://lore.kernel.org/lkml/ZLFQ%2Ftu%2FATQwDEIW@kernel.org/
> [2] https://lore.kernel.org/lkml/20230717054327.79815-1-leo.yan@linaro.org/


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2023-07-17  9:05 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-09  7:14 Some questions about using the perf tool in ARM-SPE 蔡沅信
2023-06-09  9:06 ` James Clark
     [not found]   ` <CALDTKqg01+xJ2xu218c_QH2PbX9wdhYOiJfDieCXL5PHWV-6FQ@mail.gmail.com>
     [not found]     ` <77773641-26e5-a754-63cf-e7d3443e11fc@arm.com>
2023-06-13 13:23       ` 蔡沅信
2023-06-14  1:21         ` Leo Yan
     [not found]           ` <CALDTKqgz6=WFs=bMvnFkKv5kt5OP5wtUqQ2uekVbumCxNqeRXw@mail.gmail.com>
2023-06-14  6:08             ` 蔡沅信
2023-06-18  9:28               ` Leo Yan
2023-07-01  5:25                 ` 蔡沅信
2023-07-03  8:18           ` Suzuki K Poulose
2023-07-03  8:24             ` James Clark
2023-07-03  9:39               ` Leo Yan
2023-07-03  9:42                 ` James Clark
2023-07-03 10:20                   ` Leo Yan
2023-07-03 13:28                     ` Suzuki K Poulose
2023-07-05 17:33                       ` Namhyung Kim
2023-07-17  5:57                         ` Leo Yan
2023-07-17  9:05                           ` Suzuki K Poulose

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).