All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: GSoC: perf Linux Profiling Scalability and speed
       [not found]   ` <CANRdyn9H7kzBXUxpvTJ_93G5Tc=51vCRs9ucHbSyVRO_rQf5vA@mail.gmail.com>
@ 2023-03-03  0:29     ` Ian Rogers
       [not found]       ` <CANRdyn9MU1N5xH8MfMbpmxPhRTCqi9J_NWsVOPKXodE97H97rQ@mail.gmail.com>
  0 siblings, 1 reply; 4+ messages in thread
From: Ian Rogers @ 2023-03-03  0:29 UTC (permalink / raw)
  To: Lukas Molleman, Arnaldo Carvalho de Melo
  Cc: linux-perf-users, Riccardo Mancini

On Wed, Mar 1, 2023 at 5:31 AM Lukas Molleman <lukas.molleman@gmail.com> wrote:
>
> Op di 21 feb 2023 om 17:56 schreef Ian Rogers <irogers@google.com>:
>>
>> +Arnaldo Carvalho de Melo
>>
>> On Mon, Feb 20, 2023 at 5:15 AM Lukas Molleman <lukas.molleman@gmail.com> wrote:
>> >
>> > Hi,
>> >
>> >
>> > My name is Lukas Molleman and I'm pursuing a degree in computer science engineering.
>> >
>> > I believe I'm a very good fit for this project. I'm passionate about writing clean code and optimization. I've got a good amount of experience contributing to mathematical and concurrent models at startups and big tech.
>> >
>> > I'd love to have a chat with you to talk more in depth about this project.
>>
>> Hi Lukas! Thank you for your interest! Perhaps we can schedule a quick
>> chat, what time zone are you in? Namhyung and I are on PST (GMT-6) and
>> Arnaldo is GMT-3 iirc. I don't think we all need to be there to talk,
>> but maybe we can answer your questions by email.
>>
>> Good habit for the perf tool, and Linux in general, is to try to avoid
>> personal emails and share with the mailing lists. You can get a good
>> amount of information from the linux-perf-users mailing list [1] which
>> tends to cover users and developers.
>>
>> In making an application it is very important to us that:
>> 1) you have a detailed time plan. This lets us know what your strategy
>> for doing work is, how you will work around other commitments, school,
>> etc.
>> 2) you've done something to touch base with the community on
>> linux-perf-users. Some ideas here are to look at improving the wiki
>> [2] or look into if there are test issues on your machine by building
>> then running the 'perf test' command. This isn't the main project,
>> what we're aiming to do is avoid contributors who show an interest but
>> then never get running. By showing you can contribute in a small way
>> ahead of selecting projects then we have some confidence things will
>> go well during the program.
>>
>> Hopefully this is helpful and let me know if email for the chat is okay. Thanks,
>> Ian
>>
>> [1] https://lore.kernel.org/linux-perf-users/
>> [2] https://perf.wiki.kernel.org/index.php/Main_Page
>>
>> >
>> > Kind regards,
>> >
>> > Lukas Molleman
>> >
>
> Hi Ian,
>
>
> Sorry for the late reply. My timezone is GMT+1. Talking over mail is ok.
>
> I had some ideas to add to the documentation but I'm not sure how to propose changes to the documentation. Does this happen on the website or inside the repo? I requested an account a week ago but haven't gotten an update.

Hi Lukas,

This was a request for the perf wiki? Arnaldo could you see what's happening?

> I've also found several errors when running perf test. Do I research the issue and give info on why it is failing (Could it be normal?) or do I write a patch to try to fix it?

Perhaps if you share what the errors are then we can talk about how to
fix them. There are also tests that currently say skip but don't give
a reason, it'd be nice to improve this.

> I've made some plan of what I'm going to do but for it to be very detailed I need to know what is exactly expected (How we want to do things, what we want to achieve, what performance improvement we want etc.) from me and where we are now.
>
> Now - start date: Research
>
> Get a deeper understanding by reading the book "Parallel and high performance computing".
> Understand necessary modules, libraries and tools.
> Understand the code that I need to know to succeed.
>
> Week 1: Design
>
> Based on my research, I'll make a more detailed plan.
> Consider the different parallel processing models such as task-based parallelism, data parallelism or message passing.
> Determine which parallelism model will best suit the identified areas of the perf tool. (Not sure if I'll decide this or if someone else will make the decisions for me?)
>
> Week 2-6: Refactor and Code
>
> Code. (Not exactly sure. I need more details on where we are now and what we want to achieve and how we want to do it.)

So the perf tool is written in somewhat low-level C code, in fact we
try to adopt the Linux kernel's conventions so that code between the
tool and the kernel can easily be shared. Frameworks for different
kinds of parallelism would need to be added. In the past Riccardo
Mancini looked at adding workqueues;
https://lore.kernel.org/lkml/3c4f8dd64d07373d876990ceb16e469b4029363f.camel@gmail.com/
We'd like to merge this work but it needs rebasing on to the current
perf development tree. One problem encountered by Riccardo was issues
with reference counts. To this end I wrote a reference count checker:
https://lore.kernel.org/lkml/20220211103415.2737789-1-irogers@google.com/
Which has a number of fixes now merged into the tree but not the
actual checking framework itself. A first task may be to work on the
reference count checker and then to bring in Riccardo's work. I
started a rebase on the checker and I should work to send it out again
soon.

Once we have a parallelism framework there are different parts of the
perf tool that would benefit. Being able to open, enable, disable,
close events in parallel would be good for most parts of the tool. A
lot of time in 'perf report' is spent in time ordering samples and
things like the symbols. There is already some support in perf record
for synthesizing events (to create the initial machine state) on
multiple threads, but this should migrate to using workqueues.

I hope this helps. Thanks,
Ian

> Week 7-8: Documentation
>
> Document the changes made to the perf tool to support multi-core processing.
> Provide examples and usage instructions for users who want to take advantage of the new functionality.
>
> Week 9-10: Performance Tuning
>
> Fine-tune the parallelized parts of the perf tool to optimize performance.
> Use profiling tools to identify areas that can be further optimized.
> Test different systems (e.g. server with a large amount of cores).
>
> Week 11-12: Compatibility Testing
>
> Test the perf tool with different hardware configurations and operating systems to ensure compatibility.
> Make any necessary changes to ensure that the tool works seamlessly across different environments.
>
> Week 13-15: Extra
>
> For when things take longer than expected.
>
>
>
> Kind regards,
>
> Lukas Molleman
>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* perf test results on ARM64. was Re: GSoC: perf Linux Profiling Scalability and speed
       [not found]       ` <CANRdyn9MU1N5xH8MfMbpmxPhRTCqi9J_NWsVOPKXodE97H97rQ@mail.gmail.com>
@ 2023-03-06 22:10         ` Arnaldo Carvalho de Melo
  2023-03-07  1:28           ` Leo Yan
  2023-03-07 21:56         ` Ian Rogers
  1 sibling, 1 reply; 4+ messages in thread
From: Arnaldo Carvalho de Melo @ 2023-03-06 22:10 UTC (permalink / raw)
  To: Lukas Molleman; +Cc: Ian Rogers, linux-perf-users, rickyman7

Em Mon, Mar 06, 2023 at 12:18:30PM +0100, Lukas Molleman escreveu:
> > Perhaps if you share what the errors are then we can talk about how to
> > fix them. There are also tests that currently say skip but don't give
> > a reason, it'd be nice to improve this.
 
>  1: vmlinux symtab matches kallsyms            : FAILED!
>  2: Detect openat syscall event                : FAILED!
>  3: Detect openat syscall event on all cpus    : FAILED!
>  4: Read samples using the mmap interface      : FAILED!
>  5: Test data source output                    : Ok
>  6: Parse event definition strings             : FAILED!
>  7: Simple expression parser                   : Ok
>  8: PERF_RECORD_* events & perf_sample fields  : FAILED!
>  9: Parse perf pmu format                      : Ok
> 10: PMU events                                 :
> 10.1: PMU event table sanity                           : Ok
> 10.2: PMU event map aliases                            : Ok
> 10.3: Parsing of PMU event table metrics               : Ok
> 10.4: Parsing of PMU event table metrics with fake PMUs: Ok
> 11: DSO data read                              : Ok
> 12: DSO data cache                             : Ok
> 13: DSO data reopen                            : Ok
> 14: Roundtrip evsel->name                      : Ok
> 15: Parse sched tracepoints fields             : FAILED!
> 16: syscalls:sys_enter_openat event fields     : FAILED!
> 17: Setup struct perf_event_attr               : Skip
> 18: Match and link multiple hists              : Ok
> 19: 'import perf' in python                    : Ok
> 22: Breakpoint accounting                      : Skip
> 23: Watchpoint                                 :
> 23.1: Read Only Watchpoint                     : FAILED!
> 23.2: Write Only Watchpoint                    : FAILED!
> 23.3: Read / Write Watchpoint                  : FAILED!
> 23.4: Modify Watchpoint                        : FAILED!
> 24: Number of exit events of a simple workload : FAILED!
> 25: Software clock events period values        : FAILED!
> 26: Object code reading                        : FAILED!
> 27: Sample parsing                             : Ok
> 28: Use a dummy software event to keep tracking: Skip
> 29: Parse with no sample_id_all bit set        : Ok
> 30: Filter hist entries                        : Ok
> 31: Lookup mmap thread                         : Ok
> 32: Share thread maps                          : Ok
> 33: Sort output of hist entries                : Ok
> 34: Cumulate child hist entries                : Ok
> 35: Track with sched_switch                    : Ok
> 36: Filter fds with revents mask in a fdarray  : Ok
> 37: Add fd to a fdarray, making it autogrow    : Ok
> 38: kmod_path__parse                           : Ok
> 39: Thread map                                 : Ok
> 40: LLVM search and compile                    :
> 40.1: Basic BPF llvm compile                    : Skip
> 40.2: kbuild searching                          : Skip
> 40.3: Compile source for BPF prologue generation: Skip
> 40.4: Compile source for BPF relocation         : Skip
> 41: Session topology                           : Ok
> 42: BPF filter                                 :
> 42.1: Basic BPF filtering                      : Skip
> 42.2: BPF pinning                              : Skip
> 42.3: BPF prologue generation                  : Skip
> 43: Synthesize thread map                      : Ok
> 44: Remove thread map                          : Ok
> 45: Synthesize cpu map                         : Ok
> 46: Synthesize stat config                     : Ok
> 47: Synthesize stat                            : Ok
> 48: Synthesize stat round                      : Ok
> 49: Synthesize attr update                     : Ok
> 50: Event times                                : FAILED!
> 51: Read backward ring buffer                  : Skip
> 52: Print cpu map                              : Ok
> 53: Merge cpu map                              : Ok
> 54: Probe SDT events                           : Skip
> 55: is_printable_array                         : Ok
> 56: Print bitmap                               : Ok
> 57: perf hooks                                 : Ok
> 58: builtin clang support                      : Skip (not compiled in)
> 59: unit_number__scnprintf                     : Ok
> 60: mem2node                                   : Ok
> 61: time utils                                 : Ok
> 62: Test jit_write_elf                         : Ok
> 63: Test libpfm4 support                       : Skip (not compiled in)
> 64: Test api io                                : Ok
> 65: maps__merge_in                             : Ok
> 66: Demangle Java                              : Ok
> 67: Demangle OCaml                             : Ok
> 68: Parse and process metrics                  : Ok
> 69: PE file support                            : Skip
> 70: Event expansion for cgroups                : Ok
> 71: Convert perf time to TSC                   : FAILED!
> 72: dlfilter C API                             : Skip
> 73: DWARF unwind                               : Ok
> failed to open shell test directory: /usr/libexec/perf-core/tests/shell
> 
> I'm using Ubuntu 22.04 kernel 5.15.0-60-generic ARM64.
> perf version 5.15.78

So, here, on a Libre Computer Firefly RK3399PC board and using what will
soon be on the perf-tools branch to go to Linus:

root@roc-rk3399-pc:~# cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.1 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.1 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy
root@roc-rk3399-pc:~# perf -v
perf version 6.2.rc7.g5b201a82cd9d
root@roc-rk3399-pc:~# perf -vv
perf version 6.2.rc7.g5b201a82cd9d
                 dwarf: [ on  ]  # HAVE_DWARF_SUPPORT
    dwarf_getlocations: [ on  ]  # HAVE_DWARF_GETLOCATIONS_SUPPORT
                 glibc: [ on  ]  # HAVE_GLIBC_SUPPORT
         syscall_table: [ on  ]  # HAVE_SYSCALL_TABLE_SUPPORT
                libbfd: [ on  ]  # HAVE_LIBBFD_SUPPORT
            debuginfod: [ OFF ]  # HAVE_DEBUGINFOD_SUPPORT
                libelf: [ on  ]  # HAVE_LIBELF_SUPPORT
               libnuma: [ on  ]  # HAVE_LIBNUMA_SUPPORT
numa_num_possible_cpus: [ on  ]  # HAVE_LIBNUMA_SUPPORT
               libperl: [ on  ]  # HAVE_LIBPERL_SUPPORT
             libpython: [ on  ]  # HAVE_LIBPYTHON_SUPPORT
              libslang: [ on  ]  # HAVE_SLANG_SUPPORT
             libcrypto: [ on  ]  # HAVE_LIBCRYPTO_SUPPORT
             libunwind: [ on  ]  # HAVE_LIBUNWIND_SUPPORT
    libdw-dwarf-unwind: [ on  ]  # HAVE_DWARF_SUPPORT
                  zlib: [ on  ]  # HAVE_ZLIB_SUPPORT
                  lzma: [ on  ]  # HAVE_LZMA_SUPPORT
             get_cpuid: [ on  ]  # HAVE_AUXTRACE_SUPPORT
                   bpf: [ on  ]  # HAVE_LIBBPF_SUPPORT
                   aio: [ on  ]  # HAVE_AIO_SUPPORT
                  zstd: [ on  ]  # HAVE_ZSTD_SUPPORT
               libpfm4: [ OFF ]  # HAVE_LIBPFM
         libtraceevent: [ on  ]  # HAVE_LIBTRACEEVENT
root@roc-rk3399-pc:~#

acme@roc-rk3399-pc:~/git/perf-tools$ sudo su -
[sudo] password for acme: 
root@roc-rk3399-pc:~# set -o vi
root@roc-rk3399-pc:~# export PATH=$PATH:~/bin
root@roc-rk3399-pc:~# perf test
  1: vmlinux symtab matches kallsyms                                 : Ok
  2: Detect openat syscall event                                     : Ok
  3: Detect openat syscall event on all cpus                         : Ok
  4: mmap interface tests                                            :
  4.1: Read samples using the mmap interface                         : Ok
  4.2: User space counter reading of instructions                    : Skip (permissions)
  4.3: User space counter reading of cycles                          : Skip (permissions)
  5: Test data source output                                         : Ok
  6: Parse event definition strings                                  :
  6.1: Test event parsing                                            : Ok
  6.2: Test parsing of "hybrid" CPU events                           : Skip (not hybrid)
  6.3: Parsing of all PMU events from sysfs                          : Skip (permissions)
  6.4: Parsing of given PMU events from sysfs                        : Skip (permissions)
  6.5: Parsing of aliased events from sysfs                          : Skip (no aliases in sysfs)
  6.6: Parsing of aliased events                                     : Ok
  6.7: Parsing of terms (event modifiers)                            : Ok
  7: Simple expression parser                                        : Ok
  8: PERF_RECORD_* events & perf_sample fields                       : Ok
  9: Parse perf pmu format                                           : Ok
 10: PMU events                                                      :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 11: DSO data read                                                   : Ok
 12: DSO data cache                                                  : Ok
 13: DSO data reopen                                                 : Ok
 14: Roundtrip evsel->name                                           : Ok
 15: Parse sched tracepoints fields                                  : Ok
 16: syscalls:sys_enter_openat event fields                          : Ok
 17: Setup struct perf_event_attr                                    : FAILED!
 18: Match and link multiple hists                                   : Ok
 19: 'import perf' in python                                         : Ok
 20: Breakpoint overflow signal handler                              : Skip
 21: Breakpoint overflow sampling                                    : Skip
 22: Breakpoint accounting                                           : Ok
 23: Watchpoint                                                      :
 23.1: Read Only Watchpoint                                          : Ok
 23.2: Write Only Watchpoint                                         : Ok
 23.3: Read / Write Watchpoint                                       : Ok
 23.4: Modify Watchpoint                                             : Ok
 24: Number of exit events of a simple workload                      : FAILED!
 25: Software clock events period values                             : Ok
 26: Object code reading                                             : Ok
 27: Sample parsing                                                  : Ok
 28: Use a dummy software event to keep tracking                     : Ok
 29: Parse with no sample_id_all bit set                             : Ok
 30: Filter hist entries                                             : Ok
 31: Lookup mmap thread                                              : Ok
 32: Share thread maps                                               : Ok
 33: Sort output of hist entries                                     : Ok
 34: Cumulate child hist entries                                     : Ok
 35: Track with sched_switch                                         : Ok
 36: Filter fds with revents mask in a fdarray                       : Ok
 37: Add fd to a fdarray, making it autogrow                         : Ok
 38: kmod_path__parse                                                : Ok
 39: Thread map                                                      : Ok
 40: LLVM search and compile                                         :
 40.1: Basic BPF llvm compile                                        : Ok
 40.2: kbuild searching                                              : FAILED!
 40.3: Compile source for BPF prologue generation                    : FAILED!
 40.4: Compile source for BPF relocation                             : Ok
 41: Session topology                                                : Ok
 42: BPF filter                                                      :
 42.1: Basic BPF filtering                                           : Ok
 42.2: BPF pinning                                                   : Ok
 42.3: BPF prologue generation                                       : FAILED!
 43: Synthesize thread map                                           : Ok
 44: Remove thread map                                               : Ok
 45: Synthesize cpu map                                              : Ok
 46: Synthesize stat config                                          : Ok
 47: Synthesize stat                                                 : Ok
 48: Synthesize stat round                                           : Ok
 49: Synthesize attr update                                          : Ok
 50: Event times                                                     : Ok
 51: Read backward ring buffer                                       : Ok
 52: Print cpu map                                                   : Ok
 53: Merge cpu map                                                   : Ok
 54: Probe SDT events                                                : Ok
 55: is_printable_array                                              : Ok
 56: Print bitmap                                                    : Ok
 57: perf hooks                                                      : Ok
 58: builtin clang support                                           :
 58.1: builtin clang compile C source to IR                          : Skip (not compiled in)
 58.2: builtin clang compile C source to ELF object                  : Skip (not compiled in)
 59: unit_number__scnprintf                                          : Ok
 60: mem2node                                                        : Ok
 61: time utils                                                      : Ok
 62: Test jit_write_elf                                              : Ok
 63: Test libpfm4 support                                            :
 63.1: test of individual --pfm-events                               : Skip (not compiled in)
 63.2: test groups of --pfm-events                                   : Skip (not compiled in)
 64: Test api io                                                     : Ok
 65: maps__merge_in                                                  : Ok
 66: Demangle Java                                                   : Ok
 67: Demangle OCaml                                                  : Ok
 68: Parse and process metrics                                       : Ok
 69: PE file support                                                 : FAILED!
 70: Event expansion for cgroups                                     : Ok
 71: Convert perf time to TSC                                        :
 71.1: TSC support                                                   : Ok
 71.2: Perf time to TSC                                              : Ok
 72: dlfilter C API                                                  : Ok
 73: Sigtrap                                                         : Skip
 74: Event groups                                                    : Skip
 75: Symbols                                                         : Ok
 76: Test dwarf unwind                                               : Ok
 77: build id cache operations                                       : FAILED!
 78: CoreSight / ASM Pure Loop                                       : FAILED!
 79: CoreSight / Memcpy 16k 10 Threads                               : FAILED!
 80: CoreSight / Thread Loop 10 Threads - Check TID                  : FAILED!
 81: CoreSight / Thread Loop 2 Threads - Check TID                   : FAILED!
 82: CoreSight / Unroll Loop Thread 10                               : FAILED!
 83: daemon operations                                               : Ok
 84: kernel lock contention analysis test                            : Ok
 85: perf pipe recording and injection test                          : Ok
 86: Add vfs_getname probe to get syscall args filenames             : FAILED!
 87: probe libc's inet_pton & backtrace it with ping                 : Ok
 88: Use vfs_getname probe to get syscall args filenames             : FAILED!
 89: Zstd perf.data compression/decompression                        : Ok
 90: perf record tests                                               : FAILED!
 91: perf record offcpu profiling tests                              : FAILED!
 92: perf stat CSV output linter                                     : Ok
 93: perf stat csv summary test                                      : Ok
 94: perf stat JSON output linter                                    : FAILED!
 95: perf stat metrics (shadow stat) test                            : Ok
 96: perf stat tests                                                 : Ok
 97: perf all metricgroups test                                      : Ok
 98: perf all metrics test                                           : Ok
 99: perf all PMU test                                               : Ok
100: perf stat --bpf-counters test                                   : FAILED!
101: perf stat --bpf-counters --for-each-cgroup test                 : FAILED!
102: Check Arm64 callgraphs are complete in fp mode                  : Ok
103: Check Arm CoreSight trace data recording and synthesized samples: FAILED!
104: Check Arm SPE trace data recording and synthesized samples      : Skip
105: Check Arm SPE doesn't hang when there are forks                 : Skip
106: Check branch stack sampling                                     : Skip
107: Test data symbol                                                : Skip
108: Miscellaneous Intel PT testing                                  : Skip
109: Test java symbol                                                : Ok
110: perf script task-analyzer tests                                 : Ok
111: Check open filename arg using perf trace + vfs_getname          : FAILED!
root@roc-rk3399-pc:~# 



 
> > So the perf tool is written in somewhat low-level C code, in fact we
> > try to adopt the Linux kernel's conventions so that code between the
> > tool and the kernel can easily be shared. Frameworks for different
> > kinds of parallelism would need to be added. In the past Riccardo
> > Mancini looked at adding workqueues;
> >
> https://lore.kernel.org/lkml/3c4f8dd64d07373d876990ceb16e469b4029363f.camel@gmail.com/
> > We'd like to merge this work but it needs rebasing on to the current
> > perf development tree. One problem encountered by Riccardo was issues
> > with reference counts. To this end I wrote a reference count checker:
> > https://lore.kernel.org/lkml/20220211103415.2737789-1-irogers@google.com/
> > Which has a number of fixes now merged into the tree but not the
> > actual checking framework itself. A first task may be to work on the
> > reference count checker and then to bring in Riccardo's work. I
> > started a rebase on the checker and I should work to send it out again
> > soon.
> 
> Interesting. I edited my planning to this:
> 
> Now - start date: Research
> 
>    - Get a deeper understanding of workqueue by reading the books "Linux
>    kernel development: chapter 6" and "Linux device drivers" chapter 7.
>    - Understand necessary modules, libraries and tools.
>    - Understand the code that I need to know to succeed.
>    - Work on fixing the issues with reference counts and rebase workqueues
>    patch. This task can be worked on between 3 April and 16 April.
> 
> Week 2 - 3: Code
> 
> 
>    - Implement workqueue for processing smaller/easier data structures.
>    Working further on previous contributions.
> 
> week 4: Evaluating
> 
>    - Evaluate the effectiveness of the working queues on a smaller scale
>    and devise strategies to implement this on a bigger scale.
> 
> Week 5 - 6: Code
> 
>    - Implement workqueue for processing bigger/more complex data structures.

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: perf test results on ARM64. was Re: GSoC: perf Linux Profiling Scalability and speed
  2023-03-06 22:10         ` perf test results on ARM64. was " Arnaldo Carvalho de Melo
@ 2023-03-07  1:28           ` Leo Yan
  0 siblings, 0 replies; 4+ messages in thread
From: Leo Yan @ 2023-03-07  1:28 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, James Clark
  Cc: Lukas Molleman, Ian Rogers, linux-perf-users, rickyman7

Hi Lukas,

[ +James ]

On Mon, Mar 06, 2023 at 07:10:05PM -0300, Arnaldo Carvalho de Melo wrote:

[...]

> > I'm using Ubuntu 22.04 kernel 5.15.0-60-generic ARM64.
> > perf version 5.15.78

Just remind, if you test perf on Arm64 platforms, you could refer to the
page [1], section "LKFT Integration with Perf testing on Arm64".  Please
confirm if kernel configurations (e.g. eBPF, Ftrace, etc) have been
enabled and make sure the setting up environment for eBPF & python,
especially if you are using self-built kernel image.

I have not checked the perf test for a long while, so I don't want to
mislead you that I have confirmed the cases can pass on Arm64 platforms.
In below Arnaldo's testing result, there still have some unexpected
failures, we need to look into it in details.

Thanks,
Leo

[1] https://lkft.linaro.org/tests/

> So, here, on a Libre Computer Firefly RK3399PC board and using what will
> soon be on the perf-tools branch to go to Linus:
> 
> root@roc-rk3399-pc:~# cat /etc/os-release
> PRETTY_NAME="Ubuntu 22.04.1 LTS"
> NAME="Ubuntu"
> VERSION_ID="22.04"
> VERSION="22.04.1 LTS (Jammy Jellyfish)"
> VERSION_CODENAME=jammy
> ID=ubuntu
> ID_LIKE=debian
> HOME_URL="https://www.ubuntu.com/"
> SUPPORT_URL="https://help.ubuntu.com/"
> BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
> PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
> UBUNTU_CODENAME=jammy
> root@roc-rk3399-pc:~# perf -v
> perf version 6.2.rc7.g5b201a82cd9d
> root@roc-rk3399-pc:~# perf -vv
> perf version 6.2.rc7.g5b201a82cd9d
>                  dwarf: [ on  ]  # HAVE_DWARF_SUPPORT
>     dwarf_getlocations: [ on  ]  # HAVE_DWARF_GETLOCATIONS_SUPPORT
>                  glibc: [ on  ]  # HAVE_GLIBC_SUPPORT
>          syscall_table: [ on  ]  # HAVE_SYSCALL_TABLE_SUPPORT
>                 libbfd: [ on  ]  # HAVE_LIBBFD_SUPPORT
>             debuginfod: [ OFF ]  # HAVE_DEBUGINFOD_SUPPORT
>                 libelf: [ on  ]  # HAVE_LIBELF_SUPPORT
>                libnuma: [ on  ]  # HAVE_LIBNUMA_SUPPORT
> numa_num_possible_cpus: [ on  ]  # HAVE_LIBNUMA_SUPPORT
>                libperl: [ on  ]  # HAVE_LIBPERL_SUPPORT
>              libpython: [ on  ]  # HAVE_LIBPYTHON_SUPPORT
>               libslang: [ on  ]  # HAVE_SLANG_SUPPORT
>              libcrypto: [ on  ]  # HAVE_LIBCRYPTO_SUPPORT
>              libunwind: [ on  ]  # HAVE_LIBUNWIND_SUPPORT
>     libdw-dwarf-unwind: [ on  ]  # HAVE_DWARF_SUPPORT
>                   zlib: [ on  ]  # HAVE_ZLIB_SUPPORT
>                   lzma: [ on  ]  # HAVE_LZMA_SUPPORT
>              get_cpuid: [ on  ]  # HAVE_AUXTRACE_SUPPORT
>                    bpf: [ on  ]  # HAVE_LIBBPF_SUPPORT
>                    aio: [ on  ]  # HAVE_AIO_SUPPORT
>                   zstd: [ on  ]  # HAVE_ZSTD_SUPPORT
>                libpfm4: [ OFF ]  # HAVE_LIBPFM
>          libtraceevent: [ on  ]  # HAVE_LIBTRACEEVENT
> root@roc-rk3399-pc:~#
> 
> acme@roc-rk3399-pc:~/git/perf-tools$ sudo su -
> [sudo] password for acme: 
> root@roc-rk3399-pc:~# set -o vi
> root@roc-rk3399-pc:~# export PATH=$PATH:~/bin
> root@roc-rk3399-pc:~# perf test
>   1: vmlinux symtab matches kallsyms                                 : Ok
>   2: Detect openat syscall event                                     : Ok
>   3: Detect openat syscall event on all cpus                         : Ok
>   4: mmap interface tests                                            :
>   4.1: Read samples using the mmap interface                         : Ok
>   4.2: User space counter reading of instructions                    : Skip (permissions)
>   4.3: User space counter reading of cycles                          : Skip (permissions)
>   5: Test data source output                                         : Ok
>   6: Parse event definition strings                                  :
>   6.1: Test event parsing                                            : Ok
>   6.2: Test parsing of "hybrid" CPU events                           : Skip (not hybrid)
>   6.3: Parsing of all PMU events from sysfs                          : Skip (permissions)
>   6.4: Parsing of given PMU events from sysfs                        : Skip (permissions)
>   6.5: Parsing of aliased events from sysfs                          : Skip (no aliases in sysfs)
>   6.6: Parsing of aliased events                                     : Ok
>   6.7: Parsing of terms (event modifiers)                            : Ok
>   7: Simple expression parser                                        : Ok
>   8: PERF_RECORD_* events & perf_sample fields                       : Ok
>   9: Parse perf pmu format                                           : Ok
>  10: PMU events                                                      :
>  10.1: PMU event table sanity                                        : Ok
>  10.2: PMU event map aliases                                         : Ok
>  10.3: Parsing of PMU event table metrics                            : Ok
>  10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
>  11: DSO data read                                                   : Ok
>  12: DSO data cache                                                  : Ok
>  13: DSO data reopen                                                 : Ok
>  14: Roundtrip evsel->name                                           : Ok
>  15: Parse sched tracepoints fields                                  : Ok
>  16: syscalls:sys_enter_openat event fields                          : Ok
>  17: Setup struct perf_event_attr                                    : FAILED!
>  18: Match and link multiple hists                                   : Ok
>  19: 'import perf' in python                                         : Ok
>  20: Breakpoint overflow signal handler                              : Skip
>  21: Breakpoint overflow sampling                                    : Skip
>  22: Breakpoint accounting                                           : Ok
>  23: Watchpoint                                                      :
>  23.1: Read Only Watchpoint                                          : Ok
>  23.2: Write Only Watchpoint                                         : Ok
>  23.3: Read / Write Watchpoint                                       : Ok
>  23.4: Modify Watchpoint                                             : Ok
>  24: Number of exit events of a simple workload                      : FAILED!
>  25: Software clock events period values                             : Ok
>  26: Object code reading                                             : Ok
>  27: Sample parsing                                                  : Ok
>  28: Use a dummy software event to keep tracking                     : Ok
>  29: Parse with no sample_id_all bit set                             : Ok
>  30: Filter hist entries                                             : Ok
>  31: Lookup mmap thread                                              : Ok
>  32: Share thread maps                                               : Ok
>  33: Sort output of hist entries                                     : Ok
>  34: Cumulate child hist entries                                     : Ok
>  35: Track with sched_switch                                         : Ok
>  36: Filter fds with revents mask in a fdarray                       : Ok
>  37: Add fd to a fdarray, making it autogrow                         : Ok
>  38: kmod_path__parse                                                : Ok
>  39: Thread map                                                      : Ok
>  40: LLVM search and compile                                         :
>  40.1: Basic BPF llvm compile                                        : Ok
>  40.2: kbuild searching                                              : FAILED!
>  40.3: Compile source for BPF prologue generation                    : FAILED!
>  40.4: Compile source for BPF relocation                             : Ok
>  41: Session topology                                                : Ok
>  42: BPF filter                                                      :
>  42.1: Basic BPF filtering                                           : Ok
>  42.2: BPF pinning                                                   : Ok
>  42.3: BPF prologue generation                                       : FAILED!
>  43: Synthesize thread map                                           : Ok
>  44: Remove thread map                                               : Ok
>  45: Synthesize cpu map                                              : Ok
>  46: Synthesize stat config                                          : Ok
>  47: Synthesize stat                                                 : Ok
>  48: Synthesize stat round                                           : Ok
>  49: Synthesize attr update                                          : Ok
>  50: Event times                                                     : Ok
>  51: Read backward ring buffer                                       : Ok
>  52: Print cpu map                                                   : Ok
>  53: Merge cpu map                                                   : Ok
>  54: Probe SDT events                                                : Ok
>  55: is_printable_array                                              : Ok
>  56: Print bitmap                                                    : Ok
>  57: perf hooks                                                      : Ok
>  58: builtin clang support                                           :
>  58.1: builtin clang compile C source to IR                          : Skip (not compiled in)
>  58.2: builtin clang compile C source to ELF object                  : Skip (not compiled in)
>  59: unit_number__scnprintf                                          : Ok
>  60: mem2node                                                        : Ok
>  61: time utils                                                      : Ok
>  62: Test jit_write_elf                                              : Ok
>  63: Test libpfm4 support                                            :
>  63.1: test of individual --pfm-events                               : Skip (not compiled in)
>  63.2: test groups of --pfm-events                                   : Skip (not compiled in)
>  64: Test api io                                                     : Ok
>  65: maps__merge_in                                                  : Ok
>  66: Demangle Java                                                   : Ok
>  67: Demangle OCaml                                                  : Ok
>  68: Parse and process metrics                                       : Ok
>  69: PE file support                                                 : FAILED!
>  70: Event expansion for cgroups                                     : Ok
>  71: Convert perf time to TSC                                        :
>  71.1: TSC support                                                   : Ok
>  71.2: Perf time to TSC                                              : Ok
>  72: dlfilter C API                                                  : Ok
>  73: Sigtrap                                                         : Skip
>  74: Event groups                                                    : Skip
>  75: Symbols                                                         : Ok
>  76: Test dwarf unwind                                               : Ok
>  77: build id cache operations                                       : FAILED!
>  78: CoreSight / ASM Pure Loop                                       : FAILED!
>  79: CoreSight / Memcpy 16k 10 Threads                               : FAILED!
>  80: CoreSight / Thread Loop 10 Threads - Check TID                  : FAILED!
>  81: CoreSight / Thread Loop 2 Threads - Check TID                   : FAILED!
>  82: CoreSight / Unroll Loop Thread 10                               : FAILED!
>  83: daemon operations                                               : Ok
>  84: kernel lock contention analysis test                            : Ok
>  85: perf pipe recording and injection test                          : Ok
>  86: Add vfs_getname probe to get syscall args filenames             : FAILED!
>  87: probe libc's inet_pton & backtrace it with ping                 : Ok
>  88: Use vfs_getname probe to get syscall args filenames             : FAILED!
>  89: Zstd perf.data compression/decompression                        : Ok
>  90: perf record tests                                               : FAILED!
>  91: perf record offcpu profiling tests                              : FAILED!
>  92: perf stat CSV output linter                                     : Ok
>  93: perf stat csv summary test                                      : Ok
>  94: perf stat JSON output linter                                    : FAILED!
>  95: perf stat metrics (shadow stat) test                            : Ok
>  96: perf stat tests                                                 : Ok
>  97: perf all metricgroups test                                      : Ok
>  98: perf all metrics test                                           : Ok
>  99: perf all PMU test                                               : Ok
> 100: perf stat --bpf-counters test                                   : FAILED!
> 101: perf stat --bpf-counters --for-each-cgroup test                 : FAILED!
> 102: Check Arm64 callgraphs are complete in fp mode                  : Ok
> 103: Check Arm CoreSight trace data recording and synthesized samples: FAILED!
> 104: Check Arm SPE trace data recording and synthesized samples      : Skip
> 105: Check Arm SPE doesn't hang when there are forks                 : Skip
> 106: Check branch stack sampling                                     : Skip
> 107: Test data symbol                                                : Skip
> 108: Miscellaneous Intel PT testing                                  : Skip
> 109: Test java symbol                                                : Ok
> 110: perf script task-analyzer tests                                 : Ok
> 111: Check open filename arg using perf trace + vfs_getname          : FAILED!
> root@roc-rk3399-pc:~# 
> 
> 
> 
>  
> > > So the perf tool is written in somewhat low-level C code, in fact we
> > > try to adopt the Linux kernel's conventions so that code between the
> > > tool and the kernel can easily be shared. Frameworks for different
> > > kinds of parallelism would need to be added. In the past Riccardo
> > > Mancini looked at adding workqueues;
> > >
> > https://lore.kernel.org/lkml/3c4f8dd64d07373d876990ceb16e469b4029363f.camel@gmail.com/
> > > We'd like to merge this work but it needs rebasing on to the current
> > > perf development tree. One problem encountered by Riccardo was issues
> > > with reference counts. To this end I wrote a reference count checker:
> > > https://lore.kernel.org/lkml/20220211103415.2737789-1-irogers@google.com/
> > > Which has a number of fixes now merged into the tree but not the
> > > actual checking framework itself. A first task may be to work on the
> > > reference count checker and then to bring in Riccardo's work. I
> > > started a rebase on the checker and I should work to send it out again
> > > soon.
> > 
> > Interesting. I edited my planning to this:
> > 
> > Now - start date: Research
> > 
> >    - Get a deeper understanding of workqueue by reading the books "Linux
> >    kernel development: chapter 6" and "Linux device drivers" chapter 7.
> >    - Understand necessary modules, libraries and tools.
> >    - Understand the code that I need to know to succeed.
> >    - Work on fixing the issues with reference counts and rebase workqueues
> >    patch. This task can be worked on between 3 April and 16 April.
> > 
> > Week 2 - 3: Code
> > 
> > 
> >    - Implement workqueue for processing smaller/easier data structures.
> >    Working further on previous contributions.
> > 
> > week 4: Evaluating
> > 
> >    - Evaluate the effectiveness of the working queues on a smaller scale
> >    and devise strategies to implement this on a bigger scale.
> > 
> > Week 5 - 6: Code
> > 
> >    - Implement workqueue for processing bigger/more complex data structures.
> 
> -- 
> 
> - Arnaldo

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: GSoC: perf Linux Profiling Scalability and speed
       [not found]       ` <CANRdyn9MU1N5xH8MfMbpmxPhRTCqi9J_NWsVOPKXodE97H97rQ@mail.gmail.com>
  2023-03-06 22:10         ` perf test results on ARM64. was " Arnaldo Carvalho de Melo
@ 2023-03-07 21:56         ` Ian Rogers
  1 sibling, 0 replies; 4+ messages in thread
From: Ian Rogers @ 2023-03-07 21:56 UTC (permalink / raw)
  To: Lukas Molleman; +Cc: acme, linux-perf-users, rickyman7

On Mon, Mar 6, 2023 at 3:18 AM Lukas Molleman <lukas.molleman@gmail.com> wrote:
>
> > Perhaps if you share what the errors are then we can talk about how to
> > fix them. There are also tests that currently say skip but don't give
> > a reason, it'd be nice to improve this.
>
>  1: vmlinux symtab matches kallsyms            : FAILED!
>  2: Detect openat syscall event                : FAILED!
>  3: Detect openat syscall event on all cpus    : FAILED!
>  4: Read samples using the mmap interface      : FAILED!
>  5: Test data source output                    : Ok
>  6: Parse event definition strings             : FAILED!
>  7: Simple expression parser                   : Ok
>  8: PERF_RECORD_* events & perf_sample fields  : FAILED!
>  9: Parse perf pmu format                      : Ok
> 10: PMU events                                 :
> 10.1: PMU event table sanity                           : Ok
> 10.2: PMU event map aliases                            : Ok
> 10.3: Parsing of PMU event table metrics               : Ok
> 10.4: Parsing of PMU event table metrics with fake PMUs: Ok
> 11: DSO data read                              : Ok
> 12: DSO data cache                             : Ok
> 13: DSO data reopen                            : Ok
> 14: Roundtrip evsel->name                      : Ok
> 15: Parse sched tracepoints fields             : FAILED!
> 16: syscalls:sys_enter_openat event fields     : FAILED!
> 17: Setup struct perf_event_attr               : Skip
> 18: Match and link multiple hists              : Ok
> 19: 'import perf' in python                    : Ok
> 22: Breakpoint accounting                      : Skip
> 23: Watchpoint                                 :
> 23.1: Read Only Watchpoint                     : FAILED!
> 23.2: Write Only Watchpoint                    : FAILED!
> 23.3: Read / Write Watchpoint                  : FAILED!
> 23.4: Modify Watchpoint                        : FAILED!
> 24: Number of exit events of a simple workload : FAILED!
> 25: Software clock events period values        : FAILED!
> 26: Object code reading                        : FAILED!
> 27: Sample parsing                             : Ok
> 28: Use a dummy software event to keep tracking: Skip
> 29: Parse with no sample_id_all bit set        : Ok
> 30: Filter hist entries                        : Ok
> 31: Lookup mmap thread                         : Ok
> 32: Share thread maps                          : Ok
> 33: Sort output of hist entries                : Ok
> 34: Cumulate child hist entries                : Ok
> 35: Track with sched_switch                    : Ok
> 36: Filter fds with revents mask in a fdarray  : Ok
> 37: Add fd to a fdarray, making it autogrow    : Ok
> 38: kmod_path__parse                           : Ok
> 39: Thread map                                 : Ok
> 40: LLVM search and compile                    :
> 40.1: Basic BPF llvm compile                    : Skip
> 40.2: kbuild searching                          : Skip
> 40.3: Compile source for BPF prologue generation: Skip
> 40.4: Compile source for BPF relocation         : Skip
> 41: Session topology                           : Ok
> 42: BPF filter                                 :
> 42.1: Basic BPF filtering                      : Skip
> 42.2: BPF pinning                              : Skip
> 42.3: BPF prologue generation                  : Skip
> 43: Synthesize thread map                      : Ok
> 44: Remove thread map                          : Ok
> 45: Synthesize cpu map                         : Ok
> 46: Synthesize stat config                     : Ok
> 47: Synthesize stat                            : Ok
> 48: Synthesize stat round                      : Ok
> 49: Synthesize attr update                     : Ok
> 50: Event times                                : FAILED!
> 51: Read backward ring buffer                  : Skip
> 52: Print cpu map                              : Ok
> 53: Merge cpu map                              : Ok
> 54: Probe SDT events                           : Skip
> 55: is_printable_array                         : Ok
> 56: Print bitmap                               : Ok
> 57: perf hooks                                 : Ok
> 58: builtin clang support                      : Skip (not compiled in)
> 59: unit_number__scnprintf                     : Ok
> 60: mem2node                                   : Ok
> 61: time utils                                 : Ok
> 62: Test jit_write_elf                         : Ok
> 63: Test libpfm4 support                       : Skip (not compiled in)
> 64: Test api io                                : Ok
> 65: maps__merge_in                             : Ok
> 66: Demangle Java                              : Ok
> 67: Demangle OCaml                             : Ok
> 68: Parse and process metrics                  : Ok
> 69: PE file support                            : Skip
> 70: Event expansion for cgroups                : Ok
> 71: Convert perf time to TSC                   : FAILED!
> 72: dlfilter C API                             : Skip
> 73: DWARF unwind                               : Ok
> failed to open shell test directory: /usr/libexec/perf-core/tests/shell
>
> I'm using Ubuntu 22.04 kernel 5.15.0-60-generic ARM64.
> perf version 5.15.78

Thanks for checking this Lukas. With `perf test` you can run
individual tests by putting the test number after `perf test`, you can
add the verbose flags (-v, -vv, -vvv) to see what the issue is. The -F
flag will let you run the test without forking, which simplifies
debugging with say gdb. I would also suggest using the latest version
of the perf tool from the maintainer's tree:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/
as you may be seeing issues that are already fixed.

Thanks,
Ian


> > So the perf tool is written in somewhat low-level C code, in fact we
> > try to adopt the Linux kernel's conventions so that code between the
> > tool and the kernel can easily be shared. Frameworks for different
> > kinds of parallelism would need to be added. In the past Riccardo
> > Mancini looked at adding workqueues;
> > https://lore.kernel.org/lkml/3c4f8dd64d07373d876990ceb16e469b4029363f.camel@gmail.com/
> > We'd like to merge this work but it needs rebasing on to the current
> > perf development tree. One problem encountered by Riccardo was issues
> > with reference counts. To this end I wrote a reference count checker:
> > https://lore.kernel.org/lkml/20220211103415.2737789-1-irogers@google.com/
> > Which has a number of fixes now merged into the tree but not the
> > actual checking framework itself. A first task may be to work on the
> > reference count checker and then to bring in Riccardo's work. I
> > started a rebase on the checker and I should work to send it out again
> > soon.
>
> Interesting. I edited my planning to this:
>
> Now - start date: Research
>
> Get a deeper understanding of workqueue by reading the books "Linux kernel development: chapter 6" and "Linux device drivers" chapter 7.
> Understand necessary modules, libraries and tools.
> Understand the code that I need to know to succeed.
> Work on fixing the issues with reference counts and rebase workqueues patch. This task can be worked on between 3 April and 16 April.
>
> Week 2 - 3: Code
>
> Implement workqueue for processing smaller/easier data structures. Working further on previous contributions.
>
> week 4: Evaluating
>
> Evaluate the effectiveness of the working queues on a smaller scale and devise strategies to implement this on a bigger scale.
>
> Week 5 - 6: Code
>
> Implement workqueue for processing bigger/more complex data structures.

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-03-07 21:56 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CANRdyn9kXT623Pbu9hc+c4YrDu_h06a2Ch9fJpt=O0S7yKXDEg@mail.gmail.com>
     [not found] ` <CAP-5=fU30f1j5o7c05ohaygpgV4=Hx7yS7f8i3vZ1j1Gk=HgcA@mail.gmail.com>
     [not found]   ` <CANRdyn9H7kzBXUxpvTJ_93G5Tc=51vCRs9ucHbSyVRO_rQf5vA@mail.gmail.com>
2023-03-03  0:29     ` GSoC: perf Linux Profiling Scalability and speed Ian Rogers
     [not found]       ` <CANRdyn9MU1N5xH8MfMbpmxPhRTCqi9J_NWsVOPKXodE97H97rQ@mail.gmail.com>
2023-03-06 22:10         ` perf test results on ARM64. was " Arnaldo Carvalho de Melo
2023-03-07  1:28           ` Leo Yan
2023-03-07 21:56         ` Ian Rogers

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.