[6.1.7][6.2-rc5] perf all metrics test: FAILED!

* [6.1.7][6.2-rc5] perf all metrics test: FAILED!
@ 2023-01-29  9:58 Sedat Dilek
  2023-01-29 23:21 ` Ian Rogers
  0 siblings, 1 reply; 13+ messages in thread
From: Sedat Dilek @ 2023-01-29  9:58 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Ian Rogers
  Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel,
	Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings

[-- Attachment #1: Type: text/plain, Size: 4876 bytes --]

[ CC LLVM linux folks + Ben from Debian kernel team ]

Hi,

I am playing with LLVM version 16.0.0-rc1 which was released yesterday and PERF.

After building my selfmade LLVM toolchain, I built perf and run some
perf tests here on my Intel SandyBridge CPU (details see below).

perf all metrics test: FAILED!

...with both Debian's perf version 6.1.7 and my selfmade version 6.2-rc5.

Just noticed:

Couldn't bump rlimit(MEMLOCK), failures may take place when creating
BPF maps, etc

Run the below tests with `sudo` - made this go away - still FAILED.

But maybe I am missing to activate some sysfs/debug or whatever other stuff?

Last perf version which was OK:

~/bin/perf -v
perf version 6.0.0

echo "linux-perf: Adjust limited access to performance monitoring and
observability operations"
echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
/proc/sys/kernel/perf_event_paranoid
0

~/bin/perf test 10 86 92 93 94 95
10: PMU events                                                      :
10.1: PMU event table sanity                                        : Ok
10.2: PMU event map aliases                                         : Ok
10.3: Parsing of PMU event table metrics                            : Ok
10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
86: perf record tests                                               : Ok
92: perf stat tests                                                 : Ok
93: perf all metricgroups test                                      : Ok
94: perf all metrics test                                           : Ok
95: perf all PMU test                                               : Ok

echo 1 | sudo tee /proc/sys/kernel/kptr_restrict
/proc/sys/kernel/perf_event_paranoid
echo "linux-perf: Reset limited access to performance monitoring and
observability operations"

If you need further information, please let me know.

Thanks.

Regards,
-Sedat-

P.S. Instructions

[ REPRODUCER ]

LLVM_MVER="16"

# Debian LLVM
##LLVM_TOOLCHAIN_PATH="/usr/lib/llvm-${LLVM_MVER}/bin"
# Selfmade LLVM
LLVM_TOOLCHAIN_PATH="/opt/llvm/bin"
if [ -d ${LLVM_TOOLCHAIN_PATH} ]; then
   export PATH="${LLVM_TOOLCHAIN_PATH}:${PATH}"
fi

PYTHON_VER="3.11"
MAKE="make"
MAKE_OPTS="V=1 -j1 HOSTCC=clang-$LLVM_MVER HOSTLD=ld.lld
HOSTAR=llvm-ar CC=clang-$LLVM_MVER LD=ld.lld AR=llvm-ar
STRIP=llvm-strip"

echo "LLVM MVER ........ $LLVM_MVER"
echo "Path settings .... $PATH"
echo "Python version ... $PYTHON_VER"
echo "make line ........ $MAKE $MAKE_OPTS"

LANG=C LC_ALL=C make -C tools/perf clean 2>&1 | tee ../make-log_perf-clean.txt

LANG=C LC_ALL=C $MAKE $MAKE_OPTS -C tools/perf
PYTHON=python${PYTHON_VER} install-bin 2>&1 | tee
../make-log_perf-install_bin_python${PYTHON_VER}_llvm${LLVM_MVER}.txt

[ TESTS ]

[ TESTS - START ]

echo 0 | sudo tee /proc/sys/kernel/kptr_restrict
/proc/sys/kernel/perf_event_paranoid

[ TESTS - DEBIAN ]

/usr/bin/perf -v
perf version 6.1.7

/usr/bin/perf test 10 92 98 99 100 101

 10: PMU events                                                      :
 10.1: PMU event table sanity                                        : Ok
 10.2: PMU event map aliases                                         : Ok
 10.3: Parsing of PMU event table metrics                            : Ok
 10.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 92: perf record tests                                               : Ok
 98: perf stat tests                                                 : Ok
 99: perf all metricgroups test                                      : Ok
100: perf all metrics test                                           : FAILED!
101: perf all PMU test                                               : Ok

[ TESTS - DILEKS ]

~/bin/perf -v
perf version 6.2.0-rc5

~/bin/perf test 7 87 93 94 95 96

  7: PMU events                                                      :
  7.1: PMU event table sanity                                        : Ok
  7.2: PMU event map aliases                                         : Ok
  7.3: Parsing of PMU event table metrics                            : Ok
  7.4: Parsing of PMU event table metrics with fake PMUs             : Ok
 87: perf record tests                                               : Ok
 93: perf stat tests                                                 : Ok
 94: perf all metricgroups test                                      : Ok
 95: perf all metrics test                                           : FAILED!
 96: perf all PMU test                                               : Ok

[ TESTS - FAILED ]

/usr/bin/perf test --verbose 100 2>&1 | tee
perf-test-verbose-100-perf-all-metrics-test_debian-perf-6-1-7.txt

~/bin/perf test --verbose 95 2>&1 | tee
perf-test-verbose-95-perf-all-metrics-test_dileks-perf-6-2-rc5.txt

[ TESTS - STOP ]

echo 1 | sudo tee /proc/sys/kernel/kptr_restrict
/proc/sys/kernel/perf_event_paranoid

- EOT -

[-- Attachment #2: debian-perf-6-1-7_test-verbose-100-perf-all-metrics-test.txt --]
[-- Type: text/plain, Size: 3732 bytes --]

Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
100: perf all metrics test                                           :
--- start ---
test child forked, pid 39432
Testing Average_Frequency
Testing C2_Pkg_Residency
Testing C3_Core_Residency
Testing C3_Pkg_Residency
Testing C6_Core_Residency
Testing C6_Pkg_Residency
Testing C7_Core_Residency
Testing C7_Pkg_Residency
Testing CLKS
Testing CORE_CLKS
Testing CPI
Testing CPU_Utilization
Testing CoreIPC
Testing DRAM_BW_Use
Testing DSB_Coverage
Testing Execute_per_Issue
Testing FLOPc
Testing GFLOPs
Testing ILP
Testing IPC
Testing Instructions
Testing IpFarBranch
Testing Kernel_CPI
Testing Kernel_Utilization
Testing MEM_Parallel_Requests
Testing MEM_Request_Latency
Testing Retire
Testing SLOTS
Testing SMT_2T_Utilization
Testing Turbo_Utilization
Testing UPI
Testing tma_backend_bound
Testing tma_bad_speculation
Testing tma_branch_mispredicts
Testing tma_branch_resteers
Testing tma_core_bound
Testing tma_divider
Testing tma_dram_bound
Metric 'tma_dram_bound' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
  Average synthesis took: 203.922 usec (+- 0.191 usec)
  Average num. events: 30.000 (+- 0.000)
  Average time per event 6.797 usec
  Average data synthesis took: 219.730 usec (+- 0.216 usec)
  Average num. events: 159.000 (+- 0.000)
  Average time per event 1.382 usec

 Performance counter stats for 'perf bench internals synthesize':

     <not counted>      MEM_LOAD_UOPS_RETIRED.LLC_HIT                                     (0,00%)
     <not counted>      CYCLE_ACTIVITY.STALLS_L2_PENDING                                     (0,00%)
     <not counted>      CPU_CLK_UNHALTED.THREAD                                       (0,00%)
     <not counted>      MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS                                     (0,00%)

       4,456375532 seconds time elapsed

       1,415829000 seconds user
       3,027083000 seconds sys
Testing tma_dsb_switches
Testing tma_dtlb_load
Testing tma_fetch_bandwidth
Testing tma_fetch_latency
Testing tma_fp_arith
Testing tma_fp_scalar
Testing tma_fp_vector
Testing tma_frontend_bound
Testing tma_heavy_operations
Testing tma_itlb_misses
Testing tma_l3_bound
Metric 'tma_l3_bound' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
  Average synthesis took: 204.199 usec (+- 0.228 usec)
  Average num. events: 30.000 (+- 0.000)
  Average time per event 6.807 usec
  Average data synthesis took: 219.934 usec (+- 0.232 usec)
  Average num. events: 159.000 (+- 0.000)
  Average time per event 1.383 usec

 Performance counter stats for 'perf bench internals synthesize':

     <not counted>      MEM_LOAD_UOPS_RETIRED.LLC_HIT                                     (0,00%)
     <not counted>      CYCLE_ACTIVITY.STALLS_L2_PENDING                                     (0,00%)
     <not counted>      CPU_CLK_UNHALTED.THREAD                                       (0,00%)
     <not counted>      MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS                                     (0,00%)

       4,458943453 seconds time elapsed

       1,468251000 seconds user
       2,976400000 seconds sys
Testing tma_lcp
Testing tma_light_operations
Testing tma_machine_clears
Testing tma_mem_bandwidth
Testing tma_mem_latency
Testing tma_memory_bound
Testing tma_microcode_sequencer
Testing tma_ms_switches
Testing tma_ports_utilization
Testing tma_retiring
Testing tma_store_bound
Testing tma_x87_use
test child finished with -1
---- end ----
perf all metrics test: FAILED!

[-- Attachment #3: dileks-perf-6-2-rc5-test-verbose-95-perf-all-metrics-test.txt --]
[-- Type: text/plain, Size: 3816 bytes --]

Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc
 95: perf all metrics test                                           :
--- start ---
test child forked, pid 39198
Testing ILP
Testing tma_core_bound
Testing tma_memory_bound
Testing tma_branch_mispredicts
Testing tma_machine_clears
Testing tma_itlb_misses
Testing IpFarBranch
Testing tma_l3_bound
Metric 'tma_l3_bound' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
  Average synthesis took: 208.033 usec (+- 0.214 usec)
  Average num. events: 30.000 (+- 0.000)
  Average time per event 6.934 usec
  Average data synthesis took: 216.728 usec (+- 0.182 usec)
  Average num. events: 162.000 (+- 0.000)
  Average time per event 1.338 usec

 Performance counter stats for 'perf bench internals synthesize':

     <not counted>      MEM_LOAD_UOPS_RETIRED.LLC_HIT                                           (0,00%)
     <not counted>      CYCLE_ACTIVITY.STALLS_L2_PENDING                                        (0,00%)
     <not counted>      CPU_CLK_UNHALTED.THREAD                                                 (0,00%)
     <not counted>      MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS                                        (0,00%)

       4,555228480 seconds time elapsed

       1,504137000 seconds user
       3,040193000 seconds sys
Testing tma_fp_scalar
Testing tma_fp_vector
Testing tma_x87_use
Testing Execute_per_Issue
Testing GFLOPs
Testing DSB_Coverage
Testing tma_dsb_switches
Testing tma_fetch_bandwidth
Testing tma_branch_resteers
Testing tma_lcp
Testing tma_ms_switches
Testing FLOPc
Testing tma_fetch_latency
Testing CPU_Utilization
Testing DRAM_BW_Use
Testing tma_fp_arith
Testing CPI
Testing MEM_Parallel_Requests
Testing MEM_Request_Latency
Testing tma_mem_bandwidth
Testing tma_dram_bound
Metric 'tma_dram_bound' not printed in:
# Running 'internals/synthesize' benchmark:
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
  Average synthesis took: 207.680 usec (+- 0.176 usec)
  Average num. events: 30.000 (+- 0.000)
  Average time per event 6.923 usec
  Average data synthesis took: 217.833 usec (+- 0.202 usec)
  Average num. events: 161.000 (+- 0.000)
  Average time per event 1.353 usec

 Performance counter stats for 'perf bench internals synthesize':

     <not counted>      MEM_LOAD_UOPS_RETIRED.LLC_HIT                                           (0,00%)
     <not counted>      CYCLE_ACTIVITY.STALLS_L2_PENDING                                        (0,00%)
     <not counted>      CPU_CLK_UNHALTED.THREAD                                                 (0,00%)
     <not counted>      MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS                                        (0,00%)

       4,555698863 seconds time elapsed

       1,481769000 seconds user
       3,063387000 seconds sys
Testing tma_store_bound
Testing tma_mem_latency
Testing tma_dtlb_load
Testing tma_microcode_sequencer
Testing Kernel_CPI
Testing Kernel_Utilization
Testing tma_frontend_bound
Testing CLKS
Testing Retire
Testing UPI
Testing tma_ports_utilization
Testing Average_Frequency
Testing C2_Pkg_Residency
Testing C3_Core_Residency
Testing C3_Pkg_Residency
Testing C6_Core_Residency
Testing C6_Pkg_Residency
Testing C7_Core_Residency
Testing C7_Pkg_Residency
Testing Turbo_Utilization
Testing CoreIPC
Testing IPC
Testing tma_heavy_operations
Testing tma_light_operations
Testing CORE_CLKS
Testing SMT_2T_Utilization
Testing Socket_CLKS
Testing UNCORE_FREQ
Testing Instructions
Testing tma_backend_bound
Testing tma_bad_speculation
Testing tma_retiring
Testing tma_divider
Testing SLOTS
test child finished with -1
---- end ----
perf all metrics test: FAILED!

^ permalink raw reply	[flat|nested] 13+ messages in thread