* [6.1.7][6.2-rc5] perf all metrics test: FAILED! @ 2023-01-29 9:58 Sedat Dilek 2023-01-29 23:21 ` Ian Rogers 0 siblings, 1 reply; 13+ messages in thread From: Sedat Dilek @ 2023-01-29 9:58 UTC (permalink / raw) To: Arnaldo Carvalho de Melo, Ian Rogers Cc: Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings [-- Attachment #1: Type: text/plain, Size: 4876 bytes --] [ CC LLVM linux folks + Ben from Debian kernel team ] Hi, I am playing with LLVM version 16.0.0-rc1 which was released yesterday and PERF. After building my selfmade LLVM toolchain, I built perf and run some perf tests here on my Intel SandyBridge CPU (details see below). perf all metrics test: FAILED! ...with both Debian's perf version 6.1.7 and my selfmade version 6.2-rc5. Just noticed: Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc Run the below tests with `sudo` - made this go away - still FAILED. But maybe I am missing to activate some sysfs/debug or whatever other stuff? Last perf version which was OK: ~/bin/perf -v perf version 6.0.0 echo "linux-perf: Adjust limited access to performance monitoring and observability operations" echo 0 | sudo tee /proc/sys/kernel/kptr_restrict /proc/sys/kernel/perf_event_paranoid 0 ~/bin/perf test 10 86 92 93 94 95 10: PMU events : 10.1: PMU event table sanity : Ok 10.2: PMU event map aliases : Ok 10.3: Parsing of PMU event table metrics : Ok 10.4: Parsing of PMU event table metrics with fake PMUs : Ok 86: perf record tests : Ok 92: perf stat tests : Ok 93: perf all metricgroups test : Ok 94: perf all metrics test : Ok 95: perf all PMU test : Ok echo 1 | sudo tee /proc/sys/kernel/kptr_restrict /proc/sys/kernel/perf_event_paranoid echo "linux-perf: Reset limited access to performance monitoring and observability operations" If you need further information, please let me know. Thanks. Regards, -Sedat- P.S. Instructions [ REPRODUCER ] LLVM_MVER="16" # Debian LLVM ##LLVM_TOOLCHAIN_PATH="/usr/lib/llvm-${LLVM_MVER}/bin" # Selfmade LLVM LLVM_TOOLCHAIN_PATH="/opt/llvm/bin" if [ -d ${LLVM_TOOLCHAIN_PATH} ]; then export PATH="${LLVM_TOOLCHAIN_PATH}:${PATH}" fi PYTHON_VER="3.11" MAKE="make" MAKE_OPTS="V=1 -j1 HOSTCC=clang-$LLVM_MVER HOSTLD=ld.lld HOSTAR=llvm-ar CC=clang-$LLVM_MVER LD=ld.lld AR=llvm-ar STRIP=llvm-strip" echo "LLVM MVER ........ $LLVM_MVER" echo "Path settings .... $PATH" echo "Python version ... $PYTHON_VER" echo "make line ........ $MAKE $MAKE_OPTS" LANG=C LC_ALL=C make -C tools/perf clean 2>&1 | tee ../make-log_perf-clean.txt LANG=C LC_ALL=C $MAKE $MAKE_OPTS -C tools/perf PYTHON=python${PYTHON_VER} install-bin 2>&1 | tee ../make-log_perf-install_bin_python${PYTHON_VER}_llvm${LLVM_MVER}.txt [ TESTS ] [ TESTS - START ] echo 0 | sudo tee /proc/sys/kernel/kptr_restrict /proc/sys/kernel/perf_event_paranoid [ TESTS - DEBIAN ] /usr/bin/perf -v perf version 6.1.7 /usr/bin/perf test 10 92 98 99 100 101 10: PMU events : 10.1: PMU event table sanity : Ok 10.2: PMU event map aliases : Ok 10.3: Parsing of PMU event table metrics : Ok 10.4: Parsing of PMU event table metrics with fake PMUs : Ok 92: perf record tests : Ok 98: perf stat tests : Ok 99: perf all metricgroups test : Ok 100: perf all metrics test : FAILED! 101: perf all PMU test : Ok [ TESTS - DILEKS ] ~/bin/perf -v perf version 6.2.0-rc5 ~/bin/perf test 7 87 93 94 95 96 7: PMU events : 7.1: PMU event table sanity : Ok 7.2: PMU event map aliases : Ok 7.3: Parsing of PMU event table metrics : Ok 7.4: Parsing of PMU event table metrics with fake PMUs : Ok 87: perf record tests : Ok 93: perf stat tests : Ok 94: perf all metricgroups test : Ok 95: perf all metrics test : FAILED! 96: perf all PMU test : Ok [ TESTS - FAILED ] /usr/bin/perf test --verbose 100 2>&1 | tee perf-test-verbose-100-perf-all-metrics-test_debian-perf-6-1-7.txt ~/bin/perf test --verbose 95 2>&1 | tee perf-test-verbose-95-perf-all-metrics-test_dileks-perf-6-2-rc5.txt [ TESTS - STOP ] echo 1 | sudo tee /proc/sys/kernel/kptr_restrict /proc/sys/kernel/perf_event_paranoid - EOT - [-- Attachment #2: debian-perf-6-1-7_test-verbose-100-perf-all-metrics-test.txt --] [-- Type: text/plain, Size: 3732 bytes --] Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc 100: perf all metrics test : --- start --- test child forked, pid 39432 Testing Average_Frequency Testing C2_Pkg_Residency Testing C3_Core_Residency Testing C3_Pkg_Residency Testing C6_Core_Residency Testing C6_Pkg_Residency Testing C7_Core_Residency Testing C7_Pkg_Residency Testing CLKS Testing CORE_CLKS Testing CPI Testing CPU_Utilization Testing CoreIPC Testing DRAM_BW_Use Testing DSB_Coverage Testing Execute_per_Issue Testing FLOPc Testing GFLOPs Testing ILP Testing IPC Testing Instructions Testing IpFarBranch Testing Kernel_CPI Testing Kernel_Utilization Testing MEM_Parallel_Requests Testing MEM_Request_Latency Testing Retire Testing SLOTS Testing SMT_2T_Utilization Testing Turbo_Utilization Testing UPI Testing tma_backend_bound Testing tma_bad_speculation Testing tma_branch_mispredicts Testing tma_branch_resteers Testing tma_core_bound Testing tma_divider Testing tma_dram_bound Metric 'tma_dram_bound' not printed in: # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 203.922 usec (+- 0.191 usec) Average num. events: 30.000 (+- 0.000) Average time per event 6.797 usec Average data synthesis took: 219.730 usec (+- 0.216 usec) Average num. events: 159.000 (+- 0.000) Average time per event 1.382 usec Performance counter stats for 'perf bench internals synthesize': <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT (0,00%) <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING (0,00%) <not counted> CPU_CLK_UNHALTED.THREAD (0,00%) <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS (0,00%) 4,456375532 seconds time elapsed 1,415829000 seconds user 3,027083000 seconds sys Testing tma_dsb_switches Testing tma_dtlb_load Testing tma_fetch_bandwidth Testing tma_fetch_latency Testing tma_fp_arith Testing tma_fp_scalar Testing tma_fp_vector Testing tma_frontend_bound Testing tma_heavy_operations Testing tma_itlb_misses Testing tma_l3_bound Metric 'tma_l3_bound' not printed in: # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 204.199 usec (+- 0.228 usec) Average num. events: 30.000 (+- 0.000) Average time per event 6.807 usec Average data synthesis took: 219.934 usec (+- 0.232 usec) Average num. events: 159.000 (+- 0.000) Average time per event 1.383 usec Performance counter stats for 'perf bench internals synthesize': <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT (0,00%) <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING (0,00%) <not counted> CPU_CLK_UNHALTED.THREAD (0,00%) <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS (0,00%) 4,458943453 seconds time elapsed 1,468251000 seconds user 2,976400000 seconds sys Testing tma_lcp Testing tma_light_operations Testing tma_machine_clears Testing tma_mem_bandwidth Testing tma_mem_latency Testing tma_memory_bound Testing tma_microcode_sequencer Testing tma_ms_switches Testing tma_ports_utilization Testing tma_retiring Testing tma_store_bound Testing tma_x87_use test child finished with -1 ---- end ---- perf all metrics test: FAILED! [-- Attachment #3: dileks-perf-6-2-rc5-test-verbose-95-perf-all-metrics-test.txt --] [-- Type: text/plain, Size: 3816 bytes --] Couldn't bump rlimit(MEMLOCK), failures may take place when creating BPF maps, etc 95: perf all metrics test : --- start --- test child forked, pid 39198 Testing ILP Testing tma_core_bound Testing tma_memory_bound Testing tma_branch_mispredicts Testing tma_machine_clears Testing tma_itlb_misses Testing IpFarBranch Testing tma_l3_bound Metric 'tma_l3_bound' not printed in: # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 208.033 usec (+- 0.214 usec) Average num. events: 30.000 (+- 0.000) Average time per event 6.934 usec Average data synthesis took: 216.728 usec (+- 0.182 usec) Average num. events: 162.000 (+- 0.000) Average time per event 1.338 usec Performance counter stats for 'perf bench internals synthesize': <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT (0,00%) <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING (0,00%) <not counted> CPU_CLK_UNHALTED.THREAD (0,00%) <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS (0,00%) 4,555228480 seconds time elapsed 1,504137000 seconds user 3,040193000 seconds sys Testing tma_fp_scalar Testing tma_fp_vector Testing tma_x87_use Testing Execute_per_Issue Testing GFLOPs Testing DSB_Coverage Testing tma_dsb_switches Testing tma_fetch_bandwidth Testing tma_branch_resteers Testing tma_lcp Testing tma_ms_switches Testing FLOPc Testing tma_fetch_latency Testing CPU_Utilization Testing DRAM_BW_Use Testing tma_fp_arith Testing CPI Testing MEM_Parallel_Requests Testing MEM_Request_Latency Testing tma_mem_bandwidth Testing tma_dram_bound Metric 'tma_dram_bound' not printed in: # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 207.680 usec (+- 0.176 usec) Average num. events: 30.000 (+- 0.000) Average time per event 6.923 usec Average data synthesis took: 217.833 usec (+- 0.202 usec) Average num. events: 161.000 (+- 0.000) Average time per event 1.353 usec Performance counter stats for 'perf bench internals synthesize': <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT (0,00%) <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING (0,00%) <not counted> CPU_CLK_UNHALTED.THREAD (0,00%) <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS (0,00%) 4,555698863 seconds time elapsed 1,481769000 seconds user 3,063387000 seconds sys Testing tma_store_bound Testing tma_mem_latency Testing tma_dtlb_load Testing tma_microcode_sequencer Testing Kernel_CPI Testing Kernel_Utilization Testing tma_frontend_bound Testing CLKS Testing Retire Testing UPI Testing tma_ports_utilization Testing Average_Frequency Testing C2_Pkg_Residency Testing C3_Core_Residency Testing C3_Pkg_Residency Testing C6_Core_Residency Testing C6_Pkg_Residency Testing C7_Core_Residency Testing C7_Pkg_Residency Testing Turbo_Utilization Testing CoreIPC Testing IPC Testing tma_heavy_operations Testing tma_light_operations Testing CORE_CLKS Testing SMT_2T_Utilization Testing Socket_CLKS Testing UNCORE_FREQ Testing Instructions Testing tma_backend_bound Testing tma_bad_speculation Testing tma_retiring Testing tma_divider Testing SLOTS test child finished with -1 ---- end ---- perf all metrics test: FAILED! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED! 2023-01-29 9:58 [6.1.7][6.2-rc5] perf all metrics test: FAILED! Sedat Dilek @ 2023-01-29 23:21 ` Ian Rogers 2023-01-30 2:24 ` Sedat Dilek 0 siblings, 1 reply; 13+ messages in thread From: Ian Rogers @ 2023-01-29 23:21 UTC (permalink / raw) To: sedat.dilek Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings On Sun, Jan 29, 2023 at 1:59 AM Sedat Dilek <sedat.dilek@gmail.com> wrote: > > [ CC LLVM linux folks + Ben from Debian kernel team ] > > Hi, > > I am playing with LLVM version 16.0.0-rc1 which was released yesterday and PERF. > > After building my selfmade LLVM toolchain, I built perf and run some > perf tests here on my Intel SandyBridge CPU (details see below). > > perf all metrics test: FAILED! > > ...with both Debian's perf version 6.1.7 and my selfmade version 6.2-rc5. > > Just noticed: > > Couldn't bump rlimit(MEMLOCK), failures may take place when creating > BPF maps, etc > > Run the below tests with `sudo` - made this go away - still FAILED. > > But maybe I am missing to activate some sysfs/debug or whatever other stuff? Hi Sedat, things have been improving wrt metrics and so this failure may have just been because of the addition of a previously missing metric. The rlimit thing shouldn't affect things but maybe file descriptors? Looking at the test output the issue is: ``` Metric 'tma_dram_bound' not printed in: # Running 'internals/synthesize' benchmark: Computing performance of single threaded perf event synthesis by synthesizing events on the perf process itself: Average synthesis took: 207.680 usec (+- 0.176 usec) Average num. events: 30.000 (+- 0.000) Average time per event 6.923 usec Average data synthesis took: 217.833 usec (+- 0.202 usec) Average num. events: 161.000 (+- 0.000) Average time per event 1.353 usec Performance counter stats for 'perf bench internals synthesize': <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT (0,00%) <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING (0,00%) <not counted> CPU_CLK_UNHALTED.THREAD (0,00%) <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS (0,00%) ``` So the test was checking to see whether the tma_dram_bound metric could be computed on your Sandybridge and it failed. The event counts below show that every event came back "<not counted>" which is usually indicative of a permissions problem - it is also not surprising given this that the metric wasn't computed. You could try repeating the command the test is trying with something like "perf stat -M tma_dram_bound -a sleep 1", but running as root should have resolved that issue. Does that give you enough to keep exploring? Thanks, Ian > Last perf version which was OK: > > ~/bin/perf -v > perf version 6.0.0 > > echo "linux-perf: Adjust limited access to performance monitoring and > observability operations" > echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > /proc/sys/kernel/perf_event_paranoid > 0 > > ~/bin/perf test 10 86 92 93 94 95 > 10: PMU events : > 10.1: PMU event table sanity : Ok > 10.2: PMU event map aliases : Ok > 10.3: Parsing of PMU event table metrics : Ok > 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > 86: perf record tests : Ok > 92: perf stat tests : Ok > 93: perf all metricgroups test : Ok > 94: perf all metrics test : Ok > 95: perf all PMU test : Ok > > echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > /proc/sys/kernel/perf_event_paranoid > echo "linux-perf: Reset limited access to performance monitoring and > observability operations" > > If you need further information, please let me know. > > Thanks. > > Regards, > -Sedat- > > P.S. Instructions > > [ REPRODUCER ] > > LLVM_MVER="16" > > # Debian LLVM > ##LLVM_TOOLCHAIN_PATH="/usr/lib/llvm-${LLVM_MVER}/bin" > # Selfmade LLVM > LLVM_TOOLCHAIN_PATH="/opt/llvm/bin" > if [ -d ${LLVM_TOOLCHAIN_PATH} ]; then > export PATH="${LLVM_TOOLCHAIN_PATH}:${PATH}" > fi > > PYTHON_VER="3.11" > MAKE="make" > MAKE_OPTS="V=1 -j1 HOSTCC=clang-$LLVM_MVER HOSTLD=ld.lld > HOSTAR=llvm-ar CC=clang-$LLVM_MVER LD=ld.lld AR=llvm-ar > STRIP=llvm-strip" > > echo "LLVM MVER ........ $LLVM_MVER" > echo "Path settings .... $PATH" > echo "Python version ... $PYTHON_VER" > echo "make line ........ $MAKE $MAKE_OPTS" > > LANG=C LC_ALL=C make -C tools/perf clean 2>&1 | tee ../make-log_perf-clean.txt > > LANG=C LC_ALL=C $MAKE $MAKE_OPTS -C tools/perf > PYTHON=python${PYTHON_VER} install-bin 2>&1 | tee > ../make-log_perf-install_bin_python${PYTHON_VER}_llvm${LLVM_MVER}.txt > > > [ TESTS ] > > [ TESTS - START ] > > echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > /proc/sys/kernel/perf_event_paranoid > > [ TESTS - DEBIAN ] > > /usr/bin/perf -v > perf version 6.1.7 > > /usr/bin/perf test 10 92 98 99 100 101 > > 10: PMU events : > 10.1: PMU event table sanity : Ok > 10.2: PMU event map aliases : Ok > 10.3: Parsing of PMU event table metrics : Ok > 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > 92: perf record tests : Ok > 98: perf stat tests : Ok > 99: perf all metricgroups test : Ok > 100: perf all metrics test : FAILED! > 101: perf all PMU test : Ok > > [ TESTS - DILEKS ] > > ~/bin/perf -v > perf version 6.2.0-rc5 > > ~/bin/perf test 7 87 93 94 95 96 > > 7: PMU events : > 7.1: PMU event table sanity : Ok > 7.2: PMU event map aliases : Ok > 7.3: Parsing of PMU event table metrics : Ok > 7.4: Parsing of PMU event table metrics with fake PMUs : Ok > 87: perf record tests : Ok > 93: perf stat tests : Ok > 94: perf all metricgroups test : Ok > 95: perf all metrics test : FAILED! > 96: perf all PMU test : Ok > > [ TESTS - FAILED ] > > /usr/bin/perf test --verbose 100 2>&1 | tee > perf-test-verbose-100-perf-all-metrics-test_debian-perf-6-1-7.txt > > ~/bin/perf test --verbose 95 2>&1 | tee > perf-test-verbose-95-perf-all-metrics-test_dileks-perf-6-2-rc5.txt > > [ TESTS - STOP ] > > echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > /proc/sys/kernel/perf_event_paranoid > > - EOT - ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED! 2023-01-29 23:21 ` Ian Rogers @ 2023-01-30 2:24 ` Sedat Dilek 2023-01-30 10:04 ` James Clark 0 siblings, 1 reply; 13+ messages in thread From: Sedat Dilek @ 2023-01-30 2:24 UTC (permalink / raw) To: Ian Rogers Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings ? On Mon, Jan 30, 2023 at 12:21 AM Ian Rogers <irogers@google.com> wrote: > > On Sun, Jan 29, 2023 at 1:59 AM Sedat Dilek <sedat.dilek@gmail.com> wrote: > > > > [ CC LLVM linux folks + Ben from Debian kernel team ] > > > > Hi, > > > > I am playing with LLVM version 16.0.0-rc1 which was released yesterday and PERF. > > > > After building my selfmade LLVM toolchain, I built perf and run some > > perf tests here on my Intel SandyBridge CPU (details see below). > > > > perf all metrics test: FAILED! > > > > ...with both Debian's perf version 6.1.7 and my selfmade version 6.2-rc5. > > > > Just noticed: > > > > Couldn't bump rlimit(MEMLOCK), failures may take place when creating > > BPF maps, etc > > > > Run the below tests with `sudo` - made this go away - still FAILED. > > > > But maybe I am missing to activate some sysfs/debug or whatever other stuff? > > Hi Sedat, > > things have been improving wrt metrics and so this failure may have > just been because of the addition of a previously missing metric. The > rlimit thing shouldn't affect things but maybe file descriptors? > Looking at the test output the issue is: > > ``` > Metric 'tma_dram_bound' not printed in: > # Running 'internals/synthesize' benchmark: > Computing performance of single threaded perf event synthesis by > synthesizing events on the perf process itself: > Average synthesis took: 207.680 usec (+- 0.176 usec) > Average num. events: 30.000 (+- 0.000) > Average time per event 6.923 usec > Average data synthesis took: 217.833 usec (+- 0.202 usec) > Average num. events: 161.000 (+- 0.000) > Average time per event 1.353 usec > > Performance counter stats for 'perf bench internals synthesize': > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > (0,00%) > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > (0,00%) > <not counted> CPU_CLK_UNHALTED.THREAD > (0,00%) > <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > (0,00%) > ``` > > So the test was checking to see whether the tma_dram_bound metric > could be computed on your Sandybridge and it failed. The event counts > below show that every event came back "<not counted>" which is usually > indicative of a permissions problem - it is also not surprising given > this that the metric wasn't computed. You could try repeating the > command the test is trying with something like "perf stat -M > tma_dram_bound -a sleep 1", but running as root should have resolved > that issue. Does that give you enough to keep exploring? > Hi Ian, Thanks for your feedback! I booted into my Debian kernel - just to see what happens. # cat /proc/version Linux version 6.1.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-12 (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 SMP PREEMPT_DYNAMIC Debian 6.1.7-1 (2023-01-18) All things run as root... # echo 0 | tee /proc/sys/kernel/kptr_restrict /proc/sys/kernel/perf_event_paranoid 0 # /usr/bin/perf test 10 92 98 99 100 101 10: PMU events : 10.1: PMU event table sanity : Ok 10.2: PMU event map aliases : Ok 10.3: Parsing of PMU event table metrics : Ok 10.4: Parsing of PMU event table metrics with fake PMUs : Ok 92: perf record tests : Ok 98: perf stat tests : Ok 99: perf all metricgroups test : Ok 100: perf all metrics test : FAILED! 101: perf all PMU test : Ok # perf stat -M tma_dram_bound -a sleep 1 Performance counter stats for 'system wide': <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT (0,00%) <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING (0,00%) <not counted> CPU_CLK_UNHALTED.THREAD (0,00%) <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS (0,00%) 1,002148600 seconds time elapsed Hmm... looking at... Metric 'tma_l3_bound' ... Running... # perf stat --verbose -M tma_l3_bound -a sleep 1 Using CPUID GenuineIntel-6-2A-7 metric expr (MEM_LOAD_UOPS_RETIRED.LLC_HIT / (MEM_LOAD_UOPS_RETIRED.LLC_HIT + 7 * MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS)) * CYCLE_ACTIVITY.STALLS_L2_PENDING / CLKS for tma_l3_bound metric expr CPU_CLK_UNHALTED.THREAD for CLKS found event MEM_LOAD_UOPS_RETIRED.LLC_HIT found event CYCLE_ACTIVITY.STALLS_L2_PENDING found event CPU_CLK_UNHALTED.THREAD found event MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS Parsing metric events '{MEM_LOAD_UOPS_RETIRED.LLC_HIT/metric-id=MEM_LOAD_UOPS_RETIRED.LLC_HIT/,CYCLE_ACTIVITY.STALLS_L2_PENDING/metric-id=CYCLE_ACTIVITY.STALLS_L2_PEND ING/,CPU_CLK_UNHALTED.THREAD/metric-id=CPU_CLK_UNHALTED.THREAD/,MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/metric-id=MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/}:W' MEM_LOAD_UOPS_RETIRED.LLC_HIT -> cpu/event=0xd1,period=0xc365,umask=0x4/ CYCLE_ACTIVITY.STALLS_L2_PENDING -> cpu/event=0xa3,cmask=0x5,period=0x1e8483,umask=0x5/ CPU_CLK_UNHALTED.THREAD -> cpu/event=0x3c,period=0x1e8483/ MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS -> cpu/event=0xd4,period=0x186a7,umask=0x2/ Control descriptor is not initialized MEM_LOAD_UOPS_RETIRED.LLC_HIT: 0 4007421228 0 CYCLE_ACTIVITY.STALLS_L2_PENDING: 0 4007421228 0 CPU_CLK_UNHALTED.THREAD: 0 4007421228 0 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS: 0 4007421228 0 Performance counter stats for 'system wide': <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT (0,00%) <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING (0,00%) <not counted> CPU_CLK_UNHALTED.THREAD (0,00%) <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS (0,00%) 1,002310013 seconds time elapsed So those events/metric-ids resulting in "<not counted>" are all found. What means "Control descriptor is not initialized"? To summarize: Those two tests in "100: perf all metrics test" FAILED: 1. tma_dram_bound 2. tma_l3_bound Best regards, -Sedat- > Thanks, > Ian > > > Last perf version which was OK: > > > > ~/bin/perf -v > > perf version 6.0.0 > > > > echo "linux-perf: Adjust limited access to performance monitoring and > > observability operations" > > echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > > /proc/sys/kernel/perf_event_paranoid > > 0 > > > > ~/bin/perf test 10 86 92 93 94 95 > > 10: PMU events : > > 10.1: PMU event table sanity : Ok > > 10.2: PMU event map aliases : Ok > > 10.3: Parsing of PMU event table metrics : Ok > > 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > 86: perf record tests : Ok > > 92: perf stat tests : Ok > > 93: perf all metricgroups test : Ok > > 94: perf all metrics test : Ok > > 95: perf all PMU test : Ok > > > > echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > > /proc/sys/kernel/perf_event_paranoid > > echo "linux-perf: Reset limited access to performance monitoring and > > observability operations" > > > > If you need further information, please let me know. > > > > Thanks. > > > > Regards, > > -Sedat- > > > > P.S. Instructions > > > > [ REPRODUCER ] > > > > LLVM_MVER="16" > > > > # Debian LLVM > > ##LLVM_TOOLCHAIN_PATH="/usr/lib/llvm-${LLVM_MVER}/bin" > > # Selfmade LLVM > > LLVM_TOOLCHAIN_PATH="/opt/llvm/bin" > > if [ -d ${LLVM_TOOLCHAIN_PATH} ]; then > > export PATH="${LLVM_TOOLCHAIN_PATH}:${PATH}" > > fi > > > > PYTHON_VER="3.11" > > MAKE="make" > > MAKE_OPTS="V=1 -j1 HOSTCC=clang-$LLVM_MVER HOSTLD=ld.lld > > HOSTAR=llvm-ar CC=clang-$LLVM_MVER LD=ld.lld AR=llvm-ar > > STRIP=llvm-strip" > > > > echo "LLVM MVER ........ $LLVM_MVER" > > echo "Path settings .... $PATH" > > echo "Python version ... $PYTHON_VER" > > echo "make line ........ $MAKE $MAKE_OPTS" > > > > LANG=C LC_ALL=C make -C tools/perf clean 2>&1 | tee ../make-log_perf-clean.txt > > > > LANG=C LC_ALL=C $MAKE $MAKE_OPTS -C tools/perf > > PYTHON=python${PYTHON_VER} install-bin 2>&1 | tee > > ../make-log_perf-install_bin_python${PYTHON_VER}_llvm${LLVM_MVER}.txt > > > > > > [ TESTS ] > > > > [ TESTS - START ] > > > > echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > > /proc/sys/kernel/perf_event_paranoid > > > > [ TESTS - DEBIAN ] > > > > /usr/bin/perf -v > > perf version 6.1.7 > > > > /usr/bin/perf test 10 92 98 99 100 101 > > > > 10: PMU events : > > 10.1: PMU event table sanity : Ok > > 10.2: PMU event map aliases : Ok > > 10.3: Parsing of PMU event table metrics : Ok > > 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > 92: perf record tests : Ok > > 98: perf stat tests : Ok > > 99: perf all metricgroups test : Ok > > 100: perf all metrics test : FAILED! > > 101: perf all PMU test : Ok > > > > [ TESTS - DILEKS ] > > > > ~/bin/perf -v > > perf version 6.2.0-rc5 > > > > ~/bin/perf test 7 87 93 94 95 96 > > > > 7: PMU events : > > 7.1: PMU event table sanity : Ok > > 7.2: PMU event map aliases : Ok > > 7.3: Parsing of PMU event table metrics : Ok > > 7.4: Parsing of PMU event table metrics with fake PMUs : Ok > > 87: perf record tests : Ok > > 93: perf stat tests : Ok > > 94: perf all metricgroups test : Ok > > 95: perf all metrics test : FAILED! > > 96: perf all PMU test : Ok > > > > [ TESTS - FAILED ] > > > > /usr/bin/perf test --verbose 100 2>&1 | tee > > perf-test-verbose-100-perf-all-metrics-test_debian-perf-6-1-7.txt > > > > ~/bin/perf test --verbose 95 2>&1 | tee > > perf-test-verbose-95-perf-all-metrics-test_dileks-perf-6-2-rc5.txt > > > > [ TESTS - STOP ] > > > > echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > > /proc/sys/kernel/perf_event_paranoid > > > > - EOT - ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED! 2023-01-30 2:24 ` Sedat Dilek @ 2023-01-30 10:04 ` James Clark 2023-01-31 0:20 ` Ian Rogers 0 siblings, 1 reply; 13+ messages in thread From: James Clark @ 2023-01-30 10:04 UTC (permalink / raw) To: sedat.dilek, Ian Rogers Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings On 30/01/2023 02:24, Sedat Dilek wrote: > ? > > On Mon, Jan 30, 2023 at 12:21 AM Ian Rogers <irogers@google.com> wrote: >> >> On Sun, Jan 29, 2023 at 1:59 AM Sedat Dilek <sedat.dilek@gmail.com> wrote: >>> >>> [ CC LLVM linux folks + Ben from Debian kernel team ] >>> >>> Hi, >>> >>> I am playing with LLVM version 16.0.0-rc1 which was released yesterday and PERF. >>> >>> After building my selfmade LLVM toolchain, I built perf and run some >>> perf tests here on my Intel SandyBridge CPU (details see below). >>> >>> perf all metrics test: FAILED! >>> >>> ...with both Debian's perf version 6.1.7 and my selfmade version 6.2-rc5. >>> >>> Just noticed: >>> >>> Couldn't bump rlimit(MEMLOCK), failures may take place when creating >>> BPF maps, etc >>> >>> Run the below tests with `sudo` - made this go away - still FAILED. >>> >>> But maybe I am missing to activate some sysfs/debug or whatever other stuff? >> >> Hi Sedat, >> >> things have been improving wrt metrics and so this failure may have >> just been because of the addition of a previously missing metric. The >> rlimit thing shouldn't affect things but maybe file descriptors? >> Looking at the test output the issue is: >> >> ``` >> Metric 'tma_dram_bound' not printed in: >> # Running 'internals/synthesize' benchmark: >> Computing performance of single threaded perf event synthesis by >> synthesizing events on the perf process itself: >> Average synthesis took: 207.680 usec (+- 0.176 usec) >> Average num. events: 30.000 (+- 0.000) >> Average time per event 6.923 usec >> Average data synthesis took: 217.833 usec (+- 0.202 usec) >> Average num. events: 161.000 (+- 0.000) >> Average time per event 1.353 usec >> >> Performance counter stats for 'perf bench internals synthesize': >> >> <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT >> (0,00%) >> <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING >> (0,00%) >> <not counted> CPU_CLK_UNHALTED.THREAD >> (0,00%) >> <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS >> (0,00%) >> ``` >> >> So the test was checking to see whether the tma_dram_bound metric >> could be computed on your Sandybridge and it failed. The event counts >> below show that every event came back "<not counted>" which is usually >> indicative of a permissions problem - it is also not surprising given >> this that the metric wasn't computed. You could try repeating the >> command the test is trying with something like "perf stat -M >> tma_dram_bound -a sleep 1", but running as root should have resolved >> that issue. Does that give you enough to keep exploring? >> > > Hi Ian, > > Thanks for your feedback! > > I booted into my Debian kernel - just to see what happens. > > # cat /proc/version > Linux version 6.1.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-12 > (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 > SMP PREEMPT_DYNAMIC Debian 6.1.7-1 (2023-01-18) > > All things run as root... > > # echo 0 | tee /proc/sys/kernel/kptr_restrict > /proc/sys/kernel/perf_event_paranoid > 0 > > # /usr/bin/perf test 10 92 98 99 100 101 > 10: PMU events : > 10.1: PMU event table sanity : Ok > 10.2: PMU event map aliases : Ok > 10.3: Parsing of PMU event table metrics : Ok > 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > 92: perf record tests : Ok > 98: perf stat tests : Ok > 99: perf all metricgroups test : Ok > 100: perf all metrics test : FAILED! > 101: perf all PMU test : Ok > > # perf stat -M tma_dram_bound -a sleep 1 > > Performance counter stats for 'system wide': > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > (0,00%) > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > (0,00%) > <not counted> CPU_CLK_UNHALTED.THREAD > (0,00%) > <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > (0,00%) > Hi Sedat, I also had this failure and did a git bisect, but it led me to the conclusion that it is a stale build issue rather than a regression. There was a recent commit that renamed/removed some json PMU files which the build system can't cope with. I think the tests end up iterating over a different set of event names than were generated by the build system. If you do a clean build the issue should go away. I don't know if there is anything more we can do to stop this from happening. James > 1,002148600 seconds time elapsed > > Hmm... looking at... Metric 'tma_l3_bound' ... > > Running... > > # perf stat --verbose -M tma_l3_bound -a sleep 1 > Using CPUID GenuineIntel-6-2A-7 > metric expr (MEM_LOAD_UOPS_RETIRED.LLC_HIT / > (MEM_LOAD_UOPS_RETIRED.LLC_HIT + 7 * > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS)) * > CYCLE_ACTIVITY.STALLS_L2_PENDING / CLKS for tma_l3_bound > metric expr CPU_CLK_UNHALTED.THREAD for CLKS > > found event MEM_LOAD_UOPS_RETIRED.LLC_HIT > found event CYCLE_ACTIVITY.STALLS_L2_PENDING > found event CPU_CLK_UNHALTED.THREAD > found event MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > Parsing metric events > '{MEM_LOAD_UOPS_RETIRED.LLC_HIT/metric-id=MEM_LOAD_UOPS_RETIRED.LLC_HIT/,CYCLE_ACTIVITY.STALLS_L2_PENDING/metric-id=CYCLE_ACTIVITY.STALLS_L2_PEND > ING/,CPU_CLK_UNHALTED.THREAD/metric-id=CPU_CLK_UNHALTED.THREAD/,MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/metric-id=MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/}:W' > MEM_LOAD_UOPS_RETIRED.LLC_HIT -> cpu/event=0xd1,period=0xc365,umask=0x4/ > CYCLE_ACTIVITY.STALLS_L2_PENDING -> > cpu/event=0xa3,cmask=0x5,period=0x1e8483,umask=0x5/ > CPU_CLK_UNHALTED.THREAD -> cpu/event=0x3c,period=0x1e8483/ > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS -> cpu/event=0xd4,period=0x186a7,umask=0x2/ > > Control descriptor is not initialized > > MEM_LOAD_UOPS_RETIRED.LLC_HIT: 0 4007421228 0 > CYCLE_ACTIVITY.STALLS_L2_PENDING: 0 4007421228 0 > CPU_CLK_UNHALTED.THREAD: 0 4007421228 0 > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS: 0 4007421228 0 > > Performance counter stats for 'system wide': > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > (0,00%) > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > (0,00%) > <not counted> CPU_CLK_UNHALTED.THREAD > (0,00%) > <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > (0,00%) > > 1,002310013 seconds time elapsed > > So those events/metric-ids resulting in "<not counted>" are all found. > > What means "Control descriptor is not initialized"? > > To summarize: > > Those two tests in "100: perf all metrics test" FAILED: > > 1. tma_dram_bound > 2. tma_l3_bound > > Best regards, > -Sedat- > >> Thanks, >> Ian >> >>> Last perf version which was OK: >>> >>> ~/bin/perf -v >>> perf version 6.0.0 >>> >>> echo "linux-perf: Adjust limited access to performance monitoring and >>> observability operations" >>> echo 0 | sudo tee /proc/sys/kernel/kptr_restrict >>> /proc/sys/kernel/perf_event_paranoid >>> 0 >>> >>> ~/bin/perf test 10 86 92 93 94 95 >>> 10: PMU events : >>> 10.1: PMU event table sanity : Ok >>> 10.2: PMU event map aliases : Ok >>> 10.3: Parsing of PMU event table metrics : Ok >>> 10.4: Parsing of PMU event table metrics with fake PMUs : Ok >>> 86: perf record tests : Ok >>> 92: perf stat tests : Ok >>> 93: perf all metricgroups test : Ok >>> 94: perf all metrics test : Ok >>> 95: perf all PMU test : Ok >>> >>> echo 1 | sudo tee /proc/sys/kernel/kptr_restrict >>> /proc/sys/kernel/perf_event_paranoid >>> echo "linux-perf: Reset limited access to performance monitoring and >>> observability operations" >>> >>> If you need further information, please let me know. >>> >>> Thanks. >>> >>> Regards, >>> -Sedat- >>> >>> P.S. Instructions >>> >>> [ REPRODUCER ] >>> >>> LLVM_MVER="16" >>> >>> # Debian LLVM >>> ##LLVM_TOOLCHAIN_PATH="/usr/lib/llvm-${LLVM_MVER}/bin" >>> # Selfmade LLVM >>> LLVM_TOOLCHAIN_PATH="/opt/llvm/bin" >>> if [ -d ${LLVM_TOOLCHAIN_PATH} ]; then >>> export PATH="${LLVM_TOOLCHAIN_PATH}:${PATH}" >>> fi >>> >>> PYTHON_VER="3.11" >>> MAKE="make" >>> MAKE_OPTS="V=1 -j1 HOSTCC=clang-$LLVM_MVER HOSTLD=ld.lld >>> HOSTAR=llvm-ar CC=clang-$LLVM_MVER LD=ld.lld AR=llvm-ar >>> STRIP=llvm-strip" >>> >>> echo "LLVM MVER ........ $LLVM_MVER" >>> echo "Path settings .... $PATH" >>> echo "Python version ... $PYTHON_VER" >>> echo "make line ........ $MAKE $MAKE_OPTS" >>> >>> LANG=C LC_ALL=C make -C tools/perf clean 2>&1 | tee ../make-log_perf-clean.txt >>> >>> LANG=C LC_ALL=C $MAKE $MAKE_OPTS -C tools/perf >>> PYTHON=python${PYTHON_VER} install-bin 2>&1 | tee >>> ../make-log_perf-install_bin_python${PYTHON_VER}_llvm${LLVM_MVER}.txt >>> >>> >>> [ TESTS ] >>> >>> [ TESTS - START ] >>> >>> echo 0 | sudo tee /proc/sys/kernel/kptr_restrict >>> /proc/sys/kernel/perf_event_paranoid >>> >>> [ TESTS - DEBIAN ] >>> >>> /usr/bin/perf -v >>> perf version 6.1.7 >>> >>> /usr/bin/perf test 10 92 98 99 100 101 >>> >>> 10: PMU events : >>> 10.1: PMU event table sanity : Ok >>> 10.2: PMU event map aliases : Ok >>> 10.3: Parsing of PMU event table metrics : Ok >>> 10.4: Parsing of PMU event table metrics with fake PMUs : Ok >>> 92: perf record tests : Ok >>> 98: perf stat tests : Ok >>> 99: perf all metricgroups test : Ok >>> 100: perf all metrics test : FAILED! >>> 101: perf all PMU test : Ok >>> >>> [ TESTS - DILEKS ] >>> >>> ~/bin/perf -v >>> perf version 6.2.0-rc5 >>> >>> ~/bin/perf test 7 87 93 94 95 96 >>> >>> 7: PMU events : >>> 7.1: PMU event table sanity : Ok >>> 7.2: PMU event map aliases : Ok >>> 7.3: Parsing of PMU event table metrics : Ok >>> 7.4: Parsing of PMU event table metrics with fake PMUs : Ok >>> 87: perf record tests : Ok >>> 93: perf stat tests : Ok >>> 94: perf all metricgroups test : Ok >>> 95: perf all metrics test : FAILED! >>> 96: perf all PMU test : Ok >>> >>> [ TESTS - FAILED ] >>> >>> /usr/bin/perf test --verbose 100 2>&1 | tee >>> perf-test-verbose-100-perf-all-metrics-test_debian-perf-6-1-7.txt >>> >>> ~/bin/perf test --verbose 95 2>&1 | tee >>> perf-test-verbose-95-perf-all-metrics-test_dileks-perf-6-2-rc5.txt >>> >>> [ TESTS - STOP ] >>> >>> echo 1 | sudo tee /proc/sys/kernel/kptr_restrict >>> /proc/sys/kernel/perf_event_paranoid >>> >>> - EOT - ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED! 2023-01-30 10:04 ` James Clark @ 2023-01-31 0:20 ` Ian Rogers 2023-01-31 3:45 ` Sedat Dilek 2023-02-01 6:51 ` Ravi Bangoria 0 siblings, 2 replies; 13+ messages in thread From: Ian Rogers @ 2023-01-31 0:20 UTC (permalink / raw) To: Liang, Kan, Xing, Zhengjun, sedat.dilek Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings, James Clark, Stephane Eranian On Mon, Jan 30, 2023 at 2:04 AM James Clark <james.clark@arm.com> wrote: > > > > On 30/01/2023 02:24, Sedat Dilek wrote: > > ? > > > > On Mon, Jan 30, 2023 at 12:21 AM Ian Rogers <irogers@google.com> wrote: > >> > >> On Sun, Jan 29, 2023 at 1:59 AM Sedat Dilek <sedat.dilek@gmail.com> wrote: > >>> > >>> [ CC LLVM linux folks + Ben from Debian kernel team ] > >>> > >>> Hi, > >>> > >>> I am playing with LLVM version 16.0.0-rc1 which was released yesterday and PERF. > >>> > >>> After building my selfmade LLVM toolchain, I built perf and run some > >>> perf tests here on my Intel SandyBridge CPU (details see below). > >>> > >>> perf all metrics test: FAILED! > >>> > >>> ...with both Debian's perf version 6.1.7 and my selfmade version 6.2-rc5. > >>> > >>> Just noticed: > >>> > >>> Couldn't bump rlimit(MEMLOCK), failures may take place when creating > >>> BPF maps, etc > >>> > >>> Run the below tests with `sudo` - made this go away - still FAILED. > >>> > >>> But maybe I am missing to activate some sysfs/debug or whatever other stuff? > >> > >> Hi Sedat, > >> > >> things have been improving wrt metrics and so this failure may have > >> just been because of the addition of a previously missing metric. The > >> rlimit thing shouldn't affect things but maybe file descriptors? > >> Looking at the test output the issue is: > >> > >> ``` > >> Metric 'tma_dram_bound' not printed in: > >> # Running 'internals/synthesize' benchmark: > >> Computing performance of single threaded perf event synthesis by > >> synthesizing events on the perf process itself: > >> Average synthesis took: 207.680 usec (+- 0.176 usec) > >> Average num. events: 30.000 (+- 0.000) > >> Average time per event 6.923 usec > >> Average data synthesis took: 217.833 usec (+- 0.202 usec) > >> Average num. events: 161.000 (+- 0.000) > >> Average time per event 1.353 usec > >> > >> Performance counter stats for 'perf bench internals synthesize': > >> > >> <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > >> (0,00%) > >> <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > >> (0,00%) > >> <not counted> CPU_CLK_UNHALTED.THREAD > >> (0,00%) > >> <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > >> (0,00%) > >> ``` > >> > >> So the test was checking to see whether the tma_dram_bound metric > >> could be computed on your Sandybridge and it failed. The event counts > >> below show that every event came back "<not counted>" which is usually > >> indicative of a permissions problem - it is also not surprising given > >> this that the metric wasn't computed. You could try repeating the > >> command the test is trying with something like "perf stat -M > >> tma_dram_bound -a sleep 1", but running as root should have resolved > >> that issue. Does that give you enough to keep exploring? > >> > > > > Hi Ian, > > > > Thanks for your feedback! > > > > I booted into my Debian kernel - just to see what happens. > > > > # cat /proc/version > > Linux version 6.1.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-12 > > (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 > > SMP PREEMPT_DYNAMIC Debian 6.1.7-1 (2023-01-18) > > > > All things run as root... > > > > # echo 0 | tee /proc/sys/kernel/kptr_restrict > > /proc/sys/kernel/perf_event_paranoid > > 0 > > > > # /usr/bin/perf test 10 92 98 99 100 101 > > 10: PMU events : > > 10.1: PMU event table sanity : Ok > > 10.2: PMU event map aliases : Ok > > 10.3: Parsing of PMU event table metrics : Ok > > 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > 92: perf record tests : Ok > > 98: perf stat tests : Ok > > 99: perf all metricgroups test : Ok > > 100: perf all metrics test : FAILED! > > 101: perf all PMU test : Ok > > > > # perf stat -M tma_dram_bound -a sleep 1 > > > > Performance counter stats for 'system wide': > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > (0,00%) > > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > (0,00%) > > <not counted> CPU_CLK_UNHALTED.THREAD > > (0,00%) > > <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > (0,00%) > > > > Hi Sedat, > > I also had this failure and did a git bisect, but it led me to the > conclusion that it is a stale build issue rather than a regression. > > There was a recent commit that renamed/removed some json PMU files which > the build system can't cope with. I think the tests end up iterating > over a different set of event names than were generated by the build system. > > If you do a clean build the issue should go away. I don't know if there > is anything more we can do to stop this from happening. > > James So I think this is a kernel bug triggering a perf tool bug. The kernel bug can be worked around in the perf tool. I only had an Ivybridge to test with (hence slightly different events) but what I see is both tma_dram_bound and tma_l3_bound using the same 4 events. I could work around the "<not counted>" by adding the --metric-no-group flag: ``` $ perf stat -M tma_l3_bound --metric-no-group -a sleep 1 Performance counter stats for 'system wide': 400,404 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 4.3 % tma_l3_bound (74.99%) 128,937,891 CYCLE_ACTIVITY.STALLS_L2_PENDING (87.46%) 167,459 MEM_LOAD_UOPS_RETIRED.LLC_MISS (74.99%) 759,574,967 CPU_CLK_UNHALTED.THREAD (87.47%) 1.001526438 seconds time elapsed $ perf stat -M tma_dram_bound -a --metric-no-group sleep 1 Performance counter stats for 'system wide': 259,954 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 15.2 % tma_dram_bound (74.99%) 118,807,043 CYCLE_ACTIVITY.STALLS_L2_PENDING (87.46%) 111,699 MEM_LOAD_UOPS_RETIRED.LLC_MISS (74.95%) 587,571,060 CPU_CLK_UNHALTED.THREAD (87.45%) 1.001518093 seconds time elapsed ``` The issue is that perf metrics use weak groups of events. A weak group is the same as a group of events initially. We want to use groups of events with metrics so that all the counters are scheduled in and out at the same time, and not multiplexed independently. Imagine measuring IPC but the counts for instructions and cycles are measured at different periods, the resultant IPC value would be unlikely to be accurate. If perf_event_open fails then the perf tool retries the events without the group. If I try just 3 of the events in a weak group then the failure can be seen: ``` $ perf stat -e "{MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING}:W" -a sleep 1 Performance counter stats for 'system wide': <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT (0.00%) <not counted> MEM_LOAD_UOPS_RETIRED.LLC_MISS (0.00%) <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING (0.00%) 1.001458485 seconds time elapsed ``` The kernel should have failed the perf_event_open on opening the third event and then measured without the group, which it can do with multiplexing as in the following: ``` $ perf stat -e "MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING" -a sleep 1 Performance counter stats for 'system wide': 1,239,397 MEM_LOAD_UOPS_RETIRED.LLC_HIT (79.06%) 174,826 MEM_LOAD_UOPS_RETIRED.LLC_MISS (64.60%) 124,026,024 CYCLE_ACTIVITY.STALLS_L2_PENDING (81.16%) 1.001483434 seconds time elapsed ``` When the --metric-no-group flag is given to perf then it doesn't produce the initial weak group, which works around the bug of the kernel not failing on the 3rd perf_event_open. I've added Kan and Zhengjun to the e-mail as they work on the Intel kernel PMU code. There's a question about what we should do in the perf test about this? I have a few solutions: 1) try metric tests again with the --metric-no-group flag and don't fail the test if this succeeds. This allows kernel bugs to hide, so I'm not a huge fan. 2) add a new metric flag/constraint to say not to group, this way the metric will automatically apply the "--metric-no-group" flag. It is a bit of work to wire this up but this kind of failure is common enough in PMUs that it is probably worthwhile. We also need to add the flag to metrics and I'm not sure how to get a good list of the metrics that currently fail and require it. This is okay but error prone. 3) fix the kernel bug and let the perf test fail until an adequate kernel is installed. Probably the best option. Thanks, Ian > > 1,002148600 seconds time elapsed > > > > Hmm... looking at... Metric 'tma_l3_bound' ... > > > > Running... > > > > # perf stat --verbose -M tma_l3_bound -a sleep 1 > > Using CPUID GenuineIntel-6-2A-7 > > metric expr (MEM_LOAD_UOPS_RETIRED.LLC_HIT / > > (MEM_LOAD_UOPS_RETIRED.LLC_HIT + 7 * > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS)) * > > CYCLE_ACTIVITY.STALLS_L2_PENDING / CLKS for tma_l3_bound > > metric expr CPU_CLK_UNHALTED.THREAD for CLKS > > > > found event MEM_LOAD_UOPS_RETIRED.LLC_HIT > > found event CYCLE_ACTIVITY.STALLS_L2_PENDING > > found event CPU_CLK_UNHALTED.THREAD > > found event MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > > Parsing metric events > > '{MEM_LOAD_UOPS_RETIRED.LLC_HIT/metric-id=MEM_LOAD_UOPS_RETIRED.LLC_HIT/,CYCLE_ACTIVITY.STALLS_L2_PENDING/metric-id=CYCLE_ACTIVITY.STALLS_L2_PEND > > ING/,CPU_CLK_UNHALTED.THREAD/metric-id=CPU_CLK_UNHALTED.THREAD/,MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/metric-id=MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/}:W' > > MEM_LOAD_UOPS_RETIRED.LLC_HIT -> cpu/event=0xd1,period=0xc365,umask=0x4/ > > CYCLE_ACTIVITY.STALLS_L2_PENDING -> > > cpu/event=0xa3,cmask=0x5,period=0x1e8483,umask=0x5/ > > CPU_CLK_UNHALTED.THREAD -> cpu/event=0x3c,period=0x1e8483/ > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS -> cpu/event=0xd4,period=0x186a7,umask=0x2/ > > > > Control descriptor is not initialized > > > > MEM_LOAD_UOPS_RETIRED.LLC_HIT: 0 4007421228 0 > > CYCLE_ACTIVITY.STALLS_L2_PENDING: 0 4007421228 0 > > CPU_CLK_UNHALTED.THREAD: 0 4007421228 0 > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS: 0 4007421228 0 > > > > Performance counter stats for 'system wide': > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > (0,00%) > > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > (0,00%) > > <not counted> CPU_CLK_UNHALTED.THREAD > > (0,00%) > > <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > (0,00%) > > > > 1,002310013 seconds time elapsed > > > > So those events/metric-ids resulting in "<not counted>" are all found. > > > > What means "Control descriptor is not initialized"? > > > > To summarize: > > > > Those two tests in "100: perf all metrics test" FAILED: > > > > 1. tma_dram_bound > > 2. tma_l3_bound > > > > Best regards, > > -Sedat- > > > >> Thanks, > >> Ian > >> > >>> Last perf version which was OK: > >>> > >>> ~/bin/perf -v > >>> perf version 6.0.0 > >>> > >>> echo "linux-perf: Adjust limited access to performance monitoring and > >>> observability operations" > >>> echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > >>> /proc/sys/kernel/perf_event_paranoid > >>> 0 > >>> > >>> ~/bin/perf test 10 86 92 93 94 95 > >>> 10: PMU events : > >>> 10.1: PMU event table sanity : Ok > >>> 10.2: PMU event map aliases : Ok > >>> 10.3: Parsing of PMU event table metrics : Ok > >>> 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > >>> 86: perf record tests : Ok > >>> 92: perf stat tests : Ok > >>> 93: perf all metricgroups test : Ok > >>> 94: perf all metrics test : Ok > >>> 95: perf all PMU test : Ok > >>> > >>> echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > >>> /proc/sys/kernel/perf_event_paranoid > >>> echo "linux-perf: Reset limited access to performance monitoring and > >>> observability operations" > >>> > >>> If you need further information, please let me know. > >>> > >>> Thanks. > >>> > >>> Regards, > >>> -Sedat- > >>> > >>> P.S. Instructions > >>> > >>> [ REPRODUCER ] > >>> > >>> LLVM_MVER="16" > >>> > >>> # Debian LLVM > >>> ##LLVM_TOOLCHAIN_PATH="/usr/lib/llvm-${LLVM_MVER}/bin" > >>> # Selfmade LLVM > >>> LLVM_TOOLCHAIN_PATH="/opt/llvm/bin" > >>> if [ -d ${LLVM_TOOLCHAIN_PATH} ]; then > >>> export PATH="${LLVM_TOOLCHAIN_PATH}:${PATH}" > >>> fi > >>> > >>> PYTHON_VER="3.11" > >>> MAKE="make" > >>> MAKE_OPTS="V=1 -j1 HOSTCC=clang-$LLVM_MVER HOSTLD=ld.lld > >>> HOSTAR=llvm-ar CC=clang-$LLVM_MVER LD=ld.lld AR=llvm-ar > >>> STRIP=llvm-strip" > >>> > >>> echo "LLVM MVER ........ $LLVM_MVER" > >>> echo "Path settings .... $PATH" > >>> echo "Python version ... $PYTHON_VER" > >>> echo "make line ........ $MAKE $MAKE_OPTS" > >>> > >>> LANG=C LC_ALL=C make -C tools/perf clean 2>&1 | tee ../make-log_perf-clean.txt > >>> > >>> LANG=C LC_ALL=C $MAKE $MAKE_OPTS -C tools/perf > >>> PYTHON=python${PYTHON_VER} install-bin 2>&1 | tee > >>> ../make-log_perf-install_bin_python${PYTHON_VER}_llvm${LLVM_MVER}.txt > >>> > >>> > >>> [ TESTS ] > >>> > >>> [ TESTS - START ] > >>> > >>> echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > >>> /proc/sys/kernel/perf_event_paranoid > >>> > >>> [ TESTS - DEBIAN ] > >>> > >>> /usr/bin/perf -v > >>> perf version 6.1.7 > >>> > >>> /usr/bin/perf test 10 92 98 99 100 101 > >>> > >>> 10: PMU events : > >>> 10.1: PMU event table sanity : Ok > >>> 10.2: PMU event map aliases : Ok > >>> 10.3: Parsing of PMU event table metrics : Ok > >>> 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > >>> 92: perf record tests : Ok > >>> 98: perf stat tests : Ok > >>> 99: perf all metricgroups test : Ok > >>> 100: perf all metrics test : FAILED! > >>> 101: perf all PMU test : Ok > >>> > >>> [ TESTS - DILEKS ] > >>> > >>> ~/bin/perf -v > >>> perf version 6.2.0-rc5 > >>> > >>> ~/bin/perf test 7 87 93 94 95 96 > >>> > >>> 7: PMU events : > >>> 7.1: PMU event table sanity : Ok > >>> 7.2: PMU event map aliases : Ok > >>> 7.3: Parsing of PMU event table metrics : Ok > >>> 7.4: Parsing of PMU event table metrics with fake PMUs : Ok > >>> 87: perf record tests : Ok > >>> 93: perf stat tests : Ok > >>> 94: perf all metricgroups test : Ok > >>> 95: perf all metrics test : FAILED! > >>> 96: perf all PMU test : Ok > >>> > >>> [ TESTS - FAILED ] > >>> > >>> /usr/bin/perf test --verbose 100 2>&1 | tee > >>> perf-test-verbose-100-perf-all-metrics-test_debian-perf-6-1-7.txt > >>> > >>> ~/bin/perf test --verbose 95 2>&1 | tee > >>> perf-test-verbose-95-perf-all-metrics-test_dileks-perf-6-2-rc5.txt > >>> > >>> [ TESTS - STOP ] > >>> > >>> echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > >>> /proc/sys/kernel/perf_event_paranoid > >>> > >>> - EOT - ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED! 2023-01-31 0:20 ` Ian Rogers @ 2023-01-31 3:45 ` Sedat Dilek 2023-01-31 3:55 ` Ian Rogers 2023-02-01 6:51 ` Ravi Bangoria 1 sibling, 1 reply; 13+ messages in thread From: Sedat Dilek @ 2023-01-31 3:45 UTC (permalink / raw) To: Ian Rogers Cc: Liang, Kan, Xing, Zhengjun, Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings, James Clark, Stephane Eranian On Tue, Jan 31, 2023 at 1:20 AM Ian Rogers <irogers@google.com> wrote: > > On Mon, Jan 30, 2023 at 2:04 AM James Clark <james.clark@arm.com> wrote: > > > > > > > > On 30/01/2023 02:24, Sedat Dilek wrote: > > > ? > > > > > > On Mon, Jan 30, 2023 at 12:21 AM Ian Rogers <irogers@google.com> wrote: > > >> > > >> On Sun, Jan 29, 2023 at 1:59 AM Sedat Dilek <sedat.dilek@gmail.com> wrote: > > >>> > > >>> [ CC LLVM linux folks + Ben from Debian kernel team ] > > >>> > > >>> Hi, > > >>> > > >>> I am playing with LLVM version 16.0.0-rc1 which was released yesterday and PERF. > > >>> > > >>> After building my selfmade LLVM toolchain, I built perf and run some > > >>> perf tests here on my Intel SandyBridge CPU (details see below). > > >>> > > >>> perf all metrics test: FAILED! > > >>> > > >>> ...with both Debian's perf version 6.1.7 and my selfmade version 6.2-rc5. > > >>> > > >>> Just noticed: > > >>> > > >>> Couldn't bump rlimit(MEMLOCK), failures may take place when creating > > >>> BPF maps, etc > > >>> > > >>> Run the below tests with `sudo` - made this go away - still FAILED. > > >>> > > >>> But maybe I am missing to activate some sysfs/debug or whatever other stuff? > > >> > > >> Hi Sedat, > > >> > > >> things have been improving wrt metrics and so this failure may have > > >> just been because of the addition of a previously missing metric. The > > >> rlimit thing shouldn't affect things but maybe file descriptors? > > >> Looking at the test output the issue is: > > >> > > >> ``` > > >> Metric 'tma_dram_bound' not printed in: > > >> # Running 'internals/synthesize' benchmark: > > >> Computing performance of single threaded perf event synthesis by > > >> synthesizing events on the perf process itself: > > >> Average synthesis took: 207.680 usec (+- 0.176 usec) > > >> Average num. events: 30.000 (+- 0.000) > > >> Average time per event 6.923 usec > > >> Average data synthesis took: 217.833 usec (+- 0.202 usec) > > >> Average num. events: 161.000 (+- 0.000) > > >> Average time per event 1.353 usec > > >> > > >> Performance counter stats for 'perf bench internals synthesize': > > >> > > >> <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > >> (0,00%) > > >> <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > >> (0,00%) > > >> <not counted> CPU_CLK_UNHALTED.THREAD > > >> (0,00%) > > >> <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > >> (0,00%) > > >> ``` > > >> > > >> So the test was checking to see whether the tma_dram_bound metric > > >> could be computed on your Sandybridge and it failed. The event counts > > >> below show that every event came back "<not counted>" which is usually > > >> indicative of a permissions problem - it is also not surprising given > > >> this that the metric wasn't computed. You could try repeating the > > >> command the test is trying with something like "perf stat -M > > >> tma_dram_bound -a sleep 1", but running as root should have resolved > > >> that issue. Does that give you enough to keep exploring? > > >> > > > > > > Hi Ian, > > > > > > Thanks for your feedback! > > > > > > I booted into my Debian kernel - just to see what happens. > > > > > > # cat /proc/version > > > Linux version 6.1.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-12 > > > (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 > > > SMP PREEMPT_DYNAMIC Debian 6.1.7-1 (2023-01-18) > > > > > > All things run as root... > > > > > > # echo 0 | tee /proc/sys/kernel/kptr_restrict > > > /proc/sys/kernel/perf_event_paranoid > > > 0 > > > > > > # /usr/bin/perf test 10 92 98 99 100 101 > > > 10: PMU events : > > > 10.1: PMU event table sanity : Ok > > > 10.2: PMU event map aliases : Ok > > > 10.3: Parsing of PMU event table metrics : Ok > > > 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > > 92: perf record tests : Ok > > > 98: perf stat tests : Ok > > > 99: perf all metricgroups test : Ok > > > 100: perf all metrics test : FAILED! > > > 101: perf all PMU test : Ok > > > > > > # perf stat -M tma_dram_bound -a sleep 1 > > > > > > Performance counter stats for 'system wide': > > > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > (0,00%) > > > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > > (0,00%) > > > <not counted> CPU_CLK_UNHALTED.THREAD > > > (0,00%) > > > <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > (0,00%) > > > > > > > Hi Sedat, > > > > I also had this failure and did a git bisect, but it led me to the > > conclusion that it is a stale build issue rather than a regression. > > > > There was a recent commit that renamed/removed some json PMU files which > > the build system can't cope with. I think the tests end up iterating > > over a different set of event names than were generated by the build system. > > > > If you do a clean build the issue should go away. I don't know if there > > is anything more we can do to stop this from happening. > > > > James > > So I think this is a kernel bug triggering a perf tool bug. The kernel > bug can be worked around in the perf tool. I only had an Ivybridge to > test with (hence slightly different events) but what I see is both > tma_dram_bound and tma_l3_bound using the same 4 events. I could work > around the "<not counted>" by adding the --metric-no-group flag: > > ``` > $ perf stat -M tma_l3_bound --metric-no-group -a sleep 1 > > Performance counter stats for 'system wide': > > 400,404 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 4.3 % > tma_l3_bound (74.99%) > 128,937,891 CYCLE_ACTIVITY.STALLS_L2_PENDING > (87.46%) > 167,459 MEM_LOAD_UOPS_RETIRED.LLC_MISS > (74.99%) > 759,574,967 CPU_CLK_UNHALTED.THREAD > (87.47%) > > 1.001526438 seconds time elapsed > > $ perf stat -M tma_dram_bound -a --metric-no-group sleep 1 > > Performance counter stats for 'system wide': > > 259,954 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 15.2 % > tma_dram_bound (74.99%) > 118,807,043 CYCLE_ACTIVITY.STALLS_L2_PENDING > (87.46%) > 111,699 MEM_LOAD_UOPS_RETIRED.LLC_MISS > (74.95%) > 587,571,060 CPU_CLK_UNHALTED.THREAD > (87.45%) > > 1.001518093 seconds time elapsed > ``` > > The issue is that perf metrics use weak groups of events. A weak group > is the same as a group of events initially. We want to use groups of > events with metrics so that all the counters are scheduled in and out > at the same time, and not multiplexed independently. Imagine measuring > IPC but the counts for instructions and cycles are measured at > different periods, the resultant IPC value would be unlikely to be > accurate. If perf_event_open fails then the perf tool retries the > events without the group. If I try just 3 of the events in a weak > group then the failure can be seen: > > ``` > $ perf stat -e "{MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING}:W" > -a sleep 1 > > Performance counter stats for 'system wide': > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > (0.00%) > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_MISS > (0.00%) > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > (0.00%) > > 1.001458485 seconds time elapsed > ``` > > The kernel should have failed the perf_event_open on opening the third > event and then measured without the group, which it can do with > multiplexing as in the following: > > ``` > $ perf stat -e "MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING" > -a sleep 1 > > Performance counter stats for 'system wide': > > 1,239,397 MEM_LOAD_UOPS_RETIRED.LLC_HIT > (79.06%) > 174,826 MEM_LOAD_UOPS_RETIRED.LLC_MISS > (64.60%) > 124,026,024 CYCLE_ACTIVITY.STALLS_L2_PENDING > (81.16%) > > 1.001483434 seconds time elapsed > ``` > > When the --metric-no-group flag is given to perf then it doesn't > produce the initial weak group, which works around the bug of the > kernel not failing on the 3rd perf_event_open. I've added Kan and > Zhengjun to the e-mail as they work on the Intel kernel PMU code. > > There's a question about what we should do in the perf test about > this? I have a few solutions: > > 1) try metric tests again with the --metric-no-group flag and don't > fail the test if this succeeds. This allows kernel bugs to hide, so > I'm not a huge fan. > > 2) add a new metric flag/constraint to say not to group, this way the > metric will automatically apply the "--metric-no-group" flag. It is a > bit of work to wire this up but this kind of failure is common enough > in PMUs that it is probably worthwhile. We also need to add the flag > to metrics and I'm not sure how to get a good list of the metrics that > currently fail and require it. This is okay but error prone. > > 3) fix the kernel bug and let the perf test fail until an adequate > kernel is installed. Probably the best option. > Hi Ian, I can confirm: $ echo 0 | sudo tee /proc/sys/kernel/kptr_restrict /proc/sys/kernel/perf_event_paranoid 0 $ ~/bin/perf stat -M tma_l3_bound --metric-no-group -a sleep 1 Performance counter stats for 'system wide': 2.058.892 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,5 % tma_l3_bound (99,30%) 173.254.697 CYCLE_ACTIVITY.STALLS_L2_PENDING (99,10%) 2.396.130.501 CPU_CLK_UNHALTED.THREAD (99,60%) 1.110.486 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS (99,53%) 1,001989022 seconds time elapsed $ ~/bin/perf stat -M tma_dram_bound --metric-no-group -a sleep 1 Performance counter stats for 'system wide': 1.729.208 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,2 % tma_dram_bound (99,50%) 50.346.734 CYCLE_ACTIVITY.STALLS_L2_PENDING (99,50%) 2.354.963.862 CPU_CLK_UNHALTED.THREAD (99,80%) 306.500 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS (99,61%) 1,001981392 seconds time elapsed Thanks! BR, -Sedat- > Thanks, > Ian > > > > 1,002148600 seconds time elapsed > > > > > > Hmm... looking at... Metric 'tma_l3_bound' ... > > > > > > Running... > > > > > > # perf stat --verbose -M tma_l3_bound -a sleep 1 > > > Using CPUID GenuineIntel-6-2A-7 > > > metric expr (MEM_LOAD_UOPS_RETIRED.LLC_HIT / > > > (MEM_LOAD_UOPS_RETIRED.LLC_HIT + 7 * > > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS)) * > > > CYCLE_ACTIVITY.STALLS_L2_PENDING / CLKS for tma_l3_bound > > > metric expr CPU_CLK_UNHALTED.THREAD for CLKS > > > > > > found event MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > found event CYCLE_ACTIVITY.STALLS_L2_PENDING > > > found event CPU_CLK_UNHALTED.THREAD > > > found event MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > > > > Parsing metric events > > > '{MEM_LOAD_UOPS_RETIRED.LLC_HIT/metric-id=MEM_LOAD_UOPS_RETIRED.LLC_HIT/,CYCLE_ACTIVITY.STALLS_L2_PENDING/metric-id=CYCLE_ACTIVITY.STALLS_L2_PEND > > > ING/,CPU_CLK_UNHALTED.THREAD/metric-id=CPU_CLK_UNHALTED.THREAD/,MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/metric-id=MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/}:W' > > > MEM_LOAD_UOPS_RETIRED.LLC_HIT -> cpu/event=0xd1,period=0xc365,umask=0x4/ > > > CYCLE_ACTIVITY.STALLS_L2_PENDING -> > > > cpu/event=0xa3,cmask=0x5,period=0x1e8483,umask=0x5/ > > > CPU_CLK_UNHALTED.THREAD -> cpu/event=0x3c,period=0x1e8483/ > > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS -> cpu/event=0xd4,period=0x186a7,umask=0x2/ > > > > > > Control descriptor is not initialized > > > > > > MEM_LOAD_UOPS_RETIRED.LLC_HIT: 0 4007421228 0 > > > CYCLE_ACTIVITY.STALLS_L2_PENDING: 0 4007421228 0 > > > CPU_CLK_UNHALTED.THREAD: 0 4007421228 0 > > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS: 0 4007421228 0 > > > > > > Performance counter stats for 'system wide': > > > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > (0,00%) > > > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > > (0,00%) > > > <not counted> CPU_CLK_UNHALTED.THREAD > > > (0,00%) > > > <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > (0,00%) > > > > > > 1,002310013 seconds time elapsed > > > > > > So those events/metric-ids resulting in "<not counted>" are all found. > > > > > > What means "Control descriptor is not initialized"? > > > > > > To summarize: > > > > > > Those two tests in "100: perf all metrics test" FAILED: > > > > > > 1. tma_dram_bound > > > 2. tma_l3_bound > > > > > > Best regards, > > > -Sedat- > > > > > >> Thanks, > > >> Ian > > >> > > >>> Last perf version which was OK: > > >>> > > >>> ~/bin/perf -v > > >>> perf version 6.0.0 > > >>> > > >>> echo "linux-perf: Adjust limited access to performance monitoring and > > >>> observability operations" > > >>> echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > > >>> /proc/sys/kernel/perf_event_paranoid > > >>> 0 > > >>> > > >>> ~/bin/perf test 10 86 92 93 94 95 > > >>> 10: PMU events : > > >>> 10.1: PMU event table sanity : Ok > > >>> 10.2: PMU event map aliases : Ok > > >>> 10.3: Parsing of PMU event table metrics : Ok > > >>> 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > >>> 86: perf record tests : Ok > > >>> 92: perf stat tests : Ok > > >>> 93: perf all metricgroups test : Ok > > >>> 94: perf all metrics test : Ok > > >>> 95: perf all PMU test : Ok > > >>> > > >>> echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > > >>> /proc/sys/kernel/perf_event_paranoid > > >>> echo "linux-perf: Reset limited access to performance monitoring and > > >>> observability operations" > > >>> > > >>> If you need further information, please let me know. > > >>> > > >>> Thanks. > > >>> > > >>> Regards, > > >>> -Sedat- > > >>> > > >>> P.S. Instructions > > >>> > > >>> [ REPRODUCER ] > > >>> > > >>> LLVM_MVER="16" > > >>> > > >>> # Debian LLVM > > >>> ##LLVM_TOOLCHAIN_PATH="/usr/lib/llvm-${LLVM_MVER}/bin" > > >>> # Selfmade LLVM > > >>> LLVM_TOOLCHAIN_PATH="/opt/llvm/bin" > > >>> if [ -d ${LLVM_TOOLCHAIN_PATH} ]; then > > >>> export PATH="${LLVM_TOOLCHAIN_PATH}:${PATH}" > > >>> fi > > >>> > > >>> PYTHON_VER="3.11" > > >>> MAKE="make" > > >>> MAKE_OPTS="V=1 -j1 HOSTCC=clang-$LLVM_MVER HOSTLD=ld.lld > > >>> HOSTAR=llvm-ar CC=clang-$LLVM_MVER LD=ld.lld AR=llvm-ar > > >>> STRIP=llvm-strip" > > >>> > > >>> echo "LLVM MVER ........ $LLVM_MVER" > > >>> echo "Path settings .... $PATH" > > >>> echo "Python version ... $PYTHON_VER" > > >>> echo "make line ........ $MAKE $MAKE_OPTS" > > >>> > > >>> LANG=C LC_ALL=C make -C tools/perf clean 2>&1 | tee ../make-log_perf-clean.txt > > >>> > > >>> LANG=C LC_ALL=C $MAKE $MAKE_OPTS -C tools/perf > > >>> PYTHON=python${PYTHON_VER} install-bin 2>&1 | tee > > >>> ../make-log_perf-install_bin_python${PYTHON_VER}_llvm${LLVM_MVER}.txt > > >>> > > >>> > > >>> [ TESTS ] > > >>> > > >>> [ TESTS - START ] > > >>> > > >>> echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > > >>> /proc/sys/kernel/perf_event_paranoid > > >>> > > >>> [ TESTS - DEBIAN ] > > >>> > > >>> /usr/bin/perf -v > > >>> perf version 6.1.7 > > >>> > > >>> /usr/bin/perf test 10 92 98 99 100 101 > > >>> > > >>> 10: PMU events : > > >>> 10.1: PMU event table sanity : Ok > > >>> 10.2: PMU event map aliases : Ok > > >>> 10.3: Parsing of PMU event table metrics : Ok > > >>> 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > >>> 92: perf record tests : Ok > > >>> 98: perf stat tests : Ok > > >>> 99: perf all metricgroups test : Ok > > >>> 100: perf all metrics test : FAILED! > > >>> 101: perf all PMU test : Ok > > >>> > > >>> [ TESTS - DILEKS ] > > >>> > > >>> ~/bin/perf -v > > >>> perf version 6.2.0-rc5 > > >>> > > >>> ~/bin/perf test 7 87 93 94 95 96 > > >>> > > >>> 7: PMU events : > > >>> 7.1: PMU event table sanity : Ok > > >>> 7.2: PMU event map aliases : Ok > > >>> 7.3: Parsing of PMU event table metrics : Ok > > >>> 7.4: Parsing of PMU event table metrics with fake PMUs : Ok > > >>> 87: perf record tests : Ok > > >>> 93: perf stat tests : Ok > > >>> 94: perf all metricgroups test : Ok > > >>> 95: perf all metrics test : FAILED! > > >>> 96: perf all PMU test : Ok > > >>> > > >>> [ TESTS - FAILED ] > > >>> > > >>> /usr/bin/perf test --verbose 100 2>&1 | tee > > >>> perf-test-verbose-100-perf-all-metrics-test_debian-perf-6-1-7.txt > > >>> > > >>> ~/bin/perf test --verbose 95 2>&1 | tee > > >>> perf-test-verbose-95-perf-all-metrics-test_dileks-perf-6-2-rc5.txt > > >>> > > >>> [ TESTS - STOP ] > > >>> > > >>> echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > > >>> /proc/sys/kernel/perf_event_paranoid > > >>> > > >>> - EOT - ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED! 2023-01-31 3:45 ` Sedat Dilek @ 2023-01-31 3:55 ` Ian Rogers 2023-01-31 6:14 ` Sedat Dilek 2023-02-01 15:27 ` Liang, Kan 0 siblings, 2 replies; 13+ messages in thread From: Ian Rogers @ 2023-01-31 3:55 UTC (permalink / raw) To: sedat.dilek Cc: Liang, Kan, Xing, Zhengjun, Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings, James Clark, Stephane Eranian On Mon, Jan 30, 2023 at 7:45 PM Sedat Dilek <sedat.dilek@gmail.com> wrote: > > On Tue, Jan 31, 2023 at 1:20 AM Ian Rogers <irogers@google.com> wrote: > > > > On Mon, Jan 30, 2023 at 2:04 AM James Clark <james.clark@arm.com> wrote: > > > > > > > > > > > > On 30/01/2023 02:24, Sedat Dilek wrote: > > > > ? > > > > > > > > On Mon, Jan 30, 2023 at 12:21 AM Ian Rogers <irogers@google.com> wrote: > > > >> > > > >> On Sun, Jan 29, 2023 at 1:59 AM Sedat Dilek <sedat.dilek@gmail.com> wrote: > > > >>> > > > >>> [ CC LLVM linux folks + Ben from Debian kernel team ] > > > >>> > > > >>> Hi, > > > >>> > > > >>> I am playing with LLVM version 16.0.0-rc1 which was released yesterday and PERF. > > > >>> > > > >>> After building my selfmade LLVM toolchain, I built perf and run some > > > >>> perf tests here on my Intel SandyBridge CPU (details see below). > > > >>> > > > >>> perf all metrics test: FAILED! > > > >>> > > > >>> ...with both Debian's perf version 6.1.7 and my selfmade version 6.2-rc5. > > > >>> > > > >>> Just noticed: > > > >>> > > > >>> Couldn't bump rlimit(MEMLOCK), failures may take place when creating > > > >>> BPF maps, etc > > > >>> > > > >>> Run the below tests with `sudo` - made this go away - still FAILED. > > > >>> > > > >>> But maybe I am missing to activate some sysfs/debug or whatever other stuff? > > > >> > > > >> Hi Sedat, > > > >> > > > >> things have been improving wrt metrics and so this failure may have > > > >> just been because of the addition of a previously missing metric. The > > > >> rlimit thing shouldn't affect things but maybe file descriptors? > > > >> Looking at the test output the issue is: > > > >> > > > >> ``` > > > >> Metric 'tma_dram_bound' not printed in: > > > >> # Running 'internals/synthesize' benchmark: > > > >> Computing performance of single threaded perf event synthesis by > > > >> synthesizing events on the perf process itself: > > > >> Average synthesis took: 207.680 usec (+- 0.176 usec) > > > >> Average num. events: 30.000 (+- 0.000) > > > >> Average time per event 6.923 usec > > > >> Average data synthesis took: 217.833 usec (+- 0.202 usec) > > > >> Average num. events: 161.000 (+- 0.000) > > > >> Average time per event 1.353 usec > > > >> > > > >> Performance counter stats for 'perf bench internals synthesize': > > > >> > > > >> <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > >> (0,00%) > > > >> <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > > >> (0,00%) > > > >> <not counted> CPU_CLK_UNHALTED.THREAD > > > >> (0,00%) > > > >> <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > >> (0,00%) > > > >> ``` > > > >> > > > >> So the test was checking to see whether the tma_dram_bound metric > > > >> could be computed on your Sandybridge and it failed. The event counts > > > >> below show that every event came back "<not counted>" which is usually > > > >> indicative of a permissions problem - it is also not surprising given > > > >> this that the metric wasn't computed. You could try repeating the > > > >> command the test is trying with something like "perf stat -M > > > >> tma_dram_bound -a sleep 1", but running as root should have resolved > > > >> that issue. Does that give you enough to keep exploring? > > > >> > > > > > > > > Hi Ian, > > > > > > > > Thanks for your feedback! > > > > > > > > I booted into my Debian kernel - just to see what happens. > > > > > > > > # cat /proc/version > > > > Linux version 6.1.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-12 > > > > (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 > > > > SMP PREEMPT_DYNAMIC Debian 6.1.7-1 (2023-01-18) > > > > > > > > All things run as root... > > > > > > > > # echo 0 | tee /proc/sys/kernel/kptr_restrict > > > > /proc/sys/kernel/perf_event_paranoid > > > > 0 > > > > > > > > # /usr/bin/perf test 10 92 98 99 100 101 > > > > 10: PMU events : > > > > 10.1: PMU event table sanity : Ok > > > > 10.2: PMU event map aliases : Ok > > > > 10.3: Parsing of PMU event table metrics : Ok > > > > 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > > > 92: perf record tests : Ok > > > > 98: perf stat tests : Ok > > > > 99: perf all metricgroups test : Ok > > > > 100: perf all metrics test : FAILED! > > > > 101: perf all PMU test : Ok > > > > > > > > # perf stat -M tma_dram_bound -a sleep 1 > > > > > > > > Performance counter stats for 'system wide': > > > > > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > > (0,00%) > > > > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > (0,00%) > > > > <not counted> CPU_CLK_UNHALTED.THREAD > > > > (0,00%) > > > > <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > > (0,00%) > > > > > > > > > > Hi Sedat, > > > > > > I also had this failure and did a git bisect, but it led me to the > > > conclusion that it is a stale build issue rather than a regression. > > > > > > There was a recent commit that renamed/removed some json PMU files which > > > the build system can't cope with. I think the tests end up iterating > > > over a different set of event names than were generated by the build system. > > > > > > If you do a clean build the issue should go away. I don't know if there > > > is anything more we can do to stop this from happening. > > > > > > James > > > > So I think this is a kernel bug triggering a perf tool bug. The kernel > > bug can be worked around in the perf tool. I only had an Ivybridge to > > test with (hence slightly different events) but what I see is both > > tma_dram_bound and tma_l3_bound using the same 4 events. I could work > > around the "<not counted>" by adding the --metric-no-group flag: > > > > ``` > > $ perf stat -M tma_l3_bound --metric-no-group -a sleep 1 > > > > Performance counter stats for 'system wide': > > > > 400,404 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 4.3 % > > tma_l3_bound (74.99%) > > 128,937,891 CYCLE_ACTIVITY.STALLS_L2_PENDING > > (87.46%) > > 167,459 MEM_LOAD_UOPS_RETIRED.LLC_MISS > > (74.99%) > > 759,574,967 CPU_CLK_UNHALTED.THREAD > > (87.47%) > > > > 1.001526438 seconds time elapsed > > > > $ perf stat -M tma_dram_bound -a --metric-no-group sleep 1 > > > > Performance counter stats for 'system wide': > > > > 259,954 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 15.2 % > > tma_dram_bound (74.99%) > > 118,807,043 CYCLE_ACTIVITY.STALLS_L2_PENDING > > (87.46%) > > 111,699 MEM_LOAD_UOPS_RETIRED.LLC_MISS > > (74.95%) > > 587,571,060 CPU_CLK_UNHALTED.THREAD > > (87.45%) > > > > 1.001518093 seconds time elapsed > > ``` > > > > The issue is that perf metrics use weak groups of events. A weak group > > is the same as a group of events initially. We want to use groups of > > events with metrics so that all the counters are scheduled in and out > > at the same time, and not multiplexed independently. Imagine measuring > > IPC but the counts for instructions and cycles are measured at > > different periods, the resultant IPC value would be unlikely to be > > accurate. If perf_event_open fails then the perf tool retries the > > events without the group. If I try just 3 of the events in a weak > > group then the failure can be seen: > > > > ``` > > $ perf stat -e "{MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING}:W" > > -a sleep 1 > > > > Performance counter stats for 'system wide': > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > (0.00%) > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_MISS > > (0.00%) > > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > (0.00%) > > > > 1.001458485 seconds time elapsed > > ``` > > > > The kernel should have failed the perf_event_open on opening the third > > event and then measured without the group, which it can do with > > multiplexing as in the following: > > > > ``` > > $ perf stat -e "MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING" > > -a sleep 1 > > > > Performance counter stats for 'system wide': > > > > 1,239,397 MEM_LOAD_UOPS_RETIRED.LLC_HIT > > (79.06%) > > 174,826 MEM_LOAD_UOPS_RETIRED.LLC_MISS > > (64.60%) > > 124,026,024 CYCLE_ACTIVITY.STALLS_L2_PENDING > > (81.16%) > > > > 1.001483434 seconds time elapsed > > ``` > > > > When the --metric-no-group flag is given to perf then it doesn't > > produce the initial weak group, which works around the bug of the > > kernel not failing on the 3rd perf_event_open. I've added Kan and > > Zhengjun to the e-mail as they work on the Intel kernel PMU code. > > > > There's a question about what we should do in the perf test about > > this? I have a few solutions: > > > > 1) try metric tests again with the --metric-no-group flag and don't > > fail the test if this succeeds. This allows kernel bugs to hide, so > > I'm not a huge fan. > > > > 2) add a new metric flag/constraint to say not to group, this way the > > metric will automatically apply the "--metric-no-group" flag. It is a > > bit of work to wire this up but this kind of failure is common enough > > in PMUs that it is probably worthwhile. We also need to add the flag > > to metrics and I'm not sure how to get a good list of the metrics that > > currently fail and require it. This is okay but error prone. > > > > 3) fix the kernel bug and let the perf test fail until an adequate > > kernel is installed. Probably the best option. > > > > Hi Ian, > > I can confirm: > > $ echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > /proc/sys/kernel/perf_event_paranoid > 0 > > $ ~/bin/perf stat -M tma_l3_bound --metric-no-group -a sleep 1 > > Performance counter stats for 'system wide': > > 2.058.892 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,5 % > tma_l3_bound (99,30%) > 173.254.697 CYCLE_ACTIVITY.STALLS_L2_PENDING > (99,10%) > 2.396.130.501 CPU_CLK_UNHALTED.THREAD > (99,60%) > 1.110.486 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > (99,53%) > > 1,001989022 seconds time elapsed > > $ ~/bin/perf stat -M tma_dram_bound --metric-no-group -a sleep 1 > > Performance counter stats for 'system wide': > > 1.729.208 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,2 % > tma_dram_bound (99,50%) > 50.346.734 CYCLE_ACTIVITY.STALLS_L2_PENDING > (99,50%) > 2.354.963.862 CPU_CLK_UNHALTED.THREAD > (99,80%) > 306.500 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > (99,61%) > > 1,001981392 seconds time elapsed > > Thanks! Thanks, apparently it is an issue with SandyBridge/IvyBridge that some counters on one hyperthread will limit what can be on the other. I believe that's the comment related to EXCL access here: https://github.com/torvalds/linux/blob/master/arch/x86/events/intel/core.c#L124 So you may have more success with the metric if you disable hyperthreading, but I imagine that's not a popular option. Thanks, Ian > BR, > -Sedat- > > > Thanks, > > Ian > > > > > > 1,002148600 seconds time elapsed > > > > > > > > Hmm... looking at... Metric 'tma_l3_bound' ... > > > > > > > > Running... > > > > > > > > # perf stat --verbose -M tma_l3_bound -a sleep 1 > > > > Using CPUID GenuineIntel-6-2A-7 > > > > metric expr (MEM_LOAD_UOPS_RETIRED.LLC_HIT / > > > > (MEM_LOAD_UOPS_RETIRED.LLC_HIT + 7 * > > > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS)) * > > > > CYCLE_ACTIVITY.STALLS_L2_PENDING / CLKS for tma_l3_bound > > > > metric expr CPU_CLK_UNHALTED.THREAD for CLKS > > > > > > > > found event MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > > found event CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > found event CPU_CLK_UNHALTED.THREAD > > > > found event MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > > > > > > Parsing metric events > > > > '{MEM_LOAD_UOPS_RETIRED.LLC_HIT/metric-id=MEM_LOAD_UOPS_RETIRED.LLC_HIT/,CYCLE_ACTIVITY.STALLS_L2_PENDING/metric-id=CYCLE_ACTIVITY.STALLS_L2_PEND > > > > ING/,CPU_CLK_UNHALTED.THREAD/metric-id=CPU_CLK_UNHALTED.THREAD/,MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/metric-id=MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/}:W' > > > > MEM_LOAD_UOPS_RETIRED.LLC_HIT -> cpu/event=0xd1,period=0xc365,umask=0x4/ > > > > CYCLE_ACTIVITY.STALLS_L2_PENDING -> > > > > cpu/event=0xa3,cmask=0x5,period=0x1e8483,umask=0x5/ > > > > CPU_CLK_UNHALTED.THREAD -> cpu/event=0x3c,period=0x1e8483/ > > > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS -> cpu/event=0xd4,period=0x186a7,umask=0x2/ > > > > > > > > Control descriptor is not initialized > > > > > > > > MEM_LOAD_UOPS_RETIRED.LLC_HIT: 0 4007421228 0 > > > > CYCLE_ACTIVITY.STALLS_L2_PENDING: 0 4007421228 0 > > > > CPU_CLK_UNHALTED.THREAD: 0 4007421228 0 > > > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS: 0 4007421228 0 > > > > > > > > Performance counter stats for 'system wide': > > > > > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > > (0,00%) > > > > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > (0,00%) > > > > <not counted> CPU_CLK_UNHALTED.THREAD > > > > (0,00%) > > > > <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > > (0,00%) > > > > > > > > 1,002310013 seconds time elapsed > > > > > > > > So those events/metric-ids resulting in "<not counted>" are all found. > > > > > > > > What means "Control descriptor is not initialized"? > > > > > > > > To summarize: > > > > > > > > Those two tests in "100: perf all metrics test" FAILED: > > > > > > > > 1. tma_dram_bound > > > > 2. tma_l3_bound > > > > > > > > Best regards, > > > > -Sedat- > > > > > > > >> Thanks, > > > >> Ian > > > >> > > > >>> Last perf version which was OK: > > > >>> > > > >>> ~/bin/perf -v > > > >>> perf version 6.0.0 > > > >>> > > > >>> echo "linux-perf: Adjust limited access to performance monitoring and > > > >>> observability operations" > > > >>> echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > > > >>> /proc/sys/kernel/perf_event_paranoid > > > >>> 0 > > > >>> > > > >>> ~/bin/perf test 10 86 92 93 94 95 > > > >>> 10: PMU events : > > > >>> 10.1: PMU event table sanity : Ok > > > >>> 10.2: PMU event map aliases : Ok > > > >>> 10.3: Parsing of PMU event table metrics : Ok > > > >>> 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > > >>> 86: perf record tests : Ok > > > >>> 92: perf stat tests : Ok > > > >>> 93: perf all metricgroups test : Ok > > > >>> 94: perf all metrics test : Ok > > > >>> 95: perf all PMU test : Ok > > > >>> > > > >>> echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > > > >>> /proc/sys/kernel/perf_event_paranoid > > > >>> echo "linux-perf: Reset limited access to performance monitoring and > > > >>> observability operations" > > > >>> > > > >>> If you need further information, please let me know. > > > >>> > > > >>> Thanks. > > > >>> > > > >>> Regards, > > > >>> -Sedat- > > > >>> > > > >>> P.S. Instructions > > > >>> > > > >>> [ REPRODUCER ] > > > >>> > > > >>> LLVM_MVER="16" > > > >>> > > > >>> # Debian LLVM > > > >>> ##LLVM_TOOLCHAIN_PATH="/usr/lib/llvm-${LLVM_MVER}/bin" > > > >>> # Selfmade LLVM > > > >>> LLVM_TOOLCHAIN_PATH="/opt/llvm/bin" > > > >>> if [ -d ${LLVM_TOOLCHAIN_PATH} ]; then > > > >>> export PATH="${LLVM_TOOLCHAIN_PATH}:${PATH}" > > > >>> fi > > > >>> > > > >>> PYTHON_VER="3.11" > > > >>> MAKE="make" > > > >>> MAKE_OPTS="V=1 -j1 HOSTCC=clang-$LLVM_MVER HOSTLD=ld.lld > > > >>> HOSTAR=llvm-ar CC=clang-$LLVM_MVER LD=ld.lld AR=llvm-ar > > > >>> STRIP=llvm-strip" > > > >>> > > > >>> echo "LLVM MVER ........ $LLVM_MVER" > > > >>> echo "Path settings .... $PATH" > > > >>> echo "Python version ... $PYTHON_VER" > > > >>> echo "make line ........ $MAKE $MAKE_OPTS" > > > >>> > > > >>> LANG=C LC_ALL=C make -C tools/perf clean 2>&1 | tee ../make-log_perf-clean.txt > > > >>> > > > >>> LANG=C LC_ALL=C $MAKE $MAKE_OPTS -C tools/perf > > > >>> PYTHON=python${PYTHON_VER} install-bin 2>&1 | tee > > > >>> ../make-log_perf-install_bin_python${PYTHON_VER}_llvm${LLVM_MVER}.txt > > > >>> > > > >>> > > > >>> [ TESTS ] > > > >>> > > > >>> [ TESTS - START ] > > > >>> > > > >>> echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > > > >>> /proc/sys/kernel/perf_event_paranoid > > > >>> > > > >>> [ TESTS - DEBIAN ] > > > >>> > > > >>> /usr/bin/perf -v > > > >>> perf version 6.1.7 > > > >>> > > > >>> /usr/bin/perf test 10 92 98 99 100 101 > > > >>> > > > >>> 10: PMU events : > > > >>> 10.1: PMU event table sanity : Ok > > > >>> 10.2: PMU event map aliases : Ok > > > >>> 10.3: Parsing of PMU event table metrics : Ok > > > >>> 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > > >>> 92: perf record tests : Ok > > > >>> 98: perf stat tests : Ok > > > >>> 99: perf all metricgroups test : Ok > > > >>> 100: perf all metrics test : FAILED! > > > >>> 101: perf all PMU test : Ok > > > >>> > > > >>> [ TESTS - DILEKS ] > > > >>> > > > >>> ~/bin/perf -v > > > >>> perf version 6.2.0-rc5 > > > >>> > > > >>> ~/bin/perf test 7 87 93 94 95 96 > > > >>> > > > >>> 7: PMU events : > > > >>> 7.1: PMU event table sanity : Ok > > > >>> 7.2: PMU event map aliases : Ok > > > >>> 7.3: Parsing of PMU event table metrics : Ok > > > >>> 7.4: Parsing of PMU event table metrics with fake PMUs : Ok > > > >>> 87: perf record tests : Ok > > > >>> 93: perf stat tests : Ok > > > >>> 94: perf all metricgroups test : Ok > > > >>> 95: perf all metrics test : FAILED! > > > >>> 96: perf all PMU test : Ok > > > >>> > > > >>> [ TESTS - FAILED ] > > > >>> > > > >>> /usr/bin/perf test --verbose 100 2>&1 | tee > > > >>> perf-test-verbose-100-perf-all-metrics-test_debian-perf-6-1-7.txt > > > >>> > > > >>> ~/bin/perf test --verbose 95 2>&1 | tee > > > >>> perf-test-verbose-95-perf-all-metrics-test_dileks-perf-6-2-rc5.txt > > > >>> > > > >>> [ TESTS - STOP ] > > > >>> > > > >>> echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > > > >>> /proc/sys/kernel/perf_event_paranoid > > > >>> > > > >>> - EOT - ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED! 2023-01-31 3:55 ` Ian Rogers @ 2023-01-31 6:14 ` Sedat Dilek 2023-01-31 6:20 ` Sedat Dilek 2023-02-01 15:27 ` Liang, Kan 1 sibling, 1 reply; 13+ messages in thread From: Sedat Dilek @ 2023-01-31 6:14 UTC (permalink / raw) To: Ian Rogers Cc: Liang, Kan, Xing, Zhengjun, Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings, James Clark, Stephane Eranian On Tue, Jan 31, 2023 at 4:55 AM Ian Rogers <irogers@google.com> wrote: > > On Mon, Jan 30, 2023 at 7:45 PM Sedat Dilek <sedat.dilek@gmail.com> wrote: > > > > On Tue, Jan 31, 2023 at 1:20 AM Ian Rogers <irogers@google.com> wrote: > > > > > > On Mon, Jan 30, 2023 at 2:04 AM James Clark <james.clark@arm.com> wrote: > > > > > > > > > > > > > > > > On 30/01/2023 02:24, Sedat Dilek wrote: > > > > > ? > > > > > > > > > > On Mon, Jan 30, 2023 at 12:21 AM Ian Rogers <irogers@google.com> wrote: > > > > >> > > > > >> On Sun, Jan 29, 2023 at 1:59 AM Sedat Dilek <sedat.dilek@gmail.com> wrote: > > > > >>> > > > > >>> [ CC LLVM linux folks + Ben from Debian kernel team ] > > > > >>> > > > > >>> Hi, > > > > >>> > > > > >>> I am playing with LLVM version 16.0.0-rc1 which was released yesterday and PERF. > > > > >>> > > > > >>> After building my selfmade LLVM toolchain, I built perf and run some > > > > >>> perf tests here on my Intel SandyBridge CPU (details see below). > > > > >>> > > > > >>> perf all metrics test: FAILED! > > > > >>> > > > > >>> ...with both Debian's perf version 6.1.7 and my selfmade version 6.2-rc5. > > > > >>> > > > > >>> Just noticed: > > > > >>> > > > > >>> Couldn't bump rlimit(MEMLOCK), failures may take place when creating > > > > >>> BPF maps, etc > > > > >>> > > > > >>> Run the below tests with `sudo` - made this go away - still FAILED. > > > > >>> > > > > >>> But maybe I am missing to activate some sysfs/debug or whatever other stuff? > > > > >> > > > > >> Hi Sedat, > > > > >> > > > > >> things have been improving wrt metrics and so this failure may have > > > > >> just been because of the addition of a previously missing metric. The > > > > >> rlimit thing shouldn't affect things but maybe file descriptors? > > > > >> Looking at the test output the issue is: > > > > >> > > > > >> ``` > > > > >> Metric 'tma_dram_bound' not printed in: > > > > >> # Running 'internals/synthesize' benchmark: > > > > >> Computing performance of single threaded perf event synthesis by > > > > >> synthesizing events on the perf process itself: > > > > >> Average synthesis took: 207.680 usec (+- 0.176 usec) > > > > >> Average num. events: 30.000 (+- 0.000) > > > > >> Average time per event 6.923 usec > > > > >> Average data synthesis took: 217.833 usec (+- 0.202 usec) > > > > >> Average num. events: 161.000 (+- 0.000) > > > > >> Average time per event 1.353 usec > > > > >> > > > > >> Performance counter stats for 'perf bench internals synthesize': > > > > >> > > > > >> <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > > >> (0,00%) > > > > >> <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > >> (0,00%) > > > > >> <not counted> CPU_CLK_UNHALTED.THREAD > > > > >> (0,00%) > > > > >> <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > > >> (0,00%) > > > > >> ``` > > > > >> > > > > >> So the test was checking to see whether the tma_dram_bound metric > > > > >> could be computed on your Sandybridge and it failed. The event counts > > > > >> below show that every event came back "<not counted>" which is usually > > > > >> indicative of a permissions problem - it is also not surprising given > > > > >> this that the metric wasn't computed. You could try repeating the > > > > >> command the test is trying with something like "perf stat -M > > > > >> tma_dram_bound -a sleep 1", but running as root should have resolved > > > > >> that issue. Does that give you enough to keep exploring? > > > > >> > > > > > > > > > > Hi Ian, > > > > > > > > > > Thanks for your feedback! > > > > > > > > > > I booted into my Debian kernel - just to see what happens. > > > > > > > > > > # cat /proc/version > > > > > Linux version 6.1.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-12 > > > > > (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 > > > > > SMP PREEMPT_DYNAMIC Debian 6.1.7-1 (2023-01-18) > > > > > > > > > > All things run as root... > > > > > > > > > > # echo 0 | tee /proc/sys/kernel/kptr_restrict > > > > > /proc/sys/kernel/perf_event_paranoid > > > > > 0 > > > > > > > > > > # /usr/bin/perf test 10 92 98 99 100 101 > > > > > 10: PMU events : > > > > > 10.1: PMU event table sanity : Ok > > > > > 10.2: PMU event map aliases : Ok > > > > > 10.3: Parsing of PMU event table metrics : Ok > > > > > 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > > > > 92: perf record tests : Ok > > > > > 98: perf stat tests : Ok > > > > > 99: perf all metricgroups test : Ok > > > > > 100: perf all metrics test : FAILED! > > > > > 101: perf all PMU test : Ok > > > > > > > > > > # perf stat -M tma_dram_bound -a sleep 1 > > > > > > > > > > Performance counter stats for 'system wide': > > > > > > > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > > > (0,00%) > > > > > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > > (0,00%) > > > > > <not counted> CPU_CLK_UNHALTED.THREAD > > > > > (0,00%) > > > > > <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > > > (0,00%) > > > > > > > > > > > > > Hi Sedat, > > > > > > > > I also had this failure and did a git bisect, but it led me to the > > > > conclusion that it is a stale build issue rather than a regression. > > > > > > > > There was a recent commit that renamed/removed some json PMU files which > > > > the build system can't cope with. I think the tests end up iterating > > > > over a different set of event names than were generated by the build system. > > > > > > > > If you do a clean build the issue should go away. I don't know if there > > > > is anything more we can do to stop this from happening. > > > > > > > > James > > > > > > So I think this is a kernel bug triggering a perf tool bug. The kernel > > > bug can be worked around in the perf tool. I only had an Ivybridge to > > > test with (hence slightly different events) but what I see is both > > > tma_dram_bound and tma_l3_bound using the same 4 events. I could work > > > around the "<not counted>" by adding the --metric-no-group flag: > > > > > > ``` > > > $ perf stat -M tma_l3_bound --metric-no-group -a sleep 1 > > > > > > Performance counter stats for 'system wide': > > > > > > 400,404 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 4.3 % > > > tma_l3_bound (74.99%) > > > 128,937,891 CYCLE_ACTIVITY.STALLS_L2_PENDING > > > (87.46%) > > > 167,459 MEM_LOAD_UOPS_RETIRED.LLC_MISS > > > (74.99%) > > > 759,574,967 CPU_CLK_UNHALTED.THREAD > > > (87.47%) > > > > > > 1.001526438 seconds time elapsed > > > > > > $ perf stat -M tma_dram_bound -a --metric-no-group sleep 1 > > > > > > Performance counter stats for 'system wide': > > > > > > 259,954 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 15.2 % > > > tma_dram_bound (74.99%) > > > 118,807,043 CYCLE_ACTIVITY.STALLS_L2_PENDING > > > (87.46%) > > > 111,699 MEM_LOAD_UOPS_RETIRED.LLC_MISS > > > (74.95%) > > > 587,571,060 CPU_CLK_UNHALTED.THREAD > > > (87.45%) > > > > > > 1.001518093 seconds time elapsed > > > ``` > > > > > > The issue is that perf metrics use weak groups of events. A weak group > > > is the same as a group of events initially. We want to use groups of > > > events with metrics so that all the counters are scheduled in and out > > > at the same time, and not multiplexed independently. Imagine measuring > > > IPC but the counts for instructions and cycles are measured at > > > different periods, the resultant IPC value would be unlikely to be > > > accurate. If perf_event_open fails then the perf tool retries the > > > events without the group. If I try just 3 of the events in a weak > > > group then the failure can be seen: > > > > > > ``` > > > $ perf stat -e "{MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING}:W" > > > -a sleep 1 > > > > > > Performance counter stats for 'system wide': > > > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > (0.00%) > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_MISS > > > (0.00%) > > > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > > (0.00%) > > > > > > 1.001458485 seconds time elapsed > > > ``` > > > > > > The kernel should have failed the perf_event_open on opening the third > > > event and then measured without the group, which it can do with > > > multiplexing as in the following: > > > > > > ``` > > > $ perf stat -e "MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING" > > > -a sleep 1 > > > > > > Performance counter stats for 'system wide': > > > > > > 1,239,397 MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > (79.06%) > > > 174,826 MEM_LOAD_UOPS_RETIRED.LLC_MISS > > > (64.60%) > > > 124,026,024 CYCLE_ACTIVITY.STALLS_L2_PENDING > > > (81.16%) > > > > > > 1.001483434 seconds time elapsed > > > ``` > > > > > > When the --metric-no-group flag is given to perf then it doesn't > > > produce the initial weak group, which works around the bug of the > > > kernel not failing on the 3rd perf_event_open. I've added Kan and > > > Zhengjun to the e-mail as they work on the Intel kernel PMU code. > > > > > > There's a question about what we should do in the perf test about > > > this? I have a few solutions: > > > > > > 1) try metric tests again with the --metric-no-group flag and don't > > > fail the test if this succeeds. This allows kernel bugs to hide, so > > > I'm not a huge fan. > > > > > > 2) add a new metric flag/constraint to say not to group, this way the > > > metric will automatically apply the "--metric-no-group" flag. It is a > > > bit of work to wire this up but this kind of failure is common enough > > > in PMUs that it is probably worthwhile. We also need to add the flag > > > to metrics and I'm not sure how to get a good list of the metrics that > > > currently fail and require it. This is okay but error prone. > > > > > > 3) fix the kernel bug and let the perf test fail until an adequate > > > kernel is installed. Probably the best option. > > > > > > > Hi Ian, > > > > I can confirm: > > > > $ echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > > /proc/sys/kernel/perf_event_paranoid > > 0 > > > > $ ~/bin/perf stat -M tma_l3_bound --metric-no-group -a sleep 1 > > > > Performance counter stats for 'system wide': > > > > 2.058.892 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,5 % > > tma_l3_bound (99,30%) > > 173.254.697 CYCLE_ACTIVITY.STALLS_L2_PENDING > > (99,10%) > > 2.396.130.501 CPU_CLK_UNHALTED.THREAD > > (99,60%) > > 1.110.486 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > (99,53%) > > > > 1,001989022 seconds time elapsed > > > > $ ~/bin/perf stat -M tma_dram_bound --metric-no-group -a sleep 1 > > > > Performance counter stats for 'system wide': > > > > 1.729.208 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,2 % > > tma_dram_bound (99,50%) > > 50.346.734 CYCLE_ACTIVITY.STALLS_L2_PENDING > > (99,50%) > > 2.354.963.862 CPU_CLK_UNHALTED.THREAD > > (99,80%) > > 306.500 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > (99,61%) > > > > 1,001981392 seconds time elapsed > > > > Thanks! > > Thanks, apparently it is an issue with SandyBridge/IvyBridge that some > counters on one hyperthread will limit what can be on the other. I > believe that's the comment related to EXCL access here: > https://github.com/torvalds/linux/blob/master/arch/x86/events/intel/core.c#L124 > So you may have more success with the metric if you disable > hyperthreading, but I imagine that's not a popular option. > Hi Ian, LOL. Yesterday, I played a bit with perf... and did some research. I was thinking more of the "formula" in the metrics calculation is somehow BROKEN (not executed properly). The first thing to fix the issue was to recompile perf with throwing out the blocks in snb-metrics JSON file containing the two tests. AFFAICS, stackoverflow or wherever I found something about haswell vs. sandy bridge and topic "SMT / HT (hyper-threading)". That's why my initial LOL... I am not in front of my Linux machine, I guess it was... Link: https://stackoverflow.com/questions/33677367/understanding-cycle-activity-haswell-performance-monitoring-events Hope I have not to disable HT in my BIOS before running these tests. Can I do this in runtime (proc / sysfs / etc.)? Furthermore, I passed * mitigations=off * as kernel-boot-parameter for testing purposes. ( I wanted to test it for a long time - independent of this issue. ) Of course, I played with perf -e MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING ... (comma-separated, no "...", no {...}), passing -C 1 (number of CPUs), etc. Had to check my bash-history... Again as noone explained: Control descriptor is not initialized In some tests I have seen this (all with passing -v or -vv for more verbosity) - in some not. Worth reading: Brendan Gregg's Homepage! https://www.brendangregg.com/linuxperf.html https://www.brendangregg.com/perf.html Thanks Ian for digging into this! If you have any news or anything for testing, please let me know. BR, -Sedat- > > > BR, > > -Sedat- > > > > > Thanks, > > > Ian > > > > > > > > 1,002148600 seconds time elapsed > > > > > > > > > > Hmm... looking at... Metric 'tma_l3_bound' ... > > > > > > > > > > Running... > > > > > > > > > > # perf stat --verbose -M tma_l3_bound -a sleep 1 > > > > > Using CPUID GenuineIntel-6-2A-7 > > > > > metric expr (MEM_LOAD_UOPS_RETIRED.LLC_HIT / > > > > > (MEM_LOAD_UOPS_RETIRED.LLC_HIT + 7 * > > > > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS)) * > > > > > CYCLE_ACTIVITY.STALLS_L2_PENDING / CLKS for tma_l3_bound > > > > > metric expr CPU_CLK_UNHALTED.THREAD for CLKS > > > > > > > > > > found event MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > > > found event CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > > found event CPU_CLK_UNHALTED.THREAD > > > > > found event MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > > > > > > > > Parsing metric events > > > > > '{MEM_LOAD_UOPS_RETIRED.LLC_HIT/metric-id=MEM_LOAD_UOPS_RETIRED.LLC_HIT/,CYCLE_ACTIVITY.STALLS_L2_PENDING/metric-id=CYCLE_ACTIVITY.STALLS_L2_PEND > > > > > ING/,CPU_CLK_UNHALTED.THREAD/metric-id=CPU_CLK_UNHALTED.THREAD/,MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/metric-id=MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/}:W' > > > > > MEM_LOAD_UOPS_RETIRED.LLC_HIT -> cpu/event=0xd1,period=0xc365,umask=0x4/ > > > > > CYCLE_ACTIVITY.STALLS_L2_PENDING -> > > > > > cpu/event=0xa3,cmask=0x5,period=0x1e8483,umask=0x5/ > > > > > CPU_CLK_UNHALTED.THREAD -> cpu/event=0x3c,period=0x1e8483/ > > > > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS -> cpu/event=0xd4,period=0x186a7,umask=0x2/ > > > > > > > > > > Control descriptor is not initialized > > > > > > > > > > MEM_LOAD_UOPS_RETIRED.LLC_HIT: 0 4007421228 0 > > > > > CYCLE_ACTIVITY.STALLS_L2_PENDING: 0 4007421228 0 > > > > > CPU_CLK_UNHALTED.THREAD: 0 4007421228 0 > > > > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS: 0 4007421228 0 > > > > > > > > > > Performance counter stats for 'system wide': > > > > > > > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > > > (0,00%) > > > > > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > > (0,00%) > > > > > <not counted> CPU_CLK_UNHALTED.THREAD > > > > > (0,00%) > > > > > <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > > > (0,00%) > > > > > > > > > > 1,002310013 seconds time elapsed > > > > > > > > > > So those events/metric-ids resulting in "<not counted>" are all found. > > > > > > > > > > What means "Control descriptor is not initialized"? > > > > > > > > > > To summarize: > > > > > > > > > > Those two tests in "100: perf all metrics test" FAILED: > > > > > > > > > > 1. tma_dram_bound > > > > > 2. tma_l3_bound > > > > > > > > > > Best regards, > > > > > -Sedat- > > > > > > > > > >> Thanks, > > > > >> Ian > > > > >> > > > > >>> Last perf version which was OK: > > > > >>> > > > > >>> ~/bin/perf -v > > > > >>> perf version 6.0.0 > > > > >>> > > > > >>> echo "linux-perf: Adjust limited access to performance monitoring and > > > > >>> observability operations" > > > > >>> echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > > > > >>> /proc/sys/kernel/perf_event_paranoid > > > > >>> 0 > > > > >>> > > > > >>> ~/bin/perf test 10 86 92 93 94 95 > > > > >>> 10: PMU events : > > > > >>> 10.1: PMU event table sanity : Ok > > > > >>> 10.2: PMU event map aliases : Ok > > > > >>> 10.3: Parsing of PMU event table metrics : Ok > > > > >>> 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > > > >>> 86: perf record tests : Ok > > > > >>> 92: perf stat tests : Ok > > > > >>> 93: perf all metricgroups test : Ok > > > > >>> 94: perf all metrics test : Ok > > > > >>> 95: perf all PMU test : Ok > > > > >>> > > > > >>> echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > > > > >>> /proc/sys/kernel/perf_event_paranoid > > > > >>> echo "linux-perf: Reset limited access to performance monitoring and > > > > >>> observability operations" > > > > >>> > > > > >>> If you need further information, please let me know. > > > > >>> > > > > >>> Thanks. > > > > >>> > > > > >>> Regards, > > > > >>> -Sedat- > > > > >>> > > > > >>> P.S. Instructions > > > > >>> > > > > >>> [ REPRODUCER ] > > > > >>> > > > > >>> LLVM_MVER="16" > > > > >>> > > > > >>> # Debian LLVM > > > > >>> ##LLVM_TOOLCHAIN_PATH="/usr/lib/llvm-${LLVM_MVER}/bin" > > > > >>> # Selfmade LLVM > > > > >>> LLVM_TOOLCHAIN_PATH="/opt/llvm/bin" > > > > >>> if [ -d ${LLVM_TOOLCHAIN_PATH} ]; then > > > > >>> export PATH="${LLVM_TOOLCHAIN_PATH}:${PATH}" > > > > >>> fi > > > > >>> > > > > >>> PYTHON_VER="3.11" > > > > >>> MAKE="make" > > > > >>> MAKE_OPTS="V=1 -j1 HOSTCC=clang-$LLVM_MVER HOSTLD=ld.lld > > > > >>> HOSTAR=llvm-ar CC=clang-$LLVM_MVER LD=ld.lld AR=llvm-ar > > > > >>> STRIP=llvm-strip" > > > > >>> > > > > >>> echo "LLVM MVER ........ $LLVM_MVER" > > > > >>> echo "Path settings .... $PATH" > > > > >>> echo "Python version ... $PYTHON_VER" > > > > >>> echo "make line ........ $MAKE $MAKE_OPTS" > > > > >>> > > > > >>> LANG=C LC_ALL=C make -C tools/perf clean 2>&1 | tee ../make-log_perf-clean.txt > > > > >>> > > > > >>> LANG=C LC_ALL=C $MAKE $MAKE_OPTS -C tools/perf > > > > >>> PYTHON=python${PYTHON_VER} install-bin 2>&1 | tee > > > > >>> ../make-log_perf-install_bin_python${PYTHON_VER}_llvm${LLVM_MVER}.txt > > > > >>> > > > > >>> > > > > >>> [ TESTS ] > > > > >>> > > > > >>> [ TESTS - START ] > > > > >>> > > > > >>> echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > > > > >>> /proc/sys/kernel/perf_event_paranoid > > > > >>> > > > > >>> [ TESTS - DEBIAN ] > > > > >>> > > > > >>> /usr/bin/perf -v > > > > >>> perf version 6.1.7 > > > > >>> > > > > >>> /usr/bin/perf test 10 92 98 99 100 101 > > > > >>> > > > > >>> 10: PMU events : > > > > >>> 10.1: PMU event table sanity : Ok > > > > >>> 10.2: PMU event map aliases : Ok > > > > >>> 10.3: Parsing of PMU event table metrics : Ok > > > > >>> 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > > > >>> 92: perf record tests : Ok > > > > >>> 98: perf stat tests : Ok > > > > >>> 99: perf all metricgroups test : Ok > > > > >>> 100: perf all metrics test : FAILED! > > > > >>> 101: perf all PMU test : Ok > > > > >>> > > > > >>> [ TESTS - DILEKS ] > > > > >>> > > > > >>> ~/bin/perf -v > > > > >>> perf version 6.2.0-rc5 > > > > >>> > > > > >>> ~/bin/perf test 7 87 93 94 95 96 > > > > >>> > > > > >>> 7: PMU events : > > > > >>> 7.1: PMU event table sanity : Ok > > > > >>> 7.2: PMU event map aliases : Ok > > > > >>> 7.3: Parsing of PMU event table metrics : Ok > > > > >>> 7.4: Parsing of PMU event table metrics with fake PMUs : Ok > > > > >>> 87: perf record tests : Ok > > > > >>> 93: perf stat tests : Ok > > > > >>> 94: perf all metricgroups test : Ok > > > > >>> 95: perf all metrics test : FAILED! > > > > >>> 96: perf all PMU test : Ok > > > > >>> > > > > >>> [ TESTS - FAILED ] > > > > >>> > > > > >>> /usr/bin/perf test --verbose 100 2>&1 | tee > > > > >>> perf-test-verbose-100-perf-all-metrics-test_debian-perf-6-1-7.txt > > > > >>> > > > > >>> ~/bin/perf test --verbose 95 2>&1 | tee > > > > >>> perf-test-verbose-95-perf-all-metrics-test_dileks-perf-6-2-rc5.txt > > > > >>> > > > > >>> [ TESTS - STOP ] > > > > >>> > > > > >>> echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > > > > >>> /proc/sys/kernel/perf_event_paranoid > > > > >>> > > > > >>> - EOT - ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED! 2023-01-31 6:14 ` Sedat Dilek @ 2023-01-31 6:20 ` Sedat Dilek 0 siblings, 0 replies; 13+ messages in thread From: Sedat Dilek @ 2023-01-31 6:20 UTC (permalink / raw) To: Ian Rogers Cc: Liang, Kan, Xing, Zhengjun, Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings, James Clark, Stephane Eranian On Tue, Jan 31, 2023 at 7:14 AM Sedat Dilek <sedat.dilek@gmail.com> wrote: > > On Tue, Jan 31, 2023 at 4:55 AM Ian Rogers <irogers@google.com> wrote: > > > > On Mon, Jan 30, 2023 at 7:45 PM Sedat Dilek <sedat.dilek@gmail.com> wrote: > > > > > > On Tue, Jan 31, 2023 at 1:20 AM Ian Rogers <irogers@google.com> wrote: > > > > > > > > On Mon, Jan 30, 2023 at 2:04 AM James Clark <james.clark@arm.com> wrote: > > > > > > > > > > > > > > > > > > > > On 30/01/2023 02:24, Sedat Dilek wrote: > > > > > > ? > > > > > > > > > > > > On Mon, Jan 30, 2023 at 12:21 AM Ian Rogers <irogers@google.com> wrote: > > > > > >> > > > > > >> On Sun, Jan 29, 2023 at 1:59 AM Sedat Dilek <sedat.dilek@gmail.com> wrote: > > > > > >>> > > > > > >>> [ CC LLVM linux folks + Ben from Debian kernel team ] > > > > > >>> > > > > > >>> Hi, > > > > > >>> > > > > > >>> I am playing with LLVM version 16.0.0-rc1 which was released yesterday and PERF. > > > > > >>> > > > > > >>> After building my selfmade LLVM toolchain, I built perf and run some > > > > > >>> perf tests here on my Intel SandyBridge CPU (details see below). > > > > > >>> > > > > > >>> perf all metrics test: FAILED! > > > > > >>> > > > > > >>> ...with both Debian's perf version 6.1.7 and my selfmade version 6.2-rc5. > > > > > >>> > > > > > >>> Just noticed: > > > > > >>> > > > > > >>> Couldn't bump rlimit(MEMLOCK), failures may take place when creating > > > > > >>> BPF maps, etc > > > > > >>> > > > > > >>> Run the below tests with `sudo` - made this go away - still FAILED. > > > > > >>> > > > > > >>> But maybe I am missing to activate some sysfs/debug or whatever other stuff? > > > > > >> > > > > > >> Hi Sedat, > > > > > >> > > > > > >> things have been improving wrt metrics and so this failure may have > > > > > >> just been because of the addition of a previously missing metric. The > > > > > >> rlimit thing shouldn't affect things but maybe file descriptors? > > > > > >> Looking at the test output the issue is: > > > > > >> > > > > > >> ``` > > > > > >> Metric 'tma_dram_bound' not printed in: > > > > > >> # Running 'internals/synthesize' benchmark: > > > > > >> Computing performance of single threaded perf event synthesis by > > > > > >> synthesizing events on the perf process itself: > > > > > >> Average synthesis took: 207.680 usec (+- 0.176 usec) > > > > > >> Average num. events: 30.000 (+- 0.000) > > > > > >> Average time per event 6.923 usec > > > > > >> Average data synthesis took: 217.833 usec (+- 0.202 usec) > > > > > >> Average num. events: 161.000 (+- 0.000) > > > > > >> Average time per event 1.353 usec > > > > > >> > > > > > >> Performance counter stats for 'perf bench internals synthesize': > > > > > >> > > > > > >> <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > > > >> (0,00%) > > > > > >> <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > > >> (0,00%) > > > > > >> <not counted> CPU_CLK_UNHALTED.THREAD > > > > > >> (0,00%) > > > > > >> <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > > > >> (0,00%) > > > > > >> ``` > > > > > >> > > > > > >> So the test was checking to see whether the tma_dram_bound metric > > > > > >> could be computed on your Sandybridge and it failed. The event counts > > > > > >> below show that every event came back "<not counted>" which is usually > > > > > >> indicative of a permissions problem - it is also not surprising given > > > > > >> this that the metric wasn't computed. You could try repeating the > > > > > >> command the test is trying with something like "perf stat -M > > > > > >> tma_dram_bound -a sleep 1", but running as root should have resolved > > > > > >> that issue. Does that give you enough to keep exploring? > > > > > >> > > > > > > > > > > > > Hi Ian, > > > > > > > > > > > > Thanks for your feedback! > > > > > > > > > > > > I booted into my Debian kernel - just to see what happens. > > > > > > > > > > > > # cat /proc/version > > > > > > Linux version 6.1.0-2-amd64 (debian-kernel@lists.debian.org) (gcc-12 > > > > > > (Debian 12.2.0-14) 12.2.0, GNU ld (GNU Binutils for Debian) 2.40) #1 > > > > > > SMP PREEMPT_DYNAMIC Debian 6.1.7-1 (2023-01-18) > > > > > > > > > > > > All things run as root... > > > > > > > > > > > > # echo 0 | tee /proc/sys/kernel/kptr_restrict > > > > > > /proc/sys/kernel/perf_event_paranoid > > > > > > 0 > > > > > > > > > > > > # /usr/bin/perf test 10 92 98 99 100 101 > > > > > > 10: PMU events : > > > > > > 10.1: PMU event table sanity : Ok > > > > > > 10.2: PMU event map aliases : Ok > > > > > > 10.3: Parsing of PMU event table metrics : Ok > > > > > > 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > > > > > 92: perf record tests : Ok > > > > > > 98: perf stat tests : Ok > > > > > > 99: perf all metricgroups test : Ok > > > > > > 100: perf all metrics test : FAILED! > > > > > > 101: perf all PMU test : Ok > > > > > > > > > > > > # perf stat -M tma_dram_bound -a sleep 1 > > > > > > > > > > > > Performance counter stats for 'system wide': > > > > > > > > > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > > > > (0,00%) > > > > > > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > > > (0,00%) > > > > > > <not counted> CPU_CLK_UNHALTED.THREAD > > > > > > (0,00%) > > > > > > <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > > > > (0,00%) > > > > > > > > > > > > > > > > Hi Sedat, > > > > > > > > > > I also had this failure and did a git bisect, but it led me to the > > > > > conclusion that it is a stale build issue rather than a regression. > > > > > > > > > > There was a recent commit that renamed/removed some json PMU files which > > > > > the build system can't cope with. I think the tests end up iterating > > > > > over a different set of event names than were generated by the build system. > > > > > > > > > > If you do a clean build the issue should go away. I don't know if there > > > > > is anything more we can do to stop this from happening. > > > > > > > > > > James > > > > > > > > So I think this is a kernel bug triggering a perf tool bug. The kernel > > > > bug can be worked around in the perf tool. I only had an Ivybridge to > > > > test with (hence slightly different events) but what I see is both > > > > tma_dram_bound and tma_l3_bound using the same 4 events. I could work > > > > around the "<not counted>" by adding the --metric-no-group flag: > > > > > > > > ``` > > > > $ perf stat -M tma_l3_bound --metric-no-group -a sleep 1 > > > > > > > > Performance counter stats for 'system wide': > > > > > > > > 400,404 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 4.3 % > > > > tma_l3_bound (74.99%) > > > > 128,937,891 CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > (87.46%) > > > > 167,459 MEM_LOAD_UOPS_RETIRED.LLC_MISS > > > > (74.99%) > > > > 759,574,967 CPU_CLK_UNHALTED.THREAD > > > > (87.47%) > > > > > > > > 1.001526438 seconds time elapsed > > > > > > > > $ perf stat -M tma_dram_bound -a --metric-no-group sleep 1 > > > > > > > > Performance counter stats for 'system wide': > > > > > > > > 259,954 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 15.2 % > > > > tma_dram_bound (74.99%) > > > > 118,807,043 CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > (87.46%) > > > > 111,699 MEM_LOAD_UOPS_RETIRED.LLC_MISS > > > > (74.95%) > > > > 587,571,060 CPU_CLK_UNHALTED.THREAD > > > > (87.45%) > > > > > > > > 1.001518093 seconds time elapsed > > > > ``` > > > > > > > > The issue is that perf metrics use weak groups of events. A weak group > > > > is the same as a group of events initially. We want to use groups of > > > > events with metrics so that all the counters are scheduled in and out > > > > at the same time, and not multiplexed independently. Imagine measuring > > > > IPC but the counts for instructions and cycles are measured at > > > > different periods, the resultant IPC value would be unlikely to be > > > > accurate. If perf_event_open fails then the perf tool retries the > > > > events without the group. If I try just 3 of the events in a weak > > > > group then the failure can be seen: > > > > > > > > ``` > > > > $ perf stat -e "{MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING}:W" > > > > -a sleep 1 > > > > > > > > Performance counter stats for 'system wide': > > > > > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > > (0.00%) > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_MISS > > > > (0.00%) > > > > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > (0.00%) > > > > > > > > 1.001458485 seconds time elapsed > > > > ``` > > > > > > > > The kernel should have failed the perf_event_open on opening the third > > > > event and then measured without the group, which it can do with > > > > multiplexing as in the following: > > > > > > > > ``` > > > > $ perf stat -e "MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING" > > > > -a sleep 1 > > > > > > > > Performance counter stats for 'system wide': > > > > > > > > 1,239,397 MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > > (79.06%) > > > > 174,826 MEM_LOAD_UOPS_RETIRED.LLC_MISS > > > > (64.60%) > > > > 124,026,024 CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > (81.16%) > > > > > > > > 1.001483434 seconds time elapsed > > > > ``` > > > > > > > > When the --metric-no-group flag is given to perf then it doesn't > > > > produce the initial weak group, which works around the bug of the > > > > kernel not failing on the 3rd perf_event_open. I've added Kan and > > > > Zhengjun to the e-mail as they work on the Intel kernel PMU code. > > > > > > > > There's a question about what we should do in the perf test about > > > > this? I have a few solutions: > > > > > > > > 1) try metric tests again with the --metric-no-group flag and don't > > > > fail the test if this succeeds. This allows kernel bugs to hide, so > > > > I'm not a huge fan. > > > > > > > > 2) add a new metric flag/constraint to say not to group, this way the > > > > metric will automatically apply the "--metric-no-group" flag. It is a > > > > bit of work to wire this up but this kind of failure is common enough > > > > in PMUs that it is probably worthwhile. We also need to add the flag > > > > to metrics and I'm not sure how to get a good list of the metrics that > > > > currently fail and require it. This is okay but error prone. > > > > > > > > 3) fix the kernel bug and let the perf test fail until an adequate > > > > kernel is installed. Probably the best option. > > > > > > > > > > Hi Ian, > > > > > > I can confirm: > > > > > > $ echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > > > /proc/sys/kernel/perf_event_paranoid > > > 0 > > > > > > $ ~/bin/perf stat -M tma_l3_bound --metric-no-group -a sleep 1 > > > > > > Performance counter stats for 'system wide': > > > > > > 2.058.892 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,5 % > > > tma_l3_bound (99,30%) > > > 173.254.697 CYCLE_ACTIVITY.STALLS_L2_PENDING > > > (99,10%) > > > 2.396.130.501 CPU_CLK_UNHALTED.THREAD > > > (99,60%) > > > 1.110.486 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > (99,53%) > > > > > > 1,001989022 seconds time elapsed > > > > > > $ ~/bin/perf stat -M tma_dram_bound --metric-no-group -a sleep 1 > > > > > > Performance counter stats for 'system wide': > > > > > > 1.729.208 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,2 % > > > tma_dram_bound (99,50%) > > > 50.346.734 CYCLE_ACTIVITY.STALLS_L2_PENDING > > > (99,50%) > > > 2.354.963.862 CPU_CLK_UNHALTED.THREAD > > > (99,80%) > > > 306.500 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > (99,61%) > > > > > > 1,001981392 seconds time elapsed > > > > > > Thanks! > > > > Thanks, apparently it is an issue with SandyBridge/IvyBridge that some > > counters on one hyperthread will limit what can be on the other. I > > believe that's the comment related to EXCL access here: > > https://github.com/torvalds/linux/blob/master/arch/x86/events/intel/core.c#L124 > > So you may have more success with the metric if you disable > > hyperthreading, but I imagine that's not a popular option. > > > > Hi Ian, > > LOL. > > Yesterday, I played a bit with perf... and did some research. > > I was thinking more of the "formula" in the metrics calculation is > somehow BROKEN (not executed properly). > > The first thing to fix the issue was to recompile perf with throwing > out the blocks in snb-metrics JSON file containing the two tests. > > AFFAICS, stackoverflow or wherever I found something about haswell vs. > sandy bridge and topic "SMT / HT (hyper-threading)". > That's why my initial LOL... > > I am not in front of my Linux machine, I guess it was... > > Link: https://stackoverflow.com/questions/33677367/understanding-cycle-activity-haswell-performance-monitoring-events > To quote from the above Link: > There are three important differences: > > Some of the Haswell events can only valid when HT is disabled. All SNB events are valid even when HT is enabled. > > CYCLE_ACTIVITY.STALLS_L2_PENDING on HSW counts the number of load misses at L2, but on SNB, it counts the number of cycles during which there was at least one demand load miss at L2. > > The HSW events include all accesses, not just demand loads. In contrast, the SNB events only occur for demand loads. -Sedat- > Hope I have not to disable HT in my BIOS before running these tests. > Can I do this in runtime (proc / sysfs / etc.)? > > Furthermore, I passed * mitigations=off * as kernel-boot-parameter for > testing purposes. > ( I wanted to test it for a long time - independent of this issue. ) > > Of course, I played with perf -e > MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING > ... (comma-separated, no "...", no {...}), passing -C 1 (number of > CPUs), etc. > Had to check my bash-history... > > Again as noone explained: > > Control descriptor is not initialized > > In some tests I have seen this (all with passing -v or -vv for more > verbosity) - in some not. > > Worth reading: Brendan Gregg's Homepage! > > https://www.brendangregg.com/linuxperf.html > > https://www.brendangregg.com/perf.html > > Thanks Ian for digging into this! > > If you have any news or anything for testing, please let me know. > > BR, > -Sedat- > > > > > > BR, > > > -Sedat- > > > > > > > Thanks, > > > > Ian > > > > > > > > > > 1,002148600 seconds time elapsed > > > > > > > > > > > > Hmm... looking at... Metric 'tma_l3_bound' ... > > > > > > > > > > > > Running... > > > > > > > > > > > > # perf stat --verbose -M tma_l3_bound -a sleep 1 > > > > > > Using CPUID GenuineIntel-6-2A-7 > > > > > > metric expr (MEM_LOAD_UOPS_RETIRED.LLC_HIT / > > > > > > (MEM_LOAD_UOPS_RETIRED.LLC_HIT + 7 * > > > > > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS)) * > > > > > > CYCLE_ACTIVITY.STALLS_L2_PENDING / CLKS for tma_l3_bound > > > > > > metric expr CPU_CLK_UNHALTED.THREAD for CLKS > > > > > > > > > > > > found event MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > > > > found event CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > > > found event CPU_CLK_UNHALTED.THREAD > > > > > > found event MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > > > > > > > > > > Parsing metric events > > > > > > '{MEM_LOAD_UOPS_RETIRED.LLC_HIT/metric-id=MEM_LOAD_UOPS_RETIRED.LLC_HIT/,CYCLE_ACTIVITY.STALLS_L2_PENDING/metric-id=CYCLE_ACTIVITY.STALLS_L2_PEND > > > > > > ING/,CPU_CLK_UNHALTED.THREAD/metric-id=CPU_CLK_UNHALTED.THREAD/,MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/metric-id=MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS/}:W' > > > > > > MEM_LOAD_UOPS_RETIRED.LLC_HIT -> cpu/event=0xd1,period=0xc365,umask=0x4/ > > > > > > CYCLE_ACTIVITY.STALLS_L2_PENDING -> > > > > > > cpu/event=0xa3,cmask=0x5,period=0x1e8483,umask=0x5/ > > > > > > CPU_CLK_UNHALTED.THREAD -> cpu/event=0x3c,period=0x1e8483/ > > > > > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS -> cpu/event=0xd4,period=0x186a7,umask=0x2/ > > > > > > > > > > > > Control descriptor is not initialized > > > > > > > > > > > > MEM_LOAD_UOPS_RETIRED.LLC_HIT: 0 4007421228 0 > > > > > > CYCLE_ACTIVITY.STALLS_L2_PENDING: 0 4007421228 0 > > > > > > CPU_CLK_UNHALTED.THREAD: 0 4007421228 0 > > > > > > MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS: 0 4007421228 0 > > > > > > > > > > > > Performance counter stats for 'system wide': > > > > > > > > > > > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > > > > > > (0,00%) > > > > > > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > > > > > > (0,00%) > > > > > > <not counted> CPU_CLK_UNHALTED.THREAD > > > > > > (0,00%) > > > > > > <not counted> MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > > > > > > (0,00%) > > > > > > > > > > > > 1,002310013 seconds time elapsed > > > > > > > > > > > > So those events/metric-ids resulting in "<not counted>" are all found. > > > > > > > > > > > > What means "Control descriptor is not initialized"? > > > > > > > > > > > > To summarize: > > > > > > > > > > > > Those two tests in "100: perf all metrics test" FAILED: > > > > > > > > > > > > 1. tma_dram_bound > > > > > > 2. tma_l3_bound > > > > > > > > > > > > Best regards, > > > > > > -Sedat- > > > > > > > > > > > >> Thanks, > > > > > >> Ian > > > > > >> > > > > > >>> Last perf version which was OK: > > > > > >>> > > > > > >>> ~/bin/perf -v > > > > > >>> perf version 6.0.0 > > > > > >>> > > > > > >>> echo "linux-perf: Adjust limited access to performance monitoring and > > > > > >>> observability operations" > > > > > >>> echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > > > > > >>> /proc/sys/kernel/perf_event_paranoid > > > > > >>> 0 > > > > > >>> > > > > > >>> ~/bin/perf test 10 86 92 93 94 95 > > > > > >>> 10: PMU events : > > > > > >>> 10.1: PMU event table sanity : Ok > > > > > >>> 10.2: PMU event map aliases : Ok > > > > > >>> 10.3: Parsing of PMU event table metrics : Ok > > > > > >>> 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > > > > >>> 86: perf record tests : Ok > > > > > >>> 92: perf stat tests : Ok > > > > > >>> 93: perf all metricgroups test : Ok > > > > > >>> 94: perf all metrics test : Ok > > > > > >>> 95: perf all PMU test : Ok > > > > > >>> > > > > > >>> echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > > > > > >>> /proc/sys/kernel/perf_event_paranoid > > > > > >>> echo "linux-perf: Reset limited access to performance monitoring and > > > > > >>> observability operations" > > > > > >>> > > > > > >>> If you need further information, please let me know. > > > > > >>> > > > > > >>> Thanks. > > > > > >>> > > > > > >>> Regards, > > > > > >>> -Sedat- > > > > > >>> > > > > > >>> P.S. Instructions > > > > > >>> > > > > > >>> [ REPRODUCER ] > > > > > >>> > > > > > >>> LLVM_MVER="16" > > > > > >>> > > > > > >>> # Debian LLVM > > > > > >>> ##LLVM_TOOLCHAIN_PATH="/usr/lib/llvm-${LLVM_MVER}/bin" > > > > > >>> # Selfmade LLVM > > > > > >>> LLVM_TOOLCHAIN_PATH="/opt/llvm/bin" > > > > > >>> if [ -d ${LLVM_TOOLCHAIN_PATH} ]; then > > > > > >>> export PATH="${LLVM_TOOLCHAIN_PATH}:${PATH}" > > > > > >>> fi > > > > > >>> > > > > > >>> PYTHON_VER="3.11" > > > > > >>> MAKE="make" > > > > > >>> MAKE_OPTS="V=1 -j1 HOSTCC=clang-$LLVM_MVER HOSTLD=ld.lld > > > > > >>> HOSTAR=llvm-ar CC=clang-$LLVM_MVER LD=ld.lld AR=llvm-ar > > > > > >>> STRIP=llvm-strip" > > > > > >>> > > > > > >>> echo "LLVM MVER ........ $LLVM_MVER" > > > > > >>> echo "Path settings .... $PATH" > > > > > >>> echo "Python version ... $PYTHON_VER" > > > > > >>> echo "make line ........ $MAKE $MAKE_OPTS" > > > > > >>> > > > > > >>> LANG=C LC_ALL=C make -C tools/perf clean 2>&1 | tee ../make-log_perf-clean.txt > > > > > >>> > > > > > >>> LANG=C LC_ALL=C $MAKE $MAKE_OPTS -C tools/perf > > > > > >>> PYTHON=python${PYTHON_VER} install-bin 2>&1 | tee > > > > > >>> ../make-log_perf-install_bin_python${PYTHON_VER}_llvm${LLVM_MVER}.txt > > > > > >>> > > > > > >>> > > > > > >>> [ TESTS ] > > > > > >>> > > > > > >>> [ TESTS - START ] > > > > > >>> > > > > > >>> echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > > > > > >>> /proc/sys/kernel/perf_event_paranoid > > > > > >>> > > > > > >>> [ TESTS - DEBIAN ] > > > > > >>> > > > > > >>> /usr/bin/perf -v > > > > > >>> perf version 6.1.7 > > > > > >>> > > > > > >>> /usr/bin/perf test 10 92 98 99 100 101 > > > > > >>> > > > > > >>> 10: PMU events : > > > > > >>> 10.1: PMU event table sanity : Ok > > > > > >>> 10.2: PMU event map aliases : Ok > > > > > >>> 10.3: Parsing of PMU event table metrics : Ok > > > > > >>> 10.4: Parsing of PMU event table metrics with fake PMUs : Ok > > > > > >>> 92: perf record tests : Ok > > > > > >>> 98: perf stat tests : Ok > > > > > >>> 99: perf all metricgroups test : Ok > > > > > >>> 100: perf all metrics test : FAILED! > > > > > >>> 101: perf all PMU test : Ok > > > > > >>> > > > > > >>> [ TESTS - DILEKS ] > > > > > >>> > > > > > >>> ~/bin/perf -v > > > > > >>> perf version 6.2.0-rc5 > > > > > >>> > > > > > >>> ~/bin/perf test 7 87 93 94 95 96 > > > > > >>> > > > > > >>> 7: PMU events : > > > > > >>> 7.1: PMU event table sanity : Ok > > > > > >>> 7.2: PMU event map aliases : Ok > > > > > >>> 7.3: Parsing of PMU event table metrics : Ok > > > > > >>> 7.4: Parsing of PMU event table metrics with fake PMUs : Ok > > > > > >>> 87: perf record tests : Ok > > > > > >>> 93: perf stat tests : Ok > > > > > >>> 94: perf all metricgroups test : Ok > > > > > >>> 95: perf all metrics test : FAILED! > > > > > >>> 96: perf all PMU test : Ok > > > > > >>> > > > > > >>> [ TESTS - FAILED ] > > > > > >>> > > > > > >>> /usr/bin/perf test --verbose 100 2>&1 | tee > > > > > >>> perf-test-verbose-100-perf-all-metrics-test_debian-perf-6-1-7.txt > > > > > >>> > > > > > >>> ~/bin/perf test --verbose 95 2>&1 | tee > > > > > >>> perf-test-verbose-95-perf-all-metrics-test_dileks-perf-6-2-rc5.txt > > > > > >>> > > > > > >>> [ TESTS - STOP ] > > > > > >>> > > > > > >>> echo 1 | sudo tee /proc/sys/kernel/kptr_restrict > > > > > >>> /proc/sys/kernel/perf_event_paranoid > > > > > >>> > > > > > >>> - EOT - ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED! 2023-01-31 3:55 ` Ian Rogers 2023-01-31 6:14 ` Sedat Dilek @ 2023-02-01 15:27 ` Liang, Kan 2023-02-01 17:02 ` Ian Rogers 1 sibling, 1 reply; 13+ messages in thread From: Liang, Kan @ 2023-02-01 15:27 UTC (permalink / raw) To: Ian Rogers, sedat.dilek Cc: Xing, Zhengjun, Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings, James Clark, Stephane Eranian Hi Ian, On 2023-01-30 10:55 p.m., Ian Rogers wrote: >>> There's a question about what we should do in the perf test about >>> this? I have a few solutions: >>> >>> 1) try metric tests again with the --metric-no-group flag and don't >>> fail the test if this succeeds. This allows kernel bugs to hide, so >>> I'm not a huge fan. >>> >>> 2) add a new metric flag/constraint to say not to group, this way the >>> metric will automatically apply the "--metric-no-group" flag. It is a >>> bit of work to wire this up but this kind of failure is common enough >>> in PMUs that it is probably worthwhile. We also need to add the flag >>> to metrics and I'm not sure how to get a good list of the metrics that >>> currently fail and require it. This is okay but error prone. >>> >>> 3) fix the kernel bug and let the perf test fail until an adequate >>> kernel is installed. Probably the best option. >>> >> Hi Ian, >> >> I can confirm: >> >> $ echo 0 | sudo tee /proc/sys/kernel/kptr_restrict >> /proc/sys/kernel/perf_event_paranoid >> 0 >> >> $ ~/bin/perf stat -M tma_l3_bound --metric-no-group -a sleep 1 >> >> Performance counter stats for 'system wide': >> >> 2.058.892 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,5 % >> tma_l3_bound (99,30%) >> 173.254.697 CYCLE_ACTIVITY.STALLS_L2_PENDING >> (99,10%) >> 2.396.130.501 CPU_CLK_UNHALTED.THREAD >> (99,60%) >> 1.110.486 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS >> (99,53%) >> >> 1,001989022 seconds time elapsed >> >> $ ~/bin/perf stat -M tma_dram_bound --metric-no-group -a sleep 1 >> >> Performance counter stats for 'system wide': >> >> 1.729.208 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,2 % >> tma_dram_bound (99,50%) >> 50.346.734 CYCLE_ACTIVITY.STALLS_L2_PENDING >> (99,50%) >> 2.354.963.862 CPU_CLK_UNHALTED.THREAD >> (99,80%) >> 306.500 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS >> (99,61%) >> >> 1,001981392 seconds time elapsed >> >> Thanks! > Thanks, apparently it is an issue with SandyBridge/IvyBridge that some > counters on one hyperthread will limit what can be on the other. I > believe that's the comment related to EXCL access here: > https://github.com/torvalds/linux/blob/master/arch/x86/events/intel/core.c#L124 > So you may have more success with the metric if you disable > hyperthreading, but I imagine that's not a popular option. Thanks for debugging the issue. Yes, it's caused by the HT workaround for SNB/IVB/HSW. The weak group check in the kernel is in validate_group(). It only does a sanity check. It doesn't check all the workarounds and the current status of counters (e.g., whether the fixed counter is occupied by NMI watchdog.) It's possible that a false positive is returned to the perf tool. I once tried to fix the NMI watchdog check in the kernel, but the proposal was rejected. So the metric constraint is introduced. For this issue, I think the above option2 should be a better and practical choice. The issue is only observed on old machines, which usually has a stable kernel running on it. I don't think the user wants to update their kernel just to workaround an issue for several metrics. But it should be much easier for them to update the perf tool. We know that the below events are the problematic events. /* MEM_UOPS_RETIRED.* */ /* MEM_LOAD_UOPS_RETIRED.* */ /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */ /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */ Can we update the convertor script and apply the "--metric-no-group" flag or add a new constraint if the above events are detected in SNB/IVB/HSW? Thanks, Kan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED! 2023-02-01 15:27 ` Liang, Kan @ 2023-02-01 17:02 ` Ian Rogers 2023-02-01 19:06 ` Liang, Kan 0 siblings, 1 reply; 13+ messages in thread From: Ian Rogers @ 2023-02-01 17:02 UTC (permalink / raw) To: Liang, Kan Cc: sedat.dilek, Xing, Zhengjun, Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings, James Clark, Stephane Eranian On Wed, Feb 1, 2023 at 7:28 AM Liang, Kan <kan.liang@linux.intel.com> wrote: > > Hi Ian, > > On 2023-01-30 10:55 p.m., Ian Rogers wrote: > >>> There's a question about what we should do in the perf test about > >>> this? I have a few solutions: > >>> > >>> 1) try metric tests again with the --metric-no-group flag and don't > >>> fail the test if this succeeds. This allows kernel bugs to hide, so > >>> I'm not a huge fan. > >>> > >>> 2) add a new metric flag/constraint to say not to group, this way the > >>> metric will automatically apply the "--metric-no-group" flag. It is a > >>> bit of work to wire this up but this kind of failure is common enough > >>> in PMUs that it is probably worthwhile. We also need to add the flag > >>> to metrics and I'm not sure how to get a good list of the metrics that > >>> currently fail and require it. This is okay but error prone. > >>> > >>> 3) fix the kernel bug and let the perf test fail until an adequate > >>> kernel is installed. Probably the best option. > >>> > >> Hi Ian, > >> > >> I can confirm: > >> > >> $ echo 0 | sudo tee /proc/sys/kernel/kptr_restrict > >> /proc/sys/kernel/perf_event_paranoid > >> 0 > >> > >> $ ~/bin/perf stat -M tma_l3_bound --metric-no-group -a sleep 1 > >> > >> Performance counter stats for 'system wide': > >> > >> 2.058.892 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,5 % > >> tma_l3_bound (99,30%) > >> 173.254.697 CYCLE_ACTIVITY.STALLS_L2_PENDING > >> (99,10%) > >> 2.396.130.501 CPU_CLK_UNHALTED.THREAD > >> (99,60%) > >> 1.110.486 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > >> (99,53%) > >> > >> 1,001989022 seconds time elapsed > >> > >> $ ~/bin/perf stat -M tma_dram_bound --metric-no-group -a sleep 1 > >> > >> Performance counter stats for 'system wide': > >> > >> 1.729.208 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,2 % > >> tma_dram_bound (99,50%) > >> 50.346.734 CYCLE_ACTIVITY.STALLS_L2_PENDING > >> (99,50%) > >> 2.354.963.862 CPU_CLK_UNHALTED.THREAD > >> (99,80%) > >> 306.500 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS > >> (99,61%) > >> > >> 1,001981392 seconds time elapsed > >> > >> Thanks! > > Thanks, apparently it is an issue with SandyBridge/IvyBridge that some > > counters on one hyperthread will limit what can be on the other. I > > believe that's the comment related to EXCL access here: > > https://github.com/torvalds/linux/blob/master/arch/x86/events/intel/core.c#L124 > > So you may have more success with the metric if you disable > > hyperthreading, but I imagine that's not a popular option. > > Thanks for debugging the issue. Yes, it's caused by the HT workaround > for SNB/IVB/HSW. > > The weak group check in the kernel is in validate_group(). It only does > a sanity check. It doesn't check all the workarounds and the current > status of counters (e.g., whether the fixed counter is occupied by NMI > watchdog.) It's possible that a false positive is returned to the perf > tool. I once tried to fix the NMI watchdog check in the kernel, but the > proposal was rejected. So the metric constraint is introduced. > > For this issue, I think the above option2 should be a better and > practical choice. The issue is only observed on old machines, which > usually has a stable kernel running on it. I don't think the user wants > to update their kernel just to workaround an issue for several metrics. > But it should be much easier for them to update the perf tool. > > We know that the below events are the problematic events. > /* MEM_UOPS_RETIRED.* */ > /* MEM_LOAD_UOPS_RETIRED.* */ > /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */ > /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */ > Can we update the convertor script and apply the "--metric-no-group" > flag or add a new constraint if the above events are detected in > SNB/IVB/HSW? > > Thanks, > Kan Thanks Kan, We absolutely can do that! In this case should it be --metric-no-group only when SMT is enabled? I can do some patches but would like to know about whether we need SMT and not SMT versions of --metric-no-group. Also, should we just have a list of metrics that need the flag or try to automate detection? Some warts in detection are the names of the events that vary between Ivybridge and Sandybridge, and how to determine which events conflict. For example, the perfmon event data: MEM_LOAD_UOPS_RETIRED.LLC_HIT https://github.com/intel/perfmon/blob/main/IVB/events/ivybridge_core.json#L5368 MEM_LOAD_UOPS_RETIRED.LLC_MISS https://github.com/intel/perfmon/blob/main/IVB/events/ivybridge_core.json#L5431 CYCLE_ACTIVITY.STALLS_L2_PENDING https://github.com/intel/perfmon/blob/main/IVB/events/ivybridge_core.json#L3541 The events list all counters, there are no errata fields.. Should the event data be updated and then in the converter script handle that? If I get shown an example I can modify the script accordingly. It is also hard for me to test anything other than SMT on Ivybridge. Thanks, Ian ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED! 2023-02-01 17:02 ` Ian Rogers @ 2023-02-01 19:06 ` Liang, Kan 0 siblings, 0 replies; 13+ messages in thread From: Liang, Kan @ 2023-02-01 19:06 UTC (permalink / raw) To: Ian Rogers Cc: sedat.dilek, Xing, Zhengjun, Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings, James Clark, Stephane Eranian On 2023-02-01 12:02 p.m., Ian Rogers wrote: > On Wed, Feb 1, 2023 at 7:28 AM Liang, Kan <kan.liang@linux.intel.com> wrote: >> >> Hi Ian, >> >> On 2023-01-30 10:55 p.m., Ian Rogers wrote: >>>>> There's a question about what we should do in the perf test about >>>>> this? I have a few solutions: >>>>> >>>>> 1) try metric tests again with the --metric-no-group flag and don't >>>>> fail the test if this succeeds. This allows kernel bugs to hide, so >>>>> I'm not a huge fan. >>>>> >>>>> 2) add a new metric flag/constraint to say not to group, this way the >>>>> metric will automatically apply the "--metric-no-group" flag. It is a >>>>> bit of work to wire this up but this kind of failure is common enough >>>>> in PMUs that it is probably worthwhile. We also need to add the flag >>>>> to metrics and I'm not sure how to get a good list of the metrics that >>>>> currently fail and require it. This is okay but error prone. >>>>> >>>>> 3) fix the kernel bug and let the perf test fail until an adequate >>>>> kernel is installed. Probably the best option. >>>>> >>>> Hi Ian, >>>> >>>> I can confirm: >>>> >>>> $ echo 0 | sudo tee /proc/sys/kernel/kptr_restrict >>>> /proc/sys/kernel/perf_event_paranoid >>>> 0 >>>> >>>> $ ~/bin/perf stat -M tma_l3_bound --metric-no-group -a sleep 1 >>>> >>>> Performance counter stats for 'system wide': >>>> >>>> 2.058.892 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,5 % >>>> tma_l3_bound (99,30%) >>>> 173.254.697 CYCLE_ACTIVITY.STALLS_L2_PENDING >>>> (99,10%) >>>> 2.396.130.501 CPU_CLK_UNHALTED.THREAD >>>> (99,60%) >>>> 1.110.486 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS >>>> (99,53%) >>>> >>>> 1,001989022 seconds time elapsed >>>> >>>> $ ~/bin/perf stat -M tma_dram_bound --metric-no-group -a sleep 1 >>>> >>>> Performance counter stats for 'system wide': >>>> >>>> 1.729.208 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 1,2 % >>>> tma_dram_bound (99,50%) >>>> 50.346.734 CYCLE_ACTIVITY.STALLS_L2_PENDING >>>> (99,50%) >>>> 2.354.963.862 CPU_CLK_UNHALTED.THREAD >>>> (99,80%) >>>> 306.500 MEM_LOAD_UOPS_MISC_RETIRED.LLC_MISS >>>> (99,61%) >>>> >>>> 1,001981392 seconds time elapsed >>>> >>>> Thanks! >>> Thanks, apparently it is an issue with SandyBridge/IvyBridge that some >>> counters on one hyperthread will limit what can be on the other. I >>> believe that's the comment related to EXCL access here: >>> https://github.com/torvalds/linux/blob/master/arch/x86/events/intel/core.c#L124 >>> So you may have more success with the metric if you disable >>> hyperthreading, but I imagine that's not a popular option. >> >> Thanks for debugging the issue. Yes, it's caused by the HT workaround >> for SNB/IVB/HSW. >> >> The weak group check in the kernel is in validate_group(). It only does >> a sanity check. It doesn't check all the workarounds and the current >> status of counters (e.g., whether the fixed counter is occupied by NMI >> watchdog.) It's possible that a false positive is returned to the perf >> tool. I once tried to fix the NMI watchdog check in the kernel, but the >> proposal was rejected. So the metric constraint is introduced. >> >> For this issue, I think the above option2 should be a better and >> practical choice. The issue is only observed on old machines, which >> usually has a stable kernel running on it. I don't think the user wants >> to update their kernel just to workaround an issue for several metrics. >> But it should be much easier for them to update the perf tool. >> >> We know that the below events are the problematic events. >> /* MEM_UOPS_RETIRED.* */ >> /* MEM_LOAD_UOPS_RETIRED.* */ >> /* MEM_LOAD_UOPS_LLC_HIT_RETIRED.* */ >> /* MEM_LOAD_UOPS_LLC_MISS_RETIRED.* */ >> Can we update the convertor script and apply the "--metric-no-group" >> flag or add a new constraint if the above events are detected in >> SNB/IVB/HSW? >> >> Thanks, >> Kan > > Thanks Kan, > > We absolutely can do that! In this case should it be --metric-no-group > only when SMT is enabled? I can do some patches but would like to know > about whether we need SMT and not SMT versions of --metric-no-group. The kernel workaround is disabled when SMT is off. So I think we only need SMT version of --metric-no-group. https://lore.kernel.org/all/1416251225-17721-13-git-send-email-eranian@google.com/T/#u > Also, should we just have a list of metrics that need the flag or try > to automate detection? I don't think Intel will update the metrics or events for the old SNB/IVB/HSW platforms. Hard code a list of metrics may be simpler than automated detection. > Some warts in detection are the names of the > events that vary between Ivybridge and Sandybridge, and how to > determine which events conflict. For example, the perfmon event data: > > MEM_LOAD_UOPS_RETIRED.LLC_HIT > https://github.com/intel/perfmon/blob/main/IVB/events/ivybridge_core.json#L5368 > MEM_LOAD_UOPS_RETIRED.LLC_MISS > https://github.com/intel/perfmon/blob/main/IVB/events/ivybridge_core.json#L5431 > CYCLE_ACTIVITY.STALLS_L2_PENDING > https://github.com/intel/perfmon/blob/main/IVB/events/ivybridge_core.json#L3541 > The problematic events should have the same name among platforms. If the event name doesn't work, the event encoding is exactly the same among those platforms. > The events list all counters, there are no errata fields.. Should the > event data be updated and then in the converter script handle that? If > I get shown an example I can modify the script accordingly. If it can helps the converter script, I think we can update the errata field. Here are the errata information. * SNB: BJ122 * IVB: BV98 * HSW: HSD29 Here is the details regarding the issue. (Please search BV98) https://www.intel.com/content/www/us/en/content-details/619604/desktop-3rd-generation-intel-core-processor-family-specification-update.html > > It is also hard for me to test anything other than SMT on Ivybridge. > I think it's OK to only test on Ivybridge. The original kernel patch indicates the issue is the same among SNB, IVB and HSW. https://lore.kernel.org/all/1416251225-17721-7-git-send-email-eranian@google.com/T/#u Thanks, Kan ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [6.1.7][6.2-rc5] perf all metrics test: FAILED! 2023-01-31 0:20 ` Ian Rogers 2023-01-31 3:45 ` Sedat Dilek @ 2023-02-01 6:51 ` Ravi Bangoria 1 sibling, 0 replies; 13+ messages in thread From: Ravi Bangoria @ 2023-02-01 6:51 UTC (permalink / raw) To: Ian Rogers, Liang, Kan, Xing, Zhengjun, sedat.dilek Cc: Arnaldo Carvalho de Melo, Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim, linux-perf-users, linux-kernel, Nick Desaulniers, Nathan Chancellor, llvm, Ben Hutchings, James Clark, Stephane Eranian, Ravi Bangoria Hi Ian, > So I think this is a kernel bug triggering a perf tool bug. The kernel > bug can be worked around in the perf tool. I only had an Ivybridge to > test with (hence slightly different events) but what I see is both > tma_dram_bound and tma_l3_bound using the same 4 events. I could work > around the "<not counted>" by adding the --metric-no-group flag: > > ``` > $ perf stat -M tma_l3_bound --metric-no-group -a sleep 1 > > Performance counter stats for 'system wide': > > 400,404 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 4.3 % > tma_l3_bound (74.99%) > 128,937,891 CYCLE_ACTIVITY.STALLS_L2_PENDING > (87.46%) > 167,459 MEM_LOAD_UOPS_RETIRED.LLC_MISS > (74.99%) > 759,574,967 CPU_CLK_UNHALTED.THREAD > (87.47%) > > 1.001526438 seconds time elapsed > > $ perf stat -M tma_dram_bound -a --metric-no-group sleep 1 > > Performance counter stats for 'system wide': > > 259,954 MEM_LOAD_UOPS_RETIRED.LLC_HIT # 15.2 % > tma_dram_bound (74.99%) > 118,807,043 CYCLE_ACTIVITY.STALLS_L2_PENDING > (87.46%) > 111,699 MEM_LOAD_UOPS_RETIRED.LLC_MISS > (74.95%) > 587,571,060 CPU_CLK_UNHALTED.THREAD > (87.45%) > > 1.001518093 seconds time elapsed > ``` > > The issue is that perf metrics use weak groups of events. A weak group > is the same as a group of events initially. We want to use groups of > events with metrics so that all the counters are scheduled in and out > at the same time, and not multiplexed independently. Imagine measuring > IPC but the counts for instructions and cycles are measured at > different periods, the resultant IPC value would be unlikely to be > accurate. If perf_event_open fails then the perf tool retries the > events without the group. If I try just 3 of the events in a weak > group then the failure can be seen: > > ``` > $ perf stat -e "{MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING}:W" > -a sleep 1 > > Performance counter stats for 'system wide': > > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_HIT > (0.00%) > <not counted> MEM_LOAD_UOPS_RETIRED.LLC_MISS > (0.00%) > <not counted> CYCLE_ACTIVITY.STALLS_L2_PENDING > (0.00%) > > 1.001458485 seconds time elapsed > ``` > > The kernel should have failed the perf_event_open on opening the third > event and then measured without the group, IIUC, Kernel should not fail opening of the 3rd event, because there are 4 general purpose counters on Intel and all three events can be scheduled on any of the 4 counter (I checked IvyBridge). However, what I don't understand is why kernel failed to schedule the group. Unless someone has pre-occupied 2 or more GP counter, group should get schedule fine. > which it can do with > multiplexing as in the following: > > ``` > $ perf stat -e "MEM_LOAD_UOPS_RETIRED.LLC_HIT,MEM_LOAD_UOPS_RETIRED.LLC_MISS,CYCLE_ACTIVITY.STALLS_L2_PENDING" > -a sleep 1 > > Performance counter stats for 'system wide': > > 1,239,397 MEM_LOAD_UOPS_RETIRED.LLC_HIT > (79.06%) > 174,826 MEM_LOAD_UOPS_RETIRED.LLC_MISS > (64.60%) > 124,026,024 CYCLE_ACTIVITY.STALLS_L2_PENDING > (81.16%) > > 1.001483434 seconds time elapsed > ``` > > When the --metric-no-group flag is given to perf then it doesn't > produce the initial weak group, which works around the bug of the > kernel not failing on the 3rd perf_event_open. I've added Kan and > Zhengjun to the e-mail as they work on the Intel kernel PMU code. > > There's a question about what we should do in the perf test about > this? I have a few solutions: > > 1) try metric tests again with the --metric-no-group flag and don't > fail the test if this succeeds. This allows kernel bugs to hide, so > I'm not a huge fan. > > 2) add a new metric flag/constraint to say not to group, this way the > metric will automatically apply the "--metric-no-group" flag. It is a > bit of work to wire this up but this kind of failure is common enough > in PMUs that it is probably worthwhile. We also need to add the flag > to metrics and I'm not sure how to get a good list of the metrics that > currently fail and require it. This is okay but error prone. > > 3) fix the kernel bug and let the perf test fail until an adequate > kernel is installed. Probably the best option. Thanks, Ravi ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2023-02-01 19:07 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-29 9:58 [6.1.7][6.2-rc5] perf all metrics test: FAILED! Sedat Dilek 2023-01-29 23:21 ` Ian Rogers 2023-01-30 2:24 ` Sedat Dilek 2023-01-30 10:04 ` James Clark 2023-01-31 0:20 ` Ian Rogers 2023-01-31 3:45 ` Sedat Dilek 2023-01-31 3:55 ` Ian Rogers 2023-01-31 6:14 ` Sedat Dilek 2023-01-31 6:20 ` Sedat Dilek 2023-02-01 15:27 ` Liang, Kan 2023-02-01 17:02 ` Ian Rogers 2023-02-01 19:06 ` Liang, Kan 2023-02-01 6:51 ` Ravi Bangoria
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).