Greeting, FYI, we noticed a -2.1% regression of will-it-scale.per_process_ops due to commit: commit: b77491648e6eb2f26b6edf5eaea859adc17f4dcc ("perf x86: Infrastructure for exposing an Uncore unit to PMON mapping") https://github.com/0day-ci/linux/commits/roman-sudarikov-linux-intel-com/perf-x86-Exposing-IO-stack-to-IO-PMON-mapping-through-sysfs/20200118-075508 in testcase: will-it-scale on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory with following parameters: nr_task: 100% mode: process test: signal1 cpufreq_governor: performance ucode: 0xb000038 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-20191114.cgz/lkp-bdw-ep6/signal1/will-it-scale/0xb000038 commit: v5.4 b77491648e ("perf x86: Infrastructure for exposing an Uncore unit to PMON mapping") v5.4 b77491648e6eb2f26b6edf5eaea ---------------- --------------------------- %stddev %change %stddev \ | \ 47986 -2.1% 46989 will-it-scale.per_process_ops 4222852 -2.1% 4135110 will-it-scale.workload 427194 ± 9% +13.8% 486344 ± 4% numa-vmstat.node1.numa_local 12.88 ± 2% -8.5% 11.79 ± 4% turbostat.RAMWatt 8846 ± 10% +23.9% 10964 ± 9% softirqs.CPU0.SCHED 14442 ± 4% -5.2% 13697 ± 5% softirqs.CPU71.RCU 78696 ± 9% +14.4% 89993 ± 8% sched_debug.cfs_rq:/.min_vruntime.stddev 78411 ± 9% +14.5% 89817 ± 8% sched_debug.cfs_rq:/.spread0.stddev 9.77 ± 4% +15.0% 11.23 ± 3% sched_debug.cpu.clock.stddev 9.77 ± 4% +15.0% 11.23 ± 3% sched_debug.cpu.clock_task.stddev 4.072e+09 -1.9% 3.996e+09 perf-stat.i.branch-instructions 44948352 -1.8% 44159252 perf-stat.i.branch-misses 35.25 +4.3 39.56 perf-stat.i.cache-miss-rate% 12569960 +5.2% 13223444 perf-stat.i.cache-misses 35888855 ± 2% -6.2% 33680305 ± 2% perf-stat.i.cache-references 11.75 +1.8% 11.96 perf-stat.i.cpi 19377 -5.0% 18403 perf-stat.i.cycles-between-cache-misses 27157347 -2.1% 26595986 perf-stat.i.dTLB-load-misses 6.739e+09 -2.0% 6.602e+09 perf-stat.i.dTLB-loads 27809165 -1.9% 27268405 perf-stat.i.dTLB-store-misses 5.461e+09 -1.9% 5.356e+09 perf-stat.i.dTLB-stores 2.072e+10 -1.9% 2.034e+10 perf-stat.i.instructions 0.09 -1.7% 0.08 perf-stat.i.ipc 917994 +2.6% 941599 perf-stat.i.node-load-misses 96.93 -1.1 95.81 perf-stat.i.node-store-miss-rate% 5499191 +5.0% 5774707 perf-stat.i.node-store-misses 169716 ± 8% +45.2% 246479 ± 6% perf-stat.i.node-stores 1.73 ± 2% -4.4% 1.66 ± 2% perf-stat.overall.MPKI 35.03 +4.2 39.27 perf-stat.overall.cache-miss-rate% 11.77 +1.8% 11.98 perf-stat.overall.cpi 19401 -5.0% 18428 perf-stat.overall.cycles-between-cache-misses 0.08 -1.8% 0.08 perf-stat.overall.ipc 97.01 -1.1 95.91 perf-stat.overall.node-store-miss-rate% 4.058e+09 -1.8% 3.983e+09 perf-stat.ps.branch-instructions 44798305 -1.7% 44014351 perf-stat.ps.branch-misses 12526500 +5.2% 13178368 perf-stat.ps.cache-misses 35771706 ± 2% -6.2% 33569906 ± 2% perf-stat.ps.cache-references 27063288 -2.1% 26505363 perf-stat.ps.dTLB-load-misses 6.716e+09 -2.0% 6.58e+09 perf-stat.ps.dTLB-loads 27712662 -1.9% 27175399 perf-stat.ps.dTLB-store-misses 5.442e+09 -1.9% 5.338e+09 perf-stat.ps.dTLB-stores 2.065e+10 -1.9% 2.027e+10 perf-stat.ps.instructions 914841 +2.6% 938399 perf-stat.ps.node-load-misses 5480102 +5.0% 5754996 perf-stat.ps.node-store-misses 169148 ± 8% +45.2% 245649 ± 6% perf-stat.ps.node-stores 6.242e+12 -1.6% 6.142e+12 perf-stat.total.instructions 481.50 ± 26% -41.7% 280.75 ± 28% interrupts.37:IR-PCI-MSI.1572868-edge.eth0-TxRx-3 772.75 ± 63% -70.0% 231.75 ± 28% interrupts.CPU1.RES:Rescheduling_interrupts 481.50 ± 26% -41.7% 280.75 ± 28% interrupts.CPU16.37:IR-PCI-MSI.1572868-edge.eth0-TxRx-3 954.25 ± 10% -71.8% 269.50 ± 76% interrupts.CPU19.RES:Rescheduling_interrupts 932.50 ± 48% -68.4% 294.75 ± 72% interrupts.CPU20.RES:Rescheduling_interrupts 583.75 ± 59% -79.5% 119.75 ± 54% interrupts.CPU21.RES:Rescheduling_interrupts 513.00 ± 42% +145.8% 1261 ± 17% interrupts.CPU22.RES:Rescheduling_interrupts 256.25 ± 40% +253.9% 906.75 ± 39% interrupts.CPU24.RES:Rescheduling_interrupts 475.25 ± 19% +133.5% 1109 ± 41% interrupts.CPU26.RES:Rescheduling_interrupts 734.50 ± 36% +99.1% 1462 ± 26% interrupts.CPU27.RES:Rescheduling_interrupts 905.75 ± 48% -64.9% 318.00 ± 85% interrupts.CPU3.RES:Rescheduling_interrupts 363.00 ± 35% +114.3% 777.75 ± 26% interrupts.CPU30.RES:Rescheduling_interrupts 6915 ± 24% -29.1% 4904 ± 34% interrupts.CPU37.NMI:Non-maskable_interrupts 6915 ± 24% -29.1% 4904 ± 34% interrupts.CPU37.PMI:Performance_monitoring_interrupts 436.50 ± 48% +166.7% 1164 ± 41% interrupts.CPU38.RES:Rescheduling_interrupts 6950 ± 24% -29.1% 4926 ± 34% interrupts.CPU39.NMI:Non-maskable_interrupts 6950 ± 24% -29.1% 4926 ± 34% interrupts.CPU39.PMI:Performance_monitoring_interrupts 6906 ± 24% -28.9% 4910 ± 35% interrupts.CPU41.NMI:Non-maskable_interrupts 6906 ± 24% -28.9% 4910 ± 35% interrupts.CPU41.PMI:Performance_monitoring_interrupts 216.00 ± 70% -76.6% 50.50 ± 22% interrupts.CPU46.RES:Rescheduling_interrupts 2607 ± 47% +51.4% 3948 ± 8% interrupts.CPU50.CAL:Function_call_interrupts 3220 ± 10% +22.4% 3940 ± 8% interrupts.CPU51.CAL:Function_call_interrupts 4914 ± 34% +59.9% 7855 interrupts.CPU56.NMI:Non-maskable_interrupts 4914 ± 34% +59.9% 7855 interrupts.CPU56.PMI:Performance_monitoring_interrupts 4937 ± 34% +59.7% 7885 interrupts.CPU58.NMI:Non-maskable_interrupts 4937 ± 34% +59.7% 7885 interrupts.CPU58.PMI:Performance_monitoring_interrupts 4919 ± 34% +59.6% 7849 interrupts.CPU59.NMI:Non-maskable_interrupts 4919 ± 34% +59.6% 7849 interrupts.CPU59.PMI:Performance_monitoring_interrupts 4925 ± 34% +59.9% 7878 interrupts.CPU61.NMI:Non-maskable_interrupts 4925 ± 34% +59.9% 7878 interrupts.CPU61.PMI:Performance_monitoring_interrupts 4906 ± 33% +60.3% 7867 interrupts.CPU63.NMI:Non-maskable_interrupts 4906 ± 33% +60.3% 7867 interrupts.CPU63.PMI:Performance_monitoring_interrupts 890.00 ± 75% -82.0% 160.00 ± 46% interrupts.CPU63.RES:Rescheduling_interrupts 135.00 ± 52% +911.7% 1365 ± 76% interrupts.CPU70.RES:Rescheduling_interrupts 110.25 ± 14% +388.7% 538.75 ± 30% interrupts.CPU71.RES:Rescheduling_interrupts 3285 ± 3% +15.4% 3791 ± 3% interrupts.CPU73.CAL:Function_call_interrupts 186.50 ± 60% +274.4% 698.25 ± 77% interrupts.CPU81.RES:Rescheduling_interrupts 1.22 ± 2% -0.2 1.02 perf-profile.calltrace.cycles-pp.recalc_sigpending.dequeue_signal.get_signal.do_signal.exit_to_usermode_loop 3.95 -0.2 3.79 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64 4.07 -0.2 3.92 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret 1.93 -0.1 1.79 perf-profile.calltrace.cycles-pp.fpu__clear.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.66 -0.1 0.59 ± 3% perf-profile.calltrace.cycles-pp.__set_task_blocked.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.65 ± 2% -0.1 0.57 ± 2% perf-profile.calltrace.cycles-pp.recalc_sigpending.__set_task_blocked.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64 0.85 -0.1 0.79 ± 2% perf-profile.calltrace.cycles-pp.__set_current_blocked.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.03 -0.0 0.98 perf-profile.calltrace.cycles-pp.signal_setup_done.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.81 -0.0 0.76 ± 2% perf-profile.calltrace.cycles-pp.fpregs_mark_activate.fpu__clear.do_signal.exit_to_usermode_loop.do_syscall_64 0.98 -0.0 0.94 perf-profile.calltrace.cycles-pp.__set_current_blocked.signal_setup_done.do_signal.exit_to_usermode_loop.do_syscall_64 1.10 -0.0 1.07 perf-profile.calltrace.cycles-pp.copy_fpstate_to_sigframe.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.52 +0.0 0.55 ± 3% perf-profile.calltrace.cycles-pp.fpregs_mark_activate.__fpu__restore_sig.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.87 +0.1 1.95 perf-profile.calltrace.cycles-pp.__fpu__restore_sig.__x64_sys_rt_sigreturn.do_syscall_64.entry_SYSCALL_64_after_hwframe 23.79 +0.3 24.06 perf-profile.calltrace.cycles-pp.__sigqueue_alloc.__send_signal.do_send_sig_info.do_send_specific.do_tkill 24.02 +0.3 24.29 perf-profile.calltrace.cycles-pp.__send_signal.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill 89.84 +0.3 90.14 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 89.46 +0.3 89.78 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 36.84 +0.4 37.20 perf-profile.calltrace.cycles-pp.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe 36.43 +0.4 36.80 perf-profile.calltrace.cycles-pp.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe 25.64 +0.4 26.09 perf-profile.calltrace.cycles-pp.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe 25.58 +0.5 26.04 perf-profile.calltrace.cycles-pp.do_tkill.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe 25.35 +0.5 25.82 perf-profile.calltrace.cycles-pp.do_send_specific.do_tkill.__x64_sys_tgkill.do_syscall_64.entry_SYSCALL_64_after_hwframe 24.66 +0.5 25.18 perf-profile.calltrace.cycles-pp.do_send_sig_info.do_send_specific.do_tkill.__x64_sys_tgkill.do_syscall_64 30.40 +0.6 30.97 perf-profile.calltrace.cycles-pp.dequeue_signal.get_signal.do_signal.exit_to_usermode_loop.do_syscall_64 31.58 +0.6 32.18 perf-profile.calltrace.cycles-pp.get_signal.do_signal.exit_to_usermode_loop.do_syscall_64.entry_SYSCALL_64_after_hwframe 29.13 +0.8 29.91 perf-profile.calltrace.cycles-pp.__dequeue_signal.dequeue_signal.get_signal.do_signal.exit_to_usermode_loop 28.90 +0.8 29.68 perf-profile.calltrace.cycles-pp.__sigqueue_free.__dequeue_signal.dequeue_signal.get_signal.do_signal 3.46 -0.2 3.21 ± 2% perf-profile.children.cycles-pp.recalc_sigpending 3.95 -0.2 3.79 perf-profile.children.cycles-pp.entry_SYSCALL_64 4.42 -0.2 4.26 perf-profile.children.cycles-pp.syscall_return_via_sysret 1.93 -0.1 1.80 ± 2% perf-profile.children.cycles-pp.fpu__clear 3.62 -0.1 3.54 perf-profile.children.cycles-pp.__set_current_blocked 0.27 -0.1 0.21 ± 3% perf-profile.children.cycles-pp.fpregs_assert_state_consistent 0.84 -0.0 0.79 ± 2% perf-profile.children.cycles-pp._copy_from_user 1.03 -0.0 0.99 perf-profile.children.cycles-pp.signal_setup_done 0.34 -0.0 0.30 ± 5% perf-profile.children.cycles-pp.restore_altstack 0.73 -0.0 0.70 perf-profile.children.cycles-pp.__might_fault 1.11 -0.0 1.08 perf-profile.children.cycles-pp.copy_fpstate_to_sigframe 0.37 ± 2% -0.0 0.35 perf-profile.children.cycles-pp.___might_sleep 0.27 -0.0 0.26 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string 1.89 +0.1 1.96 perf-profile.children.cycles-pp.__fpu__restore_sig 0.29 ± 7% +0.2 0.53 ± 6% perf-profile.children.cycles-pp.__lock_task_sighand 0.29 ± 7% +0.2 0.53 ± 5% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 24.03 +0.3 24.29 perf-profile.children.cycles-pp.__send_signal 23.80 +0.3 24.06 perf-profile.children.cycles-pp.__sigqueue_alloc 90.00 +0.3 90.30 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 89.60 +0.3 89.92 perf-profile.children.cycles-pp.do_syscall_64 36.86 +0.4 37.22 perf-profile.children.cycles-pp.exit_to_usermode_loop 36.45 +0.4 36.81 perf-profile.children.cycles-pp.do_signal 25.65 +0.4 26.10 perf-profile.children.cycles-pp.__x64_sys_tgkill 25.59 +0.5 26.04 perf-profile.children.cycles-pp.do_tkill 25.36 +0.5 25.82 perf-profile.children.cycles-pp.do_send_specific 24.67 +0.5 25.19 perf-profile.children.cycles-pp.do_send_sig_info 30.41 +0.6 30.98 perf-profile.children.cycles-pp.dequeue_signal 31.60 +0.6 32.20 perf-profile.children.cycles-pp.get_signal 29.14 +0.8 29.92 perf-profile.children.cycles-pp.__dequeue_signal 28.90 +0.8 29.69 perf-profile.children.cycles-pp.__sigqueue_free 19.11 -0.4 18.75 perf-profile.self.cycles-pp.do_syscall_64 2.58 -0.2 2.34 perf-profile.self.cycles-pp.recalc_sigpending 3.95 -0.2 3.79 perf-profile.self.cycles-pp.entry_SYSCALL_64 4.41 -0.2 4.25 perf-profile.self.cycles-pp.syscall_return_via_sysret 0.95 -0.1 0.86 ± 2% perf-profile.self.cycles-pp.fpu__clear 0.25 -0.1 0.19 ± 2% perf-profile.self.cycles-pp.fpregs_assert_state_consistent 0.15 ± 2% -0.0 0.12 ± 6% perf-profile.self.cycles-pp._copy_from_user 0.74 -0.0 0.71 perf-profile.self.cycles-pp.copy_fpstate_to_sigframe 0.34 -0.0 0.31 perf-profile.self.cycles-pp.__x64_sys_rt_sigprocmask 0.46 ± 2% -0.0 0.44 ± 2% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 0.36 ± 3% -0.0 0.34 perf-profile.self.cycles-pp.___might_sleep 0.26 -0.0 0.24 ± 2% perf-profile.self.cycles-pp.copy_user_enhanced_fast_string 1.10 +0.0 1.15 perf-profile.self.cycles-pp.__fpu__restore_sig 0.28 ± 6% +0.2 0.53 ± 5% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 23.71 +0.3 23.96 perf-profile.self.cycles-pp.__sigqueue_alloc 12.65 +0.6 13.24 perf-profile.self.cycles-pp.__sigqueue_free will-it-scale.per_process_ops 52000 +-+-----------------------------------------------------------------+ |.. | 51000 +-++.+..+.+ | 50000 +-+ : | | : | 49000 +-+ : | | +..+. .+.+..+.+..+..+.+..+.+..+.. | 48000 +-+ +..+. +.+..+..+.+..+.+..| | O O O O O O | 47000 +-+ O O O O O O | 46000 +-+ | | | 45000 +-+ O O | O O O O O O O O O | 44000 +-+-----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen