Greeting, FYI, we noticed a -3.0% regression of will-it-scale.per_process_ops due to commit: commit: 47b8ff194c1fd73d58dc339b597d466fe48c8958 ("entry: Explicitly flush pending rcuog wakeup before last rescheduling point") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: will-it-scale on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory with following parameters: nr_task: 100% mode: process test: futex3 cpufreq_governor: performance ucode: 0x5003006 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml bin/lkp run compatible-job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/process/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/futex3/will-it-scale/0x5003006 commit: f8bb5cae96 ("rcu/nocb: Trigger self-IPI on late deferred wake up before user resume") 47b8ff194c ("entry: Explicitly flush pending rcuog wakeup before last rescheduling point") f8bb5cae9616224a 47b8ff194c1fd73d58dc339b597 ---------------- --------------------------- %stddev %change %stddev \ | \ 1.25e+09 -3.0% 1.212e+09 will-it-scale.192.processes 6508984 -3.0% 6314032 will-it-scale.per_process_ops 1.25e+09 -3.0% 1.212e+09 will-it-scale.workload 68.00 -1.5% 67.00 vmstat.cpu.sy 30.00 +3.3% 31.00 vmstat.cpu.us 8.622e+10 +1.2% 8.728e+10 perf-stat.i.branch-instructions 0.38 -0.0 0.36 perf-stat.i.branch-miss-rate% 3.206e+08 -3.7% 3.088e+08 perf-stat.i.branch-misses 0.98 +1.1% 0.99 perf-stat.i.cpi 263518 +2.3% 269550 perf-stat.i.dTLB-store-misses 1.135e+11 -1.9% 1.113e+11 perf-stat.i.dTLB-stores 3.257e+08 -4.8% 3.1e+08 perf-stat.i.iTLB-load-misses 5.718e+11 -1.1% 5.656e+11 perf-stat.i.instructions 1758 +3.9% 1827 perf-stat.i.instructions-per-iTLB-miss 1.03 -1.1% 1.01 perf-stat.i.ipc 0.37 -0.0 0.35 perf-stat.overall.branch-miss-rate% 0.97 +1.1% 0.99 perf-stat.overall.cpi 0.00 +0.0 0.00 perf-stat.overall.dTLB-store-miss-rate% 1755 +3.9% 1824 perf-stat.overall.instructions-per-iTLB-miss 1.03 -1.1% 1.02 perf-stat.overall.ipc 138016 +2.0% 140712 perf-stat.overall.path-length 8.592e+10 +1.2% 8.698e+10 perf-stat.ps.branch-instructions 3.195e+08 -3.7% 3.078e+08 perf-stat.ps.branch-misses 262973 +2.3% 269022 perf-stat.ps.dTLB-store-misses 1.131e+11 -1.9% 1.109e+11 perf-stat.ps.dTLB-stores 3.246e+08 -4.8% 3.09e+08 perf-stat.ps.iTLB-load-misses 5.698e+11 -1.1% 5.637e+11 perf-stat.ps.instructions 1.725e+14 -1.1% 1.706e+14 perf-stat.total.instructions 32.11 -1.0 31.08 perf-profile.calltrace.cycles-pp.__entry_text_start.syscall 36.13 -0.3 35.81 perf-profile.calltrace.cycles-pp.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 39.88 -0.3 39.58 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 30.75 -0.2 30.57 perf-profile.calltrace.cycles-pp.do_futex.__x64_sys_futex.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 2.22 -0.1 2.16 perf-profile.calltrace.cycles-pp.testcase 2.15 -0.1 2.09 perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.syscall 2.21 -0.0 2.17 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.syscall 6.22 +0.1 6.32 perf-profile.calltrace.cycles-pp.get_futex_key.futex_wake.do_futex.__x64_sys_futex.do_syscall_64 1.17 +0.1 1.31 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.syscall 52.27 +1.4 53.62 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.syscall 3.53 +1.5 5.00 perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.syscall 0.00 +1.5 1.55 perf-profile.calltrace.cycles-pp.rcu_nocb_flush_deferred_wakeup.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.syscall 5.58 +1.9 7.47 perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.syscall 20.72 -0.6 20.09 perf-profile.children.cycles-pp.__entry_text_start 17.34 -0.5 16.87 perf-profile.children.cycles-pp.syscall_return_via_sysret 40.05 -0.3 39.75 perf-profile.children.cycles-pp.do_syscall_64 36.43 -0.2 36.20 perf-profile.children.cycles-pp.__x64_sys_futex 31.18 -0.2 30.98 perf-profile.children.cycles-pp.do_futex 2.36 -0.1 2.30 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 2.41 -0.1 2.34 perf-profile.children.cycles-pp.testcase 2.19 -0.1 2.12 perf-profile.children.cycles-pp.syscall_enter_from_user_mode 97.88 +0.1 97.94 perf-profile.children.cycles-pp.syscall 6.46 +0.1 6.58 perf-profile.children.cycles-pp.get_futex_key 1.19 +0.1 1.33 perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare 52.69 +1.4 54.05 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 0.00 +1.6 1.60 perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup 3.58 +1.7 5.33 perf-profile.children.cycles-pp.exit_to_user_mode_prepare 6.16 +1.9 8.07 perf-profile.children.cycles-pp.syscall_exit_to_user_mode 15.88 -0.5 15.33 perf-profile.self.cycles-pp.syscall 17.22 -0.5 16.75 perf-profile.self.cycles-pp.syscall_return_via_sysret 6.00 -0.3 5.70 perf-profile.self.cycles-pp.do_futex 9.33 -0.2 9.09 perf-profile.self.cycles-pp.__entry_text_start 6.58 -0.2 6.35 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 8.29 -0.1 8.22 perf-profile.self.cycles-pp.hash_futex 2.36 -0.1 2.29 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 1.86 -0.1 1.80 perf-profile.self.cycles-pp.syscall_enter_from_user_mode 1.96 -0.0 1.91 perf-profile.self.cycles-pp.testcase 1.14 +0.1 1.27 perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare 6.04 +0.2 6.19 perf-profile.self.cycles-pp.get_futex_key 3.29 +0.4 3.65 perf-profile.self.cycles-pp.exit_to_user_mode_prepare 0.00 +1.4 1.39 perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup will-it-scale.192.processes 1.4e+09 +-----------------------------------------------------------------+ | .+.+ ++.++.+.+ +.| 1.2e+09 |.++.++.+O +.++.++.++.+.++.++.++.+.++.++.++ : : +.++.+ | | : : : : | 1e+09 |-+ : : : : | | : : : : | 8e+08 |-+ : : : : | | :: : : | 6e+08 |-+ :: :: | | : :: | 4e+08 |-+ : :: | | + : | 2e+08 |-+ : | | : | 0 +-----------------------------------------------------------------+ will-it-scale.per_process_ops 7e+06 +-------------------------------------------------------------------+ |.++.++.+ O+.+.++.++.+.++.+.++.++.+.++.++.+.++.+ +.++.+.++.+.++.++.| 6e+06 |-OO OO : : : : | | : : : : | 5e+06 |-+ : : : : | | : : : : | 4e+06 |-+ : : : : | | :: : : | 3e+06 |-+ :: :: | | : :: | 2e+06 |-+ : :: | | + : | 1e+06 |-+ : | | : | 0 +-------------------------------------------------------------------+ will-it-scale.workload 1.4e+09 +-----------------------------------------------------------------+ | .+.+ ++.++.+.+ +.| 1.2e+09 |.++.++.+O +.++.++.++.+.++.++.++.+.++.++.++ : : +.++.+ | | : : : : | 1e+09 |-+ : : : : | | : : : : | 8e+08 |-+ : : : : | | :: : : | 6e+08 |-+ :: :: | | : :: | 4e+08 |-+ : :: | | + : | 2e+08 |-+ : | | : | 0 +-----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. --- 0DAY/LKP+ Test Infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation Thanks, Oliver Sang