Greeting, FYI, we noticed a 5.6% improvement of will-it-scale.per_process_ops due to commit: commit: 9bc0bb50727c8ac69fbb33fb937431cf3518ff37 ("objtool/x86: Rewrite retpoline thunk calls") https://git.kernel.org/cgit/linux/kernel/git/tip/tip.git x86/core in testcase: will-it-scale on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory with following parameters: nr_task: 16 mode: process test: eventfd1 cpufreq_governor: performance ucode: 0x5003006 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml bin/lkp run compatible-job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/process/16/debian-10.4-x86_64-20200603.cgz/lkp-csl-2ap2/eventfd1/will-it-scale/0x5003006 commit: 50e7b4a1a1 ("objtool: Skip magical retpoline .altinstr_replacement") 9bc0bb5072 ("objtool/x86: Rewrite retpoline thunk calls") 50e7b4a1a1b264fc 9bc0bb50727c8ac69fbb33fb937 ---------------- --------------------------- %stddev %change %stddev \ | \ 46843229 +5.6% 49479323 will-it-scale.16.processes 2927701 +5.6% 3092457 will-it-scale.per_process_ops 46843229 +5.6% 49479323 will-it-scale.workload 8251 ± 8% -19.6% 6635 ± 11% numa-vmstat.node0.nr_slab_reclaimable 33007 ± 8% -19.6% 26543 ± 11% numa-meminfo.node0.KReclaimable 33007 ± 8% -19.6% 26543 ± 11% numa-meminfo.node0.SReclaimable 1172 ± 12% -68.4% 370.67 ±141% perf-sched.wait_and_delay.avg.ms.futex_wait_queue_me.futex_wait.do_futex.__x64_sys_futex 1172 ± 12% -68.4% 370.65 ±141% perf-sched.wait_time.avg.ms.futex_wait_queue_me.futex_wait.do_futex.__x64_sys_futex 112.67 ± 20% -32.7% 75.83 ± 19% interrupts.CPU115.NMI:Non-maskable_interrupts 112.67 ± 20% -32.7% 75.83 ± 19% interrupts.CPU115.PMI:Performance_monitoring_interrupts 154.00 ± 43% -41.8% 89.67 ± 40% interrupts.CPU135.NMI:Non-maskable_interrupts 154.00 ± 43% -41.8% 89.67 ± 40% interrupts.CPU135.PMI:Performance_monitoring_interrupts 128.50 ± 16% -39.7% 77.50 ± 33% interrupts.CPU151.NMI:Non-maskable_interrupts 128.50 ± 16% -39.7% 77.50 ± 33% interrupts.CPU151.PMI:Performance_monitoring_interrupts 126.50 ± 19% -39.1% 77.00 ± 34% interrupts.CPU152.NMI:Non-maskable_interrupts 126.50 ± 19% -39.1% 77.00 ± 34% interrupts.CPU152.PMI:Performance_monitoring_interrupts 150.67 ± 49% -52.7% 71.33 ± 33% interrupts.CPU153.NMI:Non-maskable_interrupts 150.67 ± 49% -52.7% 71.33 ± 33% interrupts.CPU153.PMI:Performance_monitoring_interrupts 134.67 ± 30% -45.5% 73.33 ± 33% interrupts.CPU154.NMI:Non-maskable_interrupts 134.67 ± 30% -45.5% 73.33 ± 33% interrupts.CPU154.PMI:Performance_monitoring_interrupts 229.00 ± 82% -64.9% 80.33 ± 38% interrupts.CPU57.NMI:Non-maskable_interrupts 229.00 ± 82% -64.9% 80.33 ± 38% interrupts.CPU57.PMI:Performance_monitoring_interrupts 9305 ± 16% +30.4% 12133 ± 20% softirqs.CPU116.RCU 9674 ± 8% +17.7% 11391 ± 11% softirqs.CPU121.RCU 10950 ± 8% +13.3% 12402 ± 7% softirqs.CPU160.RCU 11054 ± 8% +14.6% 12663 ± 5% softirqs.CPU161.RCU 10764 ± 6% +16.6% 12548 ± 6% softirqs.CPU163.RCU 11073 ± 8% +20.4% 13337 ± 4% softirqs.CPU164.RCU 10840 ± 7% +18.1% 12797 ± 6% softirqs.CPU165.RCU 10935 ± 9% +19.5% 13066 ± 7% softirqs.CPU166.RCU 10791 ± 8% +17.0% 12629 ± 8% softirqs.CPU168.RCU 10152 ± 6% +17.1% 11892 ± 5% softirqs.CPU171.RCU 10644 ± 6% +13.0% 12032 ± 5% softirqs.CPU172.RCU 14639 ± 11% +20.5% 17644 ± 10% softirqs.CPU3.RCU 11177 ± 8% +13.4% 12671 ± 7% softirqs.CPU64.RCU 11039 ± 6% +15.3% 12730 ± 6% softirqs.CPU67.RCU 11218 ± 9% +17.9% 13225 ± 5% softirqs.CPU68.RCU 15014 ± 11% +17.8% 17688 ± 6% softirqs.CPU7.RCU 11300 ± 9% +17.4% 13267 ± 7% softirqs.CPU70.RCU 11094 ± 6% +18.1% 13099 ± 7% softirqs.CPU71.RCU 10930 ± 8% +15.5% 12620 ± 5% softirqs.CPU72.RCU 10800 ± 7% +15.8% 12509 ± 8% softirqs.CPU75.RCU 10822 ± 8% +14.7% 12412 ± 6% softirqs.CPU76.RCU 24155 ± 13% +26.9% 30649 ± 14% softirqs.CPU99.SCHED 1.633e+10 +3.7% 1.694e+10 perf-stat.i.branch-instructions 1.18 ± 9% -0.6 0.59 ± 15% perf-stat.i.branch-miss-rate% 1.881e+08 ± 5% -47.4% 98885807 ± 15% perf-stat.i.branch-misses 4905715 ± 12% -56.2% 2149255 ± 70% perf-stat.i.cache-misses 0.72 ± 5% -11.5% 0.64 ± 2% perf-stat.i.cpi 11933 ± 13% +264.7% 43517 ± 54% perf-stat.i.cycles-between-cache-misses 2.352e+10 +5.6% 2.484e+10 perf-stat.i.dTLB-loads 1.574e+10 +5.7% 1.664e+10 perf-stat.i.dTLB-stores 1.748e+08 -47.8% 91212835 ± 6% perf-stat.i.iTLB-load-misses 8.088e+10 +5.6% 8.541e+10 perf-stat.i.instructions 464.80 +104.0% 948.14 ± 6% perf-stat.i.instructions-per-iTLB-miss 1.42 +10.6% 1.57 ± 2% perf-stat.i.ipc 289.77 +5.1% 304.52 perf-stat.i.metric.M/sec 54032 ± 36% -49.7% 27155 ± 26% perf-stat.i.node-loads 1.15 ± 5% -0.6 0.58 ± 16% perf-stat.overall.branch-miss-rate% 0.70 -9.3% 0.64 ± 2% perf-stat.overall.cpi 11810 ± 13% +229.4% 38908 ± 51% perf-stat.overall.cycles-between-cache-misses 462.63 +103.3% 940.67 ± 6% perf-stat.overall.instructions-per-iTLB-miss 1.42 +10.4% 1.57 ± 2% perf-stat.overall.ipc 1.627e+10 +3.7% 1.688e+10 perf-stat.ps.branch-instructions 1.875e+08 ± 5% -47.4% 98557001 ± 15% perf-stat.ps.branch-misses 4889627 ± 12% -56.2% 2142836 ± 70% perf-stat.ps.cache-misses 2.344e+10 +5.6% 2.476e+10 perf-stat.ps.dTLB-loads 1.569e+10 +5.7% 1.659e+10 perf-stat.ps.dTLB-stores 1.742e+08 -47.8% 90889010 ± 6% perf-stat.ps.iTLB-load-misses 8.061e+10 +5.6% 8.512e+10 perf-stat.ps.instructions 53915 ± 36% -49.6% 27175 ± 26% perf-stat.ps.node-loads 2.442e+13 +5.3% 2.571e+13 perf-stat.total.instructions 14.71 ± 7% -2.0 12.67 ± 8% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.write 3.80 ± 26% -1.5 2.29 ± 12% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry 8.34 ± 7% -1.2 7.13 ± 8% perf-profile.calltrace.cycles-pp.eventfd_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.32 ± 31% -1.0 0.30 ±103% perf-profile.calltrace.cycles-pp.menu_select.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 4.89 ± 7% -0.9 3.98 ± 9% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.72 ± 6% -0.8 2.94 ± 7% perf-profile.calltrace.cycles-pp.security_file_permission.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.70 ± 7% -0.6 2.15 ± 10% perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_read.ksys_read.do_syscall_64 2.85 ± 6% -0.5 2.36 ± 7% perf-profile.calltrace.cycles-pp.common_file_perm.security_file_permission.vfs_write.ksys_write.do_syscall_64 0.72 ± 8% -0.4 0.28 ±100% perf-profile.calltrace.cycles-pp.___might_sleep.__might_fault._copy_from_user.eventfd_write.vfs_write 1.23 ± 8% -0.3 0.97 ± 10% perf-profile.calltrace.cycles-pp.__might_fault._copy_from_user.eventfd_write.vfs_write.ksys_write 14.85 ± 7% -2.0 12.81 ± 8% perf-profile.children.cycles-pp.vfs_write 8.61 ± 6% -1.7 6.93 ± 8% perf-profile.children.cycles-pp.security_file_permission 8.45 ± 7% -1.2 7.24 ± 8% perf-profile.children.cycles-pp.eventfd_write 5.70 ± 7% -1.1 4.64 ± 9% perf-profile.children.cycles-pp.common_file_perm 1.33 ± 31% -0.8 0.48 ± 28% perf-profile.children.cycles-pp.menu_select 3.46 ± 15% -0.8 2.68 ± 10% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.71 ± 7% -0.4 0.27 ± 10% perf-profile.children.cycles-pp.apparmor_file_permission 2.46 ± 7% -0.3 2.11 ± 9% perf-profile.children.cycles-pp.__might_fault 1.33 ± 7% -0.2 1.13 ± 9% perf-profile.children.cycles-pp.___might_sleep 0.38 ± 8% +0.1 0.48 ± 9% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare 1.57 ± 55% -1.3 0.30 ± 21% perf-profile.self.cycles-pp.cpuidle_enter_state 4.39 ± 7% -1.1 3.27 ± 8% perf-profile.self.cycles-pp.common_file_perm 2.15 ± 6% -0.8 1.32 ± 11% perf-profile.self.cycles-pp.eventfd_write 0.98 ± 42% -0.8 0.20 ± 44% perf-profile.self.cycles-pp.menu_select 2.25 ± 7% -0.6 1.61 ± 8% perf-profile.self.cycles-pp.eventfd_read 0.57 ± 7% -0.3 0.27 ± 10% perf-profile.self.cycles-pp.apparmor_file_permission 1.32 ± 7% -0.2 1.12 ± 9% perf-profile.self.cycles-pp.___might_sleep 0.43 ± 7% -0.1 0.35 ± 9% perf-profile.self.cycles-pp.__might_fault 0.11 ± 12% -0.0 0.08 ± 16% perf-profile.self.cycles-pp.read_tsc 0.07 ± 5% +0.1 0.13 ± 11% perf-profile.self.cycles-pp.__x64_sys_write 0.07 ± 12% +0.1 0.14 ± 11% perf-profile.self.cycles-pp.__x64_sys_read 0.26 ± 10% +0.1 0.37 ± 8% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare will-it-scale.per_process_ops 3.12e+06 +----------------------------------------------------------------+ 3.1e+06 |-+ | | O O O O | 3.08e+06 |-+ O | 3.06e+06 |-+ | | | 3.04e+06 |-+ O O OO O | 3.02e+06 |-O O O O | 3e+06 |-+ O | | | 2.98e+06 |-+ O OO | 2.96e+06 |-+ | | +.++.+.+.++.+.+.++.+.+.++.+.+.+ | 2.94e+06 |.+ + +. .+. +. .+ .+.+ .+.| 2.92e+06 +----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. --- 0DAY/LKP+ Test Infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation Thanks, Oliver Sang