Greeting, FYI, we noticed a -6.4% regression of will-it-scale.per_process_ops due to commit: commit: 8f159f1dfa1ea29d70a84335fe6a8bd501a9eecd ("x86/entry/common: Protect against instrumentation") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: will-it-scale on test machine: 192 threads Cooper Lake with 128G memory with following parameters: nr_task: 100% mode: process test: lseek1 cpufreq_governor: performance ucode: 0x86000017 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-20191114.cgz/lkp-cpx-4s1/lseek1/will-it-scale/0x86000017 commit: 1723be30e4 ("x86/entry: Mark enter_from_user_mode() noinstr") 8f159f1dfa ("x86/entry/common: Protect against instrumentation") 1723be30e46fbda0 8f159f1dfa1ea29d70a84335fe6 ---------------- --------------------------- %stddev %change %stddev \ | \ 9977836 -6.4% 9334708 will-it-scale.per_process_ops 1.916e+09 -6.4% 1.792e+09 will-it-scale.workload 1612 -75.1% 401.75 ±173% meminfo.Mlocked 38.00 +2.6% 39.00 vmstat.cpu.us 30383 ± 27% +51.8% 46125 ± 12% numa-meminfo.node1.KReclaimable 30383 ± 27% +51.8% 46125 ± 12% numa-meminfo.node1.SReclaimable 26574 ± 23% -29.2% 18820 ± 14% numa-meminfo.node2.KReclaimable 26574 ± 23% -29.2% 18820 ± 14% numa-meminfo.node2.SReclaimable 82840 ± 12% -14.1% 71200 ± 4% numa-meminfo.node2.SUnreclaim 100.00 ± 26% -79.0% 21.00 ±173% numa-vmstat.node1.nr_mlock 7595 ± 27% +51.8% 11531 ± 12% numa-vmstat.node1.nr_slab_reclaimable 115.50 ± 26% -81.8% 21.00 ±173% numa-vmstat.node2.nr_mlock 6643 ± 23% -29.2% 4704 ± 14% numa-vmstat.node2.nr_slab_reclaimable 20710 ± 12% -14.1% 17799 ± 4% numa-vmstat.node2.nr_slab_unreclaimable 15093 ± 4% +14.5% 17281 ± 4% sched_debug.cpu.sched_count.max 7918 ± 11% +14.3% 9046 ± 4% sched_debug.cpu.ttwu_count.max 0.49 ± 56% -96.8% 0.02 ±160% sched_debug.rt_rq:/.rt_time.avg 93.99 ± 56% -96.8% 3.00 ±160% sched_debug.rt_rq:/.rt_time.max 6.77 ± 56% -96.8% 0.22 ±160% sched_debug.rt_rq:/.rt_time.stddev 296.75 ± 23% +263.2% 1077 ± 67% interrupts.32:PCI-MSI.524290-edge.eth0-TxRx-1 296.75 ± 23% +263.2% 1077 ± 67% interrupts.CPU10.32:PCI-MSI.524290-edge.eth0-TxRx-1 899.00 ± 7% -10.5% 805.00 interrupts.CPU141.CAL:Function_call_interrupts 1204 ± 36% +83.6% 2211 ± 38% interrupts.CPU170.CAL:Function_call_interrupts 1324 ± 28% -30.8% 916.00 ± 23% interrupts.CPU2.CAL:Function_call_interrupts 3042 ± 36% -53.3% 1419 ± 27% interrupts.CPU24.CAL:Function_call_interrupts 1061 ± 24% +83.7% 1950 ± 32% interrupts.CPU72.CAL:Function_call_interrupts 77.25 ±165% -97.1% 2.25 ± 19% interrupts.CPU93.TLB:TLB_shootdowns 769.00 ± 23% -36.9% 485.00 ± 11% interrupts.TLB:TLB_shootdowns 21833 ± 3% +18.8% 25926 ± 7% softirqs.CPU0.RCU 20599 ± 4% +13.5% 23371 ± 8% softirqs.CPU107.RCU 22896 ± 11% +21.8% 27893 ± 5% softirqs.CPU125.RCU 21380 ± 6% +18.5% 25341 ± 7% softirqs.CPU163.RCU 21890 ± 9% +15.1% 25191 ± 6% softirqs.CPU166.RCU 20047 ± 5% +17.0% 23453 ± 8% softirqs.CPU176.RCU 21786 ± 3% +16.2% 25318 ± 8% softirqs.CPU25.RCU 23213 ± 4% +14.6% 26602 ± 6% softirqs.CPU35.RCU 21272 ± 5% +17.4% 24975 ± 8% softirqs.CPU71.RCU 20159 ± 4% +16.1% 23400 ± 7% softirqs.CPU76.RCU 1.176e+11 +2.7% 1.208e+11 perf-stat.i.branch-instructions 1.65 -0.1 1.51 perf-stat.i.branch-miss-rate% 1.934e+09 -6.5% 1.808e+09 perf-stat.i.branch-misses 1.26 -5.7% 1.19 perf-stat.i.cpi 0.00 ± 5% +0.0 0.00 ± 5% perf-stat.i.dTLB-load-miss-rate% 441221 ± 6% +606.9% 3119036 ± 5% perf-stat.i.dTLB-load-misses 1.7e+11 +7.2% 1.823e+11 perf-stat.i.dTLB-loads 16104 ± 2% -4.2% 15432 ± 3% perf-stat.i.dTLB-store-misses 9.743e+10 +17.4% 1.144e+11 perf-stat.i.dTLB-stores 2.243e+09 -24.3% 1.697e+09 ± 2% perf-stat.i.iTLB-load-misses 46888822 +9.4% 51286197 perf-stat.i.iTLB-loads 5.555e+11 +5.5% 5.861e+11 perf-stat.i.instructions 257.71 +37.7% 354.92 perf-stat.i.instructions-per-iTLB-miss 0.80 +6.0% 0.84 perf-stat.i.ipc 1.04 -5.9% 0.98 ± 3% perf-stat.i.metric.K/sec 2005 +8.4% 2174 perf-stat.i.metric.M/sec 0.03 -5.8% 0.03 ± 2% perf-stat.overall.MPKI 1.64 -0.1 1.50 perf-stat.overall.branch-miss-rate% 1.26 -5.7% 1.18 perf-stat.overall.cpi 0.00 ± 8% +0.0 0.00 ± 6% perf-stat.overall.dTLB-load-miss-rate% 0.00 ± 2% -0.0 0.00 ± 3% perf-stat.overall.dTLB-store-miss-rate% 247.68 +39.5% 345.39 ± 2% perf-stat.overall.instructions-per-iTLB-miss 0.80 +6.0% 0.84 perf-stat.overall.ipc 87575 +12.6% 98570 perf-stat.overall.path-length 1.172e+11 +2.7% 1.204e+11 perf-stat.ps.branch-instructions 1.927e+09 -6.5% 1.802e+09 perf-stat.ps.branch-misses 473764 ± 8% +561.3% 3132830 ± 6% perf-stat.ps.dTLB-load-misses 1.694e+11 +7.2% 1.817e+11 perf-stat.ps.dTLB-loads 9.71e+10 +17.4% 1.14e+11 perf-stat.ps.dTLB-stores 2.235e+09 -24.3% 1.692e+09 ± 2% perf-stat.ps.iTLB-load-misses 46727836 +9.5% 51154009 perf-stat.ps.iTLB-loads 5.537e+11 +5.5% 5.841e+11 perf-stat.ps.instructions 1.678e+14 +5.3% 1.767e+14 perf-stat.total.instructions 39.88 -3.8 36.04 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.lseek64 40.97 -2.5 38.46 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64.lseek64 22.10 -1.1 20.95 perf-profile.calltrace.cycles-pp.ksys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe.lseek64 8.75 -0.5 8.27 perf-profile.calltrace.cycles-pp.__fdget_pos.ksys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe.lseek64 7.52 -0.5 7.07 perf-profile.calltrace.cycles-pp.__fget_light.__fdget_pos.ksys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe 6.44 -0.4 6.05 perf-profile.calltrace.cycles-pp.shmem_file_llseek.ksys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe.lseek64 6.69 -0.2 6.54 perf-profile.calltrace.cycles-pp.syscall_return_via_sysret.lseek64 1.98 -0.1 1.88 perf-profile.calltrace.cycles-pp.generic_file_llseek_size.ksys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe.lseek64 98.92 +0.2 99.08 perf-profile.calltrace.cycles-pp.lseek64 2.30 +0.6 2.94 perf-profile.calltrace.cycles-pp.__x64_sys_lseek.do_syscall_64.entry_SYSCALL_64_after_hwframe.lseek64 0.00 +0.8 0.75 perf-profile.calltrace.cycles-pp.enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.lseek64 0.00 +1.7 1.69 perf-profile.calltrace.cycles-pp.fpregs_assert_state_consistent.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe.lseek64 0.00 +2.3 2.33 perf-profile.calltrace.cycles-pp.__syscall_return_slowpath.do_syscall_64.entry_SYSCALL_64_after_hwframe.lseek64 47.50 +4.1 51.62 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.lseek64 0.00 +4.9 4.85 perf-profile.calltrace.cycles-pp.__prepare_exit_to_usermode.do_syscall_64.entry_SYSCALL_64_after_hwframe.lseek64 0.00 +5.9 5.87 perf-profile.calltrace.cycles-pp.exit_to_user_mode.entry_SYSCALL_64_after_hwframe.lseek64 40.57 -4.0 36.60 perf-profile.children.cycles-pp.do_syscall_64 27.37 -1.6 25.76 perf-profile.children.cycles-pp.entry_SYSCALL_64 22.66 -1.1 21.53 perf-profile.children.cycles-pp.ksys_lseek 19.98 -0.9 19.10 perf-profile.children.cycles-pp.syscall_return_via_sysret 9.42 -0.5 8.92 perf-profile.children.cycles-pp.__fdget_pos 7.52 -0.5 7.07 perf-profile.children.cycles-pp.__fget_light 6.46 -0.4 6.07 perf-profile.children.cycles-pp.shmem_file_llseek 2.27 -0.1 2.17 perf-profile.children.cycles-pp.generic_file_llseek_size 1.18 -0.1 1.11 perf-profile.children.cycles-pp.__x86_indirect_thunk_rax 0.43 ± 2% -0.0 0.41 ± 2% perf-profile.children.cycles-pp.lseek@plt 1.97 +0.1 2.02 perf-profile.children.cycles-pp.fpregs_assert_state_consistent 2.43 +0.6 3.06 perf-profile.children.cycles-pp.__x64_sys_lseek 0.00 +0.8 0.80 perf-profile.children.cycles-pp.enter_from_user_mode 0.00 +2.3 2.33 perf-profile.children.cycles-pp.__syscall_return_slowpath 47.97 +2.8 50.81 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 0.00 +5.1 5.06 perf-profile.children.cycles-pp.__prepare_exit_to_usermode 0.00 +7.2 7.18 perf-profile.children.cycles-pp.exit_to_user_mode 13.07 -9.5 3.55 perf-profile.self.cycles-pp.do_syscall_64 18.01 -1.2 16.84 perf-profile.self.cycles-pp.lseek64 19.84 -0.9 18.93 perf-profile.self.cycles-pp.syscall_return_via_sysret 13.61 -0.7 12.89 perf-profile.self.cycles-pp.entry_SYSCALL_64 7.03 -0.4 6.59 perf-profile.self.cycles-pp.__fget_light 6.10 -0.4 5.73 perf-profile.self.cycles-pp.shmem_file_llseek 7.57 -0.3 7.25 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 2.24 -0.1 2.14 perf-profile.self.cycles-pp.generic_file_llseek_size 4.57 -0.1 4.47 perf-profile.self.cycles-pp.ksys_lseek 2.12 -0.1 2.05 perf-profile.self.cycles-pp.__fdget_pos 0.62 -0.0 0.59 perf-profile.self.cycles-pp.__x86_indirect_thunk_rax 0.43 ± 2% -0.0 0.40 ± 3% perf-profile.self.cycles-pp.lseek@plt 2.11 +0.6 2.75 perf-profile.self.cycles-pp.__x64_sys_lseek 0.00 +0.7 0.68 perf-profile.self.cycles-pp.enter_from_user_mode 0.00 +2.3 2.28 perf-profile.self.cycles-pp.__syscall_return_slowpath 0.00 +3.1 3.09 perf-profile.self.cycles-pp.__prepare_exit_to_usermode 0.00 +7.1 7.08 perf-profile.self.cycles-pp.exit_to_user_mode will-it-scale.per_process_ops 1e+07 +-----------------------------------------------------------------+ | + .+.+ +.++.+.+.+.+.+ + + .+.+.+ + .++ +.+ | 9.8e+06 |-+ + + + : + + + + + | | +.+ : .+ +.+ | | + | 9.6e+06 |-+ | | | 9.4e+06 |-+ | | O OO O O O O O O OO | 9.2e+06 |-+ O O O O O | | O | | O O O O O O | 9e+06 |-+ O | | O O | 8.8e+06 +-----------------------------------------------------------------+ will-it-scale.workload 1.95e+09 +----------------------------------------------------------------+ | | |.+. .+.+ .+. +. .+.+. .+. .+.+ +. .+.+.+.| 1.9e+09 |-+ + + + +.+.+.+.+.+ + + + + + : + | | + + + + +.+ | | + +.+ | 1.85e+09 |-+ | | | 1.8e+09 |-+ | | O O O OO O O O O O O | | O O O O O | 1.75e+09 |-+ O O O | | O OO O O | | O O | 1.7e+09 +----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen