Greeting, FYI, we noticed a 4.6% improvement of will-it-scale.per_thread_ops due to commit: commit: ed610fb7867eff838da286444016ae58fa4f86b2 ("[PATCH] crypto: x86/aes-ni - add missing error checks in XTS code") url: https://github.com/0day-ci/linux/commits/Ard-Biesheuvel/crypto-x86-aes-ni-add-missing-error-checks-in-XTS-code/20210718-100125 base: https://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git master in testcase: will-it-scale on test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory with following parameters: nr_task: 100% mode: thread test: getppid1 cpufreq_governor: performance ucode: 0x5003006 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run bin/lkp run generated-yaml-file ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/thread/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/getppid1/will-it-scale/0x5003006 commit: 66192b2e3f ("crypto: hisilicon/sec - fix the process of disabling sva prefetching") ed610fb786 ("crypto: x86/aes-ni - add missing error checks in XTS code") 66192b2e3fd8ab97 ed610fb7867eff838da28644401 ---------------- --------------------------- %stddev %change %stddev \ | \ 7.27e+08 +4.6% 7.601e+08 will-it-scale.88.threads 8261617 +4.6% 8637651 will-it-scale.per_thread_ops 7.27e+08 +4.6% 7.601e+08 will-it-scale.workload 2267 ± 19% -19.8% 1818 ± 16% interrupts.CPU33.CAL:Function_call_interrupts 425.50 ± 26% -21.0% 336.29 ± 9% interrupts.CPU37.RES:Rescheduling_interrupts 622.50 ± 53% -43.9% 349.43 ± 20% interrupts.CPU63.RES:Rescheduling_interrupts 6505 ± 28% -14.7% 5546 ± 34% interrupts.CPU75.NMI:Non-maskable_interrupts 6505 ± 28% -14.7% 5546 ± 34% interrupts.CPU75.PMI:Performance_monitoring_interrupts 829.33 ± 54% -56.0% 365.29 ± 27% interrupts.CPU77.RES:Rescheduling_interrupts 2423 ± 3% -21.5% 1903 ± 42% perf-sched.wait_and_delay.avg.ms.do_syslog.part.0.kmsg_read.vfs_read 7265 ± 3% -21.5% 5706 ± 42% perf-sched.wait_and_delay.max.ms.do_syslog.part.0.kmsg_read.vfs_read 23.15 ± 26% +4566.5% 1080 ±237% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe 6202 ± 8% -17.1% 5139 ± 10% perf-sched.wait_and_delay.max.ms.smpboot_thread_fn.kthread.ret_from_fork 2421 ± 3% -21.4% 1903 ± 42% perf-sched.wait_time.avg.ms.do_syslog.part.0.kmsg_read.vfs_read 0.16 ± 11% -17.4% 0.13 ± 19% perf-sched.wait_time.avg.ms.exit_to_user_mode_prepare.irqentry_exit_to_user_mode.asm_sysvec_apic_timer_interrupt.[unknown] 7261 ± 3% -21.4% 5705 ± 42% perf-sched.wait_time.max.ms.do_syslog.part.0.kmsg_read.vfs_read 22.98 ± 27% +4602.1% 1080 ±237% perf-sched.wait_time.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe 6201 ± 8% -17.1% 5139 ± 10% perf-sched.wait_time.max.ms.smpboot_thread_fn.kthread.ret_from_fork 3.779e+10 +4.6% 3.954e+10 perf-stat.i.branch-instructions 0.49 -0.0 0.47 perf-stat.i.branch-miss-rate% 434873 ± 2% +10.3% 479520 ± 4% perf-stat.i.cache-misses 1.38 -4.5% 1.31 perf-stat.i.cpi 741687 ± 2% -12.1% 652145 ± 6% perf-stat.i.cycles-between-cache-misses 6.022e+10 +4.6% 6.299e+10 perf-stat.i.dTLB-loads 0.00 -0.0 0.00 perf-stat.i.dTLB-store-miss-rate% 149137 -1.6% 146697 perf-stat.i.dTLB-store-misses 4.21e+10 +4.5% 4.401e+10 perf-stat.i.dTLB-stores 2.354e+08 ± 2% -4.9% 2.238e+08 perf-stat.i.iTLB-load-misses 1.759e+11 +4.6% 1.841e+11 perf-stat.i.instructions 755.18 ± 2% +9.5% 826.97 perf-stat.i.instructions-per-iTLB-miss 0.73 +4.7% 0.76 perf-stat.i.ipc 1592 +4.6% 1665 perf-stat.i.metric.M/sec 0.49 -0.0 0.47 perf-stat.overall.branch-miss-rate% 1.38 -4.5% 1.32 perf-stat.overall.cpi 549393 ± 2% -9.0% 500080 ± 4% perf-stat.overall.cycles-between-cache-misses 0.00 -0.0 0.00 perf-stat.overall.dTLB-store-miss-rate% 748.01 ± 2% +10.0% 822.71 perf-stat.overall.instructions-per-iTLB-miss 0.73 +4.7% 0.76 perf-stat.overall.ipc 3.767e+10 +4.6% 3.941e+10 perf-stat.ps.branch-instructions 439954 ± 2% +9.9% 483612 ± 4% perf-stat.ps.cache-misses 6.002e+10 +4.6% 6.278e+10 perf-stat.ps.dTLB-loads 148856 -1.6% 146461 perf-stat.ps.dTLB-store-misses 4.196e+10 +4.5% 4.386e+10 perf-stat.ps.dTLB-stores 2.346e+08 ± 2% -4.9% 2.231e+08 perf-stat.ps.iTLB-load-misses 1.754e+11 +4.6% 1.835e+11 perf-stat.ps.instructions 5.289e+13 +4.6% 5.534e+13 perf-stat.total.instructions 13.13 -2.6 10.51 ± 3% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.getppid 32.70 -2.4 30.28 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.getppid 40.62 -2.4 38.22 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.getppid 9.17 -1.6 7.55 ± 4% perf-profile.calltrace.cycles-pp.exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.getppid 2.42 ± 2% -0.9 1.50 ± 3% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.getppid 94.55 -0.3 94.26 perf-profile.calltrace.cycles-pp.getppid 3.00 +0.1 3.12 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_safe_stack.getppid 2.98 +0.2 3.20 perf-profile.calltrace.cycles-pp.syscall_enter_from_user_mode.do_syscall_64.entry_SYSCALL_64_after_hwframe.getppid 8.91 +0.5 9.37 perf-profile.calltrace.cycles-pp.testcase 42.46 +2.1 44.56 perf-profile.calltrace.cycles-pp.__entry_text_start.getppid 14.20 -2.6 11.64 ± 3% perf-profile.children.cycles-pp.syscall_exit_to_user_mode 33.60 -2.4 31.19 perf-profile.children.cycles-pp.do_syscall_64 41.30 -2.3 38.96 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 9.50 -1.6 7.92 ± 4% perf-profile.children.cycles-pp.exit_to_user_mode_prepare 2.60 ± 2% -0.8 1.81 ± 2% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare 1.21 -0.1 1.07 perf-profile.children.cycles-pp.rcu_read_unlock_strict 3.20 +0.1 3.34 perf-profile.children.cycles-pp.entry_SYSCALL_64_safe_stack 3.06 +0.2 3.28 perf-profile.children.cycles-pp.syscall_enter_from_user_mode 5.28 +0.2 5.52 perf-profile.children.cycles-pp.testcase 22.43 +0.7 23.10 perf-profile.children.cycles-pp.syscall_return_via_sysret 27.43 +1.3 28.77 perf-profile.children.cycles-pp.__entry_text_start 8.32 -1.5 6.81 ± 4% perf-profile.self.cycles-pp.exit_to_user_mode_prepare 2.24 ± 3% -0.8 1.43 ± 2% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare 3.90 -0.2 3.69 perf-profile.self.cycles-pp.__x64_sys_getppid 0.87 -0.1 0.73 perf-profile.self.cycles-pp.rcu_read_unlock_strict 2.43 -0.1 2.31 perf-profile.self.cycles-pp.syscall_exit_to_user_mode 0.81 ± 3% -0.1 0.72 ± 3% perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup 7.82 +0.1 7.90 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 3.17 +0.1 3.31 perf-profile.self.cycles-pp.entry_SYSCALL_64_safe_stack 2.36 +0.2 2.55 perf-profile.self.cycles-pp.syscall_enter_from_user_mode 12.36 +0.6 12.96 perf-profile.self.cycles-pp.__entry_text_start 22.14 +0.6 22.77 perf-profile.self.cycles-pp.syscall_return_via_sysret 19.17 +1.0 20.14 perf-profile.self.cycles-pp.getppid will-it-scale.per_thread_ops 8.8e+06 +-----------------------------------------------------------------+ | O O OO O | 8.7e+06 |O+O O O | | O O | | OO O OO OOOO OOO | 8.6e+06 |-+ | | O | 8.5e+06 |-+ | |+.++ +.++++.+ | 8.4e+06 |-+ + .++ ++. : : | | +.++++ +.++ +++ : | | : | 8.3e+06 |-+ : + .+ + +. + | | + .+ :+ + +.++ ++++.++++.+ ++.+| 8.2e+06 +-----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. --- 0DAY/LKP+ Test Infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation Thanks, Oliver Sang