Hello, kernel test robot noticed a -10.9% regression of blogbench.read_score on: commit: adfcf4231b8cbc2d9c1e7bfaa965b907e60639eb ("x86: don't use REP_GOOD or ERMS for user memory copies") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master testcase: blogbench test machine: 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory parameters: disk: 1SSD fs: xfs cpufreq_governor: performance test-description: Blogbench is a portable filesystem benchmark that tries to reproduce the load of a real-world busy file server. test-url: https://www.pureftpd.org/project/blogbench In addition to that, the commit also has impact on the following tests: +------------------+------------------------------------------------------------------------------------------------------------+ | testcase: change | phoronix-test-suite: phoronix-test-suite.aio-stress.RandomWrite.mb_s 21.4% improvement | | test machine | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory | | test parameters | cpufreq_governor=performance | | | test=aio-stress-1.1.1 | +------------------+------------------------------------------------------------------------------------------------------------+ | testcase: change | phoronix-test-suite: phoronix-test-suite.incompact3d.input.i3d129CellsPerDirection.seconds 3.6% regression | | test machine | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory | | test parameters | cpufreq_governor=performance | | | option_a=input.i3d 129 Cells Per Direction [Intended For Small Systems] | | | test=incompact3d-2.0.2 | +------------------+------------------------------------------------------------------------------------------------------------+ | testcase: change | phoronix-test-suite: phoronix-test-suite.ior.2MB.DefaultTestDirectory.mb_s 25.7% improvement | | test machine | 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory | | test parameters | cpufreq_governor=performance | | | test=ior-1.1.1 | +------------------+------------------------------------------------------------------------------------------------------------+ | testcase: change | phoronix-test-suite: phoronix-test-suite.iozone.1MB.512MB.WritePerformance.mb_s 20.4% improvement | | test machine | 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory | | test parameters | cpufreq_governor=performance | | | option_a=1MB | | | option_b=512MB | | | option_c=Write Performance | | | test=iozone-1.9.5 | +------------------+------------------------------------------------------------------------------------------------------------+ Since there are both improvements and regressions in various micro benchmarks, we are sending this report to provide some data and hope this can be helpful for further investigation. Thanks. If you fix the issue, kindly add following tag | Reported-by: kernel test robot | Link: https://lore.kernel.org/oe-lkp/202305041446.71d46724-yujie.liu@intel.com Details are as below: ========================================================================================= compiler/cpufreq_governor/disk/fs/kconfig/rootfs/tbox_group/testcase: gcc-11/performance/1SSD/xfs/x86_64-rhel-8.3/debian-11.1-x86_64-20220510.cgz/lkp-csl-2sp9/blogbench commit: 20f3337d35 ("x86: don't use REP_GOOD or ERMS for small memory clearing") adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies") 20f3337d350c4e1b adfcf4231b8cbc2d9c1e7bfaa96 ---------------- --------------------------- %stddev %change %stddev \ | \ 2071761 ± 4% -10.9% 1846004 blogbench.read_score 8916 ± 6% -11.0% 7932 ± 8% blogbench.time.minor_page_faults 215.23 ± 4% +12.6% 242.40 ± 2% blogbench.time.user_time 0.87 ± 4% +11.9% 0.98 ± 2% iostat.cpu.user 2.25 ± 10% -0.4 1.85 ± 6% mpstat.cpu.all.soft% 16.03 ± 8% -33.3% 10.69 ± 15% sched_debug.cfs_rq:/.load_avg.min 18.59 +8.4% 20.16 turbostat.RAMWatt 180697 -3.0% 175305 vmstat.io.bo 15025 ± 4% +6.8% 16048 ± 3% proc-vmstat.nr_active_anon 20720 ± 3% +6.0% 21962 ± 2% proc-vmstat.nr_shmem 306941 -2.6% 298909 proc-vmstat.nr_slab_unreclaimable 10016536 -3.3% 9689882 proc-vmstat.nr_written 15025 ± 4% +6.8% 16048 ± 3% proc-vmstat.nr_zone_active_anon 22736515 ± 2% -5.3% 21542328 proc-vmstat.numa_hit 22656328 ± 2% -5.3% 21460298 proc-vmstat.numa_local 31365890 ± 2% -7.0% 29177083 proc-vmstat.pgalloc_normal 18590396 ± 4% -10.5% 16641000 proc-vmstat.pgfree 54574610 -2.9% 52984224 proc-vmstat.pgpgout 15.65 ± 2% +66.7% 26.09 perf-stat.i.MPKI 6.909e+09 -8.9% 6.296e+09 perf-stat.i.branch-instructions 0.41 ± 2% +0.0 0.46 perf-stat.i.branch-miss-rate% 74.77 -14.7 60.06 perf-stat.i.cache-miss-rate% 3.627e+08 ± 3% +37.6% 4.992e+08 ± 2% perf-stat.i.cache-misses 4.842e+08 ± 3% +71.3% 8.297e+08 ± 2% perf-stat.i.cache-references 7.87 -2.9% 7.64 perf-stat.i.cpi 916.29 ± 2% -4.4% 875.61 perf-stat.i.cpu-migrations 673.07 ± 4% -27.6% 487.55 perf-stat.i.cycles-between-cache-misses 8.042e+09 +10.5% 8.884e+09 perf-stat.i.dTLB-loads 0.02 ± 25% -0.0 0.01 ± 11% perf-stat.i.dTLB-store-miss-rate% 2.369e+09 ± 3% +67.1% 3.958e+09 perf-stat.i.dTLB-stores 3.092e+10 +2.8% 3.179e+10 perf-stat.i.instructions 0.13 +3.1% 0.14 perf-stat.i.ipc 0.11 ± 16% -31.5% 0.08 ± 19% perf-stat.i.major-faults 202.45 +12.1% 226.90 perf-stat.i.metric.M/sec 73.58 +1.4 74.93 perf-stat.i.node-load-miss-rate% 83206825 ± 3% -56.1% 36533214 perf-stat.i.node-load-misses 29812010 ± 4% -59.1% 12206523 ± 2% perf-stat.i.node-loads 72.86 +20.2 93.08 perf-stat.i.node-store-miss-rate% 13767301 ± 3% +415.9% 71022712 ± 2% perf-stat.i.node-store-misses 15.66 ± 2% +66.7% 26.10 perf-stat.overall.MPKI 0.41 ± 2% +0.0 0.46 perf-stat.overall.branch-miss-rate% 74.90 -14.7 60.16 perf-stat.overall.cache-miss-rate% 7.87 -2.8% 7.65 perf-stat.overall.cpi 672.29 ± 4% -27.5% 487.55 perf-stat.overall.cycles-between-cache-misses 0.02 ± 25% -0.0 0.01 ± 11% perf-stat.overall.dTLB-store-miss-rate% 0.13 +2.9% 0.13 perf-stat.overall.ipc 73.62 +1.3 74.95 perf-stat.overall.node-load-miss-rate% 72.98 +20.3 93.32 perf-stat.overall.node-store-miss-rate% 6.887e+09 -8.9% 6.277e+09 perf-stat.ps.branch-instructions 3.615e+08 ± 3% +37.6% 4.975e+08 ± 2% perf-stat.ps.cache-misses 4.827e+08 ± 3% +71.3% 8.27e+08 ± 2% perf-stat.ps.cache-references 913.07 ± 2% -4.4% 872.48 perf-stat.ps.cpu-migrations 8.017e+09 +10.5% 8.857e+09 perf-stat.ps.dTLB-loads 2.362e+09 ± 3% +67.1% 3.946e+09 perf-stat.ps.dTLB-stores 3.082e+10 +2.8% 3.169e+10 perf-stat.ps.instructions 0.11 ± 16% -31.7% 0.08 ± 19% perf-stat.ps.major-faults 82933317 ± 3% -56.1% 36413295 perf-stat.ps.node-load-misses 29716685 ± 4% -59.0% 12171311 ± 2% perf-stat.ps.node-loads 13721351 ± 3% +415.9% 70784762 ± 2% perf-stat.ps.node-store-misses 9.266e+12 +3.0% 9.547e+12 perf-stat.total.instructions 50.60 -6.5 44.11 perf-profile.calltrace.cycles-pp.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__open64 50.58 -6.5 44.10 perf-profile.calltrace.cycles-pp.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe.__open64 50.69 -6.5 44.22 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__open64 50.68 -6.5 44.20 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__open64 50.78 -6.4 44.33 perf-profile.calltrace.cycles-pp.__open64 41.42 -5.8 35.60 perf-profile.calltrace.cycles-pp._raw_spin_lock.alloc_fd.do_sys_openat2.__x64_sys_openat.do_syscall_64 41.75 -5.8 35.93 perf-profile.calltrace.cycles-pp.alloc_fd.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 41.20 -5.8 35.40 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.alloc_fd.do_sys_openat2.__x64_sys_openat 36.32 -5.5 30.85 perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close 35.94 -5.5 30.47 perf-profile.calltrace.cycles-pp._raw_spin_lock.close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe 36.11 -5.5 30.64 perf-profile.calltrace.cycles-pp.close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close 35.74 -5.4 30.30 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.close_fd.__x64_sys_close.do_syscall_64 37.31 -5.4 31.87 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__close 37.32 -5.4 31.89 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__close 37.39 -5.4 31.96 perf-profile.calltrace.cycles-pp.__close 5.56 ± 2% -0.7 4.84 perf-profile.calltrace.cycles-pp._raw_spin_lock.put_unused_fd.do_sys_openat2.__x64_sys_openat.do_syscall_64 5.58 ± 2% -0.7 4.86 perf-profile.calltrace.cycles-pp.put_unused_fd.do_sys_openat2.__x64_sys_openat.do_syscall_64.entry_SYSCALL_64_after_hwframe 5.52 ± 2% -0.7 4.81 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.put_unused_fd.do_sys_openat2.__x64_sys_openat 2.47 ± 6% -0.4 2.07 ± 6% perf-profile.calltrace.cycles-pp.smpboot_thread_fn.kthread.ret_from_fork 2.46 ± 6% -0.4 2.06 ± 6% perf-profile.calltrace.cycles-pp.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork 2.45 ± 6% -0.4 2.06 ± 6% perf-profile.calltrace.cycles-pp.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread.ret_from_fork 2.45 ± 6% -0.4 2.06 ± 6% perf-profile.calltrace.cycles-pp.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn.kthread 2.44 ± 6% -0.4 2.05 ± 6% perf-profile.calltrace.cycles-pp.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd.smpboot_thread_fn 1.84 ± 5% -0.3 1.54 ± 5% perf-profile.calltrace.cycles-pp.file_free_rcu.rcu_do_batch.rcu_core.__do_softirq.run_ksoftirqd 0.00 +2.9 2.87 ± 5% perf-profile.calltrace.cycles-pp.__might_resched.__might_fault._copy_to_iter.copy_page_to_iter.filemap_read 0.00 +2.9 2.94 ± 5% perf-profile.calltrace.cycles-pp.__might_fault._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read 4.73 ± 3% +9.0 13.76 ± 4% perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read 5.22 ± 4% +11.7 16.96 ± 4% perf-profile.calltrace.cycles-pp._copy_to_iter.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter 5.27 ± 4% +11.8 17.06 ± 4% perf-profile.calltrace.cycles-pp.copy_page_to_iter.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read 6.44 ± 4% +12.2 18.63 ± 4% perf-profile.calltrace.cycles-pp.xfs_file_buffered_read.xfs_file_read_iter.vfs_read.ksys_read.do_syscall_64 6.28 ± 4% +12.2 18.48 ± 4% perf-profile.calltrace.cycles-pp.filemap_read.xfs_file_buffered_read.xfs_file_read_iter.vfs_read.ksys_read 6.60 ± 4% +12.2 18.81 ± 4% perf-profile.calltrace.cycles-pp.xfs_file_read_iter.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 7.16 ± 3% +12.2 19.38 ± 4% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read 7.20 ± 3% +12.2 19.42 ± 4% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read 7.22 ± 3% +12.2 19.44 ± 4% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read 6.87 ± 3% +12.2 19.12 ± 4% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read 7.32 ± 3% +12.3 19.59 ± 4% perf-profile.calltrace.cycles-pp.__libc_read 0.00 +13.6 13.63 ± 4% perf-profile.calltrace.cycles-pp.copy_user_generic_unrolled.copyout._copy_to_iter.copy_page_to_iter.filemap_read 83.12 -12.0 71.10 perf-profile.children.cycles-pp._raw_spin_lock 82.49 -12.0 70.54 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 50.60 -6.5 44.11 perf-profile.children.cycles-pp.__x64_sys_openat 50.58 -6.5 44.10 perf-profile.children.cycles-pp.do_sys_openat2 50.80 -6.5 44.34 perf-profile.children.cycles-pp.__open64 41.75 -5.8 35.93 perf-profile.children.cycles-pp.alloc_fd 36.32 -5.5 30.85 perf-profile.children.cycles-pp.__x64_sys_close 36.11 -5.5 30.64 perf-profile.children.cycles-pp.close_fd 37.40 -5.4 31.97 perf-profile.children.cycles-pp.__close 5.58 ± 2% -0.7 4.86 perf-profile.children.cycles-pp.put_unused_fd 2.80 ± 5% -0.4 2.40 ± 5% perf-profile.children.cycles-pp.__do_softirq 2.74 ± 6% -0.4 2.35 ± 5% perf-profile.children.cycles-pp.rcu_core 2.47 ± 6% -0.4 2.07 ± 6% perf-profile.children.cycles-pp.smpboot_thread_fn 2.46 ± 6% -0.4 2.06 ± 6% perf-profile.children.cycles-pp.run_ksoftirqd 2.73 ± 6% -0.4 2.34 ± 6% perf-profile.children.cycles-pp.rcu_do_batch 2.06 ± 5% -0.3 1.74 ± 5% perf-profile.children.cycles-pp.file_free_rcu 0.11 ± 28% -0.1 0.04 ± 75% perf-profile.children.cycles-pp.record__finish_output 0.65 -0.0 0.62 perf-profile.children.cycles-pp.hrtimer_interrupt 0.65 -0.0 0.63 perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.20 ± 3% -0.0 0.18 ± 4% perf-profile.children.cycles-pp.down_read 0.12 ± 4% -0.0 0.11 ± 5% perf-profile.children.cycles-pp.xfs_ilock 0.58 -0.0 0.57 perf-profile.children.cycles-pp.__hrtimer_run_queues 0.07 ± 6% +0.0 0.09 ± 4% perf-profile.children.cycles-pp.touch_atime 0.06 ± 9% +0.0 0.08 ± 8% perf-profile.children.cycles-pp.__call_rcu_common 0.06 ± 14% +0.0 0.08 ± 5% perf-profile.children.cycles-pp.get_obj_cgroup_from_current 0.00 +0.1 0.05 perf-profile.children.cycles-pp.walk_component 0.00 +0.1 0.05 perf-profile.children.cycles-pp.atime_needs_update 0.00 +0.1 0.05 perf-profile.children.cycles-pp.syscall_return_via_sysret 0.00 +0.1 0.05 ± 7% perf-profile.children.cycles-pp.__snprintf_chk 0.47 ± 4% +0.1 0.54 ± 4% perf-profile.children.cycles-pp.kmem_cache_alloc 0.00 +0.1 0.10 ± 9% perf-profile.children.cycles-pp.folio_mark_accessed 0.00 +0.1 0.10 ± 4% perf-profile.children.cycles-pp.copy_user_short_string 0.06 ± 9% +0.2 0.23 ± 3% perf-profile.children.cycles-pp.__might_sleep 95.96 +0.3 96.30 perf-profile.children.cycles-pp.do_syscall_64 96.00 +0.3 96.34 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 0.12 ± 4% +2.8 2.96 ± 5% perf-profile.children.cycles-pp.__might_resched 0.08 ± 8% +2.9 2.96 ± 5% perf-profile.children.cycles-pp.__might_fault 4.73 ± 4% +9.0 13.76 ± 4% perf-profile.children.cycles-pp.copyout 5.23 ± 4% +11.7 16.96 ± 4% perf-profile.children.cycles-pp._copy_to_iter 5.27 ± 4% +11.8 17.07 ± 4% perf-profile.children.cycles-pp.copy_page_to_iter 6.44 ± 4% +12.2 18.63 ± 4% perf-profile.children.cycles-pp.xfs_file_buffered_read 6.28 ± 4% +12.2 18.49 ± 4% perf-profile.children.cycles-pp.filemap_read 6.60 ± 4% +12.2 18.81 ± 4% perf-profile.children.cycles-pp.xfs_file_read_iter 7.18 ± 3% +12.2 19.40 ± 4% perf-profile.children.cycles-pp.ksys_read 6.89 ± 3% +12.2 19.14 ± 4% perf-profile.children.cycles-pp.vfs_read 7.36 ± 3% +12.3 19.63 ± 4% perf-profile.children.cycles-pp.__libc_read 0.00 +13.7 13.70 ± 4% perf-profile.children.cycles-pp.copy_user_generic_unrolled 81.72 -11.8 69.91 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 2.05 ± 5% -0.3 1.74 ± 5% perf-profile.self.cycles-pp.file_free_rcu 0.43 ± 4% -0.2 0.26 ± 2% perf-profile.self.cycles-pp._copy_to_iter 0.62 ± 2% -0.1 0.56 ± 4% perf-profile.self.cycles-pp._raw_spin_lock 0.20 ± 6% -0.0 0.17 ± 2% perf-profile.self.cycles-pp.fput 0.10 ± 7% -0.0 0.08 ± 5% perf-profile.self.cycles-pp.alloc_empty_file 0.05 +0.0 0.07 ± 5% perf-profile.self.cycles-pp.kmem_cache_free 0.16 ± 3% +0.0 0.18 ± 4% perf-profile.self.cycles-pp.xfs_file_read_iter 0.09 ± 7% +0.0 0.12 ± 4% perf-profile.self.cycles-pp.kmem_cache_alloc 0.00 +0.1 0.05 perf-profile.self.cycles-pp.syscall_return_via_sysret 0.00 +0.1 0.06 ± 7% perf-profile.self.cycles-pp.folio_mark_accessed 0.02 ± 99% +0.1 0.10 ± 4% perf-profile.self.cycles-pp.copy_page_to_iter 0.00 +0.1 0.10 ± 4% perf-profile.self.cycles-pp.copy_user_short_string 0.05 ± 7% +0.2 0.22 ± 4% perf-profile.self.cycles-pp.__might_sleep 0.34 ± 5% +0.3 0.64 ± 3% perf-profile.self.cycles-pp.filemap_read 0.12 ± 4% +2.6 2.76 ± 5% perf-profile.self.cycles-pp.__might_resched 0.00 +13.5 13.52 ± 4% perf-profile.self.cycles-pp.copy_user_generic_unrolled *************************************************************************************************** lkp-csl-2sp7: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase: gcc-11/performance/x86_64-rhel-8.3/debian-x86_64-phoronix/lkp-csl-2sp7/aio-stress-1.1.1/phoronix-test-suite commit: 20f3337d35 ("x86: don't use REP_GOOD or ERMS for small memory clearing") adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies") 20f3337d350c4e1b adfcf4231b8cbc2d9c1e7bfaa96 ---------------- --------------------------- %stddev %change %stddev \ | \ 3087 +21.4% 3747 phoronix-test-suite.aio-stress.RandomWrite.mb_s 1415581 ± 3% -3.9% 1360489 proc-vmstat.numa_local 18259 ± 34% -86.3% 2501 ±135% proc-vmstat.numa_pte_updates 434049 -2.1% 425149 proc-vmstat.pgfault 5.43 +2.0 7.44 ± 4% perf-stat.i.cache-miss-rate% 3626266 +66.6% 6042976 ± 6% perf-stat.i.cache-misses 69186100 +3.7% 71773613 perf-stat.i.cache-references 1917 +3.1% 1977 perf-stat.i.context-switches 2.524e+08 +12.4% 2.837e+08 ± 2% perf-stat.i.dTLB-stores 69.45 -1.9 67.54 perf-stat.i.iTLB-load-miss-rate% 906278 -2.9% 879960 perf-stat.i.iTLB-load-misses 387815 +5.4% 408574 perf-stat.i.iTLB-loads 2.204e+09 +3.6% 2.283e+09 ± 2% perf-stat.i.instructions 2397 +5.2% 2522 ± 3% perf-stat.i.instructions-per-iTLB-miss 132455 ± 3% -9.0% 120495 ± 9% perf-stat.i.node-loads 245539 ± 4% +623.2% 1775744 ± 13% perf-stat.i.node-stores 3.81 ±109% -3.8 0.00 perf-profile.calltrace.cycles-pp.vsnprintf.seq_printf.show_interrupts.seq_read_iter.proc_reg_read_iter 4.98 ±126% -3.1 1.88 ±142% perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 4.98 ± 81% -3.1 1.88 ±142% perf-profile.calltrace.cycles-pp.asm_exc_page_fault 4.00 ±110% -2.1 1.88 ±142% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault 4.00 ±110% -2.1 1.88 ±142% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 4.00 ±110% -2.1 1.88 ±142% perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 3.81 ±109% -2.1 1.76 ±141% perf-profile.calltrace.cycles-pp.seq_printf.show_interrupts.seq_read_iter.proc_reg_read_iter.vfs_read 6.94 ± 90% -5.1 1.88 ±142% perf-profile.children.cycles-pp.asm_exc_page_fault 5.96 ± 98% -4.1 1.88 ±142% perf-profile.children.cycles-pp.exc_page_fault 3.81 ±109% -3.8 0.00 perf-profile.children.cycles-pp.vsnprintf 4.98 ±126% -3.1 1.88 ±142% perf-profile.children.cycles-pp.do_user_addr_fault 4.98 ±126% -3.1 1.88 ±142% perf-profile.children.cycles-pp.__handle_mm_fault 4.98 ±126% -3.1 1.88 ±142% perf-profile.children.cycles-pp.handle_mm_fault 3.32 ±101% -2.5 0.83 ±223% perf-profile.children.cycles-pp.dput 3.81 ±109% -2.1 1.76 ±141% perf-profile.children.cycles-pp.seq_printf 3.24 ±101% -3.2 0.00 perf-profile.self.cycles-pp.release_pages *************************************************************************************************** lkp-csl-2sp7: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory ========================================================================================= compiler/cpufreq_governor/kconfig/option_a/rootfs/tbox_group/test/testcase: gcc-11/performance/x86_64-rhel-8.3/input.i3d 129 Cells Per Direction [Intended For Small Systems]/debian-x86_64-phoronix/lkp-csl-2sp7/incompact3d-2.0.2/phoronix-test-suite commit: 20f3337d35 ("x86: don't use REP_GOOD or ERMS for small memory clearing") adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies") 20f3337d350c4e1b adfcf4231b8cbc2d9c1e7bfaa96 ---------------- --------------------------- %stddev %change %stddev \ | \ 10.21 +3.6% 10.58 phoronix-test-suite.incompact3d.input.i3d129CellsPerDirection.seconds 2634 +1.5% 2675 phoronix-test-suite.time.percent_of_cpu_this_job_got 125.60 +18.0% 148.24 phoronix-test-suite.time.system_time 1365 +2.2% 1395 phoronix-test-suite.time.user_time 2.45 +0.4 2.80 mpstat.cpu.all.sys% 4716584 +1.3% 4778690 proc-vmstat.numa_hint_faults 591104 +1.3% 598784 proc-vmstat.unevictable_pgs_scanned 1484 ± 2% -3.1% 1438 vmstat.io.bi 6485 -2.7% 6310 vmstat.system.cs 59.03 ± 13% -26.8% 43.20 ± 15% sched_debug.cfs_rq:/.load_avg.avg 203.26 ± 9% -17.4% 167.94 ± 11% sched_debug.cfs_rq:/.load_avg.stddev 41.33 ± 17% -33.5% 27.48 ± 29% sched_debug.cfs_rq:/.removed.load_avg.avg 195.00 ± 9% -17.4% 161.00 ± 14% sched_debug.cfs_rq:/.removed.load_avg.stddev 1256 ± 8% +13.9% 1430 ± 4% sched_debug.cpu.nr_switches.stddev 11.80 ± 76% -8.1 3.68 ±157% perf-profile.calltrace.cycles-pp.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe 8.70 ± 86% -5.0 3.68 ±157% perf-profile.calltrace.cycles-pp.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve.do_syscall_64 8.70 ± 86% -5.0 3.68 ±157% perf-profile.calltrace.cycles-pp.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common.__x64_sys_execve 8.70 ± 86% -5.0 3.68 ±157% perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve.do_execveat_common 6.32 ±101% -3.8 2.56 ±223% perf-profile.calltrace.cycles-pp.begin_new_exec.load_elf_binary.search_binary_handler.exec_binprm.bprm_execve 4.08 ±106% -2.8 1.28 ±223% perf-profile.calltrace.cycles-pp.__mmput.exec_mmap.begin_new_exec.load_elf_binary.search_binary_handler 4.08 ±106% -2.8 1.28 ±223% perf-profile.calltrace.cycles-pp.exit_mmap.__mmput.exec_mmap.begin_new_exec.load_elf_binary 3.82 ±109% -2.5 1.28 ±223% perf-profile.calltrace.cycles-pp.zap_pte_range.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap 3.82 ±109% -2.5 1.28 ±223% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.exit_mmap.__mmput 3.88 ± 73% -2.0 1.91 ±141% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 3.88 ± 73% -2.0 1.91 ±141% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.read 4.08 ±106% -1.5 2.56 ±223% perf-profile.calltrace.cycles-pp.exec_mmap.begin_new_exec.load_elf_binary.search_binary_handler.exec_binprm 11.80 ± 76% -8.1 3.68 ±157% perf-profile.children.cycles-pp.bprm_execve 8.70 ± 86% -5.0 3.68 ±157% perf-profile.children.cycles-pp.exec_binprm 8.70 ± 86% -5.0 3.68 ±157% perf-profile.children.cycles-pp.search_binary_handler 8.70 ± 86% -5.0 3.68 ±157% perf-profile.children.cycles-pp.load_elf_binary 6.32 ±101% -3.8 2.56 ±223% perf-profile.children.cycles-pp.begin_new_exec 3.88 ± 73% -3.0 0.93 ±223% perf-profile.children.cycles-pp.seq_read_iter 3.82 ±109% -2.5 1.28 ±223% perf-profile.children.cycles-pp.zap_pte_range 3.82 ±109% -2.5 1.28 ±223% perf-profile.children.cycles-pp.unmap_vmas 3.82 ±109% -2.5 1.28 ±223% perf-profile.children.cycles-pp.unmap_page_range 3.82 ±109% -2.5 1.28 ±223% perf-profile.children.cycles-pp.zap_pmd_range 3.88 ± 73% -2.0 1.91 ±141% perf-profile.children.cycles-pp.ksys_read 3.88 ± 73% -2.0 1.91 ±141% perf-profile.children.cycles-pp.vfs_read 4.08 ±106% -1.5 2.56 ±223% perf-profile.children.cycles-pp.exec_mmap 4.167e+09 +5.0% 4.374e+09 ± 3% perf-stat.i.branch-instructions 36.75 +1.1 37.81 perf-stat.i.cache-miss-rate% 3.508e+08 +3.8% 3.643e+08 perf-stat.i.cache-misses 6338 -1.7% 6231 perf-stat.i.context-switches 7.944e+10 +1.4% 8.056e+10 perf-stat.i.cpu-cycles 1.236e+10 +2.9% 1.272e+10 perf-stat.i.dTLB-loads 0.09 -0.0 0.09 perf-stat.i.dTLB-store-miss-rate% 4.764e+09 +7.2% 5.109e+09 perf-stat.i.dTLB-stores 57.38 -1.1 56.31 perf-stat.i.iTLB-load-miss-rate% 1477373 +3.2% 1525290 perf-stat.i.iTLB-loads 4.633e+10 +2.8% 4.761e+10 perf-stat.i.instructions 831488 +1.3% 842562 perf-stat.i.metric.GHz 2.307e+08 +4.1% 2.401e+08 perf-stat.i.metric.M/sec 42.72 ± 2% -5.6 37.15 ± 2% perf-stat.i.node-load-miss-rate% 6830797 -47.3% 3600340 perf-stat.i.node-load-misses 20259060 -11.1% 18007910 perf-stat.i.node-loads 5175098 +4.9% 5428994 perf-stat.i.node-store-misses 64485672 +13.0% 72841822 perf-stat.i.node-stores 13.91 -2.4% 13.58 perf-stat.overall.MPKI 0.79 -0.0 0.75 ± 3% perf-stat.overall.branch-miss-rate% 54.45 +1.9 56.35 perf-stat.overall.cache-miss-rate% 226.37 -2.3% 221.12 perf-stat.overall.cycles-between-cache-misses 0.02 ± 2% -0.0 0.02 perf-stat.overall.dTLB-load-miss-rate% 0.07 -0.0 0.07 perf-stat.overall.dTLB-store-miss-rate% 53.87 -1.0 52.88 perf-stat.overall.iTLB-load-miss-rate% 26883 +3.5% 27836 perf-stat.overall.instructions-per-iTLB-miss 25.21 -8.6 16.66 perf-stat.overall.node-load-miss-rate% 7.43 -0.5 6.94 perf-stat.overall.node-store-miss-rate% 4.104e+09 +5.0% 4.309e+09 ± 3% perf-stat.ps.branch-instructions 3.456e+08 +3.8% 3.589e+08 perf-stat.ps.cache-misses 6219 -1.6% 6121 perf-stat.ps.context-switches 7.825e+10 +1.4% 7.936e+10 perf-stat.ps.cpu-cycles 1.218e+10 +2.9% 1.253e+10 perf-stat.ps.dTLB-loads 4.692e+09 +7.3% 5.033e+09 perf-stat.ps.dTLB-stores 1453670 +3.3% 1501333 perf-stat.ps.iTLB-loads 4.564e+10 +2.8% 4.691e+10 perf-stat.ps.instructions 6728891 -47.3% 3546712 perf-stat.ps.node-load-misses 19959395 -11.1% 17743546 perf-stat.ps.node-loads 5098250 +4.9% 5348852 perf-stat.ps.node-store-misses 63535396 +13.0% 71772875 perf-stat.ps.node-stores 2.592e+12 +4.8% 2.717e+12 perf-stat.total.instructions *************************************************************************************************** lkp-icl-2sp5: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz (Ice Lake) with 128G memory ========================================================================================= class/compiler/cpufreq_governor/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime: cpu/gcc-11/performance/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp5/getrandom/stress-ng/60s commit: 20f3337d35 ("x86: don't use REP_GOOD or ERMS for small memory clearing") adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies") 20f3337d350c4e1b adfcf4231b8cbc2d9c1e7bfaa96 ---------------- --------------------------- fail:runs %reproduction fail:runs | | | 2:3 -67% :3 dmesg.RIP:_copy_to_iter 1:3 -33% :3 dmesg.RIP:chacha_block_generic :3 33% 1:3 dmesg.RIP:chacha_permute :3 67% 2:3 dmesg.RIP:copy_user_fast_string 1:3 -33% :3 dmesg.WARNING:CPU:#PID:#at_arch/x86/evperf:interrupt_took_too_long(#>#),lowering_kernel.perf_event_max_sample_rate_to %stddev %change %stddev \ | \ 7725056 ± 12% -14.5% 6605482 meminfo.DirectMap2M 3025 ± 3% +8.0% 3268 proc-vmstat.nr_page_table_pages 48.73 ± 3% -3.6% 46.99 ± 2% turbostat.RAMWatt 0.40 -4.2% 0.39 perf-stat.i.MPKI 173.78 +1.4% 176.18 perf-stat.i.cpu-migrations 0.00 -0.0 0.00 perf-stat.i.dTLB-store-miss-rate% 169.53 +1.0% 171.29 perf-stat.ps.cpu-migrations 7.99 -8.0 0.00 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout._copy_to_iter.get_random_bytes_user.__x64_sys_getrandom 19.23 -0.1 19.09 perf-profile.calltrace.cycles-pp._copy_to_iter.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe 10.66 -0.1 10.54 perf-profile.calltrace.cycles-pp.copyout._copy_to_iter.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64 75.51 +0.1 75.64 perf-profile.calltrace.cycles-pp.chacha_block_generic.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64.entry_SYSCALL_64_after_hwframe 66.37 +0.2 66.52 perf-profile.calltrace.cycles-pp.chacha_permute.chacha_block_generic.get_random_bytes_user.__x64_sys_getrandom.do_syscall_64 0.00 +7.9 7.87 perf-profile.calltrace.cycles-pp.copy_user_fast_string.copyout._copy_to_iter.get_random_bytes_user.__x64_sys_getrandom 9.17 -9.2 0.00 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string 20.05 -0.1 19.91 perf-profile.children.cycles-pp._copy_to_iter 10.11 -0.1 9.99 perf-profile.children.cycles-pp.copyout 0.09 ± 5% +0.0 0.10 perf-profile.children.cycles-pp.ktime_get_update_offsets_now 0.37 ± 3% +0.0 0.40 perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 77.15 +0.1 77.26 perf-profile.children.cycles-pp.chacha_block_generic 67.64 +0.2 67.79 perf-profile.children.cycles-pp.chacha_permute 0.00 +9.0 9.05 perf-profile.children.cycles-pp.copy_user_fast_string 8.92 -8.9 0.00 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string 6.23 -0.0 6.20 perf-profile.self.cycles-pp._copy_to_iter 0.08 ± 5% +0.0 0.10 ± 4% perf-profile.self.cycles-pp.ktime_get_update_offsets_now 0.31 +0.0 0.33 perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 67.15 +0.1 67.29 perf-profile.self.cycles-pp.chacha_permute 0.00 +8.8 8.79 perf-profile.self.cycles-pp.copy_user_fast_string 0.25 ± 78% +117.4% 0.55 ± 2% sched_debug.cfs_rq:/.h_nr_running.avg 3187 ± 35% +183.7% 9043 ± 38% sched_debug.cfs_rq:/.load.avg 1296 ± 29% -40.1% 775.67 ± 22% sched_debug.cfs_rq:/.load_avg.max 205.63 ± 22% -33.8% 136.07 ± 4% sched_debug.cfs_rq:/.load_avg.stddev 1405113 ±137% +195.0% 4144508 sched_debug.cfs_rq:/.min_vruntime.avg 1433314 ±135% +192.5% 4191972 sched_debug.cfs_rq:/.min_vruntime.max 977377 ±137% +194.8% 2880861 sched_debug.cfs_rq:/.min_vruntime.min 0.25 ± 79% +117.7% 0.54 sched_debug.cfs_rq:/.nr_running.avg 853.67 ± 28% -40.0% 512.00 sched_debug.cfs_rq:/.removed.load_avg.max 163.07 ± 30% -36.6% 103.40 ± 23% sched_debug.cfs_rq:/.removed.load_avg.stddev 372.81 ± 48% +71.4% 638.84 sched_debug.cfs_rq:/.runnable_avg.avg 1161 ± 6% +36.9% 1589 ± 9% sched_debug.cfs_rq:/.runnable_avg.max 132.33 ±141% +244.1% 455.33 ± 11% sched_debug.cfs_rq:/.runnable_avg.min 4861 ±272% +1135.9% 60082 ± 20% sched_debug.cfs_rq:/.spread0.avg 32740 ± 92% +230.0% 108041 ± 11% sched_debug.cfs_rq:/.spread0.max 371.90 ± 48% +70.2% 632.85 sched_debug.cfs_rq:/.util_avg.avg 99.71 ±110% +155.3% 254.61 sched_debug.cfs_rq:/.util_est_enqueued.avg 88286 ±137% +215.2% 278294 ± 4% sched_debug.cpu.avg_idle.min 6.10 ± 37% +58.2% 9.65 ± 17% sched_debug.cpu.clock.stddev 1268 ± 99% +147.4% 3137 sched_debug.cpu.curr->pid.avg 4681 ± 15% +22.9% 5754 sched_debug.cpu.curr->pid.max 972.83 ±141% +200.4% 2922 sched_debug.cpu.curr->pid.min 644790 ± 5% -10.0% 580100 sched_debug.cpu.max_idle_balance_cost.max 21765 ± 26% -40.9% 12856 ± 11% sched_debug.cpu.max_idle_balance_cost.stddev 0.00 ± 26% +51.9% 0.00 ± 13% sched_debug.cpu.next_balance.stddev 0.25 ± 80% +123.8% 0.55 ± 2% sched_debug.cpu.nr_running.avg 1.00 +50.0% 1.50 sched_debug.cpu.nr_running.max 232.33 ± 21% +56.1% 362.67 sched_debug.cpu.nr_switches.min *************************************************************************************************** lkp-csl-2sp7: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz (Cascade Lake) with 512G memory ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase: gcc-11/performance/x86_64-rhel-8.3/debian-x86_64-phoronix/lkp-csl-2sp7/ior-1.1.1/phoronix-test-suite commit: 20f3337d35 ("x86: don't use REP_GOOD or ERMS for small memory clearing") adfcf4231b ("x86: don't use REP_GOOD or ERMS for user memory copies") 20f3337d350c4e1b adfcf4231b8cbc2d9c1e7bfaa96 ---------------- --------------------------- %stddev %change %stddev \ | \ 2762 +24.7% 3444 phoronix-test-suite.ior.2MB./opt/rootfs.mb_s 2737 +25.7% 3441 phoronix-test-suite.ior.2MB.DefaultTestDirectory.mb_s 40826 ± 13% -21.1% 32218 ± 11% sched_debug.cfs_rq:/.min_vruntime.avg 51961 ± 10% -18.8% 42191 ± 7% sched_debug.cfs_rq:/.min_vruntime.max 4451 ± 11% -32.1% 3022 ± 18% sched_debug.cfs_rq:/.min_vruntime.stddev 4452 ± 11% -32.0% 3026 ± 18% sched_debug.cfs_rq:/.spread0.stddev 811283 ± 26% -22.1% 631725 ± 6% sched_debug.cpu.max_idle_balance_cost.max 5.24 ±124% -3.6 1.67 ±223% perf-profile.calltrace.cycles-pp.asm_exc_page_fault 5.24 ±124% -3.6 1.67 ±223% perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault 5.24 ±124% -3.6 1.67 ±223% perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 5.24 ±124% -3.6 1.67 ±223% perf-profile.children.cycles-pp.asm_exc_page_fault 5.24 ±124% -3.6 1.67 ±223% perf-profile.children.cycles-pp.exc_page_fault 5.24 ±124% -3.6 1.67 ±223% perf-profile.children.cycles-pp.do_user_addr_fault 3939447 ± 2% +5.8% 4169072 ± 3% perf-stat.i.cache-misses 1787 -3.2% 1730 perf-stat.i.cycles-between-cache-misses 67.42 -1.9 65.54 perf-stat.i.iTLB-load-miss-rate% 1009539 -3.7% 972226 perf-stat.i.iTLB-load-misses 644541 ± 2% +9.1% 703307 ± 4% perf-stat.i.iTLB-loads 142633 ± 3% +13.0% 161225 ± 8% perf-stat.i.node-store-misses 205884 ± 4% +17.7% 242357 ± 10% perf-stat.i.node-stores Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests