Greeting, FYI, we noticed a -9.0% regression of vm-scalability.throughput due to commit: commit: 4af77b68c2c6280230daf53fe8f13db706858187 ("mm: readahead: increase maximum readahead window") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master in testcase: vm-scalability on test machine: 4 threads Intel(R) Core(TM) i3-3220 CPU @ 3.30GHz with 8G memory with following parameters: runtime: 300s test: lru-file-readonce cpufreq_governor: performance test-description: The motivation behind this suite is to exercise functions and regions of the mm/ of the Linux kernel which are of interest to us. test-url: https://git.kernel.org/cgit/linux/kernel/git/wfg/vm-scalability.git/ Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/runtime/tbox_group/test/testcase: gcc-7/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/300s/lkp-ivb-d02/lru-file-readonce/vm-scalability commit: 214c9de472 ("mm/madvise: enable soft offline of HugeTLB pages at PUD level") 4af77b68c2 ("mm: readahead: increase maximum readahead window") 214c9de472b94aa8 4af77b68c2c6280230daf53fe8 ---------------- -------------------------- %stddev %change %stddev \ | \ 5587932 -9.0% 5085253 vm-scalability.throughput 1398617 -8.9% 1273916 vm-scalability.median 307.25 +2.0% 313.50 vm-scalability.time.percent_of_cpu_this_job_got 888.71 +2.6% 911.83 vm-scalability.time.system_time 42.20 -11.3% 37.42 vm-scalability.time.user_time 1.676e+09 -9.0% 1.526e+09 vm-scalability.workload 4415 ± 6% +46.1% 6451 ± 16% slabinfo.kmalloc-32.active_objs 4415 ± 6% +46.4% 6463 ± 15% slabinfo.kmalloc-32.num_objs 416118 -9.9% 374847 softirqs.RCU 586589 ± 3% -6.2% 550233 ± 4% softirqs.TIMER 4099 ± 2% -50.9% 2013 ± 28% proc-vmstat.allocstall_movable 4556 +33.4% 6079 ± 5% proc-vmstat.kswapd_low_wmark_hit_quickly 4558 +33.4% 6080 ± 5% proc-vmstat.pageoutrun 1819820 ± 2% -50.8% 896227 ± 24% proc-vmstat.pgscan_direct 1819820 ± 2% -50.8% 896227 ± 24% proc-vmstat.pgsteal_direct 49555 ± 41% +91.1% 94716 ± 18% sched_debug.cfs_rq:/.MIN_vruntime.avg 198221 ± 41% +91.1% 378866 ± 18% sched_debug.cfs_rq:/.MIN_vruntime.max 85832 ± 41% +91.1% 164054 ± 18% sched_debug.cfs_rq:/.MIN_vruntime.stddev 241173 ± 11% -22.8% 186090 ± 10% sched_debug.cfs_rq:/.load.min 392.96 ± 16% -28.3% 281.92 ± 18% sched_debug.cfs_rq:/.load_avg.min 49555 ± 41% +91.1% 94716 ± 18% sched_debug.cfs_rq:/.max_vruntime.avg 198222 ± 41% +91.1% 378867 ± 18% sched_debug.cfs_rq:/.max_vruntime.max 85833 ± 41% +91.1% 164054 ± 18% sched_debug.cfs_rq:/.max_vruntime.stddev 13913 ± 16% +58.1% 22002 ± 25% sched_debug.cfs_rq:/.min_vruntime.stddev 13913 ± 16% +58.1% 22001 ± 25% sched_debug.cfs_rq:/.spread0.stddev 241173 ± 11% -22.8% 186108 ± 10% sched_debug.cpu.load.min 2.07 ± 4% -16.1% 1.74 ± 10% sched_debug.cpu.nr_running.avg 5.284e+11 -8.7% 4.823e+11 perf-stat.branch-instructions 0.35 -0.0 0.31 ± 4% perf-stat.branch-miss-rate% 1.864e+09 -19.8% 1.494e+09 ± 3% perf-stat.branch-misses 62.27 +5.4 67.71 perf-stat.cache-miss-rate% 2.965e+10 +11.8% 3.314e+10 perf-stat.cache-misses 4.762e+10 +2.8% 4.894e+10 perf-stat.cache-references 9160936 +1.2% 9268581 perf-stat.context-switches 1.41 +9.9% 1.55 perf-stat.cpi 19670 -3.1% 19055 perf-stat.cpu-migrations 8.242e+11 -8.9% 7.512e+11 perf-stat.dTLB-loads 0.14 ± 4% -0.0 0.11 ± 10% perf-stat.dTLB-store-miss-rate% 8.942e+08 ± 4% -25.3% 6.682e+08 ± 9% perf-stat.dTLB-store-misses 6.421e+11 -8.9% 5.851e+11 perf-stat.dTLB-stores 93.94 -1.4 92.56 perf-stat.iTLB-load-miss-rate% 4.418e+08 ± 3% -17.8% 3.631e+08 ± 10% perf-stat.iTLB-load-misses 2.77e+12 -8.8% 2.526e+12 perf-stat.instructions 0.71 -9.0% 0.64 perf-stat.ipc 39.34 ± 2% -4.5 34.84 perf-profile.calltrace.cycles-pp.__do_page_cache_readahead.ondemand_readahead.generic_file_read_iter.xfs_file_buffered_aio_read.xfs_file_read_iter 39.34 ± 2% -4.5 34.85 perf-profile.calltrace.cycles-pp.ondemand_readahead.generic_file_read_iter.xfs_file_buffered_aio_read.xfs_file_read_iter.__vfs_read 31.06 ± 2% -3.3 27.80 perf-profile.calltrace.cycles-pp.mpage_readpages.__do_page_cache_readahead.ondemand_readahead.generic_file_read_iter.xfs_file_buffered_aio_read 17.78 ± 2% -2.1 15.64 perf-profile.calltrace.cycles-pp.do_mpage_readpage.mpage_readpages.__do_page_cache_readahead.ondemand_readahead.generic_file_read_iter 12.68 -1.3 11.36 perf-profile.calltrace.cycles-pp.add_to_page_cache_lru.mpage_readpages.__do_page_cache_readahead.ondemand_readahead.generic_file_read_iter 7.54 -0.9 6.68 perf-profile.calltrace.cycles-pp.__add_to_page_cache_locked.add_to_page_cache_lru.mpage_readpages.__do_page_cache_readahead.ondemand_readahead 5.66 ± 3% -0.6 5.05 ± 3% perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.__do_page_cache_readahead.ondemand_readahead.generic_file_read_iter.xfs_file_buffered_aio_read 6.79 ± 6% -0.5 6.33 perf-profile.calltrace.cycles-pp.__remove_mapping.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node 17.75 ± 6% -0.1 17.60 perf-profile.calltrace.cycles-pp.shrink_node_memcg.shrink_node.kswapd.kthread.ret_from_fork 18.09 ± 6% -0.1 17.94 perf-profile.calltrace.cycles-pp.kswapd.kthread.ret_from_fork 18.07 ± 6% -0.1 17.93 perf-profile.calltrace.cycles-pp.shrink_node.kswapd.kthread.ret_from_fork 18.22 ± 6% -0.1 18.09 perf-profile.calltrace.cycles-pp.ret_from_fork 18.22 ± 6% -0.1 18.09 perf-profile.calltrace.cycles-pp.kthread.ret_from_fork 17.71 ± 6% -0.1 17.57 perf-profile.calltrace.cycles-pp.shrink_inactive_list.shrink_node_memcg.shrink_node.kswapd.kthread 14.98 ± 6% -0.0 14.93 perf-profile.calltrace.cycles-pp.shrink_page_list.shrink_inactive_list.shrink_node_memcg.shrink_node.kswapd 69.64 +1.7 71.34 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_fastpath 65.92 ± 2% +2.2 68.13 perf-profile.calltrace.cycles-pp.sys_read.entry_SYSCALL_64_fastpath 65.47 ± 2% +2.3 67.78 perf-profile.calltrace.cycles-pp.vfs_read.sys_read.entry_SYSCALL_64_fastpath 61.42 ± 2% +2.5 63.92 perf-profile.calltrace.cycles-pp.xfs_file_buffered_aio_read.xfs_file_read_iter.__vfs_read.vfs_read.sys_read 63.02 ± 2% +2.6 65.59 perf-profile.calltrace.cycles-pp.__vfs_read.vfs_read.sys_read.entry_SYSCALL_64_fastpath 60.15 ± 2% +2.6 62.74 perf-profile.calltrace.cycles-pp.generic_file_read_iter.xfs_file_buffered_aio_read.xfs_file_read_iter.__vfs_read.vfs_read 62.43 ± 2% +2.6 65.05 perf-profile.calltrace.cycles-pp.xfs_file_read_iter.__vfs_read.vfs_read.sys_read.entry_SYSCALL_64_fastpath 13.23 +7.3 20.52 perf-profile.calltrace.cycles-pp.copy_page_to_iter.generic_file_read_iter.xfs_file_buffered_aio_read.xfs_file_read_iter.__vfs_read 11.92 +7.5 19.46 perf-profile.calltrace.cycles-pp.copyout.copy_page_to_iter.generic_file_read_iter.xfs_file_buffered_aio_read.xfs_file_read_iter 11.79 ± 2% +7.6 19.38 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyout.copy_page_to_iter.generic_file_read_iter.xfs_file_buffered_aio_read 39.36 ± 2% -4.5 34.85 perf-profile.children.cycles-pp.ondemand_readahead 39.34 ± 2% -4.5 34.84 perf-profile.children.cycles-pp.__do_page_cache_readahead 31.12 ± 2% -3.2 27.88 perf-profile.children.cycles-pp.mpage_readpages 17.93 ± 2% -2.1 15.80 perf-profile.children.cycles-pp.do_mpage_readpage 12.73 -1.3 11.40 perf-profile.children.cycles-pp.add_to_page_cache_lru 7.78 ± 2% -0.9 6.88 perf-profile.children.cycles-pp.__add_to_page_cache_locked 6.09 ± 2% -0.8 5.25 ± 3% perf-profile.children.cycles-pp.__radix_tree_lookup 5.85 ± 2% -0.6 5.24 ± 3% perf-profile.children.cycles-pp.__alloc_pages_nodemask 6.92 ± 6% -0.5 6.45 perf-profile.children.cycles-pp.__remove_mapping 5.02 ± 2% -0.5 4.56 ± 4% perf-profile.children.cycles-pp.get_page_from_freelist 18.11 ± 6% -0.2 17.93 perf-profile.children.cycles-pp.shrink_node 17.78 ± 6% -0.2 17.60 perf-profile.children.cycles-pp.shrink_node_memcg 17.75 ± 6% -0.2 17.58 perf-profile.children.cycles-pp.shrink_inactive_list 18.09 ± 6% -0.1 17.94 perf-profile.children.cycles-pp.kswapd 18.23 ± 6% -0.1 18.09 perf-profile.children.cycles-pp.ret_from_fork 18.22 ± 6% -0.1 18.09 perf-profile.children.cycles-pp.kthread 15.09 ± 6% -0.1 15.03 perf-profile.children.cycles-pp.shrink_page_list 69.83 +1.7 71.55 perf-profile.children.cycles-pp.entry_SYSCALL_64_fastpath 66.16 ± 2% +2.2 68.33 perf-profile.children.cycles-pp.sys_read 65.64 ± 2% +2.3 67.94 perf-profile.children.cycles-pp.vfs_read 61.57 ± 2% +2.4 64.02 perf-profile.children.cycles-pp.xfs_file_buffered_aio_read 60.36 ± 2% +2.6 62.93 perf-profile.children.cycles-pp.generic_file_read_iter 63.18 ± 2% +2.6 65.76 perf-profile.children.cycles-pp.__vfs_read 62.46 ± 2% +2.6 65.08 perf-profile.children.cycles-pp.xfs_file_read_iter 13.38 +7.3 20.63 perf-profile.children.cycles-pp.copy_page_to_iter 11.96 +7.5 19.48 perf-profile.children.cycles-pp.copyout 11.81 ± 2% +7.6 19.41 perf-profile.children.cycles-pp.copy_user_enhanced_fast_string 14.34 ± 2% -1.7 12.68 perf-profile.self.cycles-pp.do_mpage_readpage 6.04 ± 2% -0.8 5.21 ± 3% perf-profile.self.cycles-pp.__radix_tree_lookup 11.69 +7.5 19.23 perf-profile.self.cycles-pp.copy_user_enhanced_fast_string vm-scalability.throughput 5.7e+06 +-+---------------------------------------------------------------+ |.+.+ + .+.+.+ +.+.+ + .+.+.+ :.+.+. | 5.6e+06 +-+ +.+ : + + + +.+.+. .+.| | +. .+.+.+.+.+.+ + | 5.5e+06 +-+ + | | | 5.4e+06 +-+ | | | 5.3e+06 +-+ | | | 5.2e+06 +-+ | | O O O O | 5.1e+06 +-+ O O O O O O O O | O O O O O | 5e+06 +-O-O---O-O-------------------------------------------------------+ vm-scalability.median 1.44e+06 +-+--------------------------------------------------------------+ 1.42e+06 +-+ .+ + + .+ .+ | |.+.+ + .++ + .++.+ + .+.+.+ + .+ .| 1.4e+06 +-+ + + .+.+. .+.+.+ + +.+ +.+.+.+.+ | 1.38e+06 +-+ +.+ + | | | 1.36e+06 +-+ | 1.34e+06 +-+ | 1.32e+06 +-+ | | | 1.3e+06 +-+ O O O O | 1.28e+06 +-+ O O O O O O O O | | O O O OO O O | 1.26e+06 O-+ O | 1.24e+06 +-+--------------------------------------------------------------+ vm-scalability.workload 1.75e+09 +-+--------------------------------------------------------------+ | | | | 1.7e+09 +-+.+.+. +.+.+ .++.+.+. .+.+.+.+. .+.+ | | +.+ + .+. .+ + + +.+.+. .+.| | +.+.+.+.+ + + | 1.65e+09 +-+ | | | 1.6e+09 +-+ | | | | | 1.55e+09 +-+ O O | | O O O O O O O O O O | | O O O | 1.5e+09 O-O-O-O-O-O------------------------------------------------------+ perf-stat.cache-misses 3.5e+10 +-+---------------------------------------------------------------+ | O | 3.4e+10 +-+ O | | O OO O O O O | 3.3e+10 O-O O O O O | | O O O O | 3.2e+10 +-+ O O | | | 3.1e+10 +-+ + | | + + | 3e+10 +-+.+.+.+. +.+ .+ .+.+.+.+. .+.+.++ +. .+.+. .| | +. + +.+. .+ : +.+ + + +.+ | 2.9e+10 +-+ + +.+ : + | | + | 2.8e+10 +-+---------------------------------------------------------------+ perf-stat.branch-instructions 5.4e+11 +-+---------------------------------------------------------------+ | + | 5.3e+11 +-+ .+.+. +. .+.+.+. .++.+.+. + : :| | +.+ +.+.+.+ : +.+ +.+ +.+.+ : :| 5.2e+11 +-+ +.+.+. .+. .+ : + | | + + : : | 5.1e+11 +-+ :: | | + | 5e+11 +-+ | | O | 4.9e+11 +-+ O O | | O O O O O | 4.8e+11 +-+ O O O O O O | O O O O O O | 4.7e+11 +-O---------------------------------------------------------------+ perf-stat.dTLB-loads 8.3e+11 +-+---------------------------------------------------------------+ 8.2e+11 +-+.+ +.+.+ + .+.+. + +.+ + +.+ + +.+ + .+ | | + + +.+.+ + | 8.1e+11 +-+ + + | 8e+11 +-+ + | | | 7.9e+11 +-+ | 7.8e+11 +-+ | 7.7e+11 +-+ | | O O | 7.6e+11 +-+ O O O O O | 7.5e+11 +-+ O O O O O O | O O O O O O O | 7.4e+11 +-O | 7.3e+11 +-+---------------------------------------------------------------+ perf-stat.dTLB-stores 6.6e+11 +-+---------------------------------------------------------------+ | .+. | 6.5e+11 +-+.+.+. .+ + .+.+.+.+. .+.+.+.+ .+.+. .+. .| 6.4e+11 +-+ +.+ :.+.+. .+.+.+ + + + +.+.+ | | + +.+ | 6.3e+11 +-+ | 6.2e+11 +-+ | | | 6.1e+11 +-+ | 6e+11 +-+ | | | 5.9e+11 +-+ O O O O O O O O | 5.8e+11 O-+ O O O O O O O O O O | | O | 5.7e+11 +-O---------------------------------------------------------------+ perf-stat.cache-miss-rate_ 69 +-+--------------------------------------------------------------------+ | O O O | 68 O-O O O O O O O O O O O O O O O O | 67 +-+ O | | | 66 +-+ | 65 +-+ | | | 64 +-+ | 63 +-+ | | .+ + .+. .| 62 +-+ .+.. .+.+.+.+. .+ : + + .+.+.+ + | 61 +-+.+.+.+ .+ + : .+.+.+.+.+.+.+.+.+.+ +. | | + +.+. | 60 +-+--------------------------------------------------------------------+ perf-stat.ipc 0.72 +-+------------------------------------------------------------------+ 0.71 +-+ + :| |. .+.+. .+.+. +. .+.+. .+. .+.+. .+.+.+.+. + + :| 0.7 +-+ +.+.+ +. : + + + + + + | 0.69 +-+ +.+.+.+.+.+ : | | + : | 0.68 +-+ + | 0.67 +-+ | 0.66 +-+ | | O O | 0.65 +-+ O O O O O O | 0.64 +-+ O O O O O | O O O O O O O | 0.63 +-O | 0.62 +-+------------------------------------------------------------------+ perf-stat.cpi 1.6 +-O------------------------------------------------------------------+ | O O O | O O O O O O O O | 1.55 +-+ O O O O O O | | O O O | | | 1.5 +-+ | | + | 1.45 +-+ .+. .+. + + | |.+. +. .+.+.+ + + + .+. +. .+. +. + | | +. + +.+.+.+ + +. + + +. + +.+.+.+.+. + +| 1.4 +-+ + + + + | | | | | 1.35 +-+------------------------------------------------------------------+ vm-scalability.time.percent_of_cpu_this_job_got 316 +-+-------------------------------------------------------------------+ | O O O | 314 O-O O O O O | | O O O O O O O | | | 312 +-+ O O O O O | | | 310 +-+ | | .+ | 308 +-+ .+ + .+.+.+ .+. | | +.+ + : +.+.+ +.| | .. : : | 306 +-+.+.+.+.+. .+.+ +.+.+.+.+.+.+.+.+..+. : | | + +.+ | 304 +-+-------------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Xiaolong