Greeting, FYI, we noticed a -4.8% regression of will-it-scale.per_process_ops due to commit: commit: 9bd0e337c633aed3e8ec3c7397b7ae0b8436f163 ("[PATCH 01/29] iov_iter: Switch to using a table of operations") url: https://github.com/0day-ci/linux/commits/David-Howells/RFC-iov_iter-Switch-to-using-an-ops-table/20201121-222344 base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 27bba9c532a8d21050b94224ffd310ad0058c353 in testcase: will-it-scale on test machine: 48 threads Intel(R) Xeon(R) CPU E5-2697 v2 @ 2.70GHz with 112G memory with following parameters: nr_task: 50% mode: process test: pwrite1 cpufreq_governor: performance ucode: 0x42e test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-ivb-2ep1/pwrite1/will-it-scale/0x42e commit: 27bba9c532 ("Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi") 9bd0e337c6 ("iov_iter: Switch to using a table of operations") 27bba9c532a8d210 9bd0e337c633aed3e8ec3c7397b ---------------- --------------------------- %stddev %change %stddev \ | \ 28443113 -4.8% 27064036 will-it-scale.24.processes 1185129 -4.8% 1127667 will-it-scale.per_process_ops 28443113 -4.8% 27064036 will-it-scale.workload 13.84 +1.0% 13.98 boot-time.dhcp 0.00 ± 9% -13.5% 0.00 ± 3% sched_debug.cpu.next_balance.stddev 1251 ± 9% -17.2% 1035 ± 10% slabinfo.dmaengine-unmap-16.active_objs 1251 ± 9% -17.2% 1035 ± 10% slabinfo.dmaengine-unmap-16.num_objs 24623 ± 5% -18.0% 20184 ± 15% softirqs.CPU0.RCU 28877 ± 10% -30.6% 20051 ± 15% softirqs.CPU19.RCU 5693 ± 31% +402.3% 28595 ± 22% softirqs.CPU19.SCHED 21142 ± 15% -26.5% 15533 ± 11% softirqs.CPU27.RCU 20776 ± 38% -50.5% 10290 ± 58% softirqs.CPU3.SCHED 26618 ± 11% -35.3% 17214 ± 6% softirqs.CPU37.RCU 10894 ± 48% +175.5% 30012 ± 34% softirqs.CPU37.SCHED 17015 ± 4% +39.2% 23681 ± 7% softirqs.CPU43.RCU 411.75 ± 58% +76.8% 728.00 ± 32% numa-vmstat.node0.nr_active_anon 34304 ± 2% -35.6% 22103 ± 48% numa-vmstat.node0.nr_anon_pages 36087 ± 2% -31.0% 24915 ± 43% numa-vmstat.node0.nr_inactive_anon 2233 ± 51% +60.4% 3582 ± 7% numa-vmstat.node0.nr_shmem 411.75 ± 58% +76.8% 728.00 ± 32% numa-vmstat.node0.nr_zone_active_anon 36087 ± 2% -31.0% 24915 ± 43% numa-vmstat.node0.nr_zone_inactive_anon 24265 ± 3% +51.3% 36707 ± 29% numa-vmstat.node1.nr_anon_pages 25441 ± 2% +44.9% 36858 ± 29% numa-vmstat.node1.nr_inactive_anon 537.25 ± 20% +22.8% 659.50 ± 10% numa-vmstat.node1.nr_page_table_pages 25441 ± 2% +44.9% 36858 ± 29% numa-vmstat.node1.nr_zone_inactive_anon 1649 ± 58% +76.7% 2913 ± 32% numa-meminfo.node0.Active 1649 ± 58% +76.7% 2913 ± 32% numa-meminfo.node0.Active(anon) 137223 ± 2% -35.6% 88410 ± 48% numa-meminfo.node0.AnonPages 164997 ± 9% -28.4% 118095 ± 42% numa-meminfo.node0.AnonPages.max 144353 ± 2% -31.0% 99656 ± 43% numa-meminfo.node0.Inactive 144353 ± 2% -31.0% 99656 ± 43% numa-meminfo.node0.Inactive(anon) 8937 ± 51% +60.3% 14328 ± 7% numa-meminfo.node0.Shmem 97072 ± 3% +51.3% 146858 ± 29% numa-meminfo.node1.AnonPages 127410 ± 5% +43.2% 182468 ± 16% numa-meminfo.node1.AnonPages.max 101822 ± 2% +44.9% 147521 ± 29% numa-meminfo.node1.Inactive 101822 ± 2% +44.9% 147521 ± 29% numa-meminfo.node1.Inactive(anon) 2148 ± 20% +22.9% 2639 ± 10% numa-meminfo.node1.PageTables 3431 ± 89% -85.1% 512.25 ±109% interrupts.38:PCI-MSI.2621444-edge.eth0-TxRx-3 348.50 ± 62% +152.7% 880.75 ± 27% interrupts.40:PCI-MSI.2621446-edge.eth0-TxRx-5 1697 ± 63% -53.1% 796.75 ± 13% interrupts.CPU13.CAL:Function_call_interrupts 89.75 ± 36% +220.3% 287.50 ± 20% interrupts.CPU13.RES:Rescheduling_interrupts 745.75 ± 3% +104.6% 1526 ± 69% interrupts.CPU19.CAL:Function_call_interrupts 293.00 ± 5% -60.0% 117.25 ± 47% interrupts.CPU19.RES:Rescheduling_interrupts 778.50 ± 9% +123.7% 1741 ± 64% interrupts.CPU22.CAL:Function_call_interrupts 6450 ± 29% -38.0% 4000 ± 4% interrupts.CPU24.NMI:Non-maskable_interrupts 6450 ± 29% -38.0% 4000 ± 4% interrupts.CPU24.PMI:Performance_monitoring_interrupts 2012 ± 56% -57.6% 852.75 ± 6% interrupts.CPU26.CAL:Function_call_interrupts 184.25 ± 37% -47.9% 96.00 ± 49% interrupts.CPU27.RES:Rescheduling_interrupts 0.50 ±100% +64250.0% 321.75 ±170% interrupts.CPU28.TLB:TLB_shootdowns 3431 ± 89% -85.1% 512.25 ±109% interrupts.CPU29.38:PCI-MSI.2621444-edge.eth0-TxRx-3 348.50 ± 62% +152.7% 880.75 ± 27% interrupts.CPU31.40:PCI-MSI.2621446-edge.eth0-TxRx-5 156.50 ± 51% -51.3% 76.25 ± 59% interrupts.CPU33.RES:Rescheduling_interrupts 883.50 ± 18% -23.8% 673.25 ± 22% interrupts.CPU36.CAL:Function_call_interrupts 7492 ± 13% -45.6% 4073 ± 63% interrupts.CPU37.NMI:Non-maskable_interrupts 7492 ± 13% -45.6% 4073 ± 63% interrupts.CPU37.PMI:Performance_monitoring_interrupts 250.50 ± 19% -52.5% 119.00 ± 50% interrupts.CPU37.RES:Rescheduling_interrupts 4688 ± 27% +63.5% 7667 ± 15% interrupts.CPU40.NMI:Non-maskable_interrupts 4688 ± 27% +63.5% 7667 ± 15% interrupts.CPU40.PMI:Performance_monitoring_interrupts 96.75 ± 92% +135.1% 227.50 ± 22% interrupts.CPU43.RES:Rescheduling_interrupts 2932 ± 36% +73.4% 5084 ± 21% interrupts.CPU47.NMI:Non-maskable_interrupts 2932 ± 36% +73.4% 5084 ± 21% interrupts.CPU47.PMI:Performance_monitoring_interrupts 57.50 ± 78% +250.4% 201.50 ± 42% interrupts.CPU47.RES:Rescheduling_interrupts 4207 ± 61% +86.0% 7827 ± 11% interrupts.CPU8.NMI:Non-maskable_interrupts 4207 ± 61% +86.0% 7827 ± 11% interrupts.CPU8.PMI:Performance_monitoring_interrupts 1.089e+10 -2.3% 1.064e+10 perf-stat.i.branch-instructions 1.62 +0.7 2.34 perf-stat.i.branch-miss-rate% 1.741e+08 +42.3% 2.476e+08 perf-stat.i.branch-misses 1.36 +3.3% 1.41 perf-stat.i.cpi 1.233e+08 ± 3% -7.1% 1.146e+08 perf-stat.i.dTLB-load-misses 2.38e+10 -3.3% 2.302e+10 perf-stat.i.dTLB-loads 57501510 -4.9% 54711717 perf-stat.i.dTLB-store-misses 1.828e+10 -3.7% 1.761e+10 perf-stat.i.dTLB-stores 98.97 -2.9 96.02 ± 2% perf-stat.i.iTLB-load-miss-rate% 29795797 ± 4% -5.0% 28320171 perf-stat.i.iTLB-load-misses 299268 ± 2% +298.1% 1191476 ± 50% perf-stat.i.iTLB-loads 5.335e+10 -3.7% 5.138e+10 perf-stat.i.instructions 0.74 -3.7% 0.71 perf-stat.i.ipc 0.20 ± 8% +12.1% 0.23 perf-stat.i.major-faults 1104 -3.2% 1069 perf-stat.i.metric.M/sec 72308 +2.3% 73975 ± 2% perf-stat.i.node-stores 0.10 +7.9% 0.11 ± 8% perf-stat.overall.MPKI 1.60 +0.7 2.33 perf-stat.overall.branch-miss-rate% 1.35 +4.1% 1.41 perf-stat.overall.cpi 99.00 -3.0 95.98 ± 2% perf-stat.overall.iTLB-load-miss-rate% 0.74 -3.9% 0.71 perf-stat.overall.ipc 1.085e+10 -2.3% 1.06e+10 perf-stat.ps.branch-instructions 1.735e+08 +42.3% 2.468e+08 perf-stat.ps.branch-misses 1.229e+08 ± 3% -7.1% 1.142e+08 perf-stat.ps.dTLB-load-misses 2.372e+10 -3.3% 2.294e+10 perf-stat.ps.dTLB-loads 57306258 -4.9% 54525679 perf-stat.ps.dTLB-store-misses 1.822e+10 -3.7% 1.755e+10 perf-stat.ps.dTLB-stores 29695158 ± 4% -5.0% 28224049 perf-stat.ps.iTLB-load-misses 298257 ± 2% +298.1% 1187498 ± 50% perf-stat.ps.iTLB-loads 5.317e+10 -3.7% 5.12e+10 perf-stat.ps.instructions 0.20 ± 7% +12.0% 0.23 ± 2% perf-stat.ps.major-faults 1.613e+13 -3.9% 1.55e+13 perf-stat.total.instructions 8.00 ± 14% -8.0 0.00 perf-profile.calltrace.cycles-pp.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write 7.38 ± 14% -7.4 0.00 perf-profile.calltrace.cycles-pp.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter 7.27 ± 14% -7.3 0.00 perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin.iov_iter_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter 0.69 ± 14% -0.4 0.29 ±100% perf-profile.calltrace.cycles-pp.up_write.generic_file_write_iter.new_sync_write.vfs_write.ksys_pwrite64 0.62 ± 15% -0.3 0.30 ±101% perf-profile.calltrace.cycles-pp.unlock_page.shmem_write_end.generic_perform_write.__generic_file_write_iter.generic_file_write_iter 0.85 ± 8% -0.2 0.66 ± 15% perf-profile.calltrace.cycles-pp.__fget_light.ksys_pwrite64.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_pwrite 0.91 ± 11% -0.1 0.79 ± 12% perf-profile.calltrace.cycles-pp.file_update_time.__generic_file_write_iter.generic_file_write_iter.new_sync_write.vfs_write 0.00 +1.0 1.01 ± 13% perf-profile.calltrace.cycles-pp.__get_user_nocheck_1.xxx_fault_in_readable.generic_perform_write.__generic_file_write_iter.generic_file_write_iter 0.00 +1.4 1.42 ± 12% perf-profile.calltrace.cycles-pp.xxx_advance.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write 0.00 +2.1 2.15 ± 13% perf-profile.calltrace.cycles-pp.xxx_fault_in_readable.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write 0.00 +6.8 6.82 ± 13% perf-profile.calltrace.cycles-pp.copy_user_enhanced_fast_string.copyin.xxx_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter 0.00 +6.9 6.92 ± 13% perf-profile.calltrace.cycles-pp.copyin.xxx_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter 0.00 +8.1 8.09 ± 14% perf-profile.calltrace.cycles-pp.xxx_copy_from_user_atomic.generic_perform_write.__generic_file_write_iter.generic_file_write_iter.new_sync_write 8.03 ± 14% -8.0 0.00 perf-profile.children.cycles-pp.iov_iter_copy_from_user_atomic 0.85 ± 8% -0.2 0.66 ± 15% perf-profile.children.cycles-pp.__fget_light 0.69 ± 14% -0.2 0.52 ± 15% perf-profile.children.cycles-pp.up_write 0.62 ± 13% -0.2 0.46 ± 14% perf-profile.children.cycles-pp.apparmor_file_permission 0.94 ± 11% -0.1 0.82 ± 13% perf-profile.children.cycles-pp.file_update_time 0.51 ± 12% -0.1 0.40 ± 14% perf-profile.children.cycles-pp.balance_dirty_pages_ratelimited 0.55 ± 12% -0.1 0.47 ± 12% perf-profile.children.cycles-pp.current_time 0.62 ± 14% -0.1 0.55 ± 13% perf-profile.children.cycles-pp.unlock_page 0.24 ± 13% -0.0 0.20 ± 16% perf-profile.children.cycles-pp.timestamp_truncate 0.18 ± 11% -0.0 0.14 ± 15% perf-profile.children.cycles-pp.file_remove_privs 0.55 ± 14% +0.3 0.87 ± 15% perf-profile.children.cycles-pp.__x86_retpoline_rax 0.00 +1.4 1.42 ± 12% perf-profile.children.cycles-pp.xxx_advance 0.00 +2.2 2.22 ± 13% perf-profile.children.cycles-pp.xxx_fault_in_readable 0.00 +8.1 8.12 ± 14% perf-profile.children.cycles-pp.xxx_copy_from_user_atomic 1.02 ± 16% -0.2 0.82 ± 12% perf-profile.self.cycles-pp.shmem_getpage_gfp 0.82 ± 8% -0.2 0.63 ± 15% perf-profile.self.cycles-pp.__fget_light 0.66 ± 14% -0.2 0.49 ± 15% perf-profile.self.cycles-pp.up_write 0.54 ± 15% -0.2 0.39 ± 14% perf-profile.self.cycles-pp.apparmor_file_permission 0.59 ± 13% -0.1 0.46 ± 13% perf-profile.self.cycles-pp.ksys_pwrite64 0.50 ± 12% -0.1 0.40 ± 13% perf-profile.self.cycles-pp.balance_dirty_pages_ratelimited 0.24 ± 15% -0.0 0.19 ± 15% perf-profile.self.cycles-pp.timestamp_truncate 0.20 ± 13% -0.0 0.17 ± 12% perf-profile.self.cycles-pp.current_time 0.12 ± 14% +0.1 0.19 ± 14% perf-profile.self.cycles-pp.__x86_indirect_thunk_rax 0.43 ± 14% +0.3 0.68 ± 15% perf-profile.self.cycles-pp.__x86_retpoline_rax 0.00 +1.1 1.14 ± 15% perf-profile.self.cycles-pp.xxx_copy_from_user_atomic 0.00 +1.2 1.21 ± 12% perf-profile.self.cycles-pp.xxx_fault_in_readable 0.00 +1.3 1.28 ± 12% perf-profile.self.cycles-pp.xxx_advance will-it-scale.24.processes 2.88e+07 +----------------------------------------------------------------+ 2.86e+07 |-+ +.+.+..+. | | +. + +. .+. | 2.84e+07 |.+.+.+.+. + +.+.+.+.+ + + | 2.82e+07 |-+ +.+ | | | 2.8e+07 |-+ | 2.78e+07 |-+ | 2.76e+07 |-+ | | | 2.74e+07 |-+ | 2.72e+07 |-O O O O O O O O O O O O O O O O | | O O O O O O O O O O O O O | 2.7e+07 |-+ O O | 2.68e+07 +----------------------------------------------------------------+ will-it-scale.per_process_ops 1.2e+06 +----------------------------------------------------------------+ | +.+.+..+. | 1.19e+06 |-+ +. + +. .+. | 1.18e+06 |.+.+.+.+ + +.+.+.+.+ + + | | + .+ | 1.17e+06 |-+ + | | | 1.16e+06 |-+ | | | 1.15e+06 |-+ | 1.14e+06 |-+ | | O O O O | 1.13e+06 |-O O O O O O O O O O O O O O O O O O O O O O O | | O O O O | 1.12e+06 +----------------------------------------------------------------+ will-it-scale.workload 2.88e+07 +----------------------------------------------------------------+ 2.86e+07 |-+ +.+.+..+. | | +. + +. .+. | 2.84e+07 |.+.+.+.+. + +.+.+.+.+ + + | 2.82e+07 |-+ +.+ | | | 2.8e+07 |-+ | 2.78e+07 |-+ | 2.76e+07 |-+ | | | 2.74e+07 |-+ | 2.72e+07 |-O O O O O O O O O O O O O O O O | | O O O O O O O O O O O O O | 2.7e+07 |-+ O O | 2.68e+07 +----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Oliver Sang