Greeting, FYI, we noticed a -4.1% regression of will-it-scale.per_process_ops due to commit: commit: 755d6edc1aee4489c90975ec093d724d5492cecd ("[PATCH] mm: release the spinlock on zap_pte_range") url: https://github.com/0day-ci/linux/commits/Minchan-Kim/mm-release-the-spinlock-on-zap_pte_range/20190730-010638 in testcase: will-it-scale on test machine: 8 threads Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz with 16G memory with following parameters: nr_task: 100% mode: process test: malloc1 cpufreq_governor: performance ucode: 0x21 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-2019-05-14.cgz/lkp-ivb-d01/malloc1/will-it-scale/0x21 commit: v5.3-rc2 755d6edc1a ("mm: release the spinlock on zap_pte_range") v5.3-rc2 755d6edc1aee4489c90975ec093 ---------------- --------------------------- fail:runs %reproduction fail:runs | | | 1:5 -20% :4 dmesg.RIP:__d_lookup_rcu 1:5 -20% :4 dmesg.RIP:mnt_drop_write :5 20% 1:4 kmsg.ab33a8>]usb_hcd_irq :5 20% 1:4 kmsg.b445f28>]usb_hcd_irq :5 20% 1:4 kmsg.cdf63ef>]usb_hcd_irq 1:5 -20% :4 kmsg.d4af11>]usb_hcd_irq 1:5 -20% :4 kmsg.d9>]usb_hcd_irq :5 20% 1:4 kmsg.f805d78>]usb_hcd_irq 5:5 -7% 4:4 perf-profile.calltrace.cycles-pp.error_entry 7:5 -39% 5:4 perf-profile.children.cycles-pp.error_entry 0:5 -1% 0:4 perf-profile.children.cycles-pp.error_exit 5:5 -30% 4:4 perf-profile.self.cycles-pp.error_entry %stddev %change %stddev \ | \ 119757 -4.1% 114839 will-it-scale.per_process_ops 958059 -4.1% 918718 will-it-scale.workload 2429 ± 16% -34.5% 1591 ± 32% cpuidle.C1.usage 0.97 ± 88% -0.7 0.26 mpstat.cpu.all.idle% 78.40 +2.0% 80.00 vmstat.cpu.sy 45.42 +2.1% 46.38 turbostat.CorWatt 50.46 +2.0% 51.45 turbostat.PkgWatt 6641 ± 4% +8.6% 7215 ± 8% slabinfo.anon_vma_chain.num_objs 1327 ± 3% +23.0% 1632 ± 5% slabinfo.kmalloc-96.active_objs 1327 ± 3% +23.0% 1632 ± 5% slabinfo.kmalloc-96.num_objs 1235 ± 30% +37.7% 1700 ± 18% interrupts.29:PCI-MSI.409600-edge.eth0 4361 ± 81% +149.4% 10876 ± 32% interrupts.CPU0.NMI:Non-maskable_interrupts 4361 ± 81% +149.4% 10876 ± 32% interrupts.CPU0.PMI:Performance_monitoring_interrupts 1235 ± 30% +37.7% 1700 ± 18% interrupts.CPU7.29:PCI-MSI.409600-edge.eth0 93196 +9.1% 101723 ± 6% sched_debug.cfs_rq:/.load.min 15.37 ± 11% +13.6% 17.46 ± 3% sched_debug.cfs_rq:/.nr_spread_over.max 5.01 ± 11% +14.5% 5.74 ± 4% sched_debug.cfs_rq:/.nr_spread_over.stddev 53.80 ± 15% +41.6% 76.21 ± 7% sched_debug.cfs_rq:/.util_avg.stddev 60098 +1.6% 61056 proc-vmstat.nr_active_anon 6867 -1.2% 6781 proc-vmstat.nr_slab_unreclaimable 60098 +1.6% 61056 proc-vmstat.nr_zone_active_anon 5.757e+08 -4.2% 5.517e+08 proc-vmstat.numa_hit 5.757e+08 -4.2% 5.517e+08 proc-vmstat.numa_local 5.758e+08 -4.1% 5.52e+08 proc-vmstat.pgalloc_normal 2.881e+08 -4.1% 2.762e+08 proc-vmstat.pgfault 5.758e+08 -4.1% 5.52e+08 proc-vmstat.pgfree 2.861e+09 ± 41% +41.1% 4.038e+09 perf-stat.i.branch-instructions 41921318 ± 38% +34.9% 56552695 ± 2% perf-stat.i.cache-references 2.173e+10 ± 41% +34.9% 2.931e+10 perf-stat.i.cpu-cycles 2.26e+09 ± 41% +41.3% 3.194e+09 perf-stat.i.dTLB-stores 57813 ± 26% +66.7% 96370 ± 6% perf-stat.i.iTLB-loads 1.365e+10 ± 41% +37.9% 1.882e+10 perf-stat.i.instructions 661.20 ± 40% +45.4% 961.52 perf-stat.i.instructions-per-iTLB-miss 0.47 ± 41% +37.3% 0.64 perf-stat.i.ipc 948620 -3.5% 915067 perf-stat.i.minor-faults 948620 -3.5% 915067 perf-stat.i.page-faults 0.51 ± 7% -0.1 0.45 perf-stat.overall.branch-miss-rate% 1.59 -2.4% 1.56 perf-stat.overall.cpi 0.38 -0.0 0.35 ± 2% perf-stat.overall.dTLB-store-miss-rate% 875.11 +8.7% 950.89 perf-stat.overall.instructions-per-iTLB-miss 0.63 +2.4% 0.64 perf-stat.overall.ipc 4337585 ± 41% +42.3% 6173557 perf-stat.overall.path-length 2.855e+09 ± 41% +41.0% 4.028e+09 perf-stat.ps.branch-instructions 41833739 ± 38% +34.8% 56408902 ± 2% perf-stat.ps.cache-references 2.255e+09 ± 41% +41.2% 3.186e+09 perf-stat.ps.dTLB-stores 57677 ± 26% +66.7% 96124 ± 6% perf-stat.ps.iTLB-loads 1.362e+10 ± 41% +37.8% 1.877e+10 perf-stat.ps.instructions 946368 -3.6% 912714 perf-stat.ps.minor-faults 946368 -3.6% 912714 perf-stat.ps.page-faults 4.155e+12 ± 41% +36.5% 5.672e+12 perf-stat.total.instructions 20.10 -0.7 19.42 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault 17.83 -0.7 17.17 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 5.47 ± 2% -0.5 4.92 ± 4% perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap 5.75 ± 2% -0.5 5.20 ± 4% perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 5.69 ± 2% -0.5 5.17 ± 4% perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap.__vm_munmap 2.61 -0.5 2.16 ± 12% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain.unmap_region 2.09 ± 2% -0.4 1.67 ± 15% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_add_drain_cpu.lru_add_drain 2.81 ± 2% -0.2 2.56 ± 2% perf-profile.calltrace.cycles-pp.__anon_vma_prepare.do_anonymous_page.__handle_mm_fault.handle_mm_fault.__do_page_fault 2.62 ± 2% -0.2 2.45 ± 2% perf-profile.calltrace.cycles-pp.flush_tlb_func_common.flush_tlb_mm_range.tlb_flush_mmu.tlb_finish_mmu.unmap_region 1.89 ± 2% -0.2 1.73 perf-profile.calltrace.cycles-pp.unlink_anon_vmas.free_pgtables.unmap_region.__do_munmap.__vm_munmap 3.05 ± 2% -0.1 2.91 perf-profile.calltrace.cycles-pp.flush_tlb_mm_range.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap 1.07 ± 3% -0.1 0.95 ± 2% perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 0.91 ± 3% -0.1 0.84 ± 4% perf-profile.calltrace.cycles-pp.native_flush_tlb.flush_tlb_func_common.flush_tlb_mm_range.tlb_flush_mmu.tlb_finish_mmu 1.94 ± 3% +0.1 2.06 perf-profile.calltrace.cycles-pp.get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64 1.31 ± 8% +0.1 1.45 perf-profile.calltrace.cycles-pp.arch_get_unmapped_area_topdown.get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff 0.31 ± 81% +0.2 0.54 ± 3% perf-profile.calltrace.cycles-pp.mem_cgroup_commit_charge.do_anonymous_page.__handle_mm_fault.handle_mm_fault.__do_page_fault 2.27 ± 50% +0.7 2.97 ± 3% perf-profile.calltrace.cycles-pp.syscall_return_via_sysret 43.67 +2.4 46.10 perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 39.41 ± 2% +2.7 42.07 perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 18.28 ± 2% +3.7 21.95 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 17.43 ± 2% +3.7 21.12 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap 35.89 ± 50% +11.0 46.92 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 36.13 ± 50% +11.1 47.22 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 51.68 ± 50% +14.5 66.17 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 51.90 ± 50% +14.5 66.42 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 17.89 -0.7 17.20 perf-profile.children.cycles-pp.handle_mm_fault 20.13 -0.7 19.45 perf-profile.children.cycles-pp.__do_page_fault 5.25 ± 2% -0.6 4.62 ± 8% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 5.50 ± 2% -0.6 4.95 ± 4% perf-profile.children.cycles-pp.pagevec_lru_move_fn 5.93 ± 2% -0.5 5.39 ± 4% perf-profile.children.cycles-pp.lru_add_drain 5.86 ± 2% -0.5 5.33 ± 4% perf-profile.children.cycles-pp.lru_add_drain_cpu 2.80 ± 2% -0.3 2.55 ± 3% perf-profile.children.cycles-pp.entry_SYSCALL_64 2.86 ± 2% -0.3 2.60 ± 2% perf-profile.children.cycles-pp.__anon_vma_prepare 1.92 ± 3% -0.2 1.75 perf-profile.children.cycles-pp.unlink_anon_vmas 1.88 ± 4% -0.2 1.72 ± 2% perf-profile.children.cycles-pp.percpu_counter_add_batch 2.03 ± 3% -0.2 1.88 perf-profile.children.cycles-pp.free_pgtables 3.06 ± 2% -0.1 2.92 perf-profile.children.cycles-pp.flush_tlb_mm_range 0.89 ± 5% -0.1 0.76 ± 6% perf-profile.children.cycles-pp.__might_sleep 1.58 ± 2% -0.1 1.45 perf-profile.children.cycles-pp.native_flush_tlb 1.97 -0.1 1.85 ± 2% perf-profile.children.cycles-pp.flush_tlb_func_common 0.41 ± 8% -0.1 0.32 ± 8% perf-profile.children.cycles-pp.___pte_free_tlb 0.10 ± 14% -0.1 0.03 ±100% perf-profile.children.cycles-pp.should_fail_alloc_page 0.55 ± 3% -0.1 0.49 ± 4% perf-profile.children.cycles-pp.down_write 0.10 ± 19% -0.1 0.05 ± 58% perf-profile.children.cycles-pp.should_failslab 0.28 ± 10% -0.0 0.23 perf-profile.children.cycles-pp.anon_vma_interval_tree_remove 0.11 ± 19% -0.0 0.07 ± 7% perf-profile.children.cycles-pp.policy_nodemask 0.10 ± 11% -0.0 0.06 ± 14% perf-profile.children.cycles-pp.__vma_link_file 0.11 ± 9% -0.0 0.08 ± 6% perf-profile.children.cycles-pp.anon_vma_chain_link 0.13 ± 8% -0.0 0.11 ± 4% perf-profile.children.cycles-pp.try_charge 0.18 ± 6% -0.0 0.16 ± 5% perf-profile.children.cycles-pp.inc_zone_page_state 0.14 ± 2% -0.0 0.12 ± 3% perf-profile.children.cycles-pp.anon_vma_interval_tree_insert 0.10 ± 17% +0.0 0.14 ± 7% perf-profile.children.cycles-pp.strlen 0.52 ± 2% +0.0 0.56 ± 3% perf-profile.children.cycles-pp.mem_cgroup_commit_charge 0.17 ± 16% +0.0 0.21 ± 6% perf-profile.children.cycles-pp.uncharge_page 0.08 ± 16% +0.0 0.13 ± 7% perf-profile.children.cycles-pp.__vma_link_list 0.26 ± 6% +0.1 0.31 ± 6% perf-profile.children.cycles-pp.mem_cgroup_charge_statistics 0.00 +0.1 0.06 ± 22% perf-profile.children.cycles-pp.__get_vma_policy 0.13 ± 9% +0.1 0.19 ± 9% perf-profile.children.cycles-pp.vma_merge 0.02 ±122% +0.1 0.09 ± 11% perf-profile.children.cycles-pp.kthread_blkcg 0.25 ± 11% +0.1 0.33 ± 6% perf-profile.children.cycles-pp.get_task_policy 0.00 +0.1 0.08 ± 5% perf-profile.children.cycles-pp.memcpy 0.25 ± 9% +0.1 0.35 ± 2% perf-profile.children.cycles-pp.memcpy_erms 1.97 ± 2% +0.1 2.09 perf-profile.children.cycles-pp.get_unmapped_area 1.34 ± 7% +0.1 1.47 ± 2% perf-profile.children.cycles-pp.arch_get_unmapped_area_topdown 0.38 ± 5% +0.1 0.52 ± 5% perf-profile.children.cycles-pp.alloc_pages_current 3.08 ± 2% +0.2 3.24 ± 2% perf-profile.children.cycles-pp.syscall_return_via_sysret 64.46 +2.0 66.45 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 64.19 +2.0 66.19 perf-profile.children.cycles-pp.do_syscall_64 43.77 +2.4 46.18 perf-profile.children.cycles-pp.__do_munmap 44.49 +2.5 46.95 perf-profile.children.cycles-pp.__vm_munmap 44.77 +2.5 47.24 perf-profile.children.cycles-pp.__x64_sys_munmap 39.43 ± 2% +2.7 42.10 perf-profile.children.cycles-pp.unmap_region 18.07 ± 2% +3.7 21.73 perf-profile.children.cycles-pp.unmap_page_range 18.29 ± 2% +3.7 21.97 perf-profile.children.cycles-pp.unmap_vmas 6.02 ± 3% -0.5 5.57 ± 3% perf-profile.self.cycles-pp.do_syscall_64 1.73 -0.1 1.59 perf-profile.self.cycles-pp._raw_spin_lock_irqsave 1.56 ± 2% -0.1 1.44 perf-profile.self.cycles-pp.native_flush_tlb 0.34 ± 11% -0.1 0.24 ± 7% perf-profile.self.cycles-pp.strlcpy 0.57 ± 5% -0.1 0.49 ± 6% perf-profile.self.cycles-pp.unlink_anon_vmas 0.68 ± 4% -0.1 0.60 ± 8% perf-profile.self.cycles-pp._raw_spin_lock 0.37 ± 5% -0.1 0.31 ± 6% perf-profile.self.cycles-pp.cpumask_any_but 0.42 ± 7% -0.1 0.36 ± 6% perf-profile.self.cycles-pp.handle_mm_fault 0.23 ± 7% -0.1 0.18 ± 4% perf-profile.self.cycles-pp.__perf_sw_event 0.10 ± 23% -0.0 0.06 ± 9% perf-profile.self.cycles-pp.policy_nodemask 0.09 ± 11% -0.0 0.04 ± 59% perf-profile.self.cycles-pp.__vma_link_file 0.13 ± 6% -0.0 0.10 ± 8% perf-profile.self.cycles-pp.try_charge 0.14 ± 2% -0.0 0.12 ± 3% perf-profile.self.cycles-pp.anon_vma_interval_tree_insert 0.10 ± 15% +0.0 0.11 ± 4% perf-profile.self.cycles-pp.strlen 0.09 ± 17% +0.0 0.12 ± 5% perf-profile.self.cycles-pp.memcg_check_events 0.07 ± 19% +0.0 0.10 ± 7% perf-profile.self.cycles-pp.__vma_link_list 0.16 ± 16% +0.0 0.20 ± 5% perf-profile.self.cycles-pp.uncharge_page 0.24 ± 7% +0.0 0.28 ± 2% perf-profile.self.cycles-pp.memcpy_erms 0.04 ± 53% +0.0 0.09 ± 8% perf-profile.self.cycles-pp.do_page_fault 0.42 ± 9% +0.1 0.48 ± 7% perf-profile.self.cycles-pp.find_next_bit 0.13 ± 10% +0.1 0.19 ± 8% perf-profile.self.cycles-pp.vma_merge 0.02 ±122% +0.1 0.09 ± 11% perf-profile.self.cycles-pp.kthread_blkcg 0.25 ± 10% +0.1 0.32 ± 7% perf-profile.self.cycles-pp.get_task_policy 0.00 +0.1 0.08 ± 6% perf-profile.self.cycles-pp.memcpy 0.14 ± 5% +0.1 0.25 ± 15% perf-profile.self.cycles-pp.alloc_pages_current 3.08 ± 2% +0.2 3.23 ± 2% perf-profile.self.cycles-pp.syscall_return_via_sysret 0.43 ± 10% +0.2 0.58 ± 6% perf-profile.self.cycles-pp.arch_get_unmapped_area_topdown 11.00 ± 2% +3.6 14.56 perf-profile.self.cycles-pp.unmap_page_range will-it-scale.per_process_ops 120000 +-+----------------------------------------------------------------+ | +. +. +.. .. + +. +. + | 119000 +-+ +.+..+..+ | 118000 +-+ | | | 117000 +-+ | | O | 116000 O-+O O O O | | O O O O O | 115000 +-+ O O O O O O O O 114000 +-+ | | O O O O O | 113000 +-+ O | | | 112000 +-+----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Oliver Sang