Greeting, FYI, we noticed a -2.5% regression of will-it-scale.per_process_ops due to commit: commit: 2eca680594818153ac6a1be3ad8e964184169bf2 ("[PATCH v11 2/6] mm: Use zone and order instead of free area in free_list manipulators") url: https://github.com/0day-ci/linux/commits/Alexander-Duyck/mm-virtio-Provide-support-for-unused-page-reporting/20191002-024207 in testcase: will-it-scale on test machine: 88 threads Intel(R) Xeon(R) CPU E5-2699 v4 @ 2.20GHz with 128G memory with following parameters: nr_task: 100% mode: process test: page_fault2 cpufreq_governor: performance ucode: 0xb000038 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-7/performance/x86_64-rhel-7.6/process/100%/debian-x86_64-2019-09-23.cgz/lkp-bdw-ep6/page_fault2/will-it-scale/0xb000038 commit: 2f16feee6a ("mm: Adjust shuffle code to allow for future coalescing") 2eca680594 ("mm: Use zone and order instead of free area in free_list manipulators") 2f16feee6a912d6b 2eca680594818153ac6a1be3ad8 ---------------- --------------------------- fail:runs %reproduction fail:runs | | | 3:4 -2% 2:4 perf-profile.self.cycles-pp.error_entry %stddev %change %stddev \ | \ 84981 -2.5% 82888 will-it-scale.per_process_ops 7478397 -2.5% 7294217 will-it-scale.workload 614224 ± 3% -8.7% 560976 ± 4% meminfo.DirectMap4k 0.00 ± 86% +0.0 0.00 ± 27% mpstat.cpu.all.soft% 8560 ± 99% -99.4% 51.25 ± 13% numa-numastat.node1.other_node 1331 ± 45% +791.3% 11867 ± 87% turbostat.C1 43.50 +3.1% 44.85 boot-time.boot 3387 +4.0% 3523 boot-time.idle 109720 ± 11% +237.3% 370072 ± 95% cpuidle.C1.time 5131 ± 10% +178.3% 14281 ± 76% cpuidle.C1.usage 13736 ± 3% -7.7% 12672 numa-vmstat.node0.nr_slab_reclaimable 10240 ± 5% +10.2% 11280 numa-vmstat.node1.nr_slab_reclaimable 54947 ± 3% -7.8% 50687 ± 2% numa-meminfo.node0.KReclaimable 54947 ± 3% -7.8% 50687 ± 2% numa-meminfo.node0.SReclaimable 40956 ± 5% +10.2% 45122 numa-meminfo.node1.KReclaimable 40956 ± 5% +10.2% 45122 numa-meminfo.node1.SReclaimable 2.256e+09 -2.4% 2.202e+09 proc-vmstat.numa_hit 2.256e+09 -2.4% 2.202e+09 proc-vmstat.numa_local 2.258e+09 -2.4% 2.203e+09 proc-vmstat.pgalloc_normal 2.249e+09 -2.4% 2.195e+09 proc-vmstat.pgfault 2.255e+09 -2.4% 2.202e+09 proc-vmstat.pgfree 148.70 ± 8% -23.7% 113.47 ± 12% sched_debug.cfs_rq:/.nr_spread_over.stddev -62259 -195.9% 59734 ± 38% sched_debug.cfs_rq:/.spread0.avg 68724 ± 31% +174.2% 188414 ± 12% sched_debug.cfs_rq:/.spread0.max 650.62 ± 13% +21.0% 787.54 ± 6% sched_debug.cfs_rq:/.util_avg.min 77.78 ± 21% -31.8% 53.07 ± 9% sched_debug.cfs_rq:/.util_avg.stddev 40.08 ± 36% -66.7% 13.33 ±107% sched_debug.cfs_rq:/.util_est_enqueued.min 266102 ± 49% -60.6% 104930 ± 2% sched_debug.cpu.avg_idle.stddev 22597 ± 8% -27.9% 16297 ± 10% sched_debug.cpu.nr_switches.max 3715 ± 2% -19.5% 2992 ± 7% sched_debug.cpu.nr_switches.stddev 19360 ± 10% -31.3% 13306 ± 9% sched_debug.cpu.sched_count.max 3208 ± 4% -24.6% 2420 ± 9% sched_debug.cpu.sched_count.stddev 2.21 ± 84% +117.0% 4.79 ± 14% sched_debug.cpu.sched_goidle.min 9763 ± 13% -37.4% 6112 ± 13% sched_debug.cpu.ttwu_count.max 1549 ± 5% -27.3% 1126 ± 10% sched_debug.cpu.ttwu_count.stddev 9112 ± 10% -37.9% 5657 ± 14% sched_debug.cpu.ttwu_local.max 1443 ± 3% -29.6% 1015 ± 12% sched_debug.cpu.ttwu_local.stddev 199.25 ± 22% +34.6% 268.25 ± 36% interrupts.36:IR-PCI-MSI.1572867-edge.eth0-TxRx-2 199.25 ± 22% +34.6% 268.25 ± 36% interrupts.CPU15.36:IR-PCI-MSI.1572867-edge.eth0-TxRx-2 47.25 ± 59% +475.1% 271.75 ± 48% interrupts.CPU17.RES:Rescheduling_interrupts 59.75 ± 61% +148.1% 148.25 ± 31% interrupts.CPU18.RES:Rescheduling_interrupts 35.00 ± 93% +406.4% 177.25 ± 47% interrupts.CPU19.RES:Rescheduling_interrupts 2910 ± 3% +9.7% 3192 ± 8% interrupts.CPU2.CAL:Function_call_interrupts 33.50 ±115% +410.4% 171.00 ± 58% interrupts.CPU21.RES:Rescheduling_interrupts 3033 ± 4% +17.3% 3557 ± 8% interrupts.CPU22.CAL:Function_call_interrupts 2965 ± 6% +13.9% 3379 ± 5% interrupts.CPU27.CAL:Function_call_interrupts 202.75 ± 34% -50.8% 99.75 ± 49% interrupts.CPU28.RES:Rescheduling_interrupts 134.00 ± 32% +243.8% 460.75 ± 92% interrupts.CPU31.RES:Rescheduling_interrupts 90.25 ±108% +467.6% 512.25 ± 91% interrupts.CPU44.RES:Rescheduling_interrupts 454.75 ± 74% -78.4% 98.25 ± 82% interrupts.CPU49.RES:Rescheduling_interrupts 4916 ± 34% +60.4% 7885 interrupts.CPU55.NMI:Non-maskable_interrupts 4916 ± 34% +60.4% 7885 interrupts.CPU55.PMI:Performance_monitoring_interrupts 33.25 ±110% +273.7% 124.25 ± 27% interrupts.CPU61.RES:Rescheduling_interrupts 8.00 ± 81% +2500.0% 208.00 ± 97% interrupts.CPU65.RES:Rescheduling_interrupts 105.25 ±114% +368.2% 492.75 ± 64% interrupts.CPU69.RES:Rescheduling_interrupts 224.00 ± 50% -76.2% 53.25 ±121% interrupts.CPU70.RES:Rescheduling_interrupts 41976580 -4.3% 40191219 perf-stat.i.branch-misses 4.657e+08 -3.1% 4.511e+08 ± 2% perf-stat.i.cache-misses 1.446e+09 -3.3% 1.398e+09 perf-stat.i.cache-references 540.00 +203.9% 1641 ±114% perf-stat.i.cycles-between-cache-misses 72449681 -3.7% 69791647 ± 2% perf-stat.i.dTLB-store-misses 6.748e+09 -3.7% 6.499e+09 perf-stat.i.dTLB-stores 15000441 -3.3% 14499685 perf-stat.i.iTLB-load-misses 48416 ± 8% -30.9% 33446 ± 36% perf-stat.i.iTLB-loads 7390548 -3.4% 7136366 ± 2% perf-stat.i.minor-faults 1.31e+08 -4.1% 1.256e+08 perf-stat.i.node-loads 866429 -5.4% 819527 perf-stat.i.node-store-misses 32410281 -5.1% 30770659 perf-stat.i.node-stores 7390641 -3.4% 7136212 ± 2% perf-stat.i.page-faults 21.34 -2.1% 20.90 perf-stat.overall.MPKI 0.28 -0.0 0.27 perf-stat.overall.branch-miss-rate% 521.95 +2.1% 532.97 perf-stat.overall.cycles-between-cache-misses 4515 +2.1% 4612 perf-stat.overall.instructions-per-iTLB-miss 2752428 +2.3% 2816574 perf-stat.overall.path-length 41829479 -4.2% 40062021 perf-stat.ps.branch-misses 4.641e+08 -3.1% 4.498e+08 perf-stat.ps.cache-misses 1.441e+09 -3.3% 1.394e+09 perf-stat.ps.cache-references 72196133 -3.6% 69580505 ± 2% perf-stat.ps.dTLB-store-misses 6.724e+09 -3.6% 6.479e+09 perf-stat.ps.dTLB-stores 14947871 -3.3% 14455433 perf-stat.ps.iTLB-load-misses 48333 ± 7% -31.0% 33349 ± 36% perf-stat.ps.iTLB-loads 7365449 -3.4% 7115485 perf-stat.ps.minor-faults 1.305e+08 -4.0% 1.253e+08 perf-stat.ps.node-loads 863374 -5.4% 817033 perf-stat.ps.node-store-misses 32297524 -5.0% 30677371 perf-stat.ps.node-stores 7365171 -3.4% 7115053 perf-stat.ps.page-faults 8.09 -1.3 6.82 perf-profile.calltrace.cycles-pp.finish_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault 7.99 -1.3 6.73 perf-profile.calltrace.cycles-pp.alloc_set_pte.finish_fault.__handle_mm_fault.handle_mm_fault.__do_page_fault 5.97 -1.2 4.74 perf-profile.calltrace.cycles-pp.__lru_cache_add.alloc_set_pte.finish_fault.__handle_mm_fault.handle_mm_fault 5.87 -1.2 4.64 perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.__lru_cache_add.alloc_set_pte.finish_fault.__handle_mm_fault 4.59 -1.2 3.40 ± 2% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.alloc_set_pte 4.62 -1.2 3.44 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.__lru_cache_add.alloc_set_pte.finish_fault 56.80 -0.7 56.09 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault 55.72 -0.7 55.02 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 57.16 -0.7 56.45 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault 57.26 -0.7 56.56 perf-profile.calltrace.cycles-pp.page_fault 55.27 -0.7 54.57 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 2.15 ± 2% -0.5 1.65 ± 3% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.unmap_page_range 2.16 ± 2% -0.5 1.67 ± 3% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.unmap_page_range.unmap_vmas 8.78 -0.2 8.59 perf-profile.calltrace.cycles-pp.copy_user_highpage.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault 8.66 -0.2 8.48 perf-profile.calltrace.cycles-pp.copy_page.copy_user_highpage.__handle_mm_fault.handle_mm_fault.__do_page_fault 0.93 -0.0 0.90 perf-profile.calltrace.cycles-pp.__pagevec_lru_add_fn.pagevec_lru_move_fn.__lru_cache_add.alloc_set_pte.finish_fault 4.11 +0.1 4.17 perf-profile.calltrace.cycles-pp.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap 4.15 +0.1 4.22 perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 4.13 +0.1 4.20 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap 3.72 +0.1 3.85 perf-profile.calltrace.cycles-pp.free_unref_page_list.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region 3.66 +0.1 3.79 perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_flush_mmu.tlb_finish_mmu 33.34 +0.7 34.03 perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 33.34 +0.7 34.02 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap 32.04 +0.7 32.76 perf-profile.calltrace.cycles-pp.tlb_flush_mmu.unmap_page_range.unmap_vmas.unmap_region.__do_munmap 31.80 +0.7 32.52 perf-profile.calltrace.cycles-pp.release_pages.tlb_flush_mmu.unmap_page_range.unmap_vmas.unmap_region 37.53 +0.8 38.28 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 37.53 +0.8 38.28 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 37.50 +0.8 38.26 perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 37.50 +0.8 38.26 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 37.50 +0.8 38.26 perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 37.50 +0.8 38.26 perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 35.45 +0.8 36.22 perf-profile.calltrace.cycles-pp.alloc_pages_vma.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault 34.73 +0.8 35.50 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.__handle_mm_fault.handle_mm_fault 35.13 +0.8 35.91 perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.alloc_pages_vma.__handle_mm_fault.handle_mm_fault.__do_page_fault 32.89 +0.8 33.70 perf-profile.calltrace.cycles-pp._raw_spin_lock.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.__handle_mm_fault 32.80 +0.8 33.61 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma 28.87 +1.2 30.10 perf-profile.calltrace.cycles-pp.free_unref_page_list.release_pages.tlb_flush_mmu.unmap_page_range.unmap_vmas 28.36 +1.2 29.61 perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_flush_mmu.unmap_page_range 30.85 +1.4 32.26 perf-profile.calltrace.cycles-pp._raw_spin_lock.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_flush_mmu 30.77 +1.4 32.18 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.free_unref_page_list.release_pages 7.09 -1.8 5.33 ± 2% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 8.09 -1.3 6.83 perf-profile.children.cycles-pp.finish_fault 8.03 -1.3 6.77 perf-profile.children.cycles-pp.alloc_set_pte 5.89 -1.2 4.65 perf-profile.children.cycles-pp.pagevec_lru_move_fn 5.98 -1.2 4.75 perf-profile.children.cycles-pp.__lru_cache_add 56.83 -0.7 56.11 perf-profile.children.cycles-pp.__do_page_fault 55.76 -0.7 55.05 perf-profile.children.cycles-pp.handle_mm_fault 57.16 -0.7 56.46 perf-profile.children.cycles-pp.do_page_fault 57.30 -0.7 56.60 perf-profile.children.cycles-pp.page_fault 55.30 -0.7 54.60 perf-profile.children.cycles-pp.__handle_mm_fault 8.78 -0.2 8.59 perf-profile.children.cycles-pp.copy_user_highpage 8.69 -0.2 8.51 perf-profile.children.cycles-pp.copy_page 0.41 -0.0 0.39 perf-profile.children.cycles-pp.__mod_lruvec_state 4.16 +0.1 4.22 perf-profile.children.cycles-pp.tlb_finish_mmu 70.71 +0.5 71.19 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 33.34 +0.7 34.03 perf-profile.children.cycles-pp.unmap_vmas 33.34 +0.7 34.03 perf-profile.children.cycles-pp.unmap_page_range 37.61 +0.7 38.36 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 37.61 +0.7 38.36 perf-profile.children.cycles-pp.do_syscall_64 37.50 +0.8 38.26 perf-profile.children.cycles-pp.__do_munmap 37.50 +0.8 38.26 perf-profile.children.cycles-pp.__vm_munmap 37.50 +0.8 38.26 perf-profile.children.cycles-pp.unmap_region 37.50 +0.8 38.26 perf-profile.children.cycles-pp.__x64_sys_munmap 35.47 +0.8 36.25 perf-profile.children.cycles-pp.alloc_pages_vma 34.87 +0.8 35.65 perf-profile.children.cycles-pp.get_page_from_freelist 36.18 +0.8 36.96 perf-profile.children.cycles-pp.tlb_flush_mmu 36.03 +0.8 36.82 perf-profile.children.cycles-pp.release_pages 35.25 +0.8 36.04 perf-profile.children.cycles-pp.__alloc_pages_nodemask 32.62 +1.4 34.00 perf-profile.children.cycles-pp.free_unref_page_list 32.04 +1.4 33.44 perf-profile.children.cycles-pp.free_pcppages_bulk 64.81 +2.2 67.01 perf-profile.children.cycles-pp._raw_spin_lock 8.65 -0.2 8.46 perf-profile.self.cycles-pp.copy_page 0.94 -0.0 0.89 perf-profile.self.cycles-pp.get_page_from_freelist 1.13 -0.0 1.10 perf-profile.self.cycles-pp._raw_spin_lock 70.71 +0.5 71.19 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath will-it-scale.per_process_ops 90000 +-+-----------------------------------------------------------------+ O..O.O..O.O..O..O.O..O..O.O..O O O..O O O.O O O.O..O..O.O..O.+..| 80000 +-+ : : : : : : | 70000 +-+ : : : : : : | | : : : : : : | 60000 +-+ : : : : : : | 50000 +-+ : : : : : : | | : : : : : : | 40000 +-+ : : : : : : | 30000 +-+ : : : : : : | | : : : : : : | 20000 +-+ :: :: : : | 10000 +-+ : : : | | : : : | 0 +-+-----------------------------------------------------------------+ will-it-scale.workload 8e+06 +-+-----------------------------------------------------------------+ O..O.O..O.O..O..O.O..O..O.O..O O O..O O O.O O O.O..O..O.O..O.+..| 7e+06 +-+ : : : : : : | 6e+06 +-+ : : : : : : | | : : : : : : | 5e+06 +-+ : : : : : : | | : : : : : : | 4e+06 +-+ : : : : : : | | : : : : : : | 3e+06 +-+ : : : : : : | 2e+06 +-+ : : : : : : | | :: :: : : | 1e+06 +-+ : : : | | : : : | 0 +-+-----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen