Greeting, FYI, we noticed a -22.7% regression of will-it-scale.per_process_ops due to commit: commit: bd0b230fe14554bfffbae54e19038716f96f5a41 ("mm/memcg: unify swap and memsw page counters") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: will-it-scale on test machine: 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory with following parameters: nr_task: 50% mode: process test: page_fault2 cpufreq_governor: performance ucode: 0x16 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/process/50%/debian-10.4-x86_64-20200603.cgz/lkp-hsw-4ex1/page_fault2/will-it-scale/0x16 commit: 8d387a5f17 ("mm/memcg: simplify mem_cgroup_get_max()") bd0b230fe1 ("mm/memcg: unify swap and memsw page counters") 8d387a5f172f26ff bd0b230fe14554bfffbae54e190 ---------------- --------------------------- fail:runs %reproduction fail:runs | | | 5:5 -23% 4:5 perf-profile.calltrace.cycles-pp.error_entry.testcase 5:5 -23% 4:5 perf-profile.children.cycles-pp.error_entry 5:5 -19% 4:5 perf-profile.self.cycles-pp.error_entry %stddev %change %stddev \ | \ 187666 -22.7% 145157 will-it-scale.per_process_ops 13511995 -22.7% 10451324 will-it-scale.workload 1895023 ±196% -98.7% 24188 ± 4% cpuidle.POLL.time 280275 ±192% -97.3% 7540 ± 3% cpuidle.POLL.usage 34107 ± 26% -27.5% 24740 ± 10% numa-meminfo.node1.KReclaimable 4234 ± 11% -15.6% 3573 ± 2% numa-meminfo.node1.PageTables 34107 ± 26% -27.5% 24740 ± 10% numa-meminfo.node1.SReclaimable 1195173 +1.2% 1209077 proc-vmstat.nr_anon_pages 4325 -2.3% 4223 proc-vmstat.nr_page_table_pages 4.07e+09 -22.6% 3.149e+09 proc-vmstat.numa_hit 4.069e+09 -22.6% 3.149e+09 proc-vmstat.numa_local 4.072e+09 -22.6% 3.151e+09 proc-vmstat.pgalloc_normal 4.059e+09 -22.6% 3.141e+09 proc-vmstat.pgfault 4.069e+09 -22.6% 3.147e+09 proc-vmstat.pgfree 1.395e+09 -9.9e+07 1.296e+09 ± 5% syscalls.sys_mmap.noise.75% 2.512e+09 ± 17% +1.2e+09 3.667e+09 ± 13% syscalls.sys_write.noise.100% 2.526e+09 ± 17% +1.2e+09 3.684e+09 ± 13% syscalls.sys_write.noise.2% 2.523e+09 ± 17% +1.2e+09 3.681e+09 ± 13% syscalls.sys_write.noise.25% 2.525e+09 ± 17% +1.2e+09 3.684e+09 ± 13% syscalls.sys_write.noise.5% 2.52e+09 ± 17% +1.2e+09 3.677e+09 ± 13% syscalls.sys_write.noise.50% 2.516e+09 ± 17% +1.2e+09 3.672e+09 ± 13% syscalls.sys_write.noise.75% 1.029e+09 -21.7% 8.052e+08 ± 2% numa-numastat.node0.local_node 1.029e+09 -21.7% 8.053e+08 ± 2% numa-numastat.node0.numa_hit 1.02e+09 -23.0% 7.853e+08 numa-numastat.node1.local_node 1.02e+09 -23.0% 7.853e+08 numa-numastat.node1.numa_hit 1.013e+09 -22.8% 7.817e+08 numa-numastat.node2.local_node 1.013e+09 -22.8% 7.818e+08 numa-numastat.node2.numa_hit 1.011e+09 -23.1% 7.771e+08 ± 2% numa-numastat.node3.local_node 1.011e+09 -23.1% 7.772e+08 ± 2% numa-numastat.node3.numa_hit 9210 ± 8% +12.5% 10362 ± 8% softirqs.CPU13.RCU 20302 ± 8% +31.3% 26656 ± 10% softirqs.CPU142.SCHED 16688 ± 16% +52.8% 25498 ± 29% softirqs.CPU15.SCHED 17137 ± 23% +62.3% 27811 ± 12% softirqs.CPU20.SCHED 23421 ± 21% -40.4% 13969 ± 33% softirqs.CPU36.SCHED 23782 ± 7% -29.0% 16876 ± 17% softirqs.CPU70.SCHED 27401 ± 9% -34.4% 17978 ± 42% softirqs.CPU87.SCHED 25692 ± 13% -44.6% 14223 ± 20% softirqs.CPU92.SCHED 5.114e+08 -21.6% 4.012e+08 ± 2% numa-vmstat.node0.numa_hit 5.114e+08 -21.6% 4.012e+08 ± 2% numa-vmstat.node0.numa_local 1058 ± 11% -15.6% 893.00 ± 2% numa-vmstat.node1.nr_page_table_pages 8526 ± 26% -27.5% 6184 ± 10% numa-vmstat.node1.nr_slab_reclaimable 5.074e+08 -22.9% 3.91e+08 numa-vmstat.node1.numa_hit 5.073e+08 -22.9% 3.909e+08 numa-vmstat.node1.numa_local 5.04e+08 -22.7% 3.895e+08 numa-vmstat.node2.numa_hit 5.039e+08 -22.7% 3.894e+08 numa-vmstat.node2.numa_local 5.029e+08 -23.0% 3.87e+08 ± 2% numa-vmstat.node3.numa_hit 5.028e+08 -23.1% 3.869e+08 ± 2% numa-vmstat.node3.numa_local 6778 ± 54% -98.0% 135.96 ± 62% sched_debug.cfs_rq:/.exec_clock.min 29780 ± 6% +52.2% 45320 ± 6% sched_debug.cfs_rq:/.exec_clock.stddev 528699 ± 51% -94.3% 30214 ± 19% sched_debug.cfs_rq:/.min_vruntime.min 2205697 ± 6% +52.3% 3359526 ± 6% sched_debug.cfs_rq:/.min_vruntime.stddev 17.95 ± 5% -37.5% 11.22 ± 2% sched_debug.cfs_rq:/.nr_spread_over.avg 59.80 ± 28% -40.5% 35.60 ± 24% sched_debug.cfs_rq:/.nr_spread_over.max 1.80 ± 38% -100.0% 0.00 sched_debug.cfs_rq:/.nr_spread_over.min 8.78 ± 5% -16.2% 7.36 ± 7% sched_debug.cfs_rq:/.nr_spread_over.stddev 2205731 ± 6% +52.3% 3359553 ± 6% sched_debug.cfs_rq:/.spread0.stddev 23138 ± 2% +39.2% 32199 ± 18% sched_debug.cpu.nr_switches.max 3068 ± 3% +19.9% 3680 ± 10% sched_debug.cpu.nr_switches.stddev 19891 ± 4% +47.3% 29292 ± 20% sched_debug.cpu.sched_count.max 694.57 ± 4% -12.4% 608.67 ± 3% sched_debug.cpu.sched_count.min 2602 ± 4% +28.2% 3335 ± 10% sched_debug.cpu.sched_count.stddev 9769 ± 4% +48.7% 14531 ± 21% sched_debug.cpu.sched_goidle.max 25.50 ± 22% -51.2% 12.43 ± 10% sched_debug.cpu.sched_goidle.min 1315 ± 4% +29.2% 1699 ± 9% sched_debug.cpu.sched_goidle.stddev 259.50 ± 2% -16.7% 216.20 ± 5% sched_debug.cpu.ttwu_count.min 233.47 -12.9% 203.27 ± 3% sched_debug.cpu.ttwu_local.min 136.40 ± 22% -41.3% 80.00 ± 35% interrupts.CPU1.RES:Rescheduling_interrupts 445.80 ± 49% +54.3% 688.00 interrupts.CPU116.CAL:Function_call_interrupts 2384 ± 39% +77.2% 4225 ± 26% interrupts.CPU116.NMI:Non-maskable_interrupts 2384 ± 39% +77.2% 4225 ± 26% interrupts.CPU116.PMI:Performance_monitoring_interrupts 3140 ± 25% +82.5% 5732 ± 36% interrupts.CPU12.NMI:Non-maskable_interrupts 3140 ± 25% +82.5% 5732 ± 36% interrupts.CPU12.PMI:Performance_monitoring_interrupts 6641 ± 17% -48.0% 3452 ± 40% interrupts.CPU128.NMI:Non-maskable_interrupts 6641 ± 17% -48.0% 3452 ± 40% interrupts.CPU128.PMI:Performance_monitoring_interrupts 6211 ± 25% -41.3% 3643 ± 40% interrupts.CPU14.NMI:Non-maskable_interrupts 6211 ± 25% -41.3% 3643 ± 40% interrupts.CPU14.PMI:Performance_monitoring_interrupts 156.20 ± 19% -60.4% 61.80 ± 45% interrupts.CPU15.RES:Rescheduling_interrupts 450.60 ± 61% +314.9% 1869 ±110% interrupts.CPU17.CAL:Function_call_interrupts 401.20 ± 62% +105.1% 823.00 ± 28% interrupts.CPU2.CAL:Function_call_interrupts 3781 ± 29% +91.4% 7239 ± 22% interrupts.CPU23.NMI:Non-maskable_interrupts 3781 ± 29% +91.4% 7239 ± 22% interrupts.CPU23.PMI:Performance_monitoring_interrupts 131.20 ± 16% -38.3% 81.00 ± 37% interrupts.CPU23.RES:Rescheduling_interrupts 6565 ± 27% -47.8% 3430 ± 57% interrupts.CPU30.NMI:Non-maskable_interrupts 6565 ± 27% -47.8% 3430 ± 57% interrupts.CPU30.PMI:Performance_monitoring_interrupts 2524 ± 33% +53.1% 3866 ± 18% interrupts.CPU39.NMI:Non-maskable_interrupts 2524 ± 33% +53.1% 3866 ± 18% interrupts.CPU39.PMI:Performance_monitoring_interrupts 5453 ± 26% -34.5% 3569 ± 27% interrupts.CPU43.NMI:Non-maskable_interrupts 5453 ± 26% -34.5% 3569 ± 27% interrupts.CPU43.PMI:Performance_monitoring_interrupts 2524 ± 36% +65.3% 4172 ± 28% interrupts.CPU48.NMI:Non-maskable_interrupts 2524 ± 36% +65.3% 4172 ± 28% interrupts.CPU48.PMI:Performance_monitoring_interrupts 487.40 ± 58% +124.0% 1092 ± 55% interrupts.CPU79.CAL:Function_call_interrupts 585.20 ± 40% +61.9% 947.20 ± 23% interrupts.CPU80.CAL:Function_call_interrupts 487.60 ± 54% +67.9% 818.80 ± 16% interrupts.CPU81.CAL:Function_call_interrupts 15.97 +11.6% 17.83 perf-stat.i.MPKI 1.333e+10 -27.3% 9.69e+09 perf-stat.i.branch-instructions 0.48 +0.1 0.53 perf-stat.i.branch-miss-rate% 62963150 -20.1% 50310834 perf-stat.i.branch-misses 37.28 -1.9 35.40 perf-stat.i.cache-miss-rate% 4.08e+08 -21.8% 3.189e+08 perf-stat.i.cache-misses 1.092e+09 -17.7% 8.983e+08 perf-stat.i.cache-references 3.06 +35.8% 4.16 perf-stat.i.cpi 143.56 -1.4% 141.50 perf-stat.i.cpu-migrations 523.77 +28.4% 672.47 perf-stat.i.cycles-between-cache-misses 2.068e+10 -25.8% 1.534e+10 perf-stat.i.dTLB-loads 95643337 -22.7% 73905709 perf-stat.i.dTLB-store-misses 1.225e+10 -22.1% 9.544e+09 perf-stat.i.dTLB-stores 94.47 -2.1 92.36 perf-stat.i.iTLB-load-miss-rate% 40594103 -23.3% 31146350 perf-stat.i.iTLB-load-misses 2304247 ± 11% +9.1% 2513651 perf-stat.i.iTLB-loads 6.823e+10 -26.2% 5.033e+10 perf-stat.i.instructions 1692 -3.6% 1632 perf-stat.i.instructions-per-iTLB-miss 0.33 -26.3% 0.24 perf-stat.i.ipc 332.55 -25.1% 249.20 perf-stat.i.metric.M/sec 13449087 -22.7% 10398888 perf-stat.i.minor-faults 4.30 +1.7 6.01 perf-stat.i.node-load-miss-rate% 13864893 +11.3% 15438022 perf-stat.i.node-load-misses 3.294e+08 -22.9% 2.539e+08 perf-stat.i.node-loads 18.37 +0.4 18.79 perf-stat.i.node-store-miss-rate% 11546036 -22.7% 8923297 perf-stat.i.node-store-misses 51372725 -24.8% 38626751 perf-stat.i.node-stores 13449089 -22.7% 10398890 perf-stat.i.page-faults 16.00 +11.5% 17.85 perf-stat.overall.MPKI 0.47 +0.0 0.52 perf-stat.overall.branch-miss-rate% 37.36 -1.9 35.50 perf-stat.overall.cache-miss-rate% 3.06 +35.9% 4.16 perf-stat.overall.cpi 512.20 +28.3% 656.96 perf-stat.overall.cycles-between-cache-misses 94.64 -2.1 92.54 perf-stat.overall.iTLB-load-miss-rate% 1680 -3.9% 1616 perf-stat.overall.instructions-per-iTLB-miss 0.33 -26.4% 0.24 perf-stat.overall.ipc 4.04 +1.7 5.73 perf-stat.overall.node-load-miss-rate% 18.35 +0.4 18.77 perf-stat.overall.node-store-miss-rate% 1519587 -4.6% 1449452 perf-stat.overall.path-length 1.329e+10 -27.3% 9.658e+09 perf-stat.ps.branch-instructions 62761890 -20.1% 50132370 perf-stat.ps.branch-misses 4.065e+08 -21.8% 3.178e+08 perf-stat.ps.cache-misses 1.088e+09 -17.7% 8.953e+08 perf-stat.ps.cache-references 142.33 -1.6% 140.11 perf-stat.ps.cpu-migrations 2.061e+10 -25.8% 1.529e+10 perf-stat.ps.dTLB-loads 95301115 -22.7% 73646080 perf-stat.ps.dTLB-store-misses 1.221e+10 -22.1% 9.511e+09 perf-stat.ps.dTLB-stores 40451925 -23.3% 31039602 perf-stat.ps.iTLB-load-misses 2295342 ± 11% +9.1% 2503862 perf-stat.ps.iTLB-loads 6.799e+10 -26.2% 5.016e+10 perf-stat.ps.instructions 13401817 -22.7% 10363171 perf-stat.ps.minor-faults 13816941 +11.3% 15384985 perf-stat.ps.node-load-misses 3.282e+08 -22.9% 2.53e+08 perf-stat.ps.node-loads 11506624 -22.7% 8893312 perf-stat.ps.node-store-misses 51195452 -24.8% 38495647 perf-stat.ps.node-stores 13401819 -22.7% 10363173 perf-stat.ps.page-faults 2.053e+13 -26.2% 1.515e+13 perf-stat.total.instructions 11.81 ± 11% -3.5 8.26 ± 10% perf-profile.calltrace.cycles-pp.__munmap 11.81 ± 11% -3.5 8.26 ± 10% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 11.80 ± 11% -3.5 8.26 ± 10% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 11.80 ± 11% -3.5 8.26 ± 10% perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 11.80 ± 11% -3.5 8.26 ± 10% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 11.80 ± 11% -3.5 8.26 ± 10% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 11.80 ± 11% -3.5 8.25 ± 10% perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 12.30 ± 10% -3.4 8.86 ± 10% perf-profile.calltrace.cycles-pp.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 12.25 ± 10% -3.4 8.83 ± 10% perf-profile.calltrace.cycles-pp.alloc_set_pte.finish_fault.do_fault.__handle_mm_fault.handle_mm_fault 8.17 ± 11% -3.3 4.90 ± 10% perf-profile.calltrace.cycles-pp.lru_cache_add.alloc_set_pte.finish_fault.do_fault.__handle_mm_fault 8.04 ± 11% -3.2 4.79 ± 10% perf-profile.calltrace.cycles-pp.pagevec_lru_move_fn.lru_cache_add.alloc_set_pte.finish_fault.do_fault 10.87 ± 11% -3.2 7.63 ± 10% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 10.87 ± 11% -3.2 7.63 ± 10% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap 10.86 ± 11% -3.2 7.62 ± 10% perf-profile.calltrace.cycles-pp.zap_pte_range.unmap_page_range.unmap_vmas.unmap_region.__do_munmap 6.14 ± 11% -3.0 3.18 ± 9% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_cache_add.alloc_set_pte.finish_fault 6.08 ± 11% -2.9 3.14 ± 9% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.pagevec_lru_move_fn.lru_cache_add.alloc_set_pte 7.57 ± 12% -2.7 4.88 ± 9% perf-profile.calltrace.cycles-pp.tlb_flush_mmu.zap_pte_range.unmap_page_range.unmap_vmas.unmap_region 7.14 ± 13% -2.6 4.56 ± 9% perf-profile.calltrace.cycles-pp.release_pages.tlb_flush_mmu.zap_pte_range.unmap_page_range.unmap_vmas 3.95 ± 16% -2.1 1.86 ± 7% perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.zap_pte_range.unmap_page_range 3.93 ± 17% -2.1 1.85 ± 7% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.release_pages.tlb_flush_mmu.zap_pte_range 3.25 ± 10% -0.8 2.44 ± 10% perf-profile.calltrace.cycles-pp.alloc_pages_vma.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 2.86 ± 10% -0.7 2.13 ± 9% perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.alloc_pages_vma.do_fault.__handle_mm_fault.handle_mm_fault 2.62 ± 10% -0.7 1.93 ± 10% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.do_fault.__handle_mm_fault 2.36 ± 11% -0.7 1.71 ± 9% perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.do_fault 1.83 ± 11% -0.6 1.23 ± 11% perf-profile.calltrace.cycles-pp.try_charge.mem_cgroup_charge.do_fault.__handle_mm_fault.handle_mm_fault 1.89 ± 11% -0.6 1.32 ± 9% perf-profile.calltrace.cycles-pp.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma 1.76 ± 9% -0.5 1.30 ± 10% perf-profile.calltrace.cycles-pp.free_unref_page_list.release_pages.tlb_flush_mmu.zap_pte_range.unmap_page_range 1.30 ± 12% -0.4 0.86 ± 11% perf-profile.calltrace.cycles-pp.page_counter_try_charge.try_charge.mem_cgroup_charge.do_fault.__handle_mm_fault 1.53 ± 9% -0.4 1.11 ± 10% perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page_list.release_pages.tlb_flush_mmu.zap_pte_range 0.92 ± 13% -0.3 0.62 ± 10% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 0.92 ± 13% -0.3 0.62 ± 10% perf-profile.calltrace.cycles-pp.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap 0.89 ± 13% -0.3 0.60 ± 10% perf-profile.calltrace.cycles-pp.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap 1.02 ± 11% -0.2 0.78 ± 9% perf-profile.calltrace.cycles-pp.__list_del_entry_valid.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages_nodemask 1.03 ± 9% -0.2 0.79 ± 10% perf-profile.calltrace.cycles-pp.__irqentry_text_end.testcase 0.98 ± 12% +0.3 1.27 ± 13% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.page_add_new_anon_rmap.alloc_set_pte.finish_fault.do_fault 1.42 ± 9% +0.3 1.75 ± 9% perf-profile.calltrace.cycles-pp.shmem_fault.__do_fault.do_fault.__handle_mm_fault.handle_mm_fault 0.70 ± 13% +0.4 1.10 ± 14% perf-profile.calltrace.cycles-pp.__mod_memcg_state.__mod_memcg_lruvec_state.page_add_new_anon_rmap.alloc_set_pte.finish_fault 1.23 ± 9% +0.4 1.64 ± 10% perf-profile.calltrace.cycles-pp.shmem_getpage_gfp.shmem_fault.__do_fault.do_fault.__handle_mm_fault 1.05 ± 9% +0.5 1.51 ± 9% perf-profile.calltrace.cycles-pp.find_lock_entry.shmem_getpage_gfp.shmem_fault.__do_fault.do_fault 0.71 ± 12% +0.5 1.18 ± 13% perf-profile.calltrace.cycles-pp.__count_memcg_events.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 0.83 ± 9% +0.5 1.34 ± 10% perf-profile.calltrace.cycles-pp.find_get_entry.find_lock_entry.shmem_getpage_gfp.shmem_fault.__do_fault 0.37 ± 81% +0.6 0.93 ± 15% perf-profile.calltrace.cycles-pp.mem_cgroup_charge_statistics.mem_cgroup_charge.do_fault.__handle_mm_fault.handle_mm_fault 0.36 ± 81% +0.6 0.91 ± 15% perf-profile.calltrace.cycles-pp.__count_memcg_events.mem_cgroup_charge_statistics.mem_cgroup_charge.do_fault.__handle_mm_fault 2.69 ± 9% +4.3 6.95 ± 14% perf-profile.calltrace.cycles-pp.get_mem_cgroup_from_mm.mem_cgroup_charge.do_fault.__handle_mm_fault.handle_mm_fault 7.04 ± 9% +10.2 17.26 ± 13% perf-profile.calltrace.cycles-pp.mem_cgroup_charge.do_fault.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 11.69 ± 13% -5.8 5.91 ± 8% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 10.58 ± 13% -5.3 5.29 ± 8% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 11.81 ± 11% -3.5 8.26 ± 10% perf-profile.children.cycles-pp.__munmap 11.80 ± 11% -3.5 8.26 ± 10% perf-profile.children.cycles-pp.__do_munmap 11.80 ± 11% -3.5 8.26 ± 10% perf-profile.children.cycles-pp.__x64_sys_munmap 11.80 ± 11% -3.5 8.26 ± 10% perf-profile.children.cycles-pp.__vm_munmap 11.80 ± 11% -3.5 8.26 ± 10% perf-profile.children.cycles-pp.unmap_region 11.88 ± 11% -3.5 8.34 ± 10% perf-profile.children.cycles-pp.do_syscall_64 11.89 ± 11% -3.5 8.35 ± 10% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 12.30 ± 10% -3.4 8.86 ± 10% perf-profile.children.cycles-pp.finish_fault 12.26 ± 10% -3.4 8.83 ± 10% perf-profile.children.cycles-pp.alloc_set_pte 8.18 ± 11% -3.3 4.90 ± 10% perf-profile.children.cycles-pp.lru_cache_add 8.05 ± 11% -3.2 4.80 ± 9% perf-profile.children.cycles-pp.pagevec_lru_move_fn 10.87 ± 11% -3.2 7.63 ± 10% perf-profile.children.cycles-pp.unmap_vmas 10.87 ± 11% -3.2 7.63 ± 10% perf-profile.children.cycles-pp.unmap_page_range 10.87 ± 11% -3.2 7.63 ± 10% perf-profile.children.cycles-pp.zap_pte_range 8.49 ± 12% -3.0 5.51 ± 9% perf-profile.children.cycles-pp.tlb_flush_mmu 8.20 ± 13% -2.9 5.29 ± 9% perf-profile.children.cycles-pp.release_pages 3.91 ± 10% -0.9 3.06 ± 9% perf-profile.children.cycles-pp._raw_spin_lock 3.27 ± 10% -0.8 2.45 ± 10% perf-profile.children.cycles-pp.alloc_pages_vma 2.95 ± 10% -0.7 2.20 ± 10% perf-profile.children.cycles-pp.__alloc_pages_nodemask 2.67 ± 10% -0.7 1.98 ± 10% perf-profile.children.cycles-pp.get_page_from_freelist 2.39 ± 11% -0.7 1.73 ± 9% perf-profile.children.cycles-pp.rmqueue 1.84 ± 11% -0.6 1.23 ± 11% perf-profile.children.cycles-pp.try_charge 1.90 ± 11% -0.6 1.33 ± 9% perf-profile.children.cycles-pp.rmqueue_bulk 2.01 ± 9% -0.5 1.49 ± 10% perf-profile.children.cycles-pp.free_unref_page_list 1.74 ± 9% -0.5 1.27 ± 10% perf-profile.children.cycles-pp.free_pcppages_bulk 1.32 ± 12% -0.4 0.87 ± 12% perf-profile.children.cycles-pp.page_counter_try_charge 1.54 ± 10% -0.3 1.22 ± 9% perf-profile.children.cycles-pp.__list_del_entry_valid 0.93 ± 13% -0.3 0.62 ± 10% perf-profile.children.cycles-pp.tlb_finish_mmu 1.03 ± 9% -0.2 0.79 ± 10% perf-profile.children.cycles-pp.__irqentry_text_end 0.43 ± 7% -0.1 0.33 ± 10% perf-profile.children.cycles-pp.free_pages_and_swap_cache 0.41 ± 7% -0.1 0.33 ± 10% perf-profile.children.cycles-pp.__perf_sw_event 0.41 ± 9% -0.1 0.33 ± 8% perf-profile.children.cycles-pp.xas_load 0.31 ± 9% -0.1 0.23 ± 6% perf-profile.children.cycles-pp.__mod_lruvec_state 0.25 ± 8% -0.1 0.19 ± 10% perf-profile.children.cycles-pp.___perf_sw_event 0.22 ± 10% -0.1 0.16 ± 6% perf-profile.children.cycles-pp.__mod_node_page_state 0.09 ± 21% -0.0 0.05 ± 50% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore 0.22 ± 8% -0.0 0.17 ± 9% perf-profile.children.cycles-pp.sync_regs 0.15 ± 10% -0.0 0.11 ± 8% perf-profile.children.cycles-pp.__list_add_valid 0.13 ± 12% -0.0 0.10 ± 8% perf-profile.children.cycles-pp.memcg_check_events 0.09 ± 8% -0.0 0.06 ± 12% perf-profile.children.cycles-pp.mem_cgroup_page_lruvec 0.10 ± 9% -0.0 0.08 ± 14% perf-profile.children.cycles-pp.up_read 0.11 ± 10% -0.0 0.09 ± 12% perf-profile.children.cycles-pp.shmem_get_policy 0.10 ± 5% -0.0 0.08 ± 13% perf-profile.children.cycles-pp.unlock_page 0.09 ± 11% -0.0 0.07 ± 14% perf-profile.children.cycles-pp._cond_resched 0.09 ± 5% -0.0 0.07 ± 17% perf-profile.children.cycles-pp.find_vma 0.25 ± 8% +0.2 0.42 ± 14% perf-profile.children.cycles-pp.mem_cgroup_uncharge_list 0.11 ± 8% +0.2 0.31 ± 15% perf-profile.children.cycles-pp.uncharge_page 0.15 ± 10% +0.2 0.38 ± 12% perf-profile.children.cycles-pp.lock_page_memcg 1.42 ± 9% +0.3 1.75 ± 9% perf-profile.children.cycles-pp.shmem_fault 0.57 ± 11% +0.4 0.93 ± 15% perf-profile.children.cycles-pp.mem_cgroup_charge_statistics 1.24 ± 9% +0.4 1.65 ± 9% perf-profile.children.cycles-pp.shmem_getpage_gfp 1.07 ± 9% +0.5 1.52 ± 10% perf-profile.children.cycles-pp.find_lock_entry 0.84 ± 9% +0.5 1.35 ± 10% perf-profile.children.cycles-pp.find_get_entry 1.70 ± 11% +0.7 2.39 ± 9% perf-profile.children.cycles-pp.native_irq_return_iret 1.28 ± 11% +0.8 2.11 ± 14% perf-profile.children.cycles-pp.__count_memcg_events 2.69 ± 9% +4.3 6.98 ± 14% perf-profile.children.cycles-pp.get_mem_cgroup_from_mm 7.06 ± 9% +10.2 17.28 ± 13% perf-profile.children.cycles-pp.mem_cgroup_charge 11.69 ± 13% -5.8 5.91 ± 8% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 2.90 ± 9% -0.6 2.29 ± 10% perf-profile.self.cycles-pp.testcase 2.03 ± 8% -0.5 1.55 ± 11% perf-profile.self.cycles-pp.zap_pte_range 1.07 ± 11% -0.4 0.63 ± 11% perf-profile.self.cycles-pp.__mod_memcg_lruvec_state 1.21 ± 12% -0.4 0.79 ± 11% perf-profile.self.cycles-pp.page_counter_try_charge 1.53 ± 10% -0.3 1.21 ± 9% perf-profile.self.cycles-pp.__list_del_entry_valid 1.03 ± 9% -0.2 0.79 ± 10% perf-profile.self.cycles-pp.__irqentry_text_end 1.05 ± 8% -0.2 0.82 ± 10% perf-profile.self.cycles-pp.free_pcppages_bulk 0.53 ± 10% -0.2 0.37 ± 12% perf-profile.self.cycles-pp.try_charge 0.43 ± 8% -0.1 0.33 ± 10% perf-profile.self.cycles-pp.free_pages_and_swap_cache 0.46 ± 10% -0.1 0.36 ± 10% perf-profile.self.cycles-pp.release_pages 0.46 ± 11% -0.1 0.36 ± 8% perf-profile.self.cycles-pp.__handle_mm_fault 0.36 ± 8% -0.1 0.29 ± 9% perf-profile.self.cycles-pp.xas_load 0.18 ± 10% -0.1 0.11 ± 11% perf-profile.self.cycles-pp.shmem_fault 0.27 ± 9% -0.1 0.20 ± 11% perf-profile.self.cycles-pp.page_remove_rmap 0.29 ± 9% -0.1 0.23 ± 9% perf-profile.self.cycles-pp.handle_mm_fault 0.14 ± 10% -0.1 0.09 ± 11% perf-profile.self.cycles-pp.page_add_new_anon_rmap 0.21 ± 8% -0.1 0.16 ± 4% perf-profile.self.cycles-pp.__mod_node_page_state 0.08 ± 17% -0.0 0.03 ± 82% perf-profile.self.cycles-pp.memcg_check_events 0.16 ± 9% -0.0 0.12 ± 9% perf-profile.self.cycles-pp.shmem_getpage_gfp 0.19 ± 8% -0.0 0.15 ± 9% perf-profile.self.cycles-pp.sync_regs 0.19 ± 8% -0.0 0.14 ± 11% perf-profile.self.cycles-pp.___perf_sw_event 0.17 ± 9% -0.0 0.13 ± 10% perf-profile.self.cycles-pp.do_user_addr_fault 0.12 ± 9% -0.0 0.09 ± 10% perf-profile.self.cycles-pp.find_lock_entry 0.08 ± 11% -0.0 0.05 ± 7% perf-profile.self.cycles-pp.mem_cgroup_page_lruvec 0.09 ± 25% -0.0 0.06 ± 12% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.08 ± 6% -0.0 0.06 ± 12% perf-profile.self.cycles-pp.get_task_policy 0.11 ± 8% +0.2 0.31 ± 15% perf-profile.self.cycles-pp.uncharge_page 0.15 ± 10% +0.2 0.38 ± 13% perf-profile.self.cycles-pp.lock_page_memcg 0.43 ± 9% +0.6 1.01 ± 10% perf-profile.self.cycles-pp.find_get_entry 1.69 ± 11% +0.7 2.39 ± 9% perf-profile.self.cycles-pp.native_irq_return_iret 1.28 ± 11% +0.8 2.11 ± 14% perf-profile.self.cycles-pp.__count_memcg_events 2.68 ± 9% +4.3 6.93 ± 14% perf-profile.self.cycles-pp.get_mem_cgroup_from_mm 1.84 ± 10% +6.2 8.01 ± 13% perf-profile.self.cycles-pp.mem_cgroup_charge will-it-scale.per_process_ops 195000 +------------------------------------------------------------------+ 190000 |-+ .+.+. .+.+. +.+. .+. .| | +. + +.+ : +. .+ +. .+ | 185000 |-+ : : : +. + | 180000 |-+ : : : | 175000 |-+ : : : | 170000 |-+ : : : | |.+. .+.+ +.+. .+.+.+.+. .+.+ | 165000 |-+ + +. + | 160000 |-+ | 155000 |-+ | 150000 |-+ | | O O O O O O O O O O O O | 145000 |-O O O O O O O O O O O O O O | 140000 +------------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen