Hi Rik, Please be noted that we reported [mm] 96db82a66d: will-it-scale.per_process_ops -95.3% regression at https://lore.kernel.org/all/YwIoiIYo4qsYBcgd@xsang-OptiPlex-9020/ when this commit was on https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master now we noticed the commit has been merged to mainline and we still observed similar test result. We are not sure if this is an expected performance change when switching to huge page or if it could benefit other use cases, so we report again FYI. Please feel free to ignore the report if the test result is just as expected. Thanks. Greeting, FYI, we noticed a -95.5% regression of will-it-scale.per_process_ops due to commit: commit: f35b5d7d676e59e401690b678cd3cfec5e785c23 ("mm: align larger anonymous mappings on THP boundaries") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: will-it-scale on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Gold 6338 CPU @ 2.00GHz (Ice Lake) with 256G memory with following parameters: nr_task: 100% mode: process test: malloc1 cpufreq_governor: performance test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale In addition to that, the commit also has significant impact on the following tests: +------------------+----------------------------------------------------------------------------------------------------+ | testcase: change | stress-ng: stress-ng.numa.ops_per_sec -52.9% regression | | test machine | 192 threads 4 sockets Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz (Cascade Lake) with 192G memory | | test parameters | class=cpu | | | cpufreq_governor=performance | | | nr_threads=100% | | | test=numa | | | testtime=60s | +------------------+----------------------------------------------------------------------------------------------------+ | testcase: change | will-it-scale: will-it-scale.per_process_ops -92.9% regression | | test machine | 88 threads 2 sockets Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz (Cascade Lake) with 128G memory | | test parameters | cpufreq_governor=performance | | | mode=process | | | nr_task=16 | | | test=malloc1 | +------------------+----------------------------------------------------------------------------------------------------+ Details are as below: ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-11/performance/x86_64-rhel-8.3/process/100%/debian-11.1-x86_64-20220510.cgz/lkp-icl-2sp2/malloc1/will-it-scale commit: 7b5a0b664e ("mm/page_ext: remove unused variable in offline_page_ext") f35b5d7d67 ("mm: align larger anonymous mappings on THP boundaries") 7b5a0b664ebe2625 f35b5d7d676e59e401690b678cd ---------------- --------------------------- %stddev %change %stddev \ | \ 2765078 -95.5% 124266 ± 4% will-it-scale.128.processes 21601 -95.5% 970.33 ± 4% will-it-scale.per_process_ops 2765078 -95.5% 124266 ± 4% will-it-scale.workload 1943 -3.5% 1874 vmstat.system.cs 0.68 +1.6 2.32 ± 4% mpstat.cpu.all.irq% 0.00 ± 3% +0.0 0.03 ± 4% mpstat.cpu.all.soft% 0.89 ± 2% -0.4 0.49 ± 2% mpstat.cpu.all.usr% 0.09 ± 4% -88.7% 0.01 turbostat.IPC 351.67 +11.2% 391.20 turbostat.PkgWatt 29.30 +299.5% 117.04 turbostat.RAMWatt 8.251e+08 -95.5% 37387027 numa-numastat.node0.local_node 8.252e+08 -95.5% 37467051 numa-numastat.node0.numa_hit 8.405e+08 -95.5% 38161942 ± 7% numa-numastat.node1.local_node 8.406e+08 -95.5% 38196663 ± 7% numa-numastat.node1.numa_hit 174409 ± 9% -21.4% 137126 ± 2% meminfo.Active 174071 ± 9% -21.4% 136806 ± 2% meminfo.Active(anon) 311891 +10.9% 346028 meminfo.AnonPages 343079 +42.0% 487127 meminfo.Inactive 342068 +42.1% 486072 meminfo.Inactive(anon) 69414 ± 2% +221.8% 223379 ± 2% meminfo.Mapped 204255 ± 8% +24.9% 255031 meminfo.Shmem 32528 ± 48% +147.6% 80547 ± 38% numa-meminfo.node0.AnonHugePages 92821 ± 23% +59.3% 147839 ± 28% numa-meminfo.node0.AnonPages 99694 ± 17% +56.9% 156414 ± 26% numa-meminfo.node0.Inactive 99136 ± 17% +57.7% 156290 ± 26% numa-meminfo.node0.Inactive(anon) 30838 ± 53% +134.0% 72155 ± 21% numa-meminfo.node0.Mapped 171865 ± 9% -22.7% 132920 ± 2% numa-meminfo.node1.Active 171791 ± 9% -22.7% 132730 ± 2% numa-meminfo.node1.Active(anon) 243260 ± 7% +36.0% 330799 ± 11% numa-meminfo.node1.Inactive 242807 ± 7% +35.9% 329868 ± 11% numa-meminfo.node1.Inactive(anon) 38681 ± 37% +291.2% 151319 ± 8% numa-meminfo.node1.Mapped 195654 ± 8% +27.6% 249732 numa-meminfo.node1.Shmem 23192 ± 23% +59.3% 36946 ± 28% numa-vmstat.node0.nr_anon_pages 24771 ± 17% +57.7% 39074 ± 26% numa-vmstat.node0.nr_inactive_anon 7625 ± 53% +136.5% 18031 ± 21% numa-vmstat.node0.nr_mapped 24771 ± 17% +57.8% 39085 ± 26% numa-vmstat.node0.nr_zone_inactive_anon 8.252e+08 -95.5% 37466761 numa-vmstat.node0.numa_hit 8.251e+08 -95.5% 37386737 numa-vmstat.node0.numa_local 43036 ± 9% -23.1% 33107 ± 2% numa-vmstat.node1.nr_active_anon 60590 ± 7% +36.1% 82475 ± 11% numa-vmstat.node1.nr_inactive_anon 9533 ± 38% +297.1% 37858 ± 8% numa-vmstat.node1.nr_mapped 48889 ± 8% +27.6% 62403 numa-vmstat.node1.nr_shmem 43036 ± 9% -23.1% 33107 ± 2% numa-vmstat.node1.nr_zone_active_anon 60589 ± 7% +36.0% 82430 ± 11% numa-vmstat.node1.nr_zone_inactive_anon 8.406e+08 -95.5% 38196529 ± 7% numa-vmstat.node1.numa_hit 8.405e+08 -95.5% 38161808 ± 7% numa-vmstat.node1.numa_local 43513 ± 9% -21.8% 34042 ± 2% proc-vmstat.nr_active_anon 77940 +11.0% 86526 proc-vmstat.nr_anon_pages 762952 +1.7% 775553 proc-vmstat.nr_file_pages 85507 +42.1% 121487 proc-vmstat.nr_inactive_anon 17361 ± 2% +221.5% 55823 ± 2% proc-vmstat.nr_mapped 3300 +4.6% 3452 proc-vmstat.nr_page_table_pages 51081 ± 8% +24.6% 63669 proc-vmstat.nr_shmem 43513 ± 9% -21.8% 34042 ± 2% proc-vmstat.nr_zone_active_anon 85507 +42.1% 121480 proc-vmstat.nr_zone_inactive_anon 23080 ± 20% +56.7% 36156 ± 11% proc-vmstat.numa_hint_faults 16266 ± 13% +75.2% 28496 ± 5% proc-vmstat.numa_hint_faults_local 1.666e+09 -95.5% 75751403 ± 3% proc-vmstat.numa_hit 63.17 ± 50% +2948.3% 1925 proc-vmstat.numa_huge_pte_updates 1.666e+09 -95.5% 75551968 ± 4% proc-vmstat.numa_local 176965 ± 9% +543.0% 1137910 proc-vmstat.numa_pte_updates 160517 ± 3% -14.3% 137522 ± 2% proc-vmstat.pgactivate 1.665e+09 -95.5% 75663487 ± 4% proc-vmstat.pgalloc_normal 8.332e+08 -95.4% 38289978 ± 4% proc-vmstat.pgfault 1.665e+09 -95.5% 75646557 ± 4% proc-vmstat.pgfree 18.00 +2.1e+08% 37369911 ± 4% proc-vmstat.thp_fault_alloc 1.46 ±223% +1.3e+06% 19552 sched_debug.cfs_rq:/.MIN_vruntime.avg 187.51 ±223% +1.3e+06% 2502777 sched_debug.cfs_rq:/.MIN_vruntime.max 16.51 ±223% +1.3e+06% 220350 sched_debug.cfs_rq:/.MIN_vruntime.stddev 233.78 ± 28% +45.4% 339.81 ± 29% sched_debug.cfs_rq:/.load_avg.max 1.46 ±223% +1.3e+06% 19552 sched_debug.cfs_rq:/.max_vruntime.avg 187.51 ±223% +1.3e+06% 2502777 sched_debug.cfs_rq:/.max_vruntime.max 16.51 ±223% +1.3e+06% 220350 sched_debug.cfs_rq:/.max_vruntime.stddev 20463200 -12.7% 17863314 sched_debug.cfs_rq:/.min_vruntime.min 227934 ± 6% +73.5% 395381 ± 6% sched_debug.cfs_rq:/.min_vruntime.stddev 557786 ± 6% +22.1% 680843 ± 7% sched_debug.cfs_rq:/.spread0.max -668417 +343.1% -2961726 sched_debug.cfs_rq:/.spread0.min 227979 ± 6% +73.4% 395300 ± 6% sched_debug.cfs_rq:/.spread0.stddev 793.86 ± 3% -31.1% 546.72 ± 9% sched_debug.cfs_rq:/.util_avg.min 57.90 ± 8% +47.0% 85.09 ± 11% sched_debug.cfs_rq:/.util_avg.stddev 535.54 ± 3% -18.1% 438.80 sched_debug.cfs_rq:/.util_est_enqueued.avg 224.57 ± 5% -38.5% 138.22 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev 957251 +12.9% 1080580 ± 11% sched_debug.cpu.avg_idle.avg 10.65 ± 5% +303.8% 43.02 ± 21% sched_debug.cpu.clock.stddev 644.97 +46.0% 941.70 ± 5% sched_debug.cpu.clock_task.stddev 0.00 ± 13% +107.1% 0.00 ± 18% sched_debug.cpu.next_balance.stddev 1802 ± 10% -19.3% 1454 ± 13% sched_debug.cpu.nr_switches.min 3.28 ± 13% +8839.1% 293.65 ± 3% perf-stat.i.MPKI 1.936e+10 -81.9% 3.503e+09 perf-stat.i.branch-instructions 0.14 -0.0 0.13 ± 3% perf-stat.i.branch-miss-rate% 26013221 -82.5% 4556180 ± 2% perf-stat.i.branch-misses 5.62 ± 5% +46.1 51.68 ± 3% perf-stat.i.cache-miss-rate% 15513099 ± 8% +14150.9% 2.211e+09 perf-stat.i.cache-misses 2.787e+08 ± 13% +1437.2% 4.284e+09 ± 4% perf-stat.i.cache-references 1870 -3.6% 1803 perf-stat.i.context-switches 3.87 +474.5% 22.23 perf-stat.i.cpi 3.288e+11 -1.4% 3.242e+11 perf-stat.i.cpu-cycles 174.70 -10.3% 156.69 perf-stat.i.cpu-migrations 21352 ± 8% -99.3% 159.27 ± 17% perf-stat.i.cycles-between-cache-misses 0.01 -0.0 0.00 ± 11% perf-stat.i.dTLB-load-miss-rate% 2874326 ± 2% -94.7% 152528 ± 10% perf-stat.i.dTLB-load-misses 2.047e+10 -81.3% 3.825e+09 perf-stat.i.dTLB-loads 0.25 -0.2 0.06 perf-stat.i.dTLB-store-miss-rate% 19343669 -95.4% 891050 ± 4% perf-stat.i.dTLB-store-misses 7.829e+09 -80.1% 1.561e+09 ± 3% perf-stat.i.dTLB-stores 8.49e+10 -82.8% 1.463e+10 perf-stat.i.instructions 0.26 -82.2% 0.05 perf-stat.i.ipc 0.14 ± 38% +60.3% 0.22 ± 12% perf-stat.i.major-faults 2.57 -1.4% 2.53 perf-stat.i.metric.GHz 265.58 +203.4% 805.78 ± 15% perf-stat.i.metric.K/sec 374.50 -68.2% 119.04 perf-stat.i.metric.M/sec 2757302 -95.4% 126231 ± 4% perf-stat.i.minor-faults 92.63 +3.9 96.51 perf-stat.i.node-load-miss-rate% 3007607 ± 4% +285.4% 11591077 ± 6% perf-stat.i.node-load-misses 240194 ± 17% +71.9% 412981 ± 6% perf-stat.i.node-loads 97.87 -92.4 5.47 ± 7% perf-stat.i.node-store-miss-rate% 5503394 +2009.9% 1.161e+08 ± 7% perf-stat.i.node-store-misses 119412 ± 6% +1.7e+06% 2.041e+09 perf-stat.i.node-stores 2757302 -95.4% 126231 ± 4% perf-stat.i.page-faults 3.28 ± 13% +8826.2% 293.21 ± 3% perf-stat.overall.MPKI 0.13 -0.0 0.13 ± 2% perf-stat.overall.branch-miss-rate% 5.61 ± 5% +46.1 51.70 ± 3% perf-stat.overall.cache-miss-rate% 3.87 +473.3% 22.20 perf-stat.overall.cpi 21335 ± 8% -99.3% 146.65 perf-stat.overall.cycles-between-cache-misses 0.01 -0.0 0.00 ± 9% perf-stat.overall.dTLB-load-miss-rate% 0.25 -0.2 0.06 perf-stat.overall.dTLB-store-miss-rate% 0.26 -82.6% 0.05 perf-stat.overall.ipc 92.65 +4.0 96.63 perf-stat.overall.node-load-miss-rate% 97.88 -92.5 5.38 ± 8% perf-stat.overall.node-store-miss-rate% 9272709 +283.9% 35600802 ± 3% perf-stat.overall.path-length 1.929e+10 -81.9% 3.487e+09 perf-stat.ps.branch-instructions 25928796 -82.7% 4477103 ± 2% perf-stat.ps.branch-misses 15464091 ± 8% +14157.2% 2.205e+09 perf-stat.ps.cache-misses 2.778e+08 ± 13% +1437.1% 4.27e+09 ± 4% perf-stat.ps.cache-references 1865 -3.9% 1791 perf-stat.ps.context-switches 3.277e+11 -1.4% 3.233e+11 perf-stat.ps.cpu-cycles 174.25 -11.7% 153.93 perf-stat.ps.cpu-migrations 2866660 ± 2% -94.7% 151686 ± 10% perf-stat.ps.dTLB-load-misses 2.041e+10 -81.3% 3.808e+09 perf-stat.ps.dTLB-loads 19279774 -95.4% 888826 ± 4% perf-stat.ps.dTLB-store-misses 7.803e+09 -80.1% 1.555e+09 ± 3% perf-stat.ps.dTLB-stores 8.462e+10 -82.8% 1.456e+10 perf-stat.ps.instructions 0.14 ± 38% +56.7% 0.21 ± 14% perf-stat.ps.major-faults 2748185 -95.4% 125830 ± 4% perf-stat.ps.minor-faults 2998556 ± 4% +291.3% 11734146 ± 6% perf-stat.ps.node-load-misses 239400 ± 17% +70.8% 408868 ± 7% perf-stat.ps.node-loads 5485289 +2007.4% 1.156e+08 ± 7% perf-stat.ps.node-store-misses 119090 ± 6% +1.7e+06% 2.035e+09 perf-stat.ps.node-stores 2748185 -95.4% 125831 ± 4% perf-stat.ps.page-faults 2.564e+13 -82.8% 4.417e+12 perf-stat.total.instructions 95.23 -79.8 15.41 ± 6% perf-profile.calltrace.cycles-pp.__munmap 95.08 -79.7 15.40 ± 6% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 95.02 -79.6 15.39 ± 6% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 94.96 -79.6 15.37 ± 6% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 94.95 -79.6 15.37 ± 6% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 94.86 -79.5 15.35 ± 6% perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 94.38 -79.2 15.22 ± 6% perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 42.74 -42.7 0.00 perf-profile.calltrace.cycles-pp.lru_add_drain.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 42.74 -42.7 0.00 perf-profile.calltrace.cycles-pp.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap.__vm_munmap 42.72 -42.7 0.00 perf-profile.calltrace.cycles-pp.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region.__do_munmap 41.84 -41.8 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain.unmap_region 41.70 -41.7 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu.lru_add_drain 41.62 -41.6 0.00 perf-profile.calltrace.cycles-pp.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region 41.55 -41.6 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.folio_batch_move_lru.lru_add_drain_cpu 41.52 -41.5 0.00 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush.tlb_finish_mmu 41.28 -41.3 0.00 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.folio_lruvec_lock_irqsave.release_pages.tlb_batch_pages_flush 46.93 -37.0 9.94 ± 6% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 43.64 -33.8 9.84 ± 6% perf-profile.calltrace.cycles-pp.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap 43.40 -33.6 9.81 ± 6% perf-profile.calltrace.cycles-pp.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region.__do_munmap 0.00 +0.6 0.56 ± 6% perf-profile.calltrace.cycles-pp.update_process_times.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt 0.00 +0.6 0.57 ± 6% perf-profile.calltrace.cycles-pp.tick_sched_handle.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt 0.00 +0.6 0.60 ± 6% perf-profile.calltrace.cycles-pp.tick_sched_timer.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt 0.00 +0.7 0.67 ± 5% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt 0.00 +0.7 0.74 ± 5% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.clear_page_erms 0.00 +0.8 0.77 ± 4% perf-profile.calltrace.cycles-pp.__sysvec_apic_timer_interrupt.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.clear_page_erms.clear_huge_page 0.00 +0.8 0.81 ± 4% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page 0.00 +1.1 1.09 ± 3% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault 0.00 +1.1 1.11 ± 3% perf-profile.calltrace.cycles-pp.free_pcp_prepare.free_unref_page.release_pages.tlb_batch_pages_flush.tlb_finish_mmu 3.60 ± 3% +1.5 5.08 ± 7% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 3.51 ± 3% +1.6 5.06 ± 7% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap 3.29 ± 3% +1.7 5.04 ± 7% perf-profile.calltrace.cycles-pp.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region.__do_munmap 0.00 +2.8 2.78 ± 2% perf-profile.calltrace.cycles-pp.__cond_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault 0.00 +3.3 3.28 ± 2% perf-profile.calltrace.cycles-pp.__might_resched.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault 0.00 +4.2 4.21 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.free_pcppages_bulk.free_unref_page.zap_huge_pmd.zap_pmd_range 0.00 +4.2 4.21 ± 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.free_unref_page.zap_huge_pmd 0.00 +4.3 4.34 ± 8% perf-profile.calltrace.cycles-pp.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages.pte_alloc_one 0.00 +4.4 4.40 ± 8% perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.pte_alloc_one.__do_huge_pmd_anonymous_page 0.00 +4.4 4.43 ± 8% perf-profile.calltrace.cycles-pp.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages.__folio_alloc 0.00 +4.5 4.49 ± 8% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.pte_alloc_one.__do_huge_pmd_anonymous_page.__handle_mm_fault 0.00 +4.5 4.51 ± 8% perf-profile.calltrace.cycles-pp.rmqueue.get_page_from_freelist.__alloc_pages.__folio_alloc.vma_alloc_folio 0.00 +4.6 4.59 ± 8% perf-profile.calltrace.cycles-pp.__alloc_pages.pte_alloc_one.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault 0.00 +4.6 4.62 ± 8% perf-profile.calltrace.cycles-pp.pte_alloc_one.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.00 +4.6 4.63 ± 8% perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page.zap_huge_pmd.zap_pmd_range.unmap_page_range 0.00 +4.7 4.70 ± 7% perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages.__folio_alloc.vma_alloc_folio.do_huge_pmd_anonymous_page 0.00 +4.7 4.72 ± 7% perf-profile.calltrace.cycles-pp.__alloc_pages.__folio_alloc.vma_alloc_folio.do_huge_pmd_anonymous_page.__handle_mm_fault 0.00 +4.7 4.73 ± 7% perf-profile.calltrace.cycles-pp.__folio_alloc.vma_alloc_folio.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault 0.00 +4.8 4.75 ± 7% perf-profile.calltrace.cycles-pp.vma_alloc_folio.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.00 +4.8 4.76 ± 8% perf-profile.calltrace.cycles-pp.free_unref_page.zap_huge_pmd.zap_pmd_range.unmap_page_range.unmap_vmas 0.00 +4.8 4.82 ± 7% perf-profile.calltrace.cycles-pp.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 0.00 +4.9 4.88 ± 7% perf-profile.calltrace.cycles-pp.zap_huge_pmd.zap_pmd_range.unmap_page_range.unmap_vmas.unmap_region 0.00 +8.2 8.22 ± 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.rmqueue_bulk.rmqueue.get_page_from_freelist 0.00 +8.2 8.23 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.rmqueue_bulk.rmqueue.get_page_from_freelist.__alloc_pages 0.00 +8.3 8.35 ± 8% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.free_pcppages_bulk.free_unref_page.release_pages 0.00 +8.3 8.35 ± 8% perf-profile.calltrace.cycles-pp._raw_spin_lock.free_pcppages_bulk.free_unref_page.release_pages.tlb_batch_pages_flush 0.00 +8.4 8.37 ± 8% perf-profile.calltrace.cycles-pp.free_pcppages_bulk.free_unref_page.release_pages.tlb_batch_pages_flush.tlb_finish_mmu 0.00 +9.6 9.60 ± 6% perf-profile.calltrace.cycles-pp.free_unref_page.release_pages.tlb_batch_pages_flush.tlb_finish_mmu.unmap_region 0.00 +65.5 65.48 ± 2% perf-profile.calltrace.cycles-pp.clear_page_erms.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault 0.00 +72.5 72.51 ± 2% perf-profile.calltrace.cycles-pp.clear_huge_page.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault 0.00 +78.6 78.58 perf-profile.calltrace.cycles-pp.__do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault 2.62 ± 3% +80.9 83.56 perf-profile.calltrace.cycles-pp.asm_exc_page_fault 2.60 ± 3% +81.0 83.56 perf-profile.calltrace.cycles-pp.exc_page_fault.asm_exc_page_fault 2.58 ± 3% +81.0 83.57 perf-profile.calltrace.cycles-pp.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 2.38 ± 3% +81.1 83.52 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 2.26 ± 3% +81.2 83.45 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_user_addr_fault.exc_page_fault.asm_exc_page_fault 83.48 -83.4 0.06 ± 9% perf-profile.children.cycles-pp.folio_lruvec_lock_irqsave 83.28 -83.2 0.08 ± 8% perf-profile.children.cycles-pp._raw_spin_lock_irqsave 96.34 -80.3 16.09 ± 6% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 96.27 -80.2 16.08 ± 6% perf-profile.children.cycles-pp.do_syscall_64 95.28 -79.9 15.41 ± 6% perf-profile.children.cycles-pp.__munmap 94.96 -79.6 15.37 ± 6% perf-profile.children.cycles-pp.__x64_sys_munmap 94.96 -79.6 15.37 ± 6% perf-profile.children.cycles-pp.__vm_munmap 94.87 -79.5 15.36 ± 6% perf-profile.children.cycles-pp.__do_munmap 94.39 -79.2 15.22 ± 6% perf-profile.children.cycles-pp.unmap_region 82.88 -62.0 20.90 ± 8% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 42.78 -42.8 0.00 perf-profile.children.cycles-pp.lru_add_drain 42.76 -42.8 0.00 perf-profile.children.cycles-pp.lru_add_drain_cpu 42.75 -42.6 0.10 perf-profile.children.cycles-pp.folio_batch_move_lru 46.94 -37.0 9.94 ± 6% perf-profile.children.cycles-pp.tlb_finish_mmu 43.64 -33.8 9.84 ± 6% perf-profile.children.cycles-pp.tlb_batch_pages_flush 43.62 -33.8 9.82 ± 6% perf-profile.children.cycles-pp.release_pages 3.21 ± 4% -3.1 0.09 ± 5% perf-profile.children.cycles-pp.flush_tlb_mm_range 3.11 ± 4% -3.1 0.06 ± 8% perf-profile.children.cycles-pp.flush_tlb_func 3.00 ± 4% -3.0 0.03 ± 70% perf-profile.children.cycles-pp.native_flush_tlb_one_user 1.33 ± 3% -0.9 0.42 ± 5% perf-profile.children.cycles-pp.__mmap 1.10 ± 4% -0.7 0.39 ± 5% perf-profile.children.cycles-pp.vm_mmap_pgoff 0.79 ± 4% -0.7 0.09 ± 7% perf-profile.children.cycles-pp.uncharge_batch 0.97 ± 4% -0.6 0.36 ± 5% perf-profile.children.cycles-pp.do_mmap 0.65 ± 5% -0.6 0.06 ± 8% perf-profile.children.cycles-pp.page_counter_uncharge 0.64 ± 4% -0.6 0.05 ± 8% perf-profile.children.cycles-pp.free_pgd_range 0.62 ± 4% -0.6 0.05 perf-profile.children.cycles-pp.free_p4d_range 0.81 ± 3% -0.5 0.30 ± 6% perf-profile.children.cycles-pp.mmap_region 0.59 ± 4% -0.5 0.08 ± 6% perf-profile.children.cycles-pp.kmem_cache_alloc 0.66 ± 5% -0.5 0.16 ± 4% perf-profile.children.cycles-pp.__mod_lruvec_page_state 0.55 ± 3% -0.5 0.06 ± 9% perf-profile.children.cycles-pp.__anon_vma_prepare 0.44 ± 4% -0.4 0.06 ± 9% perf-profile.children.cycles-pp.lru_add_fn 0.42 ± 5% -0.3 0.12 ± 4% perf-profile.children.cycles-pp.free_pgtables 0.40 ± 4% -0.3 0.10 ± 9% perf-profile.children.cycles-pp.kmem_cache_free 0.41 ± 5% -0.3 0.11 ± 5% perf-profile.children.cycles-pp.unlink_anon_vmas 0.25 ± 5% -0.2 0.03 ± 70% perf-profile.children.cycles-pp.vm_area_alloc 0.42 ± 6% -0.2 0.26 ± 3% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore 0.16 ± 5% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.__put_anon_vma 0.27 ± 4% -0.1 0.13 ± 3% perf-profile.children.cycles-pp.native_irq_return_iret 0.28 ± 5% -0.1 0.14 ± 7% perf-profile.children.cycles-pp.perf_event_mmap 0.18 ± 4% -0.1 0.07 ± 7% perf-profile.children.cycles-pp.page_add_new_anon_rmap 0.23 ± 3% -0.1 0.14 ± 7% perf-profile.children.cycles-pp.perf_event_mmap_event 0.18 ± 4% -0.1 0.08 ± 5% perf-profile.children.cycles-pp.__memcg_kmem_charge_page 0.16 ± 5% -0.1 0.07 perf-profile.children.cycles-pp.page_remove_rmap 0.13 ± 2% -0.1 0.05 ± 7% perf-profile.children.cycles-pp.get_unmapped_area 0.15 ± 5% -0.0 0.10 ± 9% perf-profile.children.cycles-pp.perf_iterate_sb 0.10 ± 5% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.find_vma 0.39 ± 3% -0.0 0.35 ± 4% perf-profile.children.cycles-pp.rcu_all_qs 0.09 ± 5% -0.0 0.06 ± 9% perf-profile.children.cycles-pp.__perf_sw_event 0.15 ± 4% -0.0 0.13 ± 7% perf-profile.children.cycles-pp.__mem_cgroup_charge 0.07 -0.0 0.06 ± 8% perf-profile.children.cycles-pp.___perf_sw_event 0.12 ± 5% +0.0 0.14 ± 3% perf-profile.children.cycles-pp.__mod_lruvec_state 0.00 +0.1 0.05 perf-profile.children.cycles-pp.perf_output_sample 0.00 +0.1 0.05 perf-profile.children.cycles-pp.memcg_check_events 0.08 ± 4% +0.1 0.13 ± 5% perf-profile.children.cycles-pp.__mod_node_page_state 0.00 +0.1 0.06 ± 8% perf-profile.children.cycles-pp.__get_user_nocheck_8 0.00 +0.1 0.06 ± 6% perf-profile.children.cycles-pp.perf_callchain_user 0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.update_load_avg 0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.__orc_find 0.19 ± 4% +0.1 0.26 ± 6% perf-profile.children.cycles-pp.__list_del_entry_valid 0.11 ± 4% +0.1 0.18 ± 7% perf-profile.children.cycles-pp.unwind_next_frame 0.00 +0.1 0.08 ± 7% perf-profile.children.cycles-pp.__page_cache_release 0.02 ± 99% +0.1 0.11 ± 4% perf-profile.children.cycles-pp.folio_add_lru 0.00 +0.1 0.08 ± 13% perf-profile.children.cycles-pp.shmem_alloc_and_acct_folio 0.00 +0.1 0.08 ± 13% perf-profile.children.cycles-pp.shmem_alloc_folio 0.00 +0.1 0.09 ± 10% perf-profile.children.cycles-pp.__unwind_start 0.00 +0.1 0.09 ± 11% perf-profile.children.cycles-pp.shmem_write_begin 0.00 +0.1 0.09 ± 11% perf-profile.children.cycles-pp.shmem_getpage_gfp 0.00 +0.1 0.10 ± 6% perf-profile.children.cycles-pp.free_compound_page 0.00 +0.1 0.10 ± 6% perf-profile.children.cycles-pp.__mem_cgroup_uncharge 0.13 ± 4% +0.1 0.23 ± 7% perf-profile.children.cycles-pp.perf_callchain_kernel 0.13 ± 17% +0.1 0.24 ± 15% perf-profile.children.cycles-pp.cmd_record 0.13 ± 15% +0.1 0.24 ± 16% perf-profile.children.cycles-pp.__libc_start_main 0.13 ± 15% +0.1 0.24 ± 16% perf-profile.children.cycles-pp.main 0.13 ± 15% +0.1 0.24 ± 16% perf-profile.children.cycles-pp.run_builtin 0.04 ± 47% +0.1 0.16 ± 11% perf-profile.children.cycles-pp.generic_perform_write 0.04 ± 47% +0.1 0.16 ± 12% perf-profile.children.cycles-pp.generic_file_write_iter 0.04 ± 47% +0.1 0.16 ± 12% perf-profile.children.cycles-pp.__generic_file_write_iter 0.04 ± 47% +0.1 0.16 ± 14% perf-profile.children.cycles-pp.record__pushfn 0.04 ± 47% +0.1 0.16 ± 14% perf-profile.children.cycles-pp.__libc_write 0.04 ± 47% +0.1 0.16 ± 13% perf-profile.children.cycles-pp.vfs_write 0.04 ± 47% +0.1 0.16 ± 13% perf-profile.children.cycles-pp.perf_mmap__push 0.04 ± 47% +0.1 0.16 ± 13% perf-profile.children.cycles-pp.ksys_write 0.04 ± 47% +0.1 0.17 ± 12% perf-profile.children.cycles-pp.record__mmap_read_evlist 0.13 ± 17% +0.1 0.27 ± 17% perf-profile.children.cycles-pp.__cmd_record 0.15 ± 3% +0.2 0.30 ± 7% perf-profile.children.cycles-pp.get_perf_callchain 0.15 ± 3% +0.2 0.31 ± 6% perf-profile.children.cycles-pp.perf_callchain 0.16 ± 6% +0.2 0.33 ± 7% perf-profile.children.cycles-pp.perf_prepare_sample 0.17 ± 4% +0.2 0.41 ± 6% perf-profile.children.cycles-pp.perf_event_output_forward 0.17 ± 4% +0.2 0.41 ± 6% perf-profile.children.cycles-pp.__perf_event_overflow 0.00 +0.3 0.27 ± 8% perf-profile.children.cycles-pp.__free_one_page 0.18 ± 5% +0.3 0.44 ± 6% perf-profile.children.cycles-pp.perf_tp_event 0.18 ± 5% +0.3 0.46 ± 6% perf-profile.children.cycles-pp.perf_trace_sched_stat_runtime 0.19 ± 5% +0.3 0.53 ± 6% perf-profile.children.cycles-pp.update_curr 0.21 ± 3% +0.4 0.62 ± 5% perf-profile.children.cycles-pp.task_tick_fair 0.00 +0.4 0.42 ± 8% perf-profile.children.cycles-pp.check_new_pages 0.23 ± 3% +0.5 0.72 ± 5% perf-profile.children.cycles-pp.scheduler_tick 0.25 ± 4% +0.6 0.83 ± 5% perf-profile.children.cycles-pp.update_process_times 0.25 ± 4% +0.6 0.84 ± 5% perf-profile.children.cycles-pp.tick_sched_handle 0.26 ± 4% +0.6 0.87 ± 5% perf-profile.children.cycles-pp.tick_sched_timer 0.29 ± 3% +0.7 0.98 ± 4% perf-profile.children.cycles-pp.__hrtimer_run_queues 0.34 ± 4% +0.7 1.09 ± 4% perf-profile.children.cycles-pp.hrtimer_interrupt 0.36 ± 4% +0.8 1.12 ± 4% perf-profile.children.cycles-pp.__sysvec_apic_timer_interrupt 0.38 ± 4% +0.8 1.18 ± 4% perf-profile.children.cycles-pp.sysvec_apic_timer_interrupt 0.44 ± 5% +1.1 1.51 ± 3% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 0.00 +1.1 1.14 ± 2% perf-profile.children.cycles-pp.free_pcp_prepare 3.60 ± 3% +1.5 5.08 ± 7% perf-profile.children.cycles-pp.unmap_vmas 3.52 ± 3% +1.6 5.07 ± 7% perf-profile.children.cycles-pp.unmap_page_range 3.43 ± 3% +1.6 5.05 ± 7% perf-profile.children.cycles-pp.zap_pmd_range 0.74 ± 3% +1.7 2.42 ± 2% perf-profile.children.cycles-pp.__cond_resched 0.79 ± 4% +2.9 3.67 ± 2% perf-profile.children.cycles-pp.__might_resched 0.73 ± 3% +3.9 4.63 ± 8% perf-profile.children.cycles-pp.pte_alloc_one 0.32 ± 5% +4.5 4.84 ± 7% perf-profile.children.cycles-pp.vma_alloc_folio 0.27 ± 5% +4.5 4.82 ± 7% perf-profile.children.cycles-pp.__folio_alloc 0.00 +4.8 4.82 ± 7% perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page 0.00 +4.9 4.88 ± 7% perf-profile.children.cycles-pp.zap_huge_pmd 0.75 ± 4% +8.7 9.41 ± 7% perf-profile.children.cycles-pp.__alloc_pages 0.16 ± 5% +8.8 9.00 ± 8% perf-profile.children.cycles-pp.rmqueue 0.00 +8.9 8.86 ± 8% perf-profile.children.cycles-pp.rmqueue_bulk 0.42 ± 5% +8.9 9.28 ± 7% perf-profile.children.cycles-pp.get_page_from_freelist 0.00 +13.0 13.02 ± 8% perf-profile.children.cycles-pp.free_pcppages_bulk 0.00 +14.4 14.36 ± 7% perf-profile.children.cycles-pp.free_unref_page 0.12 ± 3% +20.9 21.00 ± 8% perf-profile.children.cycles-pp._raw_spin_lock 0.18 ± 3% +65.9 66.08 ± 2% perf-profile.children.cycles-pp.clear_page_erms 0.00 +73.5 73.51 ± 2% perf-profile.children.cycles-pp.clear_huge_page 0.00 +78.6 78.58 perf-profile.children.cycles-pp.__do_huge_pmd_anonymous_page 2.65 ± 3% +81.0 83.62 perf-profile.children.cycles-pp.asm_exc_page_fault 2.61 ± 3% +81.0 83.59 perf-profile.children.cycles-pp.exc_page_fault 2.60 ± 3% +81.0 83.59 perf-profile.children.cycles-pp.do_user_addr_fault 2.39 ± 3% +81.1 83.53 perf-profile.children.cycles-pp.handle_mm_fault 2.27 ± 3% +81.2 83.46 perf-profile.children.cycles-pp.__handle_mm_fault 82.87 -62.0 20.90 ± 8% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 2.98 ± 4% -2.9 0.03 ± 70% perf-profile.self.cycles-pp.native_flush_tlb_one_user 0.72 ± 3% -0.6 0.08 ± 10% perf-profile.self.cycles-pp.zap_pmd_range 0.50 ± 5% -0.5 0.03 ± 70% perf-profile.self.cycles-pp.page_counter_uncharge 0.41 ± 4% -0.3 0.06 ± 7% perf-profile.self.cycles-pp._raw_spin_lock_irqsave 0.27 ± 4% -0.1 0.13 ± 3% perf-profile.self.cycles-pp.native_irq_return_iret 0.20 ± 6% -0.1 0.08 ± 8% perf-profile.self.cycles-pp.kmem_cache_free 0.22 ± 3% -0.1 0.14 ± 3% perf-profile.self.cycles-pp.rcu_all_qs 0.08 ± 5% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.try_charge_memcg 0.02 ±141% +0.1 0.07 ± 5% perf-profile.self.cycles-pp.unwind_next_frame 0.08 ± 6% +0.1 0.13 ± 4% perf-profile.self.cycles-pp.__mod_node_page_state 0.00 +0.1 0.06 ± 8% perf-profile.self.cycles-pp.__do_huge_pmd_anonymous_page 0.00 +0.1 0.06 ± 8% perf-profile.self.cycles-pp.page_counter_try_charge 0.00 +0.1 0.06 ± 9% perf-profile.self.cycles-pp.__orc_find 0.19 ± 3% +0.1 0.26 ± 8% perf-profile.self.cycles-pp.__list_del_entry_valid 0.08 ± 7% +0.1 0.19 ± 8% perf-profile.self.cycles-pp.get_page_from_freelist 0.00 +0.3 0.25 ± 8% perf-profile.self.cycles-pp.__free_one_page 0.00 +0.4 0.42 ± 8% perf-profile.self.cycles-pp.check_new_pages 0.00 +1.1 1.10 ± 2% perf-profile.self.cycles-pp.free_pcp_prepare 0.42 ± 4% +1.2 1.59 ± 2% perf-profile.self.cycles-pp.__cond_resched 0.00 +2.5 2.47 ± 2% perf-profile.self.cycles-pp.clear_huge_page 0.70 ± 4% +2.7 3.36 ± 2% perf-profile.self.cycles-pp.__might_resched 0.18 ± 4% +65.0 65.14 ± 2% perf-profile.self.cycles-pp.clear_page_erms If you fix the issue, kindly add following tag | Reported-by: kernel test robot | Link: https://lore.kernel.org/r/202210181535.7144dd15-yujie.liu@intel.com To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://01.org/lkp