Greeting, FYI, we noticed a -2.1% regression of will-it-scale.per_thread_ops due to commit: commit: 5ae8a9d7c84e7e6fa64ccaa357a1351015f1457c ("[RFC PATCH 4/4] mm: Add PG_zero support") url: https://github.com/0day-ci/linux/commits/liliangleo/mm-Add-PG_zero-support/20200412-172834 in testcase: will-it-scale on test machine: 8 threads Intel(R) Core(TM) i7-3770K CPU @ 3.50GHz with 16G memory with following parameters: nr_task: 100% mode: thread test: page_fault1 cpufreq_governor: performance ucode: 0x21 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-7/performance/x86_64-rhel-7.6/thread/100%/debian-x86_64-20191114.cgz/lkp-ivb-d01/page_fault1/will-it-scale/0x21 commit: 0801ffd19f ("mm: add sys fs configuration for page reporting") 5ae8a9d7c8 ("mm: Add PG_zero support") 0801ffd19fa82207 5ae8a9d7c84e7e6fa64ccaa357a ---------------- --------------------------- fail:runs %reproduction fail:runs | | | :4 25% 1:4 dmesg.RIP:__mnt_want_write :4 25% 1:4 dmesg.RIP:loop 1:4 -25% :4 dmesg.RIP:poll_idle 1:4 -25% :4 kmsg.b44449d>]usb_hcd_irq :4 25% 1:4 kmsg.b5a1a>]usb_hcd_irq 1:4 -25% :4 kmsg.c3ed91f>]usb_hcd_irq :4 25% 1:4 kmsg.c6fae9f>]usb_hcd_irq 1:4 -25% :4 kmsg.c8b7ca>]usb_hcd_irq %stddev %change %stddev \ | \ 536081 -2.1% 524633 will-it-scale.per_thread_ops 2523521 -2.1% 2470098 will-it-scale.time.minor_page_faults 107.85 -4.2% 103.29 will-it-scale.time.user_time 511850 -3.3% 495188 will-it-scale.time.voluntary_context_switches 4288652 -2.1% 4197068 will-it-scale.workload 4.87 +0.8 5.63 mpstat.cpu.all.idle% 3991 +10.2% 4397 ± 5% slabinfo.anon_vma.num_objs 142485 ± 8% -15.3% 120695 ± 3% softirqs.CPU7.TIMER 3005147 ± 5% +129.0% 6881725 ± 28% cpuidle.C1.time 66503 ± 2% +55.0% 103053 ± 22% cpuidle.C1.usage 32502781 ± 14% +26.6% 41156255 cpuidle.C3.time 23839328 ± 16% +45.3% 34642724 ± 8% cpuidle.C6.time 41675 ± 15% +58.6% 66107 ± 10% cpuidle.C6.usage 246196 -4.0% 236260 interrupts.CAL:Function_call_interrupts 8406 ± 32% +27.0% 10673 ± 27% interrupts.CPU4.NMI:Non-maskable_interrupts 8406 ± 32% +27.0% 10673 ± 27% interrupts.CPU4.PMI:Performance_monitoring_interrupts 11617 ± 21% -41.5% 6801 interrupts.CPU5.NMI:Non-maskable_interrupts 11617 ± 21% -41.5% 6801 interrupts.CPU5.PMI:Performance_monitoring_interrupts 320147 ± 22% -26.9% 233880 ± 2% sched_debug.cfs_rq:/.load.max 18580 ± 24% -32.6% 12525 ± 28% sched_debug.cpu.nr_switches.stddev 18333 ± 27% -34.9% 11930 ± 26% sched_debug.cpu.sched_count.stddev 8807 ± 28% -31.8% 6009 ± 19% sched_debug.cpu.ttwu_count.stddev 8775 ± 26% -31.9% 5973 ± 22% sched_debug.cpu.ttwu_local.stddev 5412715 -2.1% 5298836 proc-vmstat.numa_hit 5412715 -2.1% 5298836 proc-vmstat.numa_local 1.291e+09 -2.1% 1.265e+09 proc-vmstat.pgalloc_normal 2908824 -1.7% 2858835 proc-vmstat.pgfault 1.291e+09 -2.1% 1.265e+09 proc-vmstat.pgfree 2516340 -2.1% 2464229 proc-vmstat.thp_fault_alloc 86.60 -27.4 59.23 perf-profile.calltrace.cycles-pp.clear_page_erms.clear_subpage.clear_huge_page.do_huge_pmd_anonymous_page.__handle_mm_fault 95.16 -0.6 94.56 perf-profile.calltrace.cycles-pp.page_fault 94.97 -0.6 94.38 perf-profile.calltrace.cycles-pp.handle_mm_fault.do_page_fault.page_fault 95.12 -0.6 94.53 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault 94.88 -0.6 94.29 perf-profile.calltrace.cycles-pp.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_page_fault.page_fault 94.95 -0.6 94.36 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.do_page_fault.page_fault 90.91 -0.4 90.56 perf-profile.calltrace.cycles-pp.clear_huge_page.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_page_fault 3.37 -0.2 3.18 perf-profile.calltrace.cycles-pp.get_page_from_freelist.__alloc_pages_nodemask.alloc_pages_vma.do_huge_pmd_anonymous_page.__handle_mm_fault 3.38 -0.2 3.19 perf-profile.calltrace.cycles-pp.__alloc_pages_nodemask.alloc_pages_vma.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault 3.38 -0.2 3.20 perf-profile.calltrace.cycles-pp.alloc_pages_vma.do_huge_pmd_anonymous_page.__handle_mm_fault.handle_mm_fault.do_page_fault 0.87 ± 2% -0.1 0.79 ± 4% perf-profile.calltrace.cycles-pp.rcu_all_qs._cond_resched.clear_huge_page.do_huge_pmd_anonymous_page.__handle_mm_fault 2.49 ± 4% +0.5 3.03 ± 4% perf-profile.calltrace.cycles-pp.munmap 2.48 ± 4% +0.5 3.02 ± 4% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap 2.48 ± 4% +0.5 3.02 ± 4% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap 2.48 ± 4% +0.5 3.03 ± 4% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.munmap 2.48 ± 4% +0.5 3.03 ± 4% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.munmap 2.44 ± 4% +0.5 2.99 ± 4% perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.42 ± 4% +0.6 2.97 ± 4% perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 2.37 ± 4% +0.6 2.93 ± 4% perf-profile.calltrace.cycles-pp.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 2.37 ± 4% +0.6 2.93 ± 4% perf-profile.calltrace.cycles-pp.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap.__vm_munmap 2.32 ± 4% +0.6 2.88 ± 4% perf-profile.calltrace.cycles-pp.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region.__do_munmap 2.19 ± 4% +0.6 2.77 ± 4% perf-profile.calltrace.cycles-pp.__free_pages_ok.release_pages.tlb_flush_mmu.tlb_finish_mmu.unmap_region 0.00 +2.0 2.04 ± 2% perf-profile.calltrace.cycles-pp.clear_zero_page_flag.__free_pages_ok.release_pages.tlb_flush_mmu.tlb_finish_mmu 87.03 -27.5 59.52 perf-profile.children.cycles-pp.clear_page_erms 95.19 -0.6 94.59 perf-profile.children.cycles-pp.page_fault 95.14 -0.6 94.55 perf-profile.children.cycles-pp.do_page_fault 94.97 -0.6 94.38 perf-profile.children.cycles-pp.__handle_mm_fault 94.88 -0.6 94.29 perf-profile.children.cycles-pp.do_huge_pmd_anonymous_page 94.99 -0.6 94.41 perf-profile.children.cycles-pp.handle_mm_fault 90.97 -0.3 90.63 perf-profile.children.cycles-pp.clear_huge_page 88.69 -0.3 88.41 perf-profile.children.cycles-pp.clear_subpage 3.60 -0.2 3.40 perf-profile.children.cycles-pp.__alloc_pages_nodemask 3.58 -0.2 3.38 perf-profile.children.cycles-pp.get_page_from_freelist 3.40 -0.2 3.21 perf-profile.children.cycles-pp.alloc_pages_vma 0.71 ± 14% -0.2 0.53 ± 5% perf-profile.children.cycles-pp.apic_timer_interrupt 0.97 ± 3% -0.1 0.87 ± 3% perf-profile.children.cycles-pp._cond_resched 0.89 ± 2% -0.1 0.80 ± 3% perf-profile.children.cycles-pp.rcu_all_qs 0.65 ± 2% -0.0 0.60 ± 2% perf-profile.children.cycles-pp.prep_new_page 0.25 ± 9% -0.0 0.21 ± 7% perf-profile.children.cycles-pp.__hrtimer_run_queues 0.50 -0.0 0.47 ± 2% perf-profile.children.cycles-pp.prep_compound_page 2.49 ± 4% +0.5 3.03 ± 4% perf-profile.children.cycles-pp.munmap 2.48 ± 4% +0.5 3.03 ± 4% perf-profile.children.cycles-pp.__vm_munmap 2.48 ± 4% +0.5 3.03 ± 4% perf-profile.children.cycles-pp.__x64_sys_munmap 2.38 ± 4% +0.5 2.93 ± 4% perf-profile.children.cycles-pp.tlb_flush_mmu 2.45 ± 4% +0.6 3.00 ± 4% perf-profile.children.cycles-pp.__do_munmap 2.42 ± 4% +0.6 2.98 ± 4% perf-profile.children.cycles-pp.unmap_region 2.38 ± 4% +0.6 2.94 ± 4% perf-profile.children.cycles-pp.tlb_finish_mmu 2.33 ± 4% +0.6 2.90 ± 4% perf-profile.children.cycles-pp.release_pages 2.78 ± 3% +0.6 3.35 ± 3% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 2.78 ± 3% +0.6 3.35 ± 3% perf-profile.children.cycles-pp.do_syscall_64 2.20 ± 4% +0.6 2.77 ± 3% perf-profile.children.cycles-pp.__free_pages_ok 0.00 +2.0 2.04 ± 3% perf-profile.children.cycles-pp.clear_zero_page_flag 86.48 -27.2 59.24 perf-profile.self.cycles-pp.clear_page_erms 2.09 ± 4% -1.5 0.63 ± 7% perf-profile.self.cycles-pp.__free_pages_ok 0.58 ± 2% -0.1 0.43 ± 7% perf-profile.self.cycles-pp.rcu_all_qs 0.50 ± 2% -0.0 0.47 ± 3% perf-profile.self.cycles-pp.prep_compound_page 0.37 ± 5% +0.0 0.41 ± 2% perf-profile.self.cycles-pp._cond_resched 0.00 +2.0 2.03 ± 3% perf-profile.self.cycles-pp.clear_zero_page_flag 1.87 ± 2% +27.0 28.87 perf-profile.self.cycles-pp.clear_subpage will-it-scale.per_thread_ops 538000 +------------------------------------------------------------------+ | ++ + .++++.++++ ++ + ++ ++: +| 536000 |-+ ++ : +++.++ +.+ | 534000 |-+ +.++++.+ + | | | 532000 |-+ | | | 530000 |-+ | | | 528000 |-+ | 526000 |-+ | | O OO O OO | 524000 |O+ OO OOO OO OOOO OOOO | | | 522000 +------------------------------------------------------------------+ will-it-scale.workload 4.32e+06 +----------------------------------------------------------------+ | | 4.3e+06 |++.+++++.+ ++. +++. ++++.++++.+++++.+++++.++ | | : : ++ + : .++ +.++| 4.28e+06 |-+ ++ :++.+++++ ++ | | + | 4.26e+06 |-+ | | | 4.24e+06 |-+ | | | 4.22e+06 |-+ | | | 4.2e+06 |-O OOOOO OOOOO OOOO OOO | |O O O O | 4.18e+06 +----------------------------------------------------------------+ will-it-scale.time.minor_page_faults 2.54e+06 +----------------------------------------------------------------+ |+ .+ + +. + +. | 2.53e+06 |-+ ++++.+ ++.+++++. + + ++++.++ + +++++.++ | 2.52e+06 |-+ : : + : ++.++ +.++| | ++ +++.+++ ++ | 2.51e+06 |-+ | | | 2.5e+06 |-+ | | | 2.49e+06 |-+ | 2.48e+06 |-+ | | | 2.47e+06 |-O OOOOO OOOOO OOOO OOO | |O O O O | 2.46e+06 +----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen