Greeting, FYI, we noticed a 15.9% improvement of will-it-scale.per_process_ops due to commit: commit: c77c0a8ac4c522638a8242fcb9de9496e3cdbb2d ("mm/hugetlb: defer freeing of huge pages if in non-task context") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: will-it-scale on test machine: 192 threads Intel(R) Xeon(R) Platinum 9242 CPU @ 2.30GHz with 192G memory with following parameters: nr_task: 50% mode: process test: page_fault3 cpufreq_governor: performance ucode: 0x500002c test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-7/performance/x86_64-rhel-7.6/process/50%/debian-x86_64-2019-11-14.cgz/lkp-csl-2ap3/page_fault3/will-it-scale/0x500002c commit: a7c46c0c0e ("mm/gup: fix memory leak in __gup_benchmark_ioctl") c77c0a8ac4 ("mm/hugetlb: defer freeing of huge pages if in non-task context") a7c46c0c0e3d62f2 c77c0a8ac4c522638a8242fcb9d ---------------- --------------------------- fail:runs %reproduction fail:runs | | | :4 25% 1:4 dmesg.WARNING:at_ip___perf_sw_event/0x 21:4 82% 25:4 perf-profile.calltrace.cycles-pp.sync_regs.error_entry 24:4 92% 28:4 perf-profile.calltrace.cycles-pp.error_entry 0:4 1% 0:4 perf-profile.children.cycles-pp.error_exit 25:4 97% 29:4 perf-profile.children.cycles-pp.error_entry 0:4 1% 0:4 perf-profile.self.cycles-pp.error_exit 2:4 11% 3:4 perf-profile.self.cycles-pp.error_entry %stddev %change %stddev \ | \ 5.86 ± 12% -2.9 2.97 ± 10% perf-profile.calltrace.cycles-pp.__count_memcg_events.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 2.00 ± 12% -1.1 0.95 ± 12% perf-profile.calltrace.cycles-pp.lock_page_memcg.page_add_file_rmap.alloc_set_pte.finish_fault.handle_pte_fault 0.83 ± 12% +0.2 1.01 ± 9% perf-profile.calltrace.cycles-pp.file_update_time.fault_dirty_shared_page.handle_pte_fault.__handle_mm_fault.handle_mm_fault 0.40 ± 57% +0.2 0.62 ± 10% perf-profile.calltrace.cycles-pp.current_time.file_update_time.fault_dirty_shared_page.handle_pte_fault.__handle_mm_fault 14.61 ± 11% -4.1 10.54 ± 10% perf-profile.children.cycles-pp.native_irq_return_iret 5.86 ± 12% -2.9 2.97 ± 10% perf-profile.children.cycles-pp.__count_memcg_events 2.54 ± 12% -1.2 1.29 ± 12% perf-profile.children.cycles-pp.lock_page_memcg 0.50 ± 12% +0.1 0.62 ± 10% perf-profile.children.cycles-pp.current_time 0.83 ± 12% +0.2 1.01 ± 9% perf-profile.children.cycles-pp.file_update_time 14.60 ± 11% -4.1 10.54 ± 10% perf-profile.self.cycles-pp.native_irq_return_iret 5.83 ± 12% -2.9 2.95 ± 10% perf-profile.self.cycles-pp.__count_memcg_events 2.50 ± 12% -1.2 1.25 ± 12% perf-profile.self.cycles-pp.lock_page_memcg 0.23 ± 13% -0.1 0.16 ± 9% perf-profile.self.cycles-pp.__unlock_page_memcg 824554 +15.9% 955434 will-it-scale.per_process_ops 79157305 +15.9% 91721706 will-it-scale.workload 41420 ± 95% -80.4% 8122 ± 58% numa-meminfo.node3.AnonHugePages 308.61 +4.0% 321.03 turbostat.PkgWatt 51328483 +14.5% 58776435 proc-vmstat.numa_hit 51272762 +14.5% 58720733 proc-vmstat.numa_local 51446016 +14.5% 58921941 proc-vmstat.pgalloc_normal 2.381e+10 +15.8% 2.758e+10 proc-vmstat.pgfault 50812338 ± 2% +13.0% 57413676 ± 2% proc-vmstat.pgfree 7179986 +15.6% 8300189 numa-vmstat.node0.numa_hit 7170547 +15.7% 8295341 numa-vmstat.node0.numa_local 7267631 +10.9% 8059572 numa-vmstat.node1.numa_hit 7166107 +11.1% 7958194 numa-vmstat.node1.numa_local 7161204 +11.8% 8007798 numa-vmstat.node2.numa_hit 7056803 +12.0% 7901667 numa-vmstat.node2.numa_local 12704221 +17.8% 14964315 numa-numastat.node0.local_node 12713482 +17.7% 14968938 numa-numastat.node0.numa_hit 12946189 +13.5% 14695756 numa-numastat.node1.local_node 12960130 +13.5% 14709699 numa-numastat.node1.numa_hit 12816622 +13.3% 14527218 numa-numastat.node2.local_node 12833628 +13.3% 14545778 numa-numastat.node2.numa_hit 12814554 +13.7% 14572722 numa-numastat.node3.local_node 12830035 +13.7% 14591272 numa-numastat.node3.numa_hit 9311 ± 88% -62.1% 3529 ± 12% softirqs.CPU116.SCHED 20911 ± 80% -83.1% 3531 ± 14% softirqs.CPU117.SCHED 9130 ± 96% -61.6% 3503 ± 14% softirqs.CPU118.SCHED 21250 ± 79% -82.4% 3729 ± 7% softirqs.CPU131.SCHED 119649 ± 24% -24.0% 90953 softirqs.CPU131.TIMER 12060 ±113% -69.7% 3651 ± 14% softirqs.CPU153.SCHED 12095 ±112% -70.6% 3552 ± 14% softirqs.CPU159.SCHED 20918 ± 79% -83.1% 3532 ± 13% softirqs.CPU169.SCHED 21337 ± 81% -83.0% 3634 ± 15% softirqs.CPU180.SCHED 12102 ±113% -70.4% 3577 ± 13% softirqs.CPU185.SCHED 41.50 ±121% -95.8% 1.75 ± 74% interrupts.CPU115.RES:Rescheduling_interrupts 5306 ± 41% -39.9% 3191 ± 47% interrupts.CPU12.NMI:Non-maskable_interrupts 5306 ± 41% -39.9% 3191 ± 47% interrupts.CPU12.PMI:Performance_monitoring_interrupts 7979 ± 15% -29.6% 5614 ± 34% interrupts.CPU126.NMI:Non-maskable_interrupts 7979 ± 15% -29.6% 5614 ± 34% interrupts.CPU126.PMI:Performance_monitoring_interrupts 5197 ± 39% +68.2% 8741 interrupts.CPU138.NMI:Non-maskable_interrupts 5197 ± 39% +68.2% 8741 interrupts.CPU138.PMI:Performance_monitoring_interrupts 4289 ± 5% -19.2% 3466 ± 14% interrupts.CPU144.CAL:Function_call_interrupts 5154 ± 40% +54.6% 7969 ± 16% interrupts.CPU150.NMI:Non-maskable_interrupts 5154 ± 40% +54.6% 7969 ± 16% interrupts.CPU150.PMI:Performance_monitoring_interrupts 4269 ± 5% -14.8% 3635 ± 9% interrupts.CPU156.CAL:Function_call_interrupts 4478 ± 14% -17.9% 3677 ± 7% interrupts.CPU49.CAL:Function_call_interrupts 3413 ± 6% +23.8% 4225 ± 4% interrupts.CPU5.CAL:Function_call_interrupts 4195 ± 5% -13.4% 3632 ± 4% interrupts.CPU50.CAL:Function_call_interrupts 2955 ± 48% +50.8% 4458 ± 16% interrupts.CPU75.CAL:Function_call_interrupts 50672 ±152% -92.8% 3670 ± 30% interrupts.RES:Rescheduling_interrupts 2.25 ± 3% -16.4% 1.88 perf-stat.i.MPKI 4.231e+10 +16.7% 4.937e+10 perf-stat.i.branch-instructions 1.021e+08 +16.7% 1.191e+08 perf-stat.i.branch-misses 41.81 +0.9 42.67 perf-stat.i.cache-miss-rate% 1.859e+08 +3.1% 1.918e+08 perf-stat.i.cache-misses 4.423e+08 +1.4% 4.485e+08 perf-stat.i.cache-references 1.45 -14.9% 1.23 perf-stat.i.cpi 1587 -2.2% 1553 perf-stat.i.cycles-between-cache-misses 0.00 ± 6% -0.0 0.00 ± 7% perf-stat.i.dTLB-load-miss-rate% 5.921e+10 ± 2% +13.4% 6.713e+10 ± 3% perf-stat.i.dTLB-loads 2.33e+09 ± 2% +18.6% 2.763e+09 perf-stat.i.dTLB-store-misses 3.018e+10 ± 2% +18.2% 3.568e+10 perf-stat.i.dTLB-stores 2.075e+11 +16.7% 2.421e+11 perf-stat.i.instructions 0.70 +16.3% 0.81 perf-stat.i.ipc 78135721 +16.8% 91283662 perf-stat.i.minor-faults 33.70 ± 4% -16.9 16.76 ± 3% perf-stat.i.node-load-miss-rate% 2303552 ± 2% -52.6% 1090928 ± 3% perf-stat.i.node-load-misses 4735768 ± 4% +17.7% 5575862 ± 3% perf-stat.i.node-loads 7.74 ± 7% -0.7 7.08 perf-stat.i.node-store-miss-rate% 5836219 ± 3% +24.5% 7264951 ± 3% perf-stat.i.node-store-misses 77683656 ± 3% +25.5% 97502265 ± 4% perf-stat.i.node-stores 78135605 +16.8% 91283300 perf-stat.i.page-faults 2.13 -13.1% 1.85 perf-stat.overall.MPKI 42.04 +0.7 42.75 perf-stat.overall.cache-miss-rate% 1.42 -13.6% 1.23 perf-stat.overall.cpi 1588 -2.3% 1552 perf-stat.overall.cycles-between-cache-misses 0.70 +15.7% 0.81 perf-stat.overall.ipc 32.75 ± 2% -16.4 16.37 ± 3% perf-stat.overall.node-load-miss-rate% 4.216e+10 +16.7% 4.919e+10 perf-stat.ps.branch-instructions 1.017e+08 +16.7% 1.187e+08 perf-stat.ps.branch-misses 1.853e+08 +3.1% 1.911e+08 perf-stat.ps.cache-misses 4.407e+08 +1.4% 4.469e+08 perf-stat.ps.cache-references 5.9e+10 ± 2% +13.4% 6.69e+10 ± 3% perf-stat.ps.dTLB-loads 2.321e+09 ± 2% +18.6% 2.753e+09 perf-stat.ps.dTLB-store-misses 3.008e+10 ± 2% +18.2% 3.556e+10 perf-stat.ps.dTLB-stores 2.068e+11 +16.7% 2.412e+11 perf-stat.ps.instructions 77856103 +16.8% 90958076 perf-stat.ps.minor-faults 2295616 ± 2% -52.6% 1087383 ± 3% perf-stat.ps.node-load-misses 4718877 ± 4% +17.7% 5556031 ± 3% perf-stat.ps.node-loads 5815493 ± 3% +24.5% 7239177 ± 3% perf-stat.ps.node-store-misses 77405910 ± 3% +25.5% 97154591 ± 4% perf-stat.ps.node-stores 77855962 +16.8% 90957705 perf-stat.ps.page-faults 6.318e+13 +15.6% 7.304e+13 perf-stat.total.instructions will-it-scale.per_process_ops 960000 O-O-O--O-O---O--O-O-O-O--O-O---O-----------------------------------+ | O O | 940000 +-+ | 920000 +-+ | | | 900000 +-+ | | | 880000 +-+ | | | 860000 +-+ | 840000 +-+.+.. .+.. .+. .+. .+. .+. | | + +.+.+ +.+.+.+. + +..+ + +..+.+.+. .+. | 820000 +-+ +..+.+ +..+.+.| | | 800000 +-+----------------------------------------------------------------+ will-it-scale.workload 9.4e+07 +-+---------------------------------------------------------------+ | | 9.2e+07 O-O O O O O O O O O O O O O O | 9e+07 +-+ | | | 8.8e+07 +-+ | | | 8.6e+07 +-+ | | | 8.4e+07 +-+ | 8.2e+07 +-+ | | | 8e+07 +-+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+..+.+.+.+.+.. | | +.+.+.+.+..+.+.| 7.8e+07 +-+---------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen