Greeting, FYI, we noticed a 3.8% improvement of will-it-scale.per_process_ops due to commit: commit: 180c117d46be304a08b14fb080010773faf50788 ("[PATCH] maple_tree: Remove GFP_ZERO from kmem_cache_alloc() and kmem_cache_alloc_bulk()") url: https://github.com/intel-lab-lkp/linux/commits/Liam-Howlett/maple_tree-Remove-GFP_ZERO-from-kmem_cache_alloc-and-kmem_cache_alloc_bulk/20230106-000849 base: https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git 41c03ba9beea760bd2d2ac9250b09a2e192da2dc patch link: https://lore.kernel.org/all/20230105160427.2988454-1-Liam.Howlett@oracle.com/ patch subject: [PATCH] maple_tree: Remove GFP_ZERO from kmem_cache_alloc() and kmem_cache_alloc_bulk() in testcase: will-it-scale on test machine: 104 threads 2 sockets (Skylake) with 192G memory with following parameters: nr_task: 16 mode: process test: mmap1 cpufreq_governor: performance test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-11/performance/x86_64-rhel-8.3/process/16/debian-11.1-x86_64-20220510.cgz/lkp-skl-fpga01/mmap1/will-it-scale commit: 41c03ba9be ("Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost") 180c117d46 ("maple_tree: Remove GFP_ZERO from kmem_cache_alloc() and kmem_cache_alloc_bulk()") 41c03ba9beea760b 180c117d46be304a08b14fb0800 ---------------- --------------------------- %stddev %change %stddev \ | \ 2604393 +3.8% 2702107 will-it-scale.16.processes 162774 +3.8% 168881 will-it-scale.per_process_ops 2604393 +3.8% 2702107 will-it-scale.workload 1.254e+10 +2.6% 1.286e+10 perf-stat.i.branch-instructions 1.251e+10 +2.2% 1.279e+10 perf-stat.i.dTLB-loads 6.283e+09 +1.3% 6.367e+09 perf-stat.i.dTLB-stores 5.838e+10 +2.5% 5.987e+10 perf-stat.i.instructions 301.29 +2.2% 307.85 perf-stat.i.metric.M/sec 0.81 -2.1% 0.79 perf-stat.overall.cpi 6756492 -1.0% 6686622 perf-stat.overall.path-length 1.25e+10 +2.6% 1.282e+10 perf-stat.ps.branch-instructions 1.247e+10 +2.2% 1.275e+10 perf-stat.ps.dTLB-loads 6.263e+09 +1.3% 6.346e+09 perf-stat.ps.dTLB-stores 5.819e+10 +2.5% 5.967e+10 perf-stat.ps.instructions 1.76e+13 +2.7% 1.807e+13 perf-stat.total.instructions 2.31 ± 10% -1.0 1.29 ± 7% perf-profile.calltrace.cycles-pp.mas_preallocate.do_mas_align_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 2.24 ± 10% -1.0 1.23 ± 7% perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.do_mas_align_munmap.__vm_munmap.__x64_sys_munmap 2.28 ± 11% -1.0 1.27 ± 8% perf-profile.calltrace.cycles-pp.mas_preallocate.mmap_region.do_mmap.vm_mmap_pgoff.do_syscall_64 2.24 ± 10% -1.0 1.23 ± 8% perf-profile.calltrace.cycles-pp.mas_alloc_nodes.mas_preallocate.mmap_region.do_mmap.vm_mmap_pgoff 1.52 ± 11% -0.8 0.67 ± 8% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_bulk.mas_alloc_nodes.mas_preallocate.mmap_region.do_mmap 1.49 ± 11% -0.8 0.66 ± 7% perf-profile.calltrace.cycles-pp.kmem_cache_alloc_bulk.mas_alloc_nodes.mas_preallocate.do_mas_align_munmap.__vm_munmap 4.94 ± 10% -2.1 2.83 ± 8% perf-profile.children.cycles-pp.mas_alloc_nodes 4.61 ± 10% -2.0 2.57 ± 8% perf-profile.children.cycles-pp.mas_preallocate 1.96 ± 10% -1.8 0.21 ± 7% perf-profile.children.cycles-pp.memset_erms 3.09 ± 10% -1.7 1.36 ± 7% perf-profile.children.cycles-pp.kmem_cache_alloc_bulk 0.09 ± 13% -0.1 0.03 ±100% perf-profile.children.cycles-pp.vm_area_free 0.07 ± 10% +0.0 0.09 ± 11% perf-profile.children.cycles-pp.mas_wr_modify 0.22 ± 10% +0.1 0.28 ± 7% perf-profile.children.cycles-pp.__might_sleep 0.00 +0.2 0.22 ± 10% perf-profile.children.cycles-pp.mas_pop_node 1.89 ± 11% -1.7 0.21 ± 8% perf-profile.self.cycles-pp.memset_erms 0.72 ± 11% -0.3 0.37 ± 7% perf-profile.self.cycles-pp.kmem_cache_alloc_bulk 0.09 ± 13% -0.1 0.03 ±100% perf-profile.self.cycles-pp.vm_area_free 0.11 ± 7% +0.1 0.17 ± 11% perf-profile.self.cycles-pp.do_mas_munmap 0.00 +0.2 0.21 ± 9% perf-profile.self.cycles-pp.mas_pop_node Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://github.com/intel/lkp-tests