Greeting, FYI, we noticed a -4.0% regression of will-it-scale.per_process_ops due to commit: commit: fd978bf7fd312581a7ca454a991f0ffb34c4204b ("bpf: Add reference tracking to verifier") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master in testcase: will-it-scale on test machine: 80 threads Skylake with 64G memory with following parameters: nr_task: 100% mode: process test: mmap1 cpufreq_governor: performance test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale In addition to that, the commit also has significant impact on the following tests: +------------------+---------------------------------------------------------------+ | testcase: change | will-it-scale: will-it-scale.per_process_ops -3.8% regression | | test machine | 80 threads Skylake with 64G memory | | test parameters | cpufreq_governor=performance | | | mode=process | | | nr_task=100% | | | test=mmap2 | +------------------+---------------------------------------------------------------+ Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-7/performance/x86_64-rhel-7.2/process/100%/debian-x86_64-2018-04-03.cgz/lkp-skl-2sp2/mmap1/will-it-scale commit: 84dbf35073 ("bpf: Macrofy stack state copy") fd978bf7fd ("bpf: Add reference tracking to verifier") 84dbf3507349696b fd978bf7fd312581a7ca454a99 ---------------- -------------------------- %stddev %change %stddev \ | \ 16811 -4.0% 16140 will-it-scale.per_process_ops 1344946 -4.0% 1291230 will-it-scale.workload 107.75 ± 38% +252.4% 379.75 ± 93% cpuidle.POLL.usage 121.70 ± 18% +18.9% 144.70 ± 4% sched_debug.cfs_rq:/.exec_clock.stddev 4933 +2.0% 5031 proc-vmstat.nr_inactive_anon 4933 +2.0% 5031 proc-vmstat.nr_zone_inactive_anon 9874 +9.0% 10765 ± 7% slabinfo.proc_inode_cache.active_objs 9874 +9.0% 10765 ± 7% slabinfo.proc_inode_cache.num_objs 12248 ± 50% +52.2% 18640 ± 2% numa-meminfo.node0.Inactive 33943 ± 8% +16.2% 39453 ± 6% numa-meminfo.node0.SReclaimable 91549 ± 9% -9.9% 82530 ± 7% numa-meminfo.node1.Slab 18091 ± 15% +29.2% 23382 ± 17% numa-vmstat.node0 3027 ± 52% +52.6% 4620 ± 3% numa-vmstat.node0.nr_inactive_anon 8485 ± 8% +16.2% 9862 ± 6% numa-vmstat.node0.nr_slab_reclaimable 3027 ± 52% +52.6% 4620 ± 3% numa-vmstat.node0.nr_zone_inactive_anon 1.4e+12 -2.5% 1.364e+12 perf-stat.branch-instructions 41.42 +0.7 42.15 perf-stat.cache-miss-rate% 2.166e+10 -2.1% 2.12e+10 perf-stat.cache-references 12.16 +2.7% 12.49 perf-stat.cpi 1.741e+12 -2.8% 1.692e+12 perf-stat.dTLB-loads 0.00 ± 3% +0.0 0.00 ± 9% perf-stat.dTLB-store-miss-rate% 5.713e+11 -3.9% 5.49e+11 perf-stat.dTLB-stores 6.103e+12 -2.6% 5.943e+12 perf-stat.instructions 0.08 -2.6% 0.08 perf-stat.ipc 1.954e+09 -1.8% 1.919e+09 perf-stat.node-load-misses 4538060 +1.4% 4602862 perf-stat.path-length 49.62 -0.5 49.14 perf-profile.calltrace.cycles-pp.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 47.64 -0.5 47.17 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.percpu_counter_add_batch.do_munmap.vm_munmap.__x64_sys_munmap 47.49 -0.5 47.02 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.percpu_counter_add_batch.do_munmap.vm_munmap 49.99 -0.5 49.53 perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 49.96 -0.5 49.51 perf-profile.calltrace.cycles-pp.vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 48.02 -0.4 47.58 perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64 1.41 -0.0 1.37 perf-profile.calltrace.cycles-pp.unmap_region.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64 47.73 +0.4 48.11 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.percpu_counter_add_batch.__vm_enough_memory.mmap_region 47.85 +0.4 48.25 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.percpu_counter_add_batch.__vm_enough_memory.mmap_region.do_mmap 48.28 +0.4 48.68 perf-profile.calltrace.cycles-pp.__vm_enough_memory.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff 48.23 +0.4 48.63 perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.__vm_enough_memory.mmap_region.do_mmap.vm_mmap_pgoff 48.96 +0.4 49.41 perf-profile.calltrace.cycles-pp.mmap_region.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64 49.11 +0.5 49.56 perf-profile.calltrace.cycles-pp.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe 49.24 +0.5 49.70 perf-profile.calltrace.cycles-pp.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe 49.25 +0.5 49.72 perf-profile.calltrace.cycles-pp.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe 49.62 -0.5 49.15 perf-profile.children.cycles-pp.do_munmap 49.99 -0.5 49.53 perf-profile.children.cycles-pp.__x64_sys_munmap 49.97 -0.5 49.51 perf-profile.children.cycles-pp.vm_munmap 0.51 ± 2% -0.0 0.46 perf-profile.children.cycles-pp.___might_sleep 1.16 -0.0 1.11 perf-profile.children.cycles-pp.unmap_vmas 1.15 -0.0 1.10 perf-profile.children.cycles-pp.unmap_page_range 1.41 -0.0 1.37 perf-profile.children.cycles-pp.unmap_region 0.32 ± 2% +0.0 0.34 ± 2% perf-profile.children.cycles-pp.up_write 0.32 ± 2% +0.0 0.34 perf-profile.children.cycles-pp.vm_area_alloc 0.29 +0.0 0.32 perf-profile.children.cycles-pp.kmem_cache_alloc 48.28 +0.4 48.68 perf-profile.children.cycles-pp.__vm_enough_memory 48.96 +0.4 49.41 perf-profile.children.cycles-pp.mmap_region 49.11 +0.5 49.56 perf-profile.children.cycles-pp.do_mmap 49.25 +0.5 49.71 perf-profile.children.cycles-pp.vm_mmap_pgoff 49.25 +0.5 49.72 perf-profile.children.cycles-pp.ksys_mmap_pgoff 0.47 ± 3% -0.0 0.43 perf-profile.self.cycles-pp.___might_sleep 0.32 ± 3% +0.0 0.34 ± 2% perf-profile.self.cycles-pp.up_write 0.27 +0.0 0.30 perf-profile.self.cycles-pp.kmem_cache_alloc 0.49 +0.0 0.53 perf-profile.self.cycles-pp.percpu_counter_add_batch will-it-scale.per_process_ops 18000 +-+-----------------------------------------------------------------+ | | 17500 +-+ +.+ | |+.+++ : +.++++.+++ ++++.++++.++ | | :++. + : : : : | 17000 +-+ + + ++.++: : ++.+++ : ++.+ ++. +. | | + +.+ + + +++ +| 16500 +-+ | | O OOOO OOOO O O | 16000 +-+ O O O O O | | | O O OOO O | 15500 +O+OOO O | | | 15000 +-+-----------------------------------------------------------------+ will-it-scale.workload 1.42e+06 +-+--------------------------------------------------------------+ 1.4e+06 +-+ ++ | |++.++ : ++. +++.+ | 1.38e+06 +-+ : +.+++ ++ ++++.++ : | 1.36e+06 +-+ +.+++++. : : : :+ | | ++++ ++.+++++.+ + ++.+++++.++| 1.34e+06 +-+ | 1.32e+06 +-+ | 1.3e+06 +-+ O | | OO OO OO OOOOO OOO | 1.28e+06 +-+ O | 1.26e+06 +-+ O | O O O OO | 1.24e+06 +OO OO O | 1.22e+06 +-+--------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample *************************************************************************************************** lkp-skl-2sp2: 80 threads Skylake with 64G memory ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase: gcc-7/performance/x86_64-rhel-7.2/process/100%/debian-x86_64-2018-04-03.cgz/lkp-skl-2sp2/mmap2/will-it-scale commit: 84dbf35073 ("bpf: Macrofy stack state copy") fd978bf7fd ("bpf: Add reference tracking to verifier") 84dbf3507349696b fd978bf7fd312581a7ca454a99 ---------------- -------------------------- %stddev %change %stddev \ | \ 16832 -3.8% 16186 will-it-scale.per_process_ops 1346634 -3.8% 1294984 will-it-scale.workload 390809 ± 21% +51.6% 592424 ± 27% cpuidle.C1.time 6897 +2.7% 7085 proc-vmstat.nr_mapped 936.00 ± 7% +15.6% 1082 ± 5% slabinfo.Acpi-ParseExt.active_objs 936.00 ± 7% +15.6% 1082 ± 5% slabinfo.Acpi-ParseExt.num_objs 968.00 ± 9% +27.5% 1233 ± 16% slabinfo.pool_workqueue.active_objs 968.00 ± 9% +29.7% 1255 ± 16% slabinfo.pool_workqueue.num_objs 8430 -14.1% 7244 ± 2% numa-meminfo.node0.KernelStack 4283 ± 14% -22.4% 3325 ± 10% numa-meminfo.node0.PageTables 73929 ± 3% -10.6% 66061 ± 6% numa-meminfo.node0.SUnreclaim 5569 ± 2% +21.0% 6738 ± 3% numa-meminfo.node1.KernelStack 55085 ± 5% +17.5% 64739 ± 5% numa-meminfo.node1.SUnreclaim 81155 ± 6% +16.2% 94292 ± 7% numa-meminfo.node1.Slab 230.00 -100.0% 0.00 numa-vmstat.node0.nr_active_file 100.25 ± 3% -88.8% 11.25 ±173% numa-vmstat.node0.nr_inactive_file 8431 -14.1% 7245 ± 2% numa-vmstat.node0.nr_kernel_stack 1071 ± 14% -22.4% 831.25 ± 10% numa-vmstat.node0.nr_page_table_pages 18482 ± 3% -10.6% 16515 ± 6% numa-vmstat.node0.nr_slab_unreclaimable 230.00 -100.0% 0.00 numa-vmstat.node0.nr_zone_active_file 100.25 ± 3% -88.8% 11.25 ±173% numa-vmstat.node0.nr_zone_inactive_file 5569 ± 2% +21.0% 6738 ± 3% numa-vmstat.node1.nr_kernel_stack 2778 ± 3% +28.4% 3567 ± 16% numa-vmstat.node1.nr_mapped 13771 ± 5% +17.5% 16184 ± 5% numa-vmstat.node1.nr_slab_unreclaimable 1.506e+12 -2.5% 1.468e+12 perf-stat.branch-instructions 41.41 +0.8 42.20 perf-stat.cache-miss-rate% 2.165e+10 -1.7% 2.129e+10 perf-stat.cache-references 11.25 +2.8% 11.57 perf-stat.cpi 1.891e+12 -2.8% 1.838e+12 perf-stat.dTLB-loads 6.543e+11 -3.7% 6.3e+11 perf-stat.dTLB-stores 6.591e+12 -2.6% 6.419e+12 perf-stat.instructions 0.09 -2.7% 0.09 perf-stat.ipc 1.967e+09 -1.3% 1.941e+09 perf-stat.node-load-misses 4894750 +1.3% 4956596 perf-stat.path-length 40.37 ± 12% -16.2% 33.81 ± 7% sched_debug.cfs_rq:/.load_avg.stddev 0.05 ± 2% +18.7% 0.06 ± 3% sched_debug.cfs_rq:/.nr_running.stddev 6.37 ± 40% -50.2% 3.17 ± 32% sched_debug.cfs_rq:/.removed.load_avg.avg 31.64 ± 18% -28.5% 22.63 ± 16% sched_debug.cfs_rq:/.removed.load_avg.stddev 293.89 ± 40% -50.1% 146.61 ± 32% sched_debug.cfs_rq:/.removed.runnable_sum.avg 1459 ± 18% -28.3% 1045 ± 16% sched_debug.cfs_rq:/.removed.runnable_sum.stddev 2.46 ± 43% -60.9% 0.96 ± 66% sched_debug.cfs_rq:/.removed.util_avg.avg 12.42 ± 26% -46.5% 6.64 ± 59% sched_debug.cfs_rq:/.removed.util_avg.stddev 385.92 ± 6% +12.8% 435.46 ± 2% sched_debug.cpu.nr_switches.min -14.21 -31.4% -9.75 sched_debug.cpu.nr_uninterruptible.min 47.54 -0.2 47.31 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock_irqsave.percpu_counter_add_batch.do_munmap.vm_munmap 47.67 -0.2 47.45 perf-profile.calltrace.cycles-pp._raw_spin_lock_irqsave.percpu_counter_add_batch.do_munmap.vm_munmap.__x64_sys_munmap 48.04 -0.2 47.86 perf-profile.calltrace.cycles-pp.percpu_counter_add_batch.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64 99.36 -0.0 99.34 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 1.47 +0.0 1.51 perf-profile.calltrace.cycles-pp.unmap_region.do_munmap.vm_munmap.__x64_sys_munmap.do_syscall_64 94.77 -0.3 94.52 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 95.04 -0.2 94.81 perf-profile.children.cycles-pp._raw_spin_lock_irqsave 95.77 -0.2 95.60 perf-profile.children.cycles-pp.percpu_counter_add_batch 49.72 -0.1 49.58 perf-profile.children.cycles-pp.do_munmap 0.53 ± 2% -0.1 0.47 perf-profile.children.cycles-pp.___might_sleep 0.30 ± 2% +0.0 0.33 perf-profile.children.cycles-pp.perf_event_mmap 0.30 ± 3% +0.0 0.33 ± 2% perf-profile.children.cycles-pp.vm_area_alloc 0.33 ± 2% +0.0 0.36 ± 2% perf-profile.children.cycles-pp.up_write 1.48 +0.0 1.51 perf-profile.children.cycles-pp.unmap_region 94.77 -0.3 94.52 perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath 0.48 ± 2% -0.0 0.44 perf-profile.self.cycles-pp.___might_sleep 0.33 ± 2% +0.0 0.36 ± 2% perf-profile.self.cycles-pp.up_write 0.53 +0.0 0.57 perf-profile.self.cycles-pp.unmap_page_range 0.47 +0.0 0.52 ± 2% perf-profile.self.cycles-pp.percpu_counter_add_batch Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, rongc