Greeting, FYI, we noticed a 3.5% improvement of will-it-scale.per_process_ops due to commit: commit: a10787e6d58c24b51e91c19c6d16c5da89fcaa4b ("bpf: Enable task local storage for tracing programs") https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master in testcase: will-it-scale on test machine: 88 threads Intel(R) Xeon(R) Gold 6238M CPU @ 2.10GHz with 128G memory with following parameters: nr_task: 16 mode: process test: mmap2 cpufreq_governor: performance ucode: 0x5003006 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml bin/lkp run compatible-job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/process/16/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp9/mmap2/will-it-scale/0x5003006 commit: 9c8f21e6f8 ("xsk: Build skb by page (aka generic zerocopy xmit)") a10787e6d5 ("bpf: Enable task local storage for tracing programs") 9c8f21e6f8856a96 a10787e6d58c24b51e91c19c6d1 ---------------- --------------------------- %stddev %change %stddev \ | \ 8990002 +3.5% 9304107 will-it-scale.16.processes 561874 +3.5% 581506 will-it-scale.per_process_ops 8990002 +3.5% 9304107 will-it-scale.workload 112185 ± 23% +46.6% 164508 ± 22% numa-numastat.node0.local_node 63.33 ± 93% -80.8% 12.17 ±130% numa-vmstat.node0.nr_inactive_file 63.33 ± 93% -80.8% 12.17 ±130% numa-vmstat.node0.nr_zone_inactive_file 14212 ± 23% +41.7% 20144 ± 14% softirqs.CPU15.SCHED 30141 ± 13% -22.5% 23370 ± 14% softirqs.CPU59.SCHED 66.17 ± 88% -90.7% 6.17 ± 48% interrupts.CPU60.RES:Rescheduling_interrupts 500.00 +86.1% 930.33 ± 60% interrupts.CPU69.CAL:Function_call_interrupts 396.17 ± 6% -18.8% 321.50 ± 21% interrupts.CPU87.NMI:Non-maskable_interrupts 396.17 ± 6% -18.8% 321.50 ± 21% interrupts.CPU87.PMI:Performance_monitoring_interrupts 5.45 ± 46% -98.5% 0.08 ± 73% perf-sched.wait_and_delay.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown] 176.51 ± 36% -61.2% 68.51 ± 77% perf-sched.wait_and_delay.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork 5.45 ± 46% -98.5% 0.08 ± 73% perf-sched.wait_time.max.ms.exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.[unknown] 176.50 ± 36% -61.2% 68.50 ± 77% perf-sched.wait_time.max.ms.schedule_timeout.rcu_gp_kthread.kthread.ret_from_fork 2.304e+10 +3.4% 2.383e+10 perf-stat.i.branch-instructions 72536156 +4.1% 75492267 perf-stat.i.branch-misses 0.48 -3.3% 0.47 perf-stat.i.cpi 0.00 ± 15% -0.0 0.00 ± 9% perf-stat.i.dTLB-load-miss-rate% 2.404e+10 +3.4% 2.487e+10 perf-stat.i.dTLB-loads 1.096e+10 +3.4% 1.133e+10 perf-stat.i.dTLB-stores 47654226 +12.8% 53744349 perf-stat.i.iTLB-load-misses 9.562e+10 +3.4% 9.889e+10 perf-stat.i.instructions 2015 -8.4% 1847 perf-stat.i.instructions-per-iTLB-miss 2.06 +3.5% 2.14 perf-stat.i.ipc 659.67 +3.4% 682.32 perf-stat.i.metric.M/sec 0.48 -3.4% 0.47 perf-stat.overall.cpi 0.00 ± 18% -0.0 0.00 ± 14% perf-stat.overall.dTLB-load-miss-rate% 2006 -8.3% 1840 perf-stat.overall.instructions-per-iTLB-miss 2.07 +3.5% 2.14 perf-stat.overall.ipc 2.297e+10 +3.4% 2.375e+10 perf-stat.ps.branch-instructions 72285805 +4.1% 75236431 perf-stat.ps.branch-misses 2.396e+10 +3.4% 2.479e+10 perf-stat.ps.dTLB-loads 1.092e+10 +3.4% 1.13e+10 perf-stat.ps.dTLB-stores 47489125 +12.8% 53563329 perf-stat.ps.iTLB-load-misses 9.529e+10 +3.4% 9.856e+10 perf-stat.ps.instructions 2.876e+13 +3.5% 2.976e+13 perf-stat.total.instructions 44.75 -7.7 37.01 ± 11% perf-profile.calltrace.cycles-pp.__munmap 42.13 -7.2 34.95 ± 11% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__munmap 41.64 -7.1 34.53 ± 11% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 41.21 -7.1 34.11 ± 11% perf-profile.calltrace.cycles-pp.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 41.45 -7.1 34.36 ± 11% perf-profile.calltrace.cycles-pp.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe.__munmap 39.74 -6.9 32.83 ± 11% perf-profile.calltrace.cycles-pp.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64.entry_SYSCALL_64_after_hwframe 33.92 -6.2 27.75 ± 11% perf-profile.calltrace.cycles-pp.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 25.32 -5.7 19.64 ± 11% perf-profile.calltrace.cycles-pp.unmap_vmas.unmap_region.__do_munmap.__vm_munmap.__x64_sys_munmap 24.74 -5.7 19.08 ± 11% perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.unmap_region.__do_munmap.__vm_munmap 10.59 -3.7 6.89 ± 11% perf-profile.calltrace.cycles-pp.___might_sleep.unmap_page_range.unmap_vmas.unmap_region.__do_munmap 1.60 -0.5 1.06 ± 32% perf-profile.calltrace.cycles-pp.__entry_text_start.__mmap 2.94 -0.4 2.56 ± 10% perf-profile.calltrace.cycles-pp.d_path.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff 2.85 ± 2% -0.4 2.47 ± 11% perf-profile.calltrace.cycles-pp.get_unmapped_area.do_mmap.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64 0.66 ± 6% -0.4 0.29 ±101% perf-profile.calltrace.cycles-pp.strlen.perf_event_mmap.mmap_region.do_mmap.vm_mmap_pgoff 2.39 ± 3% -0.3 2.10 ± 11% perf-profile.calltrace.cycles-pp.kmem_cache_alloc.vm_area_alloc.mmap_region.do_mmap.vm_mmap_pgoff 1.30 ± 3% -0.2 1.08 ± 11% perf-profile.calltrace.cycles-pp.security_mmap_file.vm_mmap_pgoff.ksys_mmap_pgoff.do_syscall_64.entry_SYSCALL_64_after_hwframe 0.97 ± 2% -0.2 0.78 ± 11% perf-profile.calltrace.cycles-pp.find_vma.__do_munmap.__vm_munmap.__x64_sys_munmap.do_syscall_64 0.67 ± 3% -0.2 0.49 ± 45% perf-profile.calltrace.cycles-pp.touch_atime.shmem_mmap.mmap_region.do_mmap.vm_mmap_pgoff 0.90 ± 5% -0.2 0.73 ± 8% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry 0.78 ± 5% -0.1 0.63 ± 8% perf-profile.calltrace.cycles-pp.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.do_idle 26.40 ± 4% +10.3 36.72 ± 17% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 26.40 ± 4% +10.3 36.72 ± 17% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify 26.40 ± 4% +10.3 36.72 ± 17% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 26.11 ± 5% +10.4 36.49 ± 18% perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify 26.00 ± 5% +10.4 36.40 ± 18% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary 27.39 ± 4% +11.1 38.45 ± 18% perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify 25.93 ± 4% +11.4 37.32 ± 18% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry 67.97 -10.6 57.41 ± 11% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 66.99 -10.4 56.56 ± 11% perf-profile.children.cycles-pp.do_syscall_64 44.75 -7.4 37.31 ± 11% perf-profile.children.cycles-pp.__munmap 41.23 -7.1 34.12 ± 11% perf-profile.children.cycles-pp.__vm_munmap 41.47 -7.1 34.38 ± 11% perf-profile.children.cycles-pp.__x64_sys_munmap 39.79 -6.9 32.88 ± 11% perf-profile.children.cycles-pp.__do_munmap 33.98 -6.2 27.81 ± 11% perf-profile.children.cycles-pp.unmap_region 25.35 -5.7 19.67 ± 11% perf-profile.children.cycles-pp.unmap_vmas 24.73 -5.6 19.12 ± 11% perf-profile.children.cycles-pp.unmap_page_range 11.68 -3.9 7.83 ± 11% perf-profile.children.cycles-pp.___might_sleep 2.98 -0.4 2.59 ± 10% perf-profile.children.cycles-pp.d_path 2.87 ± 2% -0.4 2.49 ± 11% perf-profile.children.cycles-pp.get_unmapped_area 2.49 ± 2% -0.3 2.18 ± 11% perf-profile.children.cycles-pp.kmem_cache_alloc 2.09 -0.3 1.80 ± 11% perf-profile.children.cycles-pp.__entry_text_start 2.31 -0.3 2.02 ± 10% perf-profile.children.cycles-pp.zap_pte_range 1.31 ± 3% -0.2 1.09 ± 11% perf-profile.children.cycles-pp.security_mmap_file 1.24 ± 2% -0.2 1.05 ± 10% perf-profile.children.cycles-pp.down_write 1.00 -0.2 0.81 ± 10% perf-profile.children.cycles-pp.find_vma 0.66 ± 6% -0.1 0.52 ± 15% perf-profile.children.cycles-pp.strlen 0.66 ± 3% -0.1 0.53 ± 12% perf-profile.children.cycles-pp.common_file_perm 0.69 ± 3% -0.1 0.58 ± 10% perf-profile.children.cycles-pp.touch_atime 0.36 ± 4% -0.1 0.29 ± 8% perf-profile.children.cycles-pp.sync_mm_rss 0.40 ± 3% -0.1 0.34 ± 8% perf-profile.children.cycles-pp.downgrade_write 0.19 ± 12% -0.1 0.13 ± 21% perf-profile.children.cycles-pp.cap_capable 0.25 ± 4% -0.1 0.20 ± 10% perf-profile.children.cycles-pp.vmacache_find 0.18 ± 7% -0.0 0.14 ± 10% perf-profile.children.cycles-pp.tlb_flush_mmu 0.19 ± 7% -0.0 0.15 ± 13% perf-profile.children.cycles-pp.lru_add_drain_cpu 0.13 ± 11% -0.0 0.10 ± 15% perf-profile.children.cycles-pp.__libc_start_main 0.13 ± 11% -0.0 0.10 ± 15% perf-profile.children.cycles-pp.main 0.13 ± 11% -0.0 0.10 ± 15% perf-profile.children.cycles-pp.run_builtin 0.12 ± 10% -0.0 0.09 ± 7% perf-profile.children.cycles-pp.timestamp_truncate 0.09 ± 5% -0.0 0.06 ± 20% perf-profile.children.cycles-pp.common_mmap 0.19 ± 9% -0.0 0.16 ± 5% perf-profile.children.cycles-pp.may_expand_vm 0.19 ± 6% -0.0 0.16 ± 5% perf-profile.children.cycles-pp.userfaultfd_unmap_complete 0.09 ± 12% -0.0 0.07 ± 11% perf-profile.children.cycles-pp.vm_pgprot_modify 0.08 ± 6% -0.0 0.06 ± 11% perf-profile.children.cycles-pp.get_align_mask 0.10 ± 7% +0.0 0.13 ± 14% perf-profile.children.cycles-pp.blocking_notifier_call_chain 0.08 ± 22% +0.0 0.13 ± 12% perf-profile.children.cycles-pp.munmap@plt 26.40 ± 4% +10.3 36.72 ± 17% perf-profile.children.cycles-pp.start_secondary 27.39 ± 4% +11.1 38.45 ± 18% perf-profile.children.cycles-pp.secondary_startup_64_no_verify 27.39 ± 4% +11.1 38.45 ± 18% perf-profile.children.cycles-pp.cpu_startup_entry 27.39 ± 4% +11.1 38.45 ± 18% perf-profile.children.cycles-pp.do_idle 27.10 ± 4% +11.1 38.21 ± 18% perf-profile.children.cycles-pp.cpuidle_enter 27.09 ± 4% +11.1 38.21 ± 18% perf-profile.children.cycles-pp.cpuidle_enter_state 26.00 ± 4% +11.3 37.32 ± 18% perf-profile.children.cycles-pp.intel_idle 11.56 -3.8 7.71 ± 11% perf-profile.self.cycles-pp.___might_sleep 1.28 ± 4% -0.2 1.07 ± 10% perf-profile.self.cycles-pp.perf_event_mmap 1.01 -0.2 0.84 ± 11% perf-profile.self.cycles-pp.__entry_text_start 1.08 ± 4% -0.2 0.92 ± 9% perf-profile.self.cycles-pp.kmem_cache_alloc 0.66 ± 6% -0.1 0.51 ± 14% perf-profile.self.cycles-pp.strlen 0.67 -0.1 0.54 ± 11% perf-profile.self.cycles-pp.find_vma 0.50 ± 4% -0.1 0.40 ± 12% perf-profile.self.cycles-pp.common_file_perm 0.50 ± 6% -0.1 0.41 ± 11% perf-profile.self.cycles-pp.get_obj_cgroup_from_current 0.34 ± 4% -0.1 0.28 ± 9% perf-profile.self.cycles-pp.sync_mm_rss 0.39 ± 3% -0.1 0.33 ± 8% perf-profile.self.cycles-pp.downgrade_write 0.17 ± 13% -0.1 0.11 ± 21% perf-profile.self.cycles-pp.cap_capable 0.24 ± 3% -0.0 0.20 ± 10% perf-profile.self.cycles-pp.vmacache_find 0.15 ± 7% -0.0 0.11 ± 25% perf-profile.self.cycles-pp.menu_select 0.39 ± 3% -0.0 0.34 ± 7% perf-profile.self.cycles-pp.__vm_munmap 0.08 ± 8% -0.0 0.04 ± 73% perf-profile.self.cycles-pp.common_mmap 0.13 ± 11% -0.0 0.09 ± 6% perf-profile.self.cycles-pp.tlb_flush_mmu 0.15 ± 6% -0.0 0.12 ± 12% perf-profile.self.cycles-pp.touch_atime 0.13 ± 10% -0.0 0.10 ± 10% perf-profile.self.cycles-pp.remove_vma 0.11 ± 11% -0.0 0.08 ± 6% perf-profile.self.cycles-pp.timestamp_truncate 0.18 ± 10% -0.0 0.15 ± 8% perf-profile.self.cycles-pp.may_expand_vm 0.16 ± 4% -0.0 0.13 ± 6% perf-profile.self.cycles-pp.syscall_exit_to_user_mode 0.18 ± 5% -0.0 0.15 ± 11% perf-profile.self.cycles-pp.get_unmapped_area 0.19 ± 6% -0.0 0.16 ± 5% perf-profile.self.cycles-pp.userfaultfd_unmap_complete 0.13 ± 5% -0.0 0.11 ± 10% perf-profile.self.cycles-pp.prepend 0.10 ± 7% +0.0 0.13 ± 14% perf-profile.self.cycles-pp.blocking_notifier_call_chain 26.00 ± 4% +11.3 37.32 ± 18% perf-profile.self.cycles-pp.intel_idle will-it-scale.per_process_ops 585000 +------------------------------------------------------------------+ | O O O OO O O OO O | 580000 |-+ O | | O O OO O O O O O | 575000 |-O OO O | | O O | 570000 |-+ O O O O OO O | | OO | 565000 |-+ O | | .+. +. .+.++.+.+.+.++.+.+.+. | 560000 |-+ +.+.+ +.+ +.+ + | | : | 555000 |.+.++.+. .+.+ | | + | 550000 +------------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. --- 0DAY/LKP+ Test Infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation Thanks, Oliver Sang