Greeting, FYI, we noticed a 9.1% improvement of will-it-scale.per_thread_ops due to commit: commit: 7c30f36a98ae488741178d69662e4f2baa53e7f6 ("io_uring: run __io_sq_thread() with the initial creds from io_uring_setup()") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: will-it-scale on test machine: 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory with following parameters: nr_task: 50% mode: thread test: unix1 cpufreq_governor: performance ucode: 0x16 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml bin/lkp run compatible-job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-9/performance/x86_64-rhel-8.3/thread/50%/debian-10.4-x86_64-20200603.cgz/lkp-hsw-4ex1/unix1/will-it-scale/0x16 commit: 678eeba481 ("io-wq: warn on creating manager while exiting") 7c30f36a98 ("io_uring: run __io_sq_thread() with the initial creds from io_uring_setup()") 678eeba481d8c161 7c30f36a98ae488741178d69662 ---------------- --------------------------- %stddev %change %stddev \ | \ 30824092 +9.1% 33623774 will-it-scale.72.threads 428111 +9.1% 466996 will-it-scale.per_thread_ops 30824092 +9.1% 33623774 will-it-scale.workload 314351 ± 4% -8.6% 287222 numa-meminfo.node0.Unevictable 0.04 ±116% +27922.2% 9.90 ±123% perf-sched.sch_delay.max.ms.__traceiter_sched_switch.__traceiter_sched_switch.schedule_timeout.rcu_gp_kthread.kthread 15.00 +6.7% 16.00 vmstat.cpu.us 78587 ± 4% -8.6% 71805 numa-vmstat.node0.nr_unevictable 78587 ± 4% -8.6% 71805 numa-vmstat.node0.nr_zone_unevictable 1769 -12.7% 1544 syscalls.sys_read.med 1780 -9.4% 1613 syscalls.sys_write.med 19842 ± 3% -20.6% 15756 ± 12% softirqs.CPU11.RCU 12942 ± 8% +17.0% 15137 ± 10% softirqs.CPU134.RCU 13720 ± 11% +19.2% 16356 ± 10% softirqs.CPU55.RCU 36667 ± 8% -41.0% 21647 ± 38% softirqs.CPU83.SCHED 266.33 ± 8% -47.4% 140.00 ± 58% interrupts.CPU11.RES:Rescheduling_interrupts 1118 ± 19% -47.1% 592.00 ± 50% interrupts.CPU11.TLB:TLB_shootdowns 992.50 ± 14% -35.1% 643.67 ± 39% interrupts.CPU120.TLB:TLB_shootdowns 1914 ± 35% +136.0% 4518 ± 43% interrupts.CPU129.NMI:Non-maskable_interrupts 1914 ± 35% +136.0% 4518 ± 43% interrupts.CPU129.PMI:Performance_monitoring_interrupts 36.17 ± 71% +206.9% 111.00 ± 44% interrupts.CPU131.RES:Rescheduling_interrupts 1159 ± 18% +72.2% 1996 ± 32% interrupts.CPU134.CAL:Function_call_interrupts 374.83 ± 61% +139.0% 895.67 ± 40% interrupts.CPU134.TLB:TLB_shootdowns 2810 ± 37% +134.0% 6578 ± 33% interrupts.CPU45.NMI:Non-maskable_interrupts 2810 ± 37% +134.0% 6578 ± 33% interrupts.CPU45.PMI:Performance_monitoring_interrupts 1605 ± 19% +76.6% 2836 ± 48% interrupts.CPU52.CAL:Function_call_interrupts 2231 ± 27% -37.2% 1400 ± 22% interrupts.CPU62.CAL:Function_call_interrupts 6880 ± 25% -46.7% 3669 ± 57% interrupts.CPU62.NMI:Non-maskable_interrupts 6880 ± 25% -46.7% 3669 ± 57% interrupts.CPU62.PMI:Performance_monitoring_interrupts 226.50 ± 18% -47.5% 119.00 ± 63% interrupts.CPU62.RES:Rescheduling_interrupts 1169 ± 18% -44.4% 650.83 ± 52% interrupts.CPU62.TLB:TLB_shootdowns 235.00 ± 13% -59.4% 95.33 ± 65% interrupts.CPU63.RES:Rescheduling_interrupts 384.17 ± 64% +120.0% 845.33 ± 30% interrupts.CPU84.TLB:TLB_shootdowns 1870 ± 8% -26.7% 1370 ± 29% interrupts.CPU93.CAL:Function_call_interrupts 1092 ± 16% -45.3% 597.33 ± 66% interrupts.CPU93.TLB:TLB_shootdowns 3.702e+10 +9.1% 4.038e+10 perf-stat.i.branch-instructions 4.711e+08 +9.0% 5.134e+08 perf-stat.i.branch-misses 1.13 -8.4% 1.04 perf-stat.i.cpi 5.421e+10 +9.0% 5.909e+10 perf-stat.i.dTLB-loads 61939091 +8.9% 67468763 perf-stat.i.dTLB-store-misses 3.777e+10 +8.9% 4.112e+10 perf-stat.i.dTLB-stores 64979413 ± 2% +9.4% 71098260 ± 2% perf-stat.i.iTLB-load-misses 1.703e+08 ± 3% +14.3% 1.947e+08 ± 15% perf-stat.i.iTLB-loads 1.857e+11 +9.1% 2.026e+11 perf-stat.i.instructions 0.89 +9.2% 0.97 perf-stat.i.ipc 896.93 +9.0% 978.06 perf-stat.i.metric.M/sec 22535 ± 7% +13.9% 25662 ± 5% perf-stat.i.node-loads 0.07 -9.1% 0.06 ± 2% perf-stat.overall.MPKI 1.13 -8.4% 1.03 perf-stat.overall.cpi 0.89 +9.1% 0.97 perf-stat.overall.ipc 3.687e+10 +9.1% 4.023e+10 perf-stat.ps.branch-instructions 4.693e+08 +9.0% 5.116e+08 perf-stat.ps.branch-misses 5.399e+10 +9.1% 5.888e+10 perf-stat.ps.dTLB-loads 61667162 +9.0% 67198925 perf-stat.ps.dTLB-store-misses 3.761e+10 +8.9% 4.097e+10 perf-stat.ps.dTLB-stores 64692296 ± 2% +9.5% 70834658 ± 2% perf-stat.ps.iTLB-load-misses 1.695e+08 ± 3% +14.4% 1.939e+08 ± 15% perf-stat.ps.iTLB-loads 1.849e+11 +9.1% 2.018e+11 perf-stat.ps.instructions 23463 ± 8% +13.9% 26730 ± 8% perf-stat.ps.node-loads 5.594e+13 +9.1% 6.104e+13 perf-stat.total.instructions 31.07 -2.3 28.80 ± 9% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.__libc_read 37.16 -2.2 35.01 ± 9% perf-profile.calltrace.cycles-pp.__libc_read 20.02 -1.5 18.51 ± 9% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read 17.67 ± 2% -1.5 16.19 ± 9% perf-profile.calltrace.cycles-pp.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write 20.78 ± 2% -1.4 19.34 ± 9% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write 16.56 -1.4 15.14 ± 9% perf-profile.calltrace.cycles-pp.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_write 15.50 -1.3 14.21 ± 9% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read 14.88 ± 2% -1.3 13.62 ± 9% perf-profile.calltrace.cycles-pp.new_sync_write.vfs_write.ksys_write.do_syscall_64.entry_SYSCALL_64_after_hwframe 16.59 -1.2 15.38 ± 9% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe.__libc_read 13.54 -1.2 12.38 ± 9% perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 13.18 -1.2 12.02 ± 9% perf-profile.calltrace.cycles-pp.sock_read_iter.new_sync_read.vfs_read.ksys_read.do_syscall_64 14.42 ± 2% -1.1 13.28 ± 9% perf-profile.calltrace.cycles-pp.sock_write_iter.new_sync_write.vfs_write.ksys_write.do_syscall_64 13.50 ± 2% -1.0 12.52 ± 9% perf-profile.calltrace.cycles-pp.sock_sendmsg.sock_write_iter.new_sync_write.vfs_write.ksys_write 12.25 ± 2% -0.9 11.36 ± 9% perf-profile.calltrace.cycles-pp.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.new_sync_write.vfs_write 3.14 ± 2% -0.8 2.31 ± 9% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode_prepare.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.__libc_read 10.57 -0.8 9.81 ± 9% perf-profile.calltrace.cycles-pp.syscall_exit_to_user_mode.entry_SYSCALL_64_after_hwframe.__libc_read 1.70 ± 2% -0.5 1.17 ± 9% perf-profile.calltrace.cycles-pp.sock_recvmsg.sock_read_iter.new_sync_read.vfs_read.ksys_read 2.15 ± 2% -0.4 1.72 ± 9% perf-profile.calltrace.cycles-pp.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.new_sync_write 1.05 ± 2% -0.2 0.81 ± 9% perf-profile.calltrace.cycles-pp.__check_object_size.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter 0.60 -0.2 0.44 ± 44% perf-profile.calltrace.cycles-pp.unix_write_space.sock_wfree.unix_destruct_scm.skb_release_head_state.skb_release_all 1.52 ± 2% -0.2 1.36 ± 8% perf-profile.calltrace.cycles-pp.unix_destruct_scm.skb_release_head_state.skb_release_all.consume_skb.unix_stream_read_generic 1.61 -0.2 1.46 ± 8% perf-profile.calltrace.cycles-pp.skb_release_head_state.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg 1.64 -0.2 1.49 ± 9% perf-profile.calltrace.cycles-pp.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_read_iter 40.84 -2.9 37.89 ± 9% perf-profile.children.cycles-pp.do_syscall_64 37.53 -2.1 35.38 ± 9% perf-profile.children.cycles-pp.__libc_read 17.70 ± 2% -1.5 16.22 ± 9% perf-profile.children.cycles-pp.ksys_write 16.60 -1.4 15.19 ± 9% perf-profile.children.cycles-pp.vfs_write 15.55 -1.3 14.26 ± 9% perf-profile.children.cycles-pp.vfs_read 14.91 ± 2% -1.3 13.65 ± 9% perf-profile.children.cycles-pp.new_sync_write 16.62 -1.2 15.41 ± 9% perf-profile.children.cycles-pp.ksys_read 13.22 -1.2 12.06 ± 9% perf-profile.children.cycles-pp.sock_read_iter 13.58 -1.2 12.42 ± 9% perf-profile.children.cycles-pp.new_sync_read 14.47 ± 2% -1.1 13.33 ± 9% perf-profile.children.cycles-pp.sock_write_iter 13.52 ± 2% -1.0 12.55 ± 9% perf-profile.children.cycles-pp.sock_sendmsg 12.36 ± 2% -0.9 11.42 ± 9% perf-profile.children.cycles-pp.unix_stream_sendmsg 5.51 ± 2% -0.8 4.72 ± 8% perf-profile.children.cycles-pp.syscall_exit_to_user_mode_prepare 1.71 ± 2% -0.5 1.20 ± 9% perf-profile.children.cycles-pp.sock_recvmsg 2.18 ± 2% -0.4 1.74 ± 9% perf-profile.children.cycles-pp.skb_copy_datagram_from_iter 1.22 ± 2% -0.3 0.88 ± 8% perf-profile.children.cycles-pp.__x86_retpoline_rax 0.52 -0.2 0.31 ± 9% perf-profile.children.cycles-pp.exit_to_user_mode_prepare 0.93 -0.2 0.72 ± 9% perf-profile.children.cycles-pp.fsnotify 2.25 ± 2% -0.2 2.06 ± 9% perf-profile.children.cycles-pp.__check_object_size 1.55 ± 2% -0.2 1.40 ± 8% perf-profile.children.cycles-pp.unix_destruct_scm 1.64 -0.2 1.49 ± 9% perf-profile.children.cycles-pp.skb_release_all 1.62 -0.2 1.47 ± 8% perf-profile.children.cycles-pp.skb_release_head_state 0.58 -0.1 0.44 ± 8% perf-profile.children.cycles-pp.__virt_addr_valid 0.47 ± 4% -0.1 0.35 ± 9% perf-profile.children.cycles-pp.wait_for_unix_gc 0.61 ± 2% -0.1 0.51 ± 8% perf-profile.children.cycles-pp.unix_write_space 0.40 -0.1 0.31 ± 10% perf-profile.children.cycles-pp.__x64_sys_read 0.63 -0.1 0.55 ± 9% perf-profile.children.cycles-pp.__might_sleep 0.35 ± 3% -0.1 0.29 ± 8% perf-profile.children.cycles-pp.apparmor_socket_getpeersec_dgram 0.13 -0.0 0.08 ± 8% perf-profile.children.cycles-pp.check_stack_object 0.13 ± 5% -0.0 0.09 ± 10% perf-profile.children.cycles-pp.unix_scm_to_skb 0.45 ± 3% -0.0 0.42 ± 8% perf-profile.children.cycles-pp.security_socket_getpeersec_dgram 0.09 ± 5% -0.0 0.06 ± 11% perf-profile.children.cycles-pp.rcu_nocb_flush_deferred_wakeup 0.57 ± 2% +0.1 0.71 ± 12% perf-profile.children.cycles-pp.__ksize 1.12 ± 2% -0.8 0.37 ± 9% perf-profile.self.cycles-pp.syscall_exit_to_user_mode_prepare 0.50 ± 2% -0.4 0.09 ± 12% perf-profile.self.cycles-pp.sock_recvmsg 1.13 -0.4 0.73 ± 9% perf-profile.self.cycles-pp.sock_read_iter 0.98 ± 3% -0.3 0.68 ± 9% perf-profile.self.cycles-pp.__x86_retpoline_rax 0.36 -0.2 0.13 ± 8% perf-profile.self.cycles-pp.skb_copy_datagram_from_iter 0.89 -0.2 0.69 ± 9% perf-profile.self.cycles-pp.fsnotify 0.29 ± 3% -0.2 0.10 ± 9% perf-profile.self.cycles-pp.security_socket_recvmsg 0.93 ± 2% -0.2 0.74 ± 9% perf-profile.self.cycles-pp.sock_write_iter 0.92 ± 4% -0.2 0.76 ± 10% perf-profile.self.cycles-pp.ftrace_syscall_exit 0.40 ± 2% -0.2 0.25 ± 10% perf-profile.self.cycles-pp.exit_to_user_mode_prepare 1.19 ± 2% -0.1 1.04 ± 9% perf-profile.self.cycles-pp.unix_stream_sendmsg 0.56 ± 2% -0.1 0.42 ± 9% perf-profile.self.cycles-pp.__virt_addr_valid 0.46 ± 6% -0.1 0.34 ± 9% perf-profile.self.cycles-pp.syscall_trace_enter 0.37 ± 2% -0.1 0.26 ± 8% perf-profile.self.cycles-pp.new_sync_write 0.25 ± 3% -0.1 0.14 ± 11% perf-profile.self.cycles-pp.alloc_skb_with_frags 0.34 -0.1 0.24 ± 10% perf-profile.self.cycles-pp.__x64_sys_read 0.59 -0.1 0.50 ± 8% perf-profile.self.cycles-pp.unix_write_space 0.40 ± 2% -0.1 0.31 ± 9% perf-profile.self.cycles-pp.unix_destruct_scm 0.55 ± 2% -0.1 0.47 ± 9% perf-profile.self.cycles-pp.__alloc_skb 0.28 ± 2% -0.1 0.20 ± 9% perf-profile.self.cycles-pp.ksys_write 0.48 ± 2% -0.1 0.41 ± 10% perf-profile.self.cycles-pp.vfs_read 0.16 ± 6% -0.1 0.09 ± 13% perf-profile.self.cycles-pp.skb_copy_datagram_iter 0.49 ± 2% -0.1 0.43 ± 9% perf-profile.self.cycles-pp.unix_stream_recvmsg 0.13 ± 5% -0.1 0.08 ± 11% perf-profile.self.cycles-pp.wait_for_unix_gc 0.13 ± 5% -0.1 0.07 ± 10% perf-profile.self.cycles-pp.unix_scm_to_skb 0.21 ± 5% -0.1 0.16 ± 11% perf-profile.self.cycles-pp.sock_alloc_send_pskb 0.48 -0.1 0.43 ± 10% perf-profile.self.cycles-pp.vfs_write 0.08 ± 7% -0.0 0.03 ± 70% perf-profile.self.cycles-pp.rcu_nocb_flush_deferred_wakeup 0.11 ± 4% -0.0 0.06 ± 11% perf-profile.self.cycles-pp.check_stack_object 0.30 ± 3% -0.0 0.26 ± 8% perf-profile.self.cycles-pp.apparmor_socket_getpeersec_dgram 0.14 ± 4% -0.0 0.10 ± 9% perf-profile.self.cycles-pp.sock_sendmsg 0.22 ± 4% -0.0 0.18 ± 9% perf-profile.self.cycles-pp.do_syscall_64 0.21 ± 2% -0.0 0.18 ± 12% perf-profile.self.cycles-pp.__skb_datagram_iter 0.27 ± 2% -0.0 0.23 ± 9% perf-profile.self.cycles-pp.__x64_sys_write 0.23 ± 4% -0.0 0.19 ± 8% perf-profile.self.cycles-pp.__x86_indirect_thunk_rax 0.18 ± 3% +0.1 0.24 ± 8% perf-profile.self.cycles-pp.security_file_permission 0.23 ± 2% +0.1 0.31 ± 8% perf-profile.self.cycles-pp.ksys_read 0.22 ± 3% +0.1 0.30 ± 9% perf-profile.self.cycles-pp.apparmor_socket_recvmsg 0.55 +0.1 0.70 ± 12% perf-profile.self.cycles-pp.__ksize will-it-scale.per_thread_ops 480000 +------------------------------------------------------------------+ 475000 |-+ OO O O O O | | O O O O O O O O O | 470000 |-O O O O O O O O O | 465000 |-+ O O O O O O O O O | 460000 |-+ .+.+.+.+.+. .+.+. .+.+ | 455000 |.+.+ +.+.+ +.+.+ +. | | +.+ | 450000 |-+ : | 445000 |-+ : | 440000 |-+ : | 435000 |-+ : | | : | 430000 |-+ +.+.+.+.+.+ | 425000 +------------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. --- 0DAY/LKP+ Test Infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation Thanks, Oliver Sang