FYI, we noticed a 9.3% improvement of will-it-scale.per_process_ops due to commit: commit 65ea11ec6a82b1d44aba62b59e9eb20247e57c6e ("x86/hweight: Don't clobber %rdi") https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master in testcase: will-it-scale on test machine: 32 threads Sandy Bridge-EP with 64G memory with following parameters: test: unix1 cpufreq_governor: performance Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase: gcc-6/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/lkp-sb03/unix1/will-it-scale commit: v4.8-rc1 65ea11ec6a ("x86/hweight: Don't clobber %rdi") v4.8-rc1 65ea11ec6a82b1d44aba62b59e ---------------- -------------------------- fail:runs %reproduction fail:runs | | | 1:8 -12% :4 last_state.is_incomplete_run 4:8 -50% :4 kmsg.DHCP/BOOTP:Reply_not_for_us,op[#]xid[#] 7:8 -88% :4 kmsg.drm:drm_edid_block_valid[drm]]*ERROR*EDID_checksum_is_invalid,remainder_is 7:8 -88% :4 kmsg.i8042:Can't_read_CTR_while_initializing_i8042 %stddev %change %stddev \ | \ 1063041 ± 0% +9.3% 1161810 ± 0% will-it-scale.per_process_ops 976004 ± 0% +9.0% 1063615 ± 0% will-it-scale.per_thread_ops 0.57 ± 0% -6.7% 0.53 ± 1% will-it-scale.scalability 175.96 ± 0% +8.0% 190.10 ± 0% will-it-scale.time.user_time 0.00 ± 20% -31.5% 0.00 ± 26% sched_debug.cpu.next_balance.stddev 101.14 ± 11% +9639.4% 9850 ±121% latency_stats.avg.rpc_wait_bit_killable.__rpc_execute.rpc_execute.rpc_run_task.nfs4_call_sync_sequence.[nfsv4]._nfs4_proc_getattr.[nfsv4].nfs4_proc_getattr.[nfsv4].__nfs_revalidate_inode.nfs_do_access.nfs_permission.__inode_permission.inode_permission 148.57 ± 15% +57704.4% 85880 ±125% latency_stats.max.rpc_wait_bit_killable.__rpc_execute.rpc_execute.rpc_run_task.nfs4_call_sync_sequence.[nfsv4]._nfs4_proc_getattr.[nfsv4].nfs4_proc_getattr.[nfsv4].__nfs_revalidate_inode.nfs_do_access.nfs_permission.__inode_permission.inode_permission 886.00 ± 14% +9757.0% 87333 ±123% latency_stats.sum.rpc_wait_bit_killable.__rpc_execute.rpc_execute.rpc_run_task.nfs4_call_sync_sequence.[nfsv4]._nfs4_proc_getattr.[nfsv4].nfs4_proc_getattr.[nfsv4].__nfs_revalidate_inode.nfs_do_access.nfs_permission.__inode_permission.inode_permission 3.041e+12 ± 1% +7.4% 3.267e+12 ± 1% perf-stat.branch-instructions 0.31 ± 0% -86.6% 0.04 ± 4% perf-stat.branch-miss-rate 9.456e+09 ± 1% -85.6% 1.364e+09 ± 3% perf-stat.branch-misses 5.147e+12 ± 1% +5.4% 5.427e+12 ± 1% perf-stat.dTLB-loads 3.869e+12 ± 0% +6.7% 4.128e+12 ± 1% perf-stat.dTLB-stores 29.02 ± 13% +223.2% 93.80 ± 0% perf-stat.iTLB-load-miss-rate 2.353e+08 ± 21% +733.0% 1.96e+09 ± 0% perf-stat.iTLB-load-misses 5.7e+08 ± 9% -77.2% 1.297e+08 ± 10% perf-stat.iTLB-loads 1.696e+13 ± 0% +6.9% 1.814e+13 ± 0% perf-stat.instructions 75030 ± 18% -87.7% 9251 ± 1% perf-stat.instructions-per-iTLB-miss 1.04 ± 0% +7.6% 1.12 ± 1% perf-stat.ipc 24064971 ± 3% -6.6% 22469931 ± 3% perf-stat.node-load-misses 53705459 ± 1% -3.1% 52034054 ± 2% perf-stat.node-loads 7.32 ± 5% +23.3% 9.03 ± 4% perf-profile.cycles.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg 1.29 ± 4% +11.7% 1.44 ± 5% perf-profile.cycles.__fdget_pos.sys_write.entry_SYSCALL_64_fastpath 1.15 ± 4% +12.1% 1.29 ± 4% perf-profile.cycles.__fget.__fget_light.__fdget_pos.sys_write.entry_SYSCALL_64_fastpath 1.22 ± 5% +11.7% 1.36 ± 5% perf-profile.cycles.__fget_light.__fdget_pos.sys_write.entry_SYSCALL_64_fastpath 1.86 ± 4% -58.4% 0.77 ± 7% perf-profile.cycles.__inode_security_revalidate.selinux_file_permission.security_file_permission.rw_verify_area.vfs_write 0.00 ± -1% +Inf% 2.65 ± 5% perf-profile.cycles.__kmalloc_node_track_caller.__kmalloc_reserve.isra.33.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb 1.89 ± 8% -100.0% 0.00 ± -1% perf-profile.cycles.__kmalloc_node_track_caller.__kmalloc_reserve.isra.35.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb 0.00 ± -1% +Inf% 3.55 ± 5% perf-profile.cycles.__kmalloc_reserve.isra.33.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg 2.52 ± 8% -100.0% 0.00 ± -1% perf-profile.cycles.__kmalloc_reserve.isra.35.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg 1.43 ± 4% -91.1% 0.13 ±173% perf-profile.cycles.__might_sleep.__inode_security_revalidate.selinux_file_permission.security_file_permission.rw_verify_area 1.15 ± 5% -65.7% 0.40 ± 57% perf-profile.cycles.__might_sleep.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg 1.33 ± 7% +14.0% 1.52 ± 2% perf-profile.cycles._raw_spin_lock_irqsave.skb_queue_tail.unix_stream_sendmsg.sock_sendmsg.sock_write_iter 1.37 ± 6% +20.4% 1.65 ± 3% perf-profile.cycles._raw_spin_lock_irqsave.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg 1.09 ± 9% +15.6% 1.26 ± 5% perf-profile.cycles._raw_spin_unlock_irqrestore.skb_queue_tail.unix_stream_sendmsg.sock_sendmsg.sock_write_iter 1.01 ± 6% +15.4% 1.17 ± 7% perf-profile.cycles._raw_spin_unlock_irqrestore.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg 8.01 ± 6% +22.5% 9.82 ± 4% perf-profile.cycles.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter 7.33 ± 6% +14.8% 8.42 ± 4% perf-profile.cycles.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter 0.98 ± 8% +15.0% 1.12 ± 4% perf-profile.cycles.consume_skb.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.__vfs_read 1.60 ± 5% +18.7% 1.91 ± 3% perf-profile.cycles.copy_from_iter.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter 2.30 ± 4% +11.5% 2.56 ± 6% perf-profile.cycles.entry_SYSCALL_64 2.10 ± 3% +18.1% 2.48 ± 5% perf-profile.cycles.entry_SYSCALL_64_after_swapgs 2.82 ± 7% -34.6% 1.85 ± 6% perf-profile.cycles.file_has_perm.selinux_file_permission.security_file_permission.rw_verify_area.vfs_read 1.55 ± 6% +21.3% 1.89 ± 5% perf-profile.cycles.fput.entry_SYSCALL_64_fastpath 1.13 ± 9% +17.0% 1.32 ± 3% perf-profile.cycles.kfree.skb_free_head.skb_release_data.skb_release_all.consume_skb 0.76 ± 8% +21.9% 0.93 ± 5% perf-profile.cycles.kfree_skbmem.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg 0.77 ± 10% +27.0% 0.98 ± 5% perf-profile.cycles.ksize.__alloc_skb.alloc_skb_with_frags.sock_alloc_send_pskb.unix_stream_sendmsg 2.08 ± 6% -31.5% 1.42 ± 6% perf-profile.cycles.mutex_lock.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter 0.89 ± 9% +18.8% 1.06 ± 6% perf-profile.cycles.mutex_unlock.unix_stream_recvmsg.sock_recvmsg.sock_read_iter.__vfs_read 6.80 ± 3% -19.3% 5.49 ± 3% perf-profile.cycles.rw_verify_area.vfs_read.sys_read.entry_SYSCALL_64_fastpath 5.54 ± 4% -23.5% 4.24 ± 5% perf-profile.cycles.rw_verify_area.vfs_write.sys_write.entry_SYSCALL_64_fastpath 6.21 ± 4% -19.5% 5.00 ± 3% perf-profile.cycles.security_file_permission.rw_verify_area.vfs_read.sys_read.entry_SYSCALL_64_fastpath 5.23 ± 4% -25.6% 3.89 ± 5% perf-profile.cycles.security_file_permission.rw_verify_area.vfs_write.sys_write.entry_SYSCALL_64_fastpath 4.67 ± 4% -24.1% 3.55 ± 4% perf-profile.cycles.selinux_file_permission.security_file_permission.rw_verify_area.vfs_read.sys_read 4.87 ± 5% -28.0% 3.51 ± 5% perf-profile.cycles.selinux_file_permission.security_file_permission.rw_verify_area.vfs_write.sys_write 2.43 ± 5% +29.8% 3.15 ± 3% perf-profile.cycles.skb_copy_datagram_from_iter.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write 1.18 ± 8% +16.1% 1.36 ± 2% perf-profile.cycles.skb_free_head.skb_release_data.skb_release_all.consume_skb.unix_stream_read_generic 2.60 ± 7% +15.4% 3.00 ± 3% perf-profile.cycles.skb_queue_tail.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write 6.30 ± 6% +15.2% 7.26 ± 4% perf-profile.cycles.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg 1.45 ± 7% +19.4% 1.73 ± 2% perf-profile.cycles.skb_release_data.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg 4.63 ± 6% +14.4% 5.30 ± 5% perf-profile.cycles.skb_release_head_state.skb_release_all.consume_skb.unix_stream_read_generic.unix_stream_recvmsg 1.01 ± 4% +16.7% 1.18 ± 5% perf-profile.cycles.skb_set_owner_w.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter 2.59 ± 6% +18.2% 3.07 ± 4% perf-profile.cycles.skb_unlink.unix_stream_read_generic.unix_stream_recvmsg.sock_recvmsg.sock_read_iter 9.66 ± 5% +21.1% 11.70 ± 3% perf-profile.cycles.sock_alloc_send_pskb.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write 25.86 ± 5% +14.8% 29.68 ± 4% perf-profile.cycles.sock_sendmsg.sock_write_iter.__vfs_write.vfs_write.sys_write 3.88 ± 7% +13.1% 4.38 ± 5% perf-profile.cycles.sock_wfree.unix_destruct_scm.skb_release_head_state.skb_release_all.consume_skb 4.24 ± 7% +13.3% 4.80 ± 5% perf-profile.cycles.unix_destruct_scm.skb_release_head_state.skb_release_all.consume_skb.unix_stream_read_generic 21.96 ± 5% +17.1% 25.71 ± 3% perf-profile.cycles.unix_stream_sendmsg.sock_sendmsg.sock_write_iter.__vfs_write.vfs_write 1.20 ± 6% -100.0% 0.00 ± -1% perf-profile.cycles.unix_stream_sendmsg.sock_write_iter.__vfs_write.vfs_write.sys_write 2.28 ± 6% +13.7% 2.60 ± 3% perf-profile.cycles.unix_write_space.sock_wfree.unix_destruct_scm.skb_release_head_state.skb_release_all 3.84 ± 5% -16.8% 3.20 ± 2% perf-profile.func.cycles.___might_sleep 1.96 ± 7% +20.8% 2.36 ± 4% perf-profile.func.cycles.__alloc_skb 2.40 ± 4% +11.3% 2.67 ± 4% perf-profile.func.cycles.__fget 1.30 ± 9% +48.7% 1.94 ± 4% perf-profile.func.cycles.__kmalloc_node_track_caller 1.05 ± 5% +12.6% 1.19 ± 7% perf-profile.func.cycles.__vfs_read 0.99 ± 7% +27.1% 1.26 ± 4% perf-profile.func.cycles.__vfs_write 1.01 ± 5% -51.9% 0.48 ± 3% perf-profile.func.cycles._cond_resched 2.78 ± 6% +17.0% 3.25 ± 2% perf-profile.func.cycles._raw_spin_lock_irqsave 2.19 ± 8% +15.5% 2.53 ± 6% perf-profile.func.cycles._raw_spin_unlock_irqrestore 1.10 ± 8% +11.2% 1.23 ± 4% perf-profile.func.cycles.consume_skb 0.97 ± 5% +25.6% 1.22 ± 3% perf-profile.func.cycles.copy_from_iter 2.30 ± 4% +11.5% 2.56 ± 6% perf-profile.func.cycles.entry_SYSCALL_64 2.10 ± 3% +18.1% 2.48 ± 5% perf-profile.func.cycles.entry_SYSCALL_64_after_swapgs 2.26 ± 4% -38.4% 1.39 ± 5% perf-profile.func.cycles.file_has_perm 1.55 ± 6% +21.3% 1.89 ± 5% perf-profile.func.cycles.fput 1.18 ± 8% +17.2% 1.38 ± 3% perf-profile.func.cycles.kfree 0.86 ± 10% +22.0% 1.05 ± 4% perf-profile.func.cycles.ksize 0.90 ± 8% +18.7% 1.06 ± 5% perf-profile.func.cycles.mutex_unlock 1.91 ± 6% -13.1% 1.66 ± 3% perf-profile.func.cycles.selinux_file_permission 1.05 ± 5% +16.7% 1.23 ± 5% perf-profile.func.cycles.skb_set_owner_w 1.66 ± 8% +16.3% 1.93 ± 7% perf-profile.func.cycles.sock_wfree 2.44 ± 4% -39.7% 1.47 ± 2% perf-profile.func.cycles.sock_write_iter 4.20 ± 6% -21.1% 3.32 ± 3% perf-profile.func.cycles.unix_stream_sendmsg 2.35 ± 6% +14.3% 2.69 ± 3% perf-profile.func.cycles.unix_write_space Thanks, Xiaolong