Greeting, FYI, we noticed a 18.5% improvement of stress-ng.msg.ops_per_sec due to commit: commit: 3d3d9c072e377a0c5fa0a1f661a95134e787db3a ("trace: replace WB_REASON_FOREIGN_FLUSH with a string") https://git.kernel.org/cgit/linux/kernel/git/rostedt/linux-trace.git for-next in testcase: stress-ng on test machine: 96 threads 2 sockets Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory with following parameters: nr_threads: 10% disk: 1HDD testtime: 60s fs: ext4 class: os test: msg cpufreq_governor: performance ucode: 0x5003006 Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run bin/lkp run generated-yaml-file ========================================================================================= class/compiler/cpufreq_governor/disk/fs/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode: os/gcc-9/performance/1HDD/ext4/x86_64-rhel-8.3/10%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp5/msg/stress-ng/60s/0x5003006 commit: 957cdcd9bd ("ring-buffer: Use fallthrough pseudo-keyword") 3d3d9c072e ("trace: replace WB_REASON_FOREIGN_FLUSH with a string") 957cdcd9bd7e035d 3d3d9c072e377a0c5fa0a1f661a ---------------- --------------------------- %stddev %change %stddev \ | \ 1.997e+08 +18.5% 2.367e+08 stress-ng.msg.ops 3327615 +18.5% 3943995 stress-ng.msg.ops_per_sec 1246 ± 10% +62.2% 2022 ± 5% stress-ng.time.involuntary_context_switches 1565 +5.7% 1655 stress-ng.time.percent_of_cpu_this_job_got 955.26 +5.8% 1010 stress-ng.time.system_time 40528886 ± 9% -77.6% 9060408 ± 40% stress-ng.time.voluntary_context_switches 20278090 ± 4% -29.3% 14340237 turbostat.IRQ 37628340 ± 9% -76.4% 8880621 ± 51% cpuidle.POLL.time 30309198 ± 9% -77.7% 6751779 ± 55% cpuidle.POLL.usage 83.75 -1.3% 82.62 iostat.cpu.idle 15.92 +7.0% 17.03 iostat.cpu.system 1248983 ± 9% -77.5% 280750 ± 40% vmstat.system.cs 312514 ± 4% -29.2% 221116 vmstat.system.in 16252 ± 19% -30.5% 11301 ± 16% softirqs.CPU34.SCHED 11413 ± 16% -25.4% 8518 ± 28% softirqs.CPU38.SCHED 14279 ± 30% -43.8% 8025 ± 33% softirqs.CPU39.SCHED 119136 ± 35% +316.6% 496375 ± 35% numa-numastat.node0.local_node 181940 ± 6% +202.3% 549926 ± 33% numa-numastat.node0.numa_hit 141089 ± 30% +107.4% 292570 ± 35% numa-numastat.node1.local_node 164880 ± 7% +97.5% 325601 ± 34% numa-numastat.node1.numa_hit 381961 +137.1% 905716 ± 30% proc-vmstat.numa_hit 295354 +177.3% 819154 ± 33% proc-vmstat.numa_local 393066 +137.5% 933616 ± 30% proc-vmstat.pgalloc_normal 263755 +204.3% 802655 ± 35% proc-vmstat.pgfree 11952 ± 4% +36.4% 16297 ± 6% slabinfo.kmalloc-512.active_objs 376.33 ± 4% +36.4% 513.50 ± 6% slabinfo.kmalloc-512.active_slabs 12055 ± 4% +36.4% 16445 ± 6% slabinfo.kmalloc-512.num_objs 376.33 ± 4% +36.4% 513.50 ± 6% slabinfo.kmalloc-512.num_slabs 54896 ± 3% +47.6% 81015 ± 7% slabinfo.kmalloc-64.active_objs 858.17 ± 3% +47.7% 1267 ± 7% slabinfo.kmalloc-64.active_slabs 54962 ± 3% +47.6% 81140 ± 7% slabinfo.kmalloc-64.num_objs 858.17 ± 3% +47.7% 1267 ± 7% slabinfo.kmalloc-64.num_slabs 7755346 ± 12% -76.8% 1802412 ± 7% interrupts.CAL:Function_call_interrupts 456.17 ±141% -92.8% 32.67 ± 63% interrupts.CPU0.RES:Rescheduling_interrupts 281.67 ± 86% -89.3% 30.17 ±120% interrupts.CPU20.RES:Rescheduling_interrupts 37397 ± 75% -80.2% 7409 ± 68% interrupts.CPU34.CAL:Function_call_interrupts 2076 ± 96% -83.4% 343.67 ±167% interrupts.CPU34.NMI:Non-maskable_interrupts 2076 ± 96% -83.4% 343.67 ±167% interrupts.CPU34.PMI:Performance_monitoring_interrupts 45020 ± 71% -95.8% 1880 ± 80% interrupts.CPU39.CAL:Function_call_interrupts 407.50 ± 47% -77.1% 93.50 ±148% interrupts.CPU68.RES:Rescheduling_interrupts 443.50 ±103% -82.1% 79.50 ± 20% interrupts.CPU93.NMI:Non-maskable_interrupts 443.50 ±103% -82.1% 79.50 ± 20% interrupts.CPU93.PMI:Performance_monitoring_interrupts 19756 ± 18% -77.7% 4403 ± 21% interrupts.RES:Rescheduling_interrupts 3.43e+09 ± 3% -7.1% 3.185e+09 ± 2% perf-stat.i.branch-instructions 0.74 ± 4% -0.2 0.58 ± 6% perf-stat.i.branch-miss-rate% 24452178 ± 5% -29.2% 17322512 ± 5% perf-stat.i.branch-misses 1290940 ± 9% -77.6% 289762 ± 39% perf-stat.i.context-switches 4.379e+09 ± 3% -9.4% 3.967e+09 ± 3% perf-stat.i.dTLB-loads 2.503e+09 ± 3% -11.6% 2.212e+09 ± 3% perf-stat.i.dTLB-stores 87.84 -5.0 82.86 perf-stat.i.iTLB-load-miss-rate% 13313579 ± 6% -36.5% 8456850 ± 6% perf-stat.i.iTLB-load-misses 1.688e+10 ± 3% -6.4% 1.58e+10 ± 2% perf-stat.i.instructions 1321 ± 4% +45.4% 1921 ± 6% perf-stat.i.instructions-per-iTLB-miss 0.35 ± 2% -6.0% 0.33 ± 3% perf-stat.i.ipc 111.30 ± 3% -9.2% 101.04 ± 2% perf-stat.i.metric.M/sec 16080 ± 3% +371.9% 75880 ± 32% perf-stat.i.node-stores 0.71 ± 3% -0.2 0.54 ± 3% perf-stat.overall.branch-miss-rate% 2.93 ± 2% +6.7% 3.12 ± 2% perf-stat.overall.cpi 89.24 -5.1 84.11 perf-stat.overall.iTLB-load-miss-rate% 1272 ± 4% +47.5% 1876 ± 6% perf-stat.overall.instructions-per-iTLB-miss 0.34 ± 3% -6.3% 0.32 ± 2% perf-stat.overall.ipc 3.376e+09 ± 3% -7.1% 3.135e+09 ± 2% perf-stat.ps.branch-instructions 24066583 ± 5% -29.2% 17045901 ± 5% perf-stat.ps.branch-misses 1270763 ± 9% -77.6% 285171 ± 40% perf-stat.ps.context-switches 4.31e+09 ± 3% -9.4% 3.905e+09 ± 3% perf-stat.ps.dTLB-loads 2.463e+09 ± 3% -11.6% 2.177e+09 ± 3% perf-stat.ps.dTLB-stores 13105434 ± 6% -36.5% 8323297 ± 6% perf-stat.ps.iTLB-load-misses 1.662e+10 ± 3% -6.4% 1.556e+10 ± 2% perf-stat.ps.instructions 15767 ± 3% +374.0% 74743 ± 32% perf-stat.ps.node-stores 1.048e+12 ± 2% -6.4% 9.805e+11 ± 3% perf-stat.total.instructions 1.08 ± 17% +0.4 1.49 ± 6% perf-profile.calltrace.cycles-pp.get_obj_cgroup_from_current.__kmalloc.load_msg.do_msgsnd.do_syscall_64 2.12 ± 9% +0.6 2.77 ± 9% perf-profile.calltrace.cycles-pp.__mod_memcg_state.__mod_memcg_lruvec_state.kfree.free_msg.do_msgrcv 2.22 ± 9% +0.7 2.88 ± 10% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.kfree.free_msg.do_msgrcv.do_syscall_64 1.66 ± 23% +0.7 2.34 ± 9% perf-profile.calltrace.cycles-pp.ipc_obtain_object_check.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe 2.74 ± 10% +0.9 3.60 ± 10% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.do_msgrcv.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.94 ± 14% +1.0 2.98 ± 10% perf-profile.calltrace.cycles-pp.__mod_memcg_state.__mod_memcg_lruvec_state.__kmalloc.load_msg.do_msgsnd 2.02 ± 14% +1.1 3.09 ± 9% perf-profile.calltrace.cycles-pp.__mod_memcg_lruvec_state.__kmalloc.load_msg.do_msgsnd.do_syscall_64 1.83 ± 28% +1.9 3.76 ± 20% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe 6.22 ± 14% +2.4 8.66 ± 8% perf-profile.calltrace.cycles-pp.__kmalloc.load_msg.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe 7.17 ± 15% +2.6 9.80 ± 9% perf-profile.calltrace.cycles-pp.load_msg.do_msgsnd.do_syscall_64.entry_SYSCALL_64_after_hwframe 3.12 ± 8% -2.4 0.76 ± 42% perf-profile.children.cycles-pp.__schedule 2.49 ± 8% -1.9 0.60 ± 58% perf-profile.children.cycles-pp.poll_idle 1.82 ± 9% -1.3 0.48 ± 55% perf-profile.children.cycles-pp.schedule 1.78 ± 8% -1.3 0.51 ± 44% perf-profile.children.cycles-pp.wake_up_q 1.69 ± 8% -1.2 0.45 ± 49% perf-profile.children.cycles-pp.try_to_wake_up 1.38 ± 9% -1.1 0.30 ± 22% perf-profile.children.cycles-pp.schedule_idle 0.84 ± 8% -0.6 0.22 ± 56% perf-profile.children.cycles-pp.dequeue_task_fair 0.77 ± 6% -0.6 0.18 ± 37% perf-profile.children.cycles-pp.ttwu_do_activate 0.76 ± 9% -0.6 0.19 ± 59% perf-profile.children.cycles-pp.dequeue_entity 0.73 ± 5% -0.6 0.17 ± 35% perf-profile.children.cycles-pp.enqueue_task_fair 0.73 ± 21% -0.5 0.24 ±100% perf-profile.children.cycles-pp.flush_smp_call_function_from_idle 0.68 ± 20% -0.5 0.21 ± 81% perf-profile.children.cycles-pp.sched_ttwu_pending 0.58 ± 6% -0.4 0.13 ± 38% perf-profile.children.cycles-pp.enqueue_entity 0.55 ± 11% -0.4 0.12 ± 34% perf-profile.children.cycles-pp.pick_next_task_fair 0.76 ± 11% -0.4 0.39 ± 29% perf-profile.children.cycles-pp.menu_select 0.53 ± 10% -0.4 0.17 ± 42% perf-profile.children.cycles-pp.wake_q_add 0.44 ± 22% -0.3 0.11 ± 23% perf-profile.children.cycles-pp.asm_sysvec_call_function_single 0.44 ± 8% -0.3 0.11 ± 32% perf-profile.children.cycles-pp.update_load_avg 0.48 ± 24% -0.3 0.15 ± 59% perf-profile.children.cycles-pp.ttwu_queue_wakelist 0.36 ± 14% -0.3 0.06 ± 52% perf-profile.children.cycles-pp.set_next_entity 0.40 ± 13% -0.3 0.11 ± 84% perf-profile.children.cycles-pp.update_curr 0.36 ± 8% -0.3 0.09 ± 40% perf-profile.children.cycles-pp.__switch_to 0.35 ± 22% -0.3 0.08 ± 27% perf-profile.children.cycles-pp.finish_task_switch 0.33 ± 22% -0.2 0.08 ± 28% perf-profile.children.cycles-pp.sysvec_call_function_single 0.34 ± 6% -0.2 0.10 ± 33% perf-profile.children.cycles-pp.update_rq_clock 0.28 ± 23% -0.2 0.06 ±123% perf-profile.children.cycles-pp.flush_smp_call_function_queue 0.28 ± 23% -0.2 0.07 ± 30% perf-profile.children.cycles-pp.__sysvec_call_function_single 0.27 ± 9% -0.2 0.06 ±145% perf-profile.children.cycles-pp.tick_nohz_idle_exit 0.40 ± 15% -0.2 0.23 ± 31% perf-profile.children.cycles-pp.tick_nohz_get_sleep_length 0.26 ± 5% -0.2 0.10 ± 32% perf-profile.children.cycles-pp.sched_clock_cpu 0.22 ± 35% -0.2 0.06 ± 87% perf-profile.children.cycles-pp.__smp_call_single_queue 0.22 ± 35% -0.2 0.06 ± 87% perf-profile.children.cycles-pp.llist_add_batch 0.22 ± 20% -0.2 0.06 ±113% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template 0.23 ± 7% -0.1 0.08 ± 37% perf-profile.children.cycles-pp.native_sched_clock 0.30 ± 17% -0.1 0.20 ± 34% perf-profile.children.cycles-pp.tick_nohz_next_event 0.11 ± 8% -0.1 0.05 ± 48% perf-profile.children.cycles-pp.read_tsc 0.12 ± 12% -0.0 0.08 ± 11% perf-profile.children.cycles-pp.native_irq_return_iret 0.07 ± 14% +0.0 0.10 ± 13% perf-profile.children.cycles-pp.get_mem_cgroup_from_objcg 0.16 ± 14% +0.0 0.19 ± 7% perf-profile.children.cycles-pp.__get_user_8 0.14 ± 9% +0.0 0.18 ± 8% perf-profile.children.cycles-pp.drain_obj_stock 0.10 ± 10% +0.1 0.16 ± 13% perf-profile.children.cycles-pp.obj_cgroup_charge_pages 0.18 ± 10% +0.1 0.23 ± 12% perf-profile.children.cycles-pp.obj_cgroup_charge 0.07 ± 15% +0.1 0.14 ± 12% perf-profile.children.cycles-pp.msgctl_info 0.13 ± 13% +0.1 0.22 ± 11% perf-profile.children.cycles-pp.ksys_msgctl 0.22 ± 10% +0.1 0.33 ± 19% perf-profile.children.cycles-pp.ipcperms 0.28 ± 18% +0.1 0.41 ± 32% perf-profile.children.cycles-pp.__list_del_entry_valid 1.09 ± 17% +0.4 1.51 ± 6% perf-profile.children.cycles-pp.get_obj_cgroup_from_current 3.46 ± 15% +0.9 4.38 ± 9% perf-profile.children.cycles-pp.ipc_obtain_object_check 4.07 ± 11% +1.7 5.77 ± 9% perf-profile.children.cycles-pp.__mod_memcg_state 4.26 ± 11% +1.7 6.00 ± 9% perf-profile.children.cycles-pp.__mod_memcg_lruvec_state 6.26 ± 14% +2.4 8.72 ± 8% perf-profile.children.cycles-pp.__kmalloc 7.18 ± 15% +2.6 9.82 ± 9% perf-profile.children.cycles-pp.load_msg 4.60 ± 16% +2.8 7.40 ± 9% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 2.31 ± 8% -1.8 0.56 ± 59% perf-profile.self.cycles-pp.poll_idle 0.63 ± 9% -0.5 0.15 ± 38% perf-profile.self.cycles-pp.__schedule 0.53 ± 10% -0.4 0.17 ± 42% perf-profile.self.cycles-pp.wake_q_add 0.35 ± 8% -0.3 0.08 ± 40% perf-profile.self.cycles-pp.__switch_to 0.29 ± 8% -0.2 0.06 ± 59% perf-profile.self.cycles-pp.enqueue_entity 0.24 ± 6% -0.2 0.06 ± 50% perf-profile.self.cycles-pp.do_idle 0.22 ± 35% -0.2 0.06 ± 87% perf-profile.self.cycles-pp.llist_add_batch 0.29 ± 7% -0.2 0.13 ± 32% perf-profile.self.cycles-pp.menu_select 0.22 ± 6% -0.1 0.08 ± 39% perf-profile.self.cycles-pp.native_sched_clock 0.23 ± 9% -0.1 0.12 ± 32% perf-profile.self.cycles-pp.do_msg_fill 0.11 ± 7% -0.1 0.05 ± 48% perf-profile.self.cycles-pp.read_tsc 0.12 ± 12% -0.0 0.08 ± 8% perf-profile.self.cycles-pp.native_irq_return_iret 0.08 ± 8% -0.0 0.05 ± 45% perf-profile.self.cycles-pp.wake_up_q 0.07 ± 14% +0.0 0.10 ± 13% perf-profile.self.cycles-pp.get_mem_cgroup_from_objcg 0.15 ± 12% +0.0 0.19 ± 7% perf-profile.self.cycles-pp.__get_user_8 0.06 ± 13% +0.0 0.10 ± 14% perf-profile.self.cycles-pp.msgctl_info 0.00 +0.1 0.05 ± 7% perf-profile.self.cycles-pp.format_decode 0.26 ± 12% +0.1 0.34 ± 11% perf-profile.self.cycles-pp.sysvipc_find_ipc 0.21 ± 11% +0.1 0.32 ± 19% perf-profile.self.cycles-pp.ipcperms 0.06 ± 50% +0.1 0.17 ± 33% perf-profile.self.cycles-pp.free_msg 0.28 ± 17% +0.1 0.41 ± 31% perf-profile.self.cycles-pp.__list_del_entry_valid 0.57 ± 26% +0.3 0.83 ± 19% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe 1.07 ± 17% +0.4 1.48 ± 6% perf-profile.self.cycles-pp.get_obj_cgroup_from_current 2.18 ± 18% +0.7 2.90 ± 12% perf-profile.self.cycles-pp.ipc_obtain_object_check 4.07 ± 11% +1.7 5.77 ± 9% perf-profile.self.cycles-pp.__mod_memcg_state 4.55 ± 16% +2.8 7.34 ± 9% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath stress-ng.msg.ops_per_sec 4.4e+06 +-----------------------------------------------------------------+ | O | 4.2e+06 |-+ O O O O | | O O O O O | 4e+06 |-O O OO O O O OO O O O O O O O O O OO | | O O O O O O O O O | 3.8e+06 |-+ | | | 3.6e+06 |-+ .+ .+ | | + + +.++ + | 3.4e+06 |-+ +. : + ++ +. + | | .+ +.: : +.+ +.+ | 3.2e+06 |-++.+ + : .+ .+ + | |.+ + +.+ + | 3e+06 +-----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. --- 0DAY/LKP+ Test Infrastructure Open Source Technology Center https://lists.01.org/hyperkitty/list/lkp@lists.01.org Intel Corporation Thanks, Oliver Sang