Greeting, FYI, we noticed a 9.3% improvement of unixbench.score due to commit: commit: c469933e772132aad040bd6a2adc8edf9ad6f825 ("sched/fair: Fix cpu_util_wake() for 'execl' type workloads") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: unixbench on test machine: 16 threads Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz with 8G memory with following parameters: runtime: 300s nr_task: 100% test: execl ucode: 0x7000013 cpufreq_governor: performance test-description: UnixBench is the original BYTE UNIX benchmark suite aims to test performance of Unix-like system. test-url: https://github.com/kdlucas/byte-unixbench Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/nr_task/rootfs/runtime/tbox_group/test/testcase/ucode: gcc-7/performance/x86_64-rhel-7.2/100%/debian-x86_64-2018-04-03.cgz/300s/lkp-bdw-de1/execl/unixbench/0x7000013 commit: e1ff516a56 ("sched/fair: Fix a comment in task_numa_fault()") c469933e77 ("sched/fair: Fix cpu_util_wake() for 'execl' type workloads") e1ff516a56ad56c4 c469933e772132aad040bd6a2a ---------------- -------------------------- %stddev %change %stddev \ | \ 8276 +9.3% 9049 unixbench.score 9424325 -86.8% 1243368 ± 4% unixbench.time.involuntary_context_switches 5.012e+08 +12.0% 5.612e+08 unixbench.time.minor_page_faults 1161 ± 3% +14.4% 1328 unixbench.time.percent_of_cpu_this_job_got 3314 +13.7% 3769 unixbench.time.system_time 681.50 +11.7% 761.05 unixbench.time.user_time 2983499 +205.1% 9103817 unixbench.time.voluntary_context_switches 11984682 +9.8% 13162410 unixbench.workload 18.50 +0.8% 18.65 boot-time.dhcp 2759872 -98.0% 54789 ± 3% interrupts.CAL:Function_call_interrupts 6073453 ± 3% +10.2% 6692409 ± 3% softirqs.RCU 23.55 ± 8% -10.0 13.55 ± 4% mpstat.cpu.idle% 60.39 ± 2% +8.3 68.70 mpstat.cpu.sys% 91338 ± 2% -34.0% 60255 vmstat.system.cs 40263 -18.0% 33007 vmstat.system.in 2.451e+08 -94.2% 14258526 cpuidle.C1.time 7930768 -48.4% 4088639 cpuidle.C1.usage 1.981e+08 ± 5% -95.0% 9807359 ± 21% cpuidle.C1E.time 4505765 ± 3% -93.0% 317380 ± 5% cpuidle.C1E.usage 2963542 ± 3% +403.8% 14930439 cpuidle.POLL.time 708625 ± 2% +630.1% 5173851 cpuidle.POLL.usage 771.25 ± 26% -35.1% 500.75 ± 6% slabinfo.dmaengine-unmap-16.active_objs 771.25 ± 26% -35.1% 500.75 ± 6% slabinfo.dmaengine-unmap-16.num_objs 248.75 ± 10% -45.4% 135.75 ± 45% slabinfo.secpath_cache.active_objs 248.75 ± 10% -45.4% 135.75 ± 45% slabinfo.secpath_cache.num_objs 1208 ± 6% +32.5% 1600 ± 8% slabinfo.skbuff_head_cache.active_objs 1240 ± 9% +29.0% 1600 ± 8% slabinfo.skbuff_head_cache.num_objs 8877 +7.8% 9571 proc-vmstat.nr_inactive_anon 1439 ± 2% +3.5% 1489 proc-vmstat.nr_page_table_pages 10413 +4.4% 10867 proc-vmstat.nr_shmem 8877 +7.8% 9571 proc-vmstat.nr_zone_inactive_anon 3.892e+08 +11.3% 4.332e+08 proc-vmstat.numa_hit 3.892e+08 +11.3% 4.332e+08 proc-vmstat.numa_local 10988 -93.7% 691.25 ± 2% proc-vmstat.pgactivate 3.998e+08 +11.3% 4.449e+08 proc-vmstat.pgalloc_normal 5.044e+08 +12.9% 5.695e+08 proc-vmstat.pgfault 3.998e+08 +11.3% 4.448e+08 proc-vmstat.pgfree 7929130 -87.1% 1022401 ±173% turbostat.C1 4.38 ± 2% -4.3 0.07 ±173% turbostat.C1% 4505651 ± 3% -98.3% 74544 ±173% turbostat.C1E 3.54 ± 4% -3.5 0.04 ±173% turbostat.C1E% 562189 ± 54% -96.1% 21843 ±173% turbostat.C6 7.81 ± 61% -7.5 0.28 ±173% turbostat.C6% 15.91 ± 10% -87.7% 1.96 ±173% turbostat.CPU%c1 4.40 ± 70% -97.6% 0.10 ±173% turbostat.CPU%c6 16844954 ± 2% -83.1% 2844682 ±173% turbostat.IRQ 0.51 ± 28% -86.3% 0.07 ±173% turbostat.Pkg%pc2 2.72 -75.2% 0.67 ±173% turbostat.RAMWatt 2100 -75.1% 523.25 ±173% turbostat.TSC_MHz 1.387e+12 +10.8% 1.537e+12 perf-stat.branch-instructions 3.543e+10 +9.8% 3.892e+10 perf-stat.branch-misses 2e+11 -9.2% 1.817e+11 perf-stat.cache-misses 2e+11 -9.2% 1.817e+11 perf-stat.cache-references 32078199 -35.3% 20767724 perf-stat.context-switches 1.071e+13 +10.6% 1.185e+13 perf-stat.cpu-cycles 9385083 -92.8% 678167 ± 11% perf-stat.cpu-migrations 0.45 ± 2% -0.1 0.40 ± 6% perf-stat.dTLB-load-miss-rate% 1.881e+12 +10.7% 2.082e+12 perf-stat.dTLB-loads 0.12 ± 2% -0.0 0.11 perf-stat.dTLB-store-miss-rate% 1.011e+12 +9.3% 1.104e+12 perf-stat.dTLB-stores 3.311e+09 +8.0% 3.575e+09 perf-stat.iTLB-load-misses 2.874e+09 +6.1% 3.049e+09 perf-stat.iTLB-loads 6.945e+12 +10.6% 7.681e+12 perf-stat.instructions 2097 +2.4% 2148 perf-stat.instructions-per-iTLB-miss 4.808e+08 +12.0% 5.388e+08 perf-stat.minor-faults 4.808e+08 +12.0% 5.388e+08 perf-stat.page-faults 40.50 -40.5 0.00 perf-profile.calltrace.cycles-pp.execve 40.33 -40.3 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.execve 40.33 -40.3 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve 40.32 -40.3 0.00 perf-profile.calltrace.cycles-pp.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve 40.25 -40.2 0.00 perf-profile.calltrace.cycles-pp.__do_execve_file.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe.execve 29.41 -29.4 0.00 perf-profile.calltrace.cycles-pp.search_binary_handler.__do_execve_file.__x64_sys_execve.do_syscall_64.entry_SYSCALL_64_after_hwframe 29.31 -29.3 0.00 perf-profile.calltrace.cycles-pp.load_elf_binary.search_binary_handler.__do_execve_file.__x64_sys_execve.do_syscall_64 16.09 ± 5% -16.1 0.00 perf-profile.calltrace.cycles-pp.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 16.06 ± 2% -16.1 0.00 perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe 16.01 ± 2% -16.0 0.00 perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe 15.24 -15.2 0.00 perf-profile.calltrace.cycles-pp.page_fault 15.02 -15.0 0.00 perf-profile.calltrace.cycles-pp.flush_old_exec.load_elf_binary.search_binary_handler.__do_execve_file.__x64_sys_execve 14.64 -14.6 0.00 perf-profile.calltrace.cycles-pp.do_page_fault.page_fault 14.44 -14.4 0.00 perf-profile.calltrace.cycles-pp.mmput.flush_old_exec.load_elf_binary.search_binary_handler.__do_execve_file 14.38 -14.4 0.00 perf-profile.calltrace.cycles-pp.exit_mmap.mmput.flush_old_exec.load_elf_binary.search_binary_handler 14.30 ± 2% -14.3 0.00 perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault 13.66 -13.7 0.00 perf-profile.calltrace.cycles-pp.secondary_startup_64 12.50 ± 2% -12.5 0.00 perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 12.50 -12.5 0.00 perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64 12.49 -12.5 0.00 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64 12.47 -12.5 0.00 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64 11.53 -11.5 0.00 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64 11.05 -11.1 0.00 perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.do_idle.cpu_startup_entry.start_secondary 7.58 ± 5% -7.6 0.00 perf-profile.calltrace.cycles-pp.filemap_map_pages.__handle_mm_fault.handle_mm_fault.__do_page_fault.do_page_fault 6.96 ± 2% -7.0 0.00 perf-profile.calltrace.cycles-pp.unmap_vmas.exit_mmap.mmput.flush_old_exec.load_elf_binary 6.63 ± 3% -6.6 0.00 perf-profile.calltrace.cycles-pp.unmap_page_range.unmap_vmas.exit_mmap.mmput.flush_old_exec 56.55 -56.5 0.00 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 56.48 -56.5 0.00 perf-profile.children.cycles-pp.do_syscall_64 40.52 -40.5 0.00 perf-profile.children.cycles-pp.__x64_sys_execve 40.51 -40.5 0.00 perf-profile.children.cycles-pp.execve 40.44 -40.4 0.00 perf-profile.children.cycles-pp.__do_execve_file 29.47 -29.5 0.00 perf-profile.children.cycles-pp.search_binary_handler 29.36 -29.4 0.00 perf-profile.children.cycles-pp.load_elf_binary 23.27 -23.3 0.00 perf-profile.children.cycles-pp.page_fault 22.12 -22.1 0.00 perf-profile.children.cycles-pp.do_page_fault 21.66 -21.7 0.00 perf-profile.children.cycles-pp.__do_page_fault 20.36 -20.4 0.00 perf-profile.children.cycles-pp.handle_mm_fault 19.64 -19.6 0.00 perf-profile.children.cycles-pp.__handle_mm_fault 15.03 -15.0 0.00 perf-profile.children.cycles-pp.flush_old_exec 14.46 -14.5 0.00 perf-profile.children.cycles-pp.mmput 14.39 -14.4 0.00 perf-profile.children.cycles-pp.exit_mmap 13.66 -13.7 0.00 perf-profile.children.cycles-pp.secondary_startup_64 13.66 -13.7 0.00 perf-profile.children.cycles-pp.cpu_startup_entry 13.66 -13.7 0.00 perf-profile.children.cycles-pp.do_idle 12.67 -12.7 0.00 perf-profile.children.cycles-pp.cpuidle_enter_state 12.50 -12.5 0.00 perf-profile.children.cycles-pp.start_secondary 12.14 -12.1 0.00 perf-profile.children.cycles-pp.intel_idle 10.19 ± 2% -10.2 0.00 perf-profile.children.cycles-pp.filemap_map_pages 7.43 ± 3% -7.4 0.00 perf-profile.children.cycles-pp.unmap_vmas 7.42 ± 3% -7.4 0.00 perf-profile.children.cycles-pp.vm_mmap_pgoff 7.15 ± 4% -7.2 0.00 perf-profile.children.cycles-pp.unmap_page_range 6.75 ± 3% -6.8 0.00 perf-profile.children.cycles-pp.do_mmap 6.04 ± 4% -6.0 0.00 perf-profile.children.cycles-pp.mmap_region 5.95 ± 5% -5.9 0.00 perf-profile.children.cycles-pp.copy_strings 12.13 -12.1 0.00 perf-profile.self.cycles-pp.intel_idle 5.79 ± 3% -5.8 0.00 perf-profile.self.cycles-pp.filemap_map_pages 113208 +12.7% 127543 sched_debug.cfs_rq:/.exec_clock.avg 118394 +12.2% 132857 sched_debug.cfs_rq:/.exec_clock.max 81434 +13.0% 92036 sched_debug.cfs_rq:/.exec_clock.min 8558 ± 2% +11.8% 9569 ± 2% sched_debug.cfs_rq:/.exec_clock.stddev 2449 ±173% +1297.3% 34233 ± 16% sched_debug.cfs_rq:/.load.min 137.71 ± 3% -16.3% 115.21 ± 7% sched_debug.cfs_rq:/.load_avg.avg 788.71 ± 11% -26.1% 582.54 ± 18% sched_debug.cfs_rq:/.load_avg.max 6.92 ± 13% +489.8% 40.79 ± 12% sched_debug.cfs_rq:/.load_avg.min 202.92 ± 8% -27.1% 147.99 ± 19% sched_debug.cfs_rq:/.load_avg.stddev 1576620 +28.1% 2018957 sched_debug.cfs_rq:/.min_vruntime.avg 1635360 +29.6% 2119584 sched_debug.cfs_rq:/.min_vruntime.max 1159560 +26.8% 1469979 sched_debug.cfs_rq:/.min_vruntime.min 111863 ± 2% +31.3% 146873 ± 3% sched_debug.cfs_rq:/.min_vruntime.stddev 0.04 ±173% +1300.0% 0.58 ± 14% sched_debug.cfs_rq:/.nr_running.min 0.33 ± 7% -44.0% 0.18 ± 11% sched_debug.cfs_rq:/.nr_running.stddev 1.03 ± 27% +113.6% 2.20 ± 14% sched_debug.cfs_rq:/.nr_spread_over.avg 4.83 ± 28% +490.5% 28.54 ± 17% sched_debug.cfs_rq:/.nr_spread_over.max 1.51 ± 23% +360.5% 6.94 ± 16% sched_debug.cfs_rq:/.nr_spread_over.stddev 95.24 ± 6% -26.6% 69.90 ± 12% sched_debug.cfs_rq:/.runnable_load_avg.avg 579.96 ± 10% -39.6% 350.29 ± 38% sched_debug.cfs_rq:/.runnable_load_avg.max 145.61 ± 13% -42.9% 83.20 ± 41% sched_debug.cfs_rq:/.runnable_load_avg.stddev 34888 ± 36% +130.2% 80299 ± 20% sched_debug.cfs_rq:/.spread0.max -440912 +29.1% -569300 sched_debug.cfs_rq:/.spread0.min 111863 ± 2% +31.3% 146867 ± 3% sched_debug.cfs_rq:/.spread0.stddev 1670 ± 8% -19.2% 1350 ± 7% sched_debug.cfs_rq:/.util_avg.max 79.79 ± 30% +349.8% 358.92 ± 31% sched_debug.cfs_rq:/.util_avg.min 414.50 ± 5% -44.1% 231.79 ± 13% sched_debug.cfs_rq:/.util_avg.stddev 566.45 ± 7% +22.4% 693.12 ± 9% sched_debug.cfs_rq:/.util_est_enqueued.avg 1393 ± 6% -25.6% 1036 ± 6% sched_debug.cfs_rq:/.util_est_enqueued.max 20.88 ±173% +1416.8% 316.62 ± 34% sched_debug.cfs_rq:/.util_est_enqueued.min 359.45 ± 7% -55.3% 160.66 ± 11% sched_debug.cfs_rq:/.util_est_enqueued.stddev 374128 ± 12% -49.1% 190255 ± 31% sched_debug.cpu.avg_idle.avg 617641 ± 7% -38.9% 377525 ± 27% sched_debug.cpu.avg_idle.max 88961 ± 11% -89.1% 9703 ± 60% sched_debug.cpu.avg_idle.min 137137 ± 7% -27.5% 99404 ± 25% sched_debug.cpu.avg_idle.stddev 0.86 ± 17% +137.8% 2.05 ± 18% sched_debug.cpu.clock.stddev 0.86 ± 16% +137.5% 2.05 ± 18% sched_debug.cpu.clock_task.stddev 90.01 ± 7% -20.8% 71.26 ± 10% sched_debug.cpu.cpu_load[0].avg 507.67 ± 10% -30.6% 352.42 ± 36% sched_debug.cpu.cpu_load[0].max 2.38 ±173% +1064.9% 27.67 ± 25% sched_debug.cpu.cpu_load[0].min 128.38 ± 11% -34.6% 83.99 ± 37% sched_debug.cpu.cpu_load[0].stddev 90.14 ± 9% -21.4% 70.84 ± 9% sched_debug.cpu.cpu_load[1].avg 500.12 ± 14% -32.2% 339.04 ± 38% sched_debug.cpu.cpu_load[1].max 21.17 ± 28% +47.8% 31.29 ± 16% sched_debug.cpu.cpu_load[1].min 117.56 ± 14% -34.3% 77.22 ± 41% sched_debug.cpu.cpu_load[1].stddev 88.92 ± 9% -21.7% 69.63 ± 8% sched_debug.cpu.cpu_load[2].avg 110.68 ± 15% -34.4% 72.64 ± 43% sched_debug.cpu.cpu_load[2].stddev 87.49 ± 9% -21.7% 68.48 ± 7% sched_debug.cpu.cpu_load[3].avg 481.67 ± 15% -32.0% 327.71 ± 38% sched_debug.cpu.cpu_load[3].max 107.04 ± 15% -34.3% 70.27 ± 44% sched_debug.cpu.cpu_load[3].stddev 86.54 ± 9% -21.7% 67.74 ± 7% sched_debug.cpu.cpu_load[4].avg 495.17 ± 15% -31.0% 341.50 ± 36% sched_debug.cpu.cpu_load[4].max 91.33 ± 97% +1064.7% 1063 ± 16% sched_debug.cpu.curr->pid.min 1148 ± 9% -45.6% 624.47 ± 11% sched_debug.cpu.curr->pid.stddev 102050 ± 5% -19.1% 82570 ± 10% sched_debug.cpu.load.avg 571024 ± 11% -23.8% 435288 ± 23% sched_debug.cpu.load.max 9574 ± 72% +254.1% 33908 ± 24% sched_debug.cpu.load.min 145258 ± 10% -28.5% 103804 ± 28% sched_debug.cpu.load.stddev 0.00 ± 11% +34.8% 0.00 ± 6% sched_debug.cpu.next_balance.stddev 0.21 ± 66% +200.0% 0.62 ± 22% sched_debug.cpu.nr_running.min 0.63 ± 4% -36.7% 0.40 ± 11% sched_debug.cpu.nr_running.stddev 889005 -30.0% 622125 sched_debug.cpu.nr_switches.avg 929176 -27.1% 677134 sched_debug.cpu.nr_switches.max 620397 -24.5% 468674 sched_debug.cpu.nr_switches.min 71419 ± 3% -19.3% 57628 ± 3% sched_debug.cpu.nr_switches.stddev 874.54 ± 8% -88.9% 97.08 ± 15% sched_debug.cpu.nr_uninterruptible.max -1512 -91.7% -125.08 sched_debug.cpu.nr_uninterruptible.min 586.61 ± 7% -89.1% 64.08 ± 7% sched_debug.cpu.nr_uninterruptible.stddev 886301 -30.1% 619461 sched_debug.cpu.sched_count.avg 920687 -27.0% 672173 sched_debug.cpu.sched_count.max 618791 -24.5% 467052 sched_debug.cpu.sched_count.min 70471 ± 3% -19.6% 56648 ± 3% sched_debug.cpu.sched_count.stddev 352570 -11.7% 311429 sched_debug.cpu.ttwu_count.avg 255008 -88.5% 29409 ± 3% sched_debug.cpu.ttwu_local.avg 272187 -87.7% 33582 ± 4% sched_debug.cpu.ttwu_local.max 171375 -90.9% 15547 ± 6% sched_debug.cpu.ttwu_local.min 22073 ± 3% -78.7% 4709 ± 5% sched_debug.cpu.ttwu_local.stddev unixbench.time.user_time 800 +-+-------O--O----OOOO------O--O--------------------------------------+ OOOOOOOOOO OO OOOO OOOOOOO OO OO | 700 +-++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 600 +-+ | | | 500 +-+ | | | 400 +-+ | | | 300 +-+ | 200 +-+ | | | 100 +-+ | | | 0 +-+-------O-----------------------------------------------------------+ unixbench.time.system_time 4000 +-+------------------------------------------------------------------+ OOOOOOOOOO OOOOOOOOOOOOOOOOOOOOOOO | 3500 +-++++++++++++++++ ++ + ++ + +++++++++ + + + + + ++++++++++++| 3000 +-+ + ++++++++ +++++ ++ ++ ++ + ++ + | | | 2500 +-+ | | | 2000 +-+ | | | 1500 +-+ | 1000 +-+ | | | 500 +-+ | | | 0 +-+-------O----------------------------------------------------------+ unixbench.time.minor_page_faults 6e+08 +-+-----------------------------------------------------------------+ OOOOOOOOOO OOOOOOOOOOOOOOOOOOOOOO | 5e+08 +-++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| | | | | 4e+08 +-+ | | | 3e+08 +-+ | | | 2e+08 +-+ | | | | | 1e+08 +-+ | | | 0 +-+-------O---------------------------------------------------------+ unixbench.time.voluntary_context_switches 1.2e+07 +-+---------------------------------------------------------------+ | | 1e+07 +OO O | O OOOOOOOO O OOOOOO O OO OOOOOOO | | O O O OOOO | 8e+06 +-+ | | | 6e+06 +-+ | | | 4e+06 +-+ | | + +++++ + + + + ++ ++ ++ +| |+++ ++++++++++++++++++++++++++ ++ +++++++++++++++++++++++ +++| 2e+06 +-+ | | | 0 +-+-------O-------------------------------------------------------+ unixbench.time.involuntary_context_switches 1e+07 +-+-----------------------------------------------------------------+ 9e+06 +-++++++++++++++++++ ++++ + ++++++++++++++ ++++ + ++++++++++++++++| | | 8e+06 +-+ | 7e+06 +-+ | | | 6e+06 +-+ | 5e+06 +-+ | 4e+06 +-+ | | | 3e+06 +-+ | 2e+06 +-+ | OOOOOOOOOO OOOOOOOOOOOOOOOOOOOOOO | 1e+06 +-+ | 0 +-+-------O---------------------------------------------------------+ unixbench.score 10000 +-+-----------------------------------------------------------------+ 9000 OOOOOOOOOO OOOOOOOOOOOOOOOOOOOOOO | |+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++| 8000 +-+ | 7000 +-+ | | | 6000 +-+ | 5000 +-+ | 4000 +-+ | | | 3000 +-+ | 2000 +-+ | | | 1000 +-+ | 0 +-+-------O---------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen