FYI, we noticed a -2.8% regression of aim9.fork_test.ops_per_sec due to commit: commit 5903b0cc463db12dd495942d405e581783074905 ("sched: propagate load during synchronous attach/detach") https://git.linaro.org/people/vincent.guittot/kernel.git sched/pelt in testcase: aim9 on test machine: 144 threads Intel(R) Xeon(R) CPU E7-8890 v3 @ 2.50GHz with 512G memory with following parameters: testtime: 300s test: fork_test cpufreq_governor: performance Suite IX is the "AIM Independent Resource Benchmark:" the famous synthetic benchmark. Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime: gcc-6/performance/x86_64-rhel-7.2/debian-x86_64-2016-08-31.cgz/lkp-hsx04/fork_test/aim9/300s commit: c3c8a02759 ("sched: factorize PELT update") 5903b0cc46 ("sched: propagate load during synchronous attach/detach") c3c8a027596a40e1 5903b0cc463db12dd495942d40 ---------------- -------------------------- %stddev %change %stddev \ | \ 5463 ± 0% -2.8% 5308 ± 0% aim9.fork_test.ops_per_sec 9613 ± 0% +125.0% 21630 ± 0% aim9.time.involuntary_context_switches 2553 ± 1% -3.3% 2468 ± 0% aim9.time.maximum_resident_set_size 3265963 ± 0% -3.4% 3154181 ± 0% aim9.time.voluntary_context_switches 452695 ± 2% +3.7% 469383 ± 1% interrupts.CAL:Function_call_interrupts 937.75 ± 0% +28.4% 1204 ± 1% proc-vmstat.nr_page_table_pages 24930 ± 0% -1.4% 24586 ± 0% vmstat.system.cs 648224 ± 1% +37.4% 890843 ± 0% meminfo.Committed_AS 3736 ± 1% +28.9% 4815 ± 1% meminfo.PageTables 13.25 ± 6% -66.0% 4.50 ±100% numa-numastat.node2.other_node 5803 ± 34% +99.7% 11587 ± 20% numa-numastat.node3.numa_foreign 5803 ± 34% +99.7% 11587 ± 20% numa-numastat.node3.numa_miss 3264239 ± 7% +80.3% 5885636 ± 6% cpuidle.C1-HSW.time 2.079e+08 ± 4% +31.6% 2.736e+08 ± 2% cpuidle.C1E-HSW.time 3.677e+08 ± 0% -16.9% 3.056e+08 ± 0% cpuidle.C3-HSW.time 1722328 ± 0% -19.2% 1390958 ± 0% cpuidle.C3-HSW.usage 3.853e+09 ± 1% -55.5% 1.716e+09 ± 2% cpuidle.POLL.time 11.80 ± 1% -43.2% 6.70 ± 1% turbostat.%Busy 324.25 ± 1% -41.1% 191.00 ± 1% turbostat.Avg_MHz 0.19 ± 0% +67.1% 0.32 ± 1% turbostat.CPU%c3 20.25 ± 4% -58.4% 8.41 ± 3% turbostat.Pkg%pc2 279.46 ± 0% -4.3% 267.33 ± 0% turbostat.PkgWatt 808.25 ± 23% +65.2% 1335 ± 18% numa-meminfo.node0.PageTables 6746 ±173% +210.1% 20919 ± 48% numa-meminfo.node1.AnonHugePages 42119 ± 6% -27.7% 30432 ± 29% numa-meminfo.node2.Active 35629 ± 6% -32.6% 24019 ± 37% numa-meminfo.node2.Active(anon) 26794 ± 7% -62.2% 10127 ± 71% numa-meminfo.node2.AnonHugePages 34316 ± 5% -52.0% 16479 ± 55% numa-meminfo.node2.AnonPages 521152 ± 3% +19.9% 625041 ± 9% numa-meminfo.node3.MemUsed 13644 ± 31% -65.4% 4725 ± 86% numa-meminfo.node3.Shmem 201.75 ± 23% +62.6% 328.00 ± 17% numa-vmstat.node0.nr_page_table_pages 8909 ± 6% -32.6% 6008 ± 37% numa-vmstat.node2.nr_active_anon 8581 ± 5% -52.0% 4122 ± 55% numa-vmstat.node2.nr_anon_pages 8909 ± 6% -32.6% 6008 ± 37% numa-vmstat.node2.nr_zone_active_anon 12.00 ± 5% -70.8% 3.50 ±109% numa-vmstat.node2.numa_other 3410 ± 31% -65.4% 1181 ± 86% numa-vmstat.node3.nr_shmem 82956 ± 2% +6.7% 88479 ± 2% numa-vmstat.node3.numa_foreign 82956 ± 2% +6.7% 88479 ± 2% numa-vmstat.node3.numa_miss 1.02e+12 ± 8% -40.2% 6.097e+11 ± 6% perf-stat.branch-instructions 0.40 ± 9% +133.3% 0.93 ± 4% perf-stat.branch-miss-rate% 4.049e+09 ± 4% +40.0% 5.668e+09 ± 1% perf-stat.branch-misses 9.08 ± 3% -22.5% 7.04 ± 2% perf-stat.cache-miss-rate% 2.399e+09 ± 2% -6.8% 2.236e+09 ± 2% perf-stat.cache-misses 2.644e+10 ± 2% +20.1% 3.176e+10 ± 0% perf-stat.cache-references 7515466 ± 0% -1.4% 7410587 ± 0% perf-stat.context-switches 1.611e+13 ± 8% -38.7% 9.871e+12 ± 6% perf-stat.cpu-cycles 84523 ± 0% +26.1% 106582 ± 1% perf-stat.cpu-migrations 0.17 ± 0% +89.9% 0.32 ± 2% perf-stat.dTLB-load-miss-rate% 1.562e+09 ± 0% +16.3% 1.817e+09 ± 1% perf-stat.dTLB-load-misses 9.298e+11 ± 0% -38.8% 5.687e+11 ± 1% perf-stat.dTLB-loads 2.945e+08 ± 1% +11.0% 3.269e+08 ± 0% perf-stat.dTLB-store-misses 2.385e+11 ± 4% +7.9% 2.574e+11 ± 0% perf-stat.dTLB-stores 53.06 ± 0% +5.9% 56.17 ± 2% perf-stat.iTLB-load-miss-rate% 4.535e+08 ± 1% +10.7% 5.022e+08 ± 4% perf-stat.iTLB-load-misses 4.011e+08 ± 0% -2.4% 3.915e+08 ± 0% perf-stat.iTLB-loads 4.305e+12 ± 7% -38.1% 2.666e+12 ± 6% perf-stat.instructions 9488 ± 7% -43.9% 5318 ± 8% perf-stat.instructions-per-iTLB-miss 0.27 ± 0% +1.0% 0.27 ± 0% perf-stat.ipc 99.18 ± 0% -1.4% 97.79 ± 0% perf-stat.node-load-miss-rate% 1.427e+09 ± 1% -11.8% 1.258e+09 ± 2% perf-stat.node-load-misses 11841576 ± 0% +140.6% 28485390 ± 3% perf-stat.node-loads 78.98 ± 0% -4.9% 75.11 ± 0% perf-stat.node-store-miss-rate% 3.361e+08 ± 0% -10.5% 3.008e+08 ± 0% perf-stat.node-store-misses 89434382 ± 2% +11.4% 99662914 ± 2% perf-stat.node-stores 276.81 ± 8% +76.5% 488.51 ± 1% sched_debug.cfs_rq:/.exec_clock.min 193902 ± 41% -39.1% 118044 ± 44% sched_debug.cfs_rq:/.load.max 182.25 ± 5% +35.5% 247.03 ± 7% sched_debug.cfs_rq:/.load_avg.avg 297.25 ± 9% +55.3% 461.54 ± 2% sched_debug.cfs_rq:/.load_avg.max 142.33 ± 5% +25.0% 177.96 ± 9% sched_debug.cfs_rq:/.load_avg.min 27.50 ± 13% +86.3% 51.24 ± 13% sched_debug.cfs_rq:/.load_avg.stddev 52563 ± 12% +64.8% 86617 ± 9% sched_debug.cfs_rq:/.min_vruntime.avg 111274 ± 8% +103.8% 226757 ± 14% sched_debug.cfs_rq:/.min_vruntime.max 15219 ± 15% +126.2% 34421 ± 6% sched_debug.cfs_rq:/.min_vruntime.min 20511 ± 6% +96.3% 40265 ± 2% sched_debug.cfs_rq:/.min_vruntime.stddev 0.31 ± 5% +334.2% 1.33 ± 38% sched_debug.cfs_rq:/.runnable_load_avg.avg 28.79 ± 5% +173.1% 78.62 ± 34% sched_debug.cfs_rq:/.runnable_load_avg.max 2.55 ± 4% +219.9% 8.15 ± 35% sched_debug.cfs_rq:/.runnable_load_avg.stddev -34683 ±-23% +81.7% -63037 ± -9% sched_debug.cfs_rq:/.spread0.avg 24059 ± 54% +220.6% 77132 ± 33% sched_debug.cfs_rq:/.spread0.max -72035 ±-15% +60.0% -115239 ± -9% sched_debug.cfs_rq:/.spread0.min 20515 ± 6% +96.3% 40269 ± 2% sched_debug.cfs_rq:/.spread0.stddev 85.16 ± 4% +162.5% 223.52 ± 3% sched_debug.cfs_rq:/.util_avg.avg 393.46 ± 8% +117.0% 853.88 ± 10% sched_debug.cfs_rq:/.util_avg.max 48.57 ± 4% +260.1% 174.88 ± 3% sched_debug.cfs_rq:/.util_avg.stddev 179135 ± 5% +20.2% 215299 ± 4% sched_debug.cpu.avg_idle.stddev 0.30 ± 11% +262.6% 1.07 ± 45% sched_debug.cpu.cpu_load[0].avg 28.37 ± 8% +112.3% 60.25 ± 44% sched_debug.cpu.cpu_load[0].max 2.49 ± 8% +141.4% 6.00 ± 49% sched_debug.cpu.cpu_load[0].stddev 0.32 ± 7% +2166.1% 7.17 ± 13% sched_debug.cpu.cpu_load[1].avg 27.75 ± 5% +403.9% 139.83 ± 17% sched_debug.cpu.cpu_load[1].max 2.45 ± 5% +762.8% 21.16 ± 10% sched_debug.cpu.cpu_load[1].stddev 0.30 ± 7% +1627.4% 5.21 ± 11% sched_debug.cpu.cpu_load[2].avg 26.00 ± 5% +295.7% 102.88 ± 18% sched_debug.cpu.cpu_load[2].max 2.29 ± 5% +556.8% 15.02 ± 9% sched_debug.cpu.cpu_load[2].stddev 0.29 ± 7% +1185.9% 3.78 ± 10% sched_debug.cpu.cpu_load[3].avg 23.38 ± 4% +207.7% 71.92 ± 15% sched_debug.cpu.cpu_load[3].max 2.11 ± 5% +381.6% 10.15 ± 6% sched_debug.cpu.cpu_load[3].stddev 0.27 ± 8% +890.3% 2.71 ± 10% sched_debug.cpu.cpu_load[4].avg 19.08 ± 8% +152.2% 48.12 ± 15% sched_debug.cpu.cpu_load[4].max 1.81 ± 8% +268.4% 6.68 ± 5% sched_debug.cpu.cpu_load[4].stddev 195679 ± 41% -40.0% 117329 ± 45% sched_debug.cpu.load.max 4594 ± 9% +75.9% 8079 ± 1% sched_debug.cpu.nr_switches.min -28.83 ±-48% -58.4% -12.00 ±-21% sched_debug.cpu.nr_uninterruptible.min 4.73 ± 14% -28.2% 3.40 ± 11% sched_debug.cpu.nr_uninterruptible.stddev 4317 ± 9% +80.7% 7800 ± 1% sched_debug.cpu.sched_count.min 2102 ± 9% +81.1% 3807 ± 1% sched_debug.cpu.sched_goidle.min 2333 ± 8% +68.8% 3938 ± 1% sched_debug.cpu.ttwu_count.min 102.08 ± 2% +38.4% 141.29 ± 4% sched_debug.cpu.ttwu_local.min 0.81 ± 8% +110.5% 1.70 ± 11% perf-profile.calltrace.cycles-pp.__do_page_fault.do_page_fault.page_fault 0.58 ± 7% +78.1% 1.04 ± 10% perf-profile.calltrace.cycles-pp.__hrtimer_run_queues.hrtimer_interrupt.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt 1.31 ± 4% +54.4% 2.02 ± 10% perf-profile.calltrace.cycles-pp._do_fork.sys_clone.do_syscall_64.return_from_SYSCALL_64 2.25 ± 10% +68.9% 3.80 ± 7% perf-profile.calltrace.cycles-pp.apic_timer_interrupt.cpuidle_enter.call_cpuidle.cpu_startup_entry.start_secondary 1.14 ± 3% +56.7% 1.78 ± 10% perf-profile.calltrace.cycles-pp.copy_process._do_fork.sys_clone.do_syscall_64.return_from_SYSCALL_64 1.62 ± 9% +50.3% 2.44 ± 7% perf-profile.calltrace.cycles-pp.do_exit.do_group_exit.sys_exit_group.entry_SYSCALL_64_fastpath 1.64 ± 9% +50.3% 2.46 ± 7% perf-profile.calltrace.cycles-pp.do_group_exit.sys_exit_group.entry_SYSCALL_64_fastpath 0.81 ± 7% +111.1% 1.71 ± 11% perf-profile.calltrace.cycles-pp.do_page_fault.page_fault 1.31 ± 4% +54.6% 2.02 ± 10% perf-profile.calltrace.cycles-pp.do_syscall_64.return_from_SYSCALL_64 1.83 ± 7% +51.8% 2.77 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_fastpath 1.20 ± 11% +64.7% 1.98 ± 7% perf-profile.calltrace.cycles-pp.exit_mmap.mmput.do_exit.do_group_exit.sys_exit_group 0.00 ± -1% +Inf% 0.87 ± 16% perf-profile.calltrace.cycles-pp.filemap_map_pages.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 0.69 ± 8% +111.6% 1.46 ± 13% perf-profile.calltrace.cycles-pp.handle_mm_fault.__do_page_fault.do_page_fault.page_fault 0.79 ± 5% +75.0% 1.38 ± 10% perf-profile.calltrace.cycles-pp.hrtimer_interrupt.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter 32.78 ± 0% +15.3% 37.81 ± 2% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry 0.66 ± 22% +48.7% 0.99 ± 7% perf-profile.calltrace.cycles-pp.irq_exit.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle 0.85 ± 6% +77.6% 1.50 ± 9% perf-profile.calltrace.cycles-pp.local_apic_timer_interrupt.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle 1.21 ± 11% +63.9% 1.99 ± 7% perf-profile.calltrace.cycles-pp.mmput.do_exit.do_group_exit.sys_exit_group.entry_SYSCALL_64_fastpath 0.81 ± 8% +110.1% 1.71 ± 11% perf-profile.calltrace.cycles-pp.page_fault 58.29 ± 0% -17.4% 48.18 ± 4% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.call_cpuidle.cpu_startup_entry 1.31 ± 4% +54.6% 2.02 ± 10% perf-profile.calltrace.cycles-pp.return_from_SYSCALL_64 2.09 ± 11% +69.0% 3.54 ± 7% perf-profile.calltrace.cycles-pp.smp_apic_timer_interrupt.apic_timer_interrupt.cpuidle_enter.call_cpuidle.cpu_startup_entry 1.31 ± 4% +54.4% 2.02 ± 10% perf-profile.calltrace.cycles-pp.sys_clone.do_syscall_64.return_from_SYSCALL_64 1.64 ± 9% +50.3% 2.46 ± 7% perf-profile.calltrace.cycles-pp.sys_exit_group.entry_SYSCALL_64_fastpath 0.85 ± 7% +107.0% 1.77 ± 11% perf-profile.children.cycles-pp.__do_page_fault 0.63 ± 6% +76.6% 1.11 ± 9% perf-profile.children.cycles-pp.__hrtimer_run_queues 1.32 ± 4% +54.4% 2.03 ± 10% perf-profile.children.cycles-pp._do_fork 2.60 ± 7% +57.3% 4.09 ± 6% perf-profile.children.cycles-pp.apic_timer_interrupt 1.15 ± 3% +55.9% 1.79 ± 10% perf-profile.children.cycles-pp.copy_process 1.62 ± 9% +50.3% 2.44 ± 7% perf-profile.children.cycles-pp.do_exit 1.64 ± 9% +50.5% 2.47 ± 7% perf-profile.children.cycles-pp.do_group_exit 0.86 ± 8% +107.6% 1.78 ± 11% perf-profile.children.cycles-pp.do_page_fault 1.33 ± 3% +54.5% 2.06 ± 10% perf-profile.children.cycles-pp.do_syscall_64 1.92 ± 7% +52.2% 2.92 ± 8% perf-profile.children.cycles-pp.entry_SYSCALL_64_fastpath 1.21 ± 11% +65.0% 1.99 ± 7% perf-profile.children.cycles-pp.exit_mmap 0.40 ± 16% +120.1% 0.88 ± 16% perf-profile.children.cycles-pp.filemap_map_pages 0.70 ± 8% +111.0% 1.48 ± 13% perf-profile.children.cycles-pp.handle_mm_fault 0.85 ± 5% +73.1% 1.46 ± 10% perf-profile.children.cycles-pp.hrtimer_interrupt 32.86 ± 0% +15.8% 38.04 ± 3% perf-profile.children.cycles-pp.intel_idle 0.73 ± 20% +48.3% 1.08 ± 5% perf-profile.children.cycles-pp.irq_exit 0.90 ± 5% +76.2% 1.59 ± 9% perf-profile.children.cycles-pp.local_apic_timer_interrupt 1.22 ± 12% +64.1% 2.00 ± 7% perf-profile.children.cycles-pp.mmput 0.86 ± 8% +107.6% 1.79 ± 11% perf-profile.children.cycles-pp.page_fault 58.52 ± 0% -17.4% 48.32 ± 4% perf-profile.children.cycles-pp.poll_idle 1.33 ± 3% +54.5% 2.06 ± 10% perf-profile.children.cycles-pp.return_from_SYSCALL_64 2.46 ± 8% +56.5% 3.86 ± 6% perf-profile.children.cycles-pp.smp_apic_timer_interrupt 1.32 ± 4% +54.0% 2.02 ± 10% perf-profile.children.cycles-pp.sys_clone 1.64 ± 9% +50.5% 2.47 ± 7% perf-profile.children.cycles-pp.sys_exit_group 32.85 ± 0% +15.8% 38.04 ± 3% perf-profile.self.cycles-pp.intel_idle 58.52 ± 0% -17.4% 48.32 ± 4% perf-profile.self.cycles-pp.poll_idle perf-stat.dTLB-loads 1.2e+12 ++----------------------------------------------------------------+ | .* | 1e+12 *+ .*. + .*. .*. .*.. .*. | | * *.*.*.* * *.*..*.* * * *.*.*..*.* *.*..*.*.* | : : : : | 8e+11 ++: : : : | | : : O O O : : | 6e+11 ++O: :O O O O O O O O O O : : | O :O : O O O : : | 4e+11 ++ : : : : | | : : : : | | : : : : | 2e+11 ++ :: : | | : : | 0 ++--*--------------------------------------------------*----------+ perf-stat.node-loads 3.5e+07 ++----------------------------------------------------------------+ | | 3e+07 ++ O O O O O O | O O O O O O O O O | 2.5e+07 ++ O O O O | | | 2e+07 ++ | | | 1.5e+07 ++ | *.* *.*. .*.*..*.*.*.*.*..*.*.*.*.*..*.*.*.*.*..*.* *.*..*.*.* 1e+07 ++: : * : : | | : : : : | 5e+06 ++ : : : : | | :: : | 0 ++--*--------------------------------------------------*----------+ perf-stat.cpu-migrations 120000 ++-----------------------------------------------------------------+ | O O O O | 100000 O+O O O O O O O O O O O O O O | | | *.* *.*. .*..*.*.*.*..*.*.*.*..*.*.*.*.*..*.*.*.*..* *.*..*.*.* 80000 ++: : * : : | | : : : : | 60000 ++ : : : : | | : : : : | 40000 ++ : : : : | | : : : : | | : : : : | 20000 ++ :: : | | : : | 0 ++--*---------------------------------------------------*----------+ perf-stat.node-load-miss-rate_ 100 O+O--O-O-O--O-O-O--O-O-O--O-O-O--O-O-O--O-O-*--*-*-*--*-*----*-*-*--*-* 90 ++: : : : | | : : : : | 80 ++ : : : : | 70 ++ : : : : | | : : : : | 60 ++ : : : : | 50 ++ : : : : | 40 ++ : : : : | | : : : : | 30 ++ : : : : | 20 ++ :: :: | | : : | 10 ++ : : | 0 ++---*----------------------------------------------------*-----------+ turbostat.Avg_MHz 400 ++--------------------------------------------------------------------+ | .*..* .* | 350 *+* *.*..*.* + .*.. .* + .*.*.*..*. .*..*.* *. .* 300 ++: : * *.*.*. *. * : : *.*..* | | : : : : | 250 ++ : : : : | | : : O O O O O : : | 200 O+O: O:O O O O O O O O O O : : | | : : O : : | 150 ++ : : : : | 100 ++ : : : : | | : : :: | 50 ++ : :: | | : : | 0 ++---*----------------------------------------------------*-----------+ turbostat._Busy 14 ++---------------------------------------------------------------------+ *. .*.*. .*. .*. | 12 ++* *.*..*.*. *. *.*..*.*. *.*..*.*.*..*.*.*..* *.*.*..*.* | : : : : | 10 ++ : : : : | | : : : : | 8 ++ : : O O O O O : : | O O: O:O O O O O O O O O O : : | 6 ++ : : O : : | | : : : : | 4 ++ : : : : | | : : :: | 2 ++ : :: | | : : | 0 ++---*-----------------------------------------------------*-----------+ aim9.time.involuntary_context_switches 25000 ++------------------------------------------------------------------+ | | O O O O O O O O O O O O O O O O | 20000 ++ O O O | | | | | 15000 ++ | | | 10000 ++ *. .*. .*..*. .*.* | *.* : *.*. *.* *.*.*..*.*.*..*.*.*.*..*.* : *.*.*..*.* | : : : : | 5000 ++ : : : : | | : : : : | | :: : | 0 ++---*---------------------------------------------------*----------+ [*] bisect-good sample [O] bisect-bad sample Thanks, Xiaolong