Greeting, FYI, we noticed a -10.5% regression of will-it-scale.per_thread_ops due to commit: commit: 76550daa4d1edaa8251460bd1d4a11b5df23c1c0 ("rcu: Support kfree_bulk() interface in kfree_rcu()") https://git.kernel.org/cgit/linux/kernel/git/paulmck/linux-rcu.git dev.2020.02.13c in testcase: will-it-scale on test machine: 288 threads Intel(R) Xeon Phi(TM) CPU 7295 @ 1.50GHz with 80G memory with following parameters: nr_task: 100% mode: thread test: open2 cpufreq_governor: performance ucode: 0x11 test-description: Will It Scale takes a testcase and runs it from 1 through to n parallel copies to see if the testcase will scale. It builds both a process and threads based test in order to see any differences between the two. test-url: https://github.com/antonblanchard/will-it-scale If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/kconfig/mode/nr_task/rootfs/tbox_group/test/testcase/ucode: gcc-7/performance/x86_64-rhel-7.6/thread/100%/debian-x86_64-20191114.cgz/lkp-knm01/open2/will-it-scale/0x11 commit: e8afd73e56 ("rcu: Don't flag non-starting GPs before GP kthread is running") 76550daa4d ("rcu: Support kfree_bulk() interface in kfree_rcu()") e8afd73e56e02b4d 76550daa4d1edaa8251460bd1d4 ---------------- --------------------------- fail:runs %reproduction fail:runs | | | 2:4 -50% :4 dmesg.WARNING:at_ip__fsnotify_parent/0x :4 25% 1:4 dmesg.WARNING:at_ip__slab_free/0x %stddev %change %stddev \ | \ 1021 -10.5% 914.50 will-it-scale.per_thread_ops 807081 -33.7% 534989 ± 2% will-it-scale.time.involuntary_context_switches 489.19 ± 9% -40.3% 292.19 ± 3% will-it-scale.time.user_time 294257 -10.5% 263455 will-it-scale.workload 1.68 ± 2% +30.8% 2.20 turbostat.RAMWatt 6340 -28.8% 4512 ± 2% vmstat.system.cs 0.09 ± 12% -0.0 0.06 ± 3% mpstat.cpu.all.soft% 0.71 ± 7% -0.2 0.49 ± 3% mpstat.cpu.all.usr% 999.25 ± 6% -17.3% 826.50 ± 13% slabinfo.skbuff_fclone_cache.active_objs 999.25 ± 6% -17.3% 826.50 ± 13% slabinfo.skbuff_fclone_cache.num_objs 19880 ± 5% -9.1% 18079 softirqs.CPU100.RCU 21159 ± 2% -6.7% 19750 ± 2% softirqs.CPU13.RCU 122401 ± 10% +23.7% 151373 ± 8% softirqs.CPU144.TIMER 21180 -8.5% 19379 softirqs.CPU15.RCU 21456 ± 7% -11.7% 18945 softirqs.CPU24.RCU 124140 ± 7% +12.3% 139469 ± 7% softirqs.CPU250.TIMER 20964 ± 11% -17.5% 17289 softirqs.CPU261.RCU 20945 -8.8% 19102 softirqs.CPU30.RCU 21274 -9.8% 19194 softirqs.CPU31.RCU 21024 ± 2% -10.3% 18853 softirqs.CPU33.RCU 20571 -8.9% 18746 softirqs.CPU35.RCU 21289 ± 2% -8.0% 19588 softirqs.CPU5.RCU 1199 ± 3% +11.2% 1332 ± 2% sched_debug.cfs_rq:/.exec_clock.stddev 969782 ± 8% -26.1% 716472 ± 22% sched_debug.cfs_rq:/.load.max 226.86 ± 6% -70.5% 66.95 ± 18% sched_debug.cfs_rq:/.nr_spread_over.avg 321.05 ± 5% -47.1% 169.95 ± 6% sched_debug.cfs_rq:/.nr_spread_over.max 107.45 ± 6% -68.8% 33.55 ± 26% sched_debug.cfs_rq:/.nr_spread_over.min 39.24 ± 17% -61.2% 15.24 ± 9% sched_debug.cfs_rq:/.nr_spread_over.stddev 969780 ± 8% -26.1% 716278 ± 22% sched_debug.cfs_rq:/.runnable_weight.max 1145 ± 21% -25.2% 856.65 ± 2% sched_debug.cfs_rq:/.util_est_enqueued.max 169.65 ± 46% -80.8% 32.65 ± 79% sched_debug.cfs_rq:/.util_est_enqueued.min 871.94 ± 5% +23.4% 1075 ± 3% sched_debug.cpu.clock.stddev 871.94 ± 5% +23.4% 1075 ± 3% sched_debug.cpu.clock_task.stddev 5190 ± 11% +22.1% 6338 ± 6% sched_debug.cpu.curr->pid.max 271.88 ± 6% +30.5% 354.73 ± 9% sched_debug.cpu.curr->pid.stddev 0.00 ± 5% +23.8% 0.00 ± 2% sched_debug.cpu.next_balance.stddev 5624 -13.1% 4890 sched_debug.cpu.nr_switches.avg 2344 ± 3% -31.0% 1617 ± 3% sched_debug.cpu.nr_switches.min 2803 -25.7% 2082 ± 2% sched_debug.cpu.sched_count.avg 1906 -30.9% 1316 sched_debug.cpu.sched_count.min 1322 -27.8% 954.60 ± 2% sched_debug.cpu.ttwu_count.avg 919.25 -32.6% 619.90 sched_debug.cpu.ttwu_count.min 1255 -29.3% 888.47 ± 2% sched_debug.cpu.ttwu_local.avg 887.45 -32.9% 595.50 sched_debug.cpu.ttwu_local.min 8.476e+09 +1.3% 8.589e+09 perf-stat.i.branch-instructions 48929091 -3.8% 47071099 perf-stat.i.cache-misses 6396 -30.2% 4467 ± 2% perf-stat.i.context-switches 12.78 -1.4% 12.61 perf-stat.i.cpi 9065 +3.7% 9404 perf-stat.i.cycles-between-cache-misses 71922406 ± 2% -7.7% 66350699 ± 5% perf-stat.i.iTLB-load-misses 3.475e+10 +1.3% 3.519e+10 perf-stat.i.iTLB-loads 3.47e+10 +1.2% 3.512e+10 perf-stat.i.instructions 482.82 ± 2% +10.0% 531.13 ± 6% perf-stat.i.instructions-per-iTLB-miss 0.08 +1.4% 0.08 perf-stat.i.ipc 12.81 -1.3% 12.64 perf-stat.overall.cpi 9077 +3.8% 9423 perf-stat.overall.cycles-between-cache-misses 0.21 ± 2% -0.0 0.19 ± 5% perf-stat.overall.iTLB-load-miss-rate% 483.60 ± 2% +9.9% 531.72 ± 6% perf-stat.overall.instructions-per-iTLB-miss 0.08 +1.3% 0.08 perf-stat.overall.ipc 36101370 +13.0% 40812598 perf-stat.overall.path-length 8.46e+09 +1.2% 8.565e+09 perf-stat.ps.branch-instructions 48864485 -3.9% 46979669 perf-stat.ps.cache-misses 6244 -29.3% 4417 ± 2% perf-stat.ps.context-switches 71662456 ± 2% -7.8% 66078101 ± 5% perf-stat.ps.iTLB-load-misses 3.466e+10 +1.2% 3.507e+10 perf-stat.ps.iTLB-loads 3.463e+10 +1.1% 3.502e+10 perf-stat.ps.instructions 1.062e+13 +1.2% 1.075e+13 perf-stat.total.instructions 1.31 ± 2% -0.2 1.16 ± 4% perf-profile.calltrace.cycles-pp.do_filp_open.do_sys_openat2.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe 1.26 ± 2% -0.1 1.13 ± 3% perf-profile.calltrace.cycles-pp.path_openat.do_filp_open.do_sys_openat2.do_sys_open.do_syscall_64 48.05 +0.1 48.16 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__alloc_fd.do_sys_openat2.do_sys_open 48.09 +0.1 48.21 perf-profile.calltrace.cycles-pp._raw_spin_lock.__alloc_fd.do_sys_openat2.do_sys_open.do_syscall_64 48.16 +0.2 48.31 perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath._raw_spin_lock.__close_fd.__x64_sys_close.do_syscall_64 48.20 +0.2 48.37 perf-profile.calltrace.cycles-pp._raw_spin_lock.__close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe 48.32 +0.2 48.49 perf-profile.calltrace.cycles-pp.__close_fd.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__GI___libc_close 48.24 +0.2 48.41 perf-profile.calltrace.cycles-pp.__alloc_fd.do_sys_openat2.do_sys_open.do_syscall_64.entry_SYSCALL_64_after_hwframe 48.46 +0.2 48.64 perf-profile.calltrace.cycles-pp.__x64_sys_close.do_syscall_64.entry_SYSCALL_64_after_hwframe.__GI___libc_close 1.32 ± 2% -0.2 1.16 ± 3% perf-profile.children.cycles-pp.do_filp_open 0.35 ± 25% -0.1 0.21 ± 14% perf-profile.children.cycles-pp.update_curr 1.27 ± 2% -0.1 1.14 ± 4% perf-profile.children.cycles-pp.path_openat 0.44 ± 2% -0.1 0.37 ± 2% perf-profile.children.cycles-pp.alloc_empty_file 0.19 ± 5% -0.1 0.12 ± 4% perf-profile.children.cycles-pp.syscall_return_via_sysret 0.33 ± 3% -0.1 0.27 ± 5% perf-profile.children.cycles-pp.__fput 0.50 ± 3% -0.1 0.44 ± 5% perf-profile.children.cycles-pp.exit_to_usermode_loop 0.15 ± 4% -0.1 0.09 ± 7% perf-profile.children.cycles-pp.security_file_alloc 0.45 ± 3% -0.1 0.40 ± 4% perf-profile.children.cycles-pp.task_work_run 0.39 ± 2% -0.0 0.35 ± 2% perf-profile.children.cycles-pp.__alloc_file 0.32 -0.0 0.29 ± 3% perf-profile.children.cycles-pp.link_path_walk 0.30 ± 5% -0.0 0.27 ± 4% perf-profile.children.cycles-pp.rcu_do_batch 0.30 ± 4% -0.0 0.28 ± 4% perf-profile.children.cycles-pp.rcu_core 0.22 -0.0 0.19 ± 2% perf-profile.children.cycles-pp.kmem_cache_free 0.12 ± 5% -0.0 0.10 ± 7% perf-profile.children.cycles-pp.inode_permission 0.06 ± 7% +0.0 0.07 ± 5% perf-profile.children.cycles-pp.___might_sleep 0.06 ± 11% +0.0 0.09 ± 5% perf-profile.children.cycles-pp.run_timer_softirq 0.26 ± 2% +0.0 0.29 ± 3% perf-profile.children.cycles-pp.kmem_cache_alloc 0.10 ± 9% +0.0 0.14 ± 8% perf-profile.children.cycles-pp.enqueue_hrtimer 0.08 ± 10% +0.0 0.13 ± 6% perf-profile.children.cycles-pp.timerqueue_add 0.04 ± 57% +0.1 0.09 ± 11% perf-profile.children.cycles-pp.expand_files 0.05 ± 8% +0.1 0.11 ± 7% perf-profile.children.cycles-pp.perf_event_task_tick 0.00 +0.1 0.06 ± 7% perf-profile.children.cycles-pp.__might_sleep 0.00 +0.1 0.06 ± 20% perf-profile.children.cycles-pp.locks_remove_posix 0.11 ± 7% +0.1 0.17 ± 17% perf-profile.children.cycles-pp._raw_spin_lock_irq 0.08 ± 15% +0.1 0.15 ± 19% perf-profile.children.cycles-pp.rcu_irq_enter 99.41 +0.1 99.49 perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe 99.37 +0.1 99.46 perf-profile.children.cycles-pp.do_syscall_64 48.32 +0.2 48.49 perf-profile.children.cycles-pp.__close_fd 48.27 +0.2 48.44 perf-profile.children.cycles-pp.__alloc_fd 48.46 +0.2 48.64 perf-profile.children.cycles-pp.__x64_sys_close 96.38 +0.3 96.64 perf-profile.children.cycles-pp._raw_spin_lock 96.31 +0.3 96.59 perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath 0.19 ± 5% -0.1 0.12 ± 4% perf-profile.self.cycles-pp.syscall_return_via_sysret 0.11 ± 4% -0.0 0.07 ± 7% perf-profile.self.cycles-pp.__alloc_file 0.21 ± 6% -0.0 0.18 ± 4% perf-profile.self.cycles-pp.file_free_rcu 0.08 ± 5% -0.0 0.07 ± 6% perf-profile.self.cycles-pp.inode_permission 0.15 ± 3% -0.0 0.14 ± 3% perf-profile.self.cycles-pp.kmem_cache_free 0.07 +0.0 0.09 perf-profile.self.cycles-pp.do_syscall_64 0.03 ±100% +0.0 0.06 ± 6% perf-profile.self.cycles-pp.run_timer_softirq 0.06 ± 6% +0.0 0.11 ± 8% perf-profile.self.cycles-pp.timerqueue_add 0.00 +0.1 0.05 ± 8% perf-profile.self.cycles-pp.__might_sleep 0.05 ± 8% +0.1 0.11 ± 7% perf-profile.self.cycles-pp.perf_event_task_tick 0.00 +0.1 0.06 ± 20% perf-profile.self.cycles-pp.locks_remove_posix 0.11 ± 4% +0.1 0.17 ± 17% perf-profile.self.cycles-pp._raw_spin_lock_irq 0.00 +0.1 0.06 ± 17% perf-profile.self.cycles-pp.expand_files 0.08 ± 15% +0.1 0.15 ± 19% perf-profile.self.cycles-pp.rcu_irq_enter 396.75 ± 31% +53.1% 607.25 ± 12% interrupts.32:IR-PCI-MSI.2097155-edge.eth0-TxRx-2 6719 ± 24% -42.5% 3865 interrupts.CPU0.NMI:Non-maskable_interrupts 6719 ± 24% -42.5% 3865 interrupts.CPU0.PMI:Performance_monitoring_interrupts 1541 ± 14% -38.1% 954.75 ± 32% interrupts.CPU0.RES:Rescheduling_interrupts 85.00 ± 34% -74.4% 21.75 ± 54% interrupts.CPU104.RES:Rescheduling_interrupts 67.75 ±109% -80.8% 13.00 ± 14% interrupts.CPU105.RES:Rescheduling_interrupts 5717 ± 32% -33.3% 3814 interrupts.CPU106.NMI:Non-maskable_interrupts 5717 ± 32% -33.3% 3814 interrupts.CPU106.PMI:Performance_monitoring_interrupts 3828 +73.2% 6630 ± 24% interrupts.CPU109.NMI:Non-maskable_interrupts 3828 +73.2% 6630 ± 24% interrupts.CPU109.PMI:Performance_monitoring_interrupts 396.75 ± 31% +53.1% 607.25 ± 12% interrupts.CPU12.32:IR-PCI-MSI.2097155-edge.eth0-TxRx-2 5694 ± 32% -33.7% 3777 interrupts.CPU131.NMI:Non-maskable_interrupts 5694 ± 32% -33.7% 3777 interrupts.CPU131.PMI:Performance_monitoring_interrupts 4725 ± 34% +58.8% 7506 interrupts.CPU173.NMI:Non-maskable_interrupts 4725 ± 34% +58.8% 7506 interrupts.CPU173.PMI:Performance_monitoring_interrupts 6578 ± 24% -42.9% 3756 interrupts.CPU179.NMI:Non-maskable_interrupts 6578 ± 24% -42.9% 3756 interrupts.CPU179.PMI:Performance_monitoring_interrupts 5716 ± 33% -17.2% 4735 ± 35% interrupts.CPU183.NMI:Non-maskable_interrupts 5716 ± 33% -17.2% 4735 ± 35% interrupts.CPU183.PMI:Performance_monitoring_interrupts 4686 ± 34% +41.7% 6639 ± 24% interrupts.CPU187.NMI:Non-maskable_interrupts 4686 ± 34% +41.7% 6639 ± 24% interrupts.CPU187.PMI:Performance_monitoring_interrupts 63.00 ± 81% -66.3% 21.25 ± 91% interrupts.CPU187.RES:Rescheduling_interrupts 5799 ± 32% -17.4% 4787 ± 33% interrupts.CPU20.NMI:Non-maskable_interrupts 5799 ± 32% -17.4% 4787 ± 33% interrupts.CPU20.PMI:Performance_monitoring_interrupts 3802 +48.6% 5649 ± 31% interrupts.CPU207.NMI:Non-maskable_interrupts 3802 +48.6% 5649 ± 31% interrupts.CPU207.PMI:Performance_monitoring_interrupts 51.25 ± 96% -82.4% 9.00 ± 7% interrupts.CPU211.RES:Rescheduling_interrupts 5692 ± 30% -17.6% 4690 ± 32% interrupts.CPU215.NMI:Non-maskable_interrupts 5692 ± 30% -17.6% 4690 ± 32% interrupts.CPU215.PMI:Performance_monitoring_interrupts 70.00 ±116% -80.7% 13.50 ± 49% interrupts.CPU217.RES:Rescheduling_interrupts 7.25 ± 35% +903.4% 72.75 ± 85% interrupts.CPU222.RES:Rescheduling_interrupts 3786 +49.4% 5655 ± 32% interrupts.CPU224.NMI:Non-maskable_interrupts 3786 +49.4% 5655 ± 32% interrupts.CPU224.PMI:Performance_monitoring_interrupts 3778 +48.2% 5599 ± 31% interrupts.CPU227.NMI:Non-maskable_interrupts 3778 +48.2% 5599 ± 31% interrupts.CPU227.PMI:Performance_monitoring_interrupts 66.75 ± 97% -86.1% 9.25 ± 20% interrupts.CPU228.RES:Rescheduling_interrupts 4710 ± 33% +60.1% 7539 interrupts.CPU240.NMI:Non-maskable_interrupts 4710 ± 33% +60.1% 7539 interrupts.CPU240.PMI:Performance_monitoring_interrupts 18.75 ± 40% +618.7% 134.75 ± 69% interrupts.CPU240.RES:Rescheduling_interrupts 12.50 ± 30% +582.0% 85.25 ± 81% interrupts.CPU25.RES:Rescheduling_interrupts 4672 ± 33% +39.4% 6515 ± 23% interrupts.CPU284.NMI:Non-maskable_interrupts 4672 ± 33% +39.4% 6515 ± 23% interrupts.CPU284.PMI:Performance_monitoring_interrupts 48.75 ±123% -78.5% 10.50 ± 24% interrupts.CPU284.RES:Rescheduling_interrupts 6705 ± 24% -28.7% 4781 ± 34% interrupts.CPU5.NMI:Non-maskable_interrupts 6705 ± 24% -28.7% 4781 ± 34% interrupts.CPU5.PMI:Performance_monitoring_interrupts 3801 +73.9% 6612 ± 24% interrupts.CPU52.NMI:Non-maskable_interrupts 3801 +73.9% 6612 ± 24% interrupts.CPU52.PMI:Performance_monitoring_interrupts 4731 ± 34% +40.3% 6638 ± 24% interrupts.CPU53.NMI:Non-maskable_interrupts 4731 ± 34% +40.3% 6638 ± 24% interrupts.CPU53.PMI:Performance_monitoring_interrupts 3808 +49.4% 5689 ± 32% interrupts.CPU54.NMI:Non-maskable_interrupts 3808 +49.4% 5689 ± 32% interrupts.CPU54.PMI:Performance_monitoring_interrupts 5676 ± 32% -16.8% 4720 ± 34% interrupts.CPU59.NMI:Non-maskable_interrupts 5676 ± 32% -16.8% 4720 ± 34% interrupts.CPU59.PMI:Performance_monitoring_interrupts 3778 +50.0% 5666 ± 32% interrupts.CPU63.NMI:Non-maskable_interrupts 3778 +50.0% 5666 ± 32% interrupts.CPU63.PMI:Performance_monitoring_interrupts 5739 ± 32% -18.0% 4706 ± 33% interrupts.CPU72.NMI:Non-maskable_interrupts 5739 ± 32% -18.0% 4706 ± 33% interrupts.CPU72.PMI:Performance_monitoring_interrupts 4750 ± 33% +59.4% 7571 interrupts.CPU87.NMI:Non-maskable_interrupts 4750 ± 33% +59.4% 7571 interrupts.CPU87.PMI:Performance_monitoring_interrupts 5720 ± 33% -34.0% 3778 interrupts.CPU98.NMI:Non-maskable_interrupts 5720 ± 33% -34.0% 3778 interrupts.CPU98.PMI:Performance_monitoring_interrupts will-it-scale.per_thread_ops 1040 +--------------------------------------------------------------------+ | +.. .+.| 1020 |-+ .+ + +.+ | 1000 |-+ .+. .+ + .+.. +.+ .+ .+..+.+. + | |.+.+.+. + + + + .+.+..+ + .+.+.+ +. + | 980 |-+ + + + + | 960 |-+ | | | 940 |-+ | 920 |-+ O O O O O O | | O O O O O O O | 900 |-+ O O O O O O O O | 880 |-+ O O O O O | | | 860 +--------------------------------------------------------------------+ will-it-scale.workload 300000 +------------------------------------------------------------------+ 295000 |-+ .| | + .+.+.+.+ | 290000 |-+ + : .+.+ + | 285000 |-+ .+.+.+..+ : .+. .+.+. .+.+. .+.+ + .. | |.+.+ + + +.+.+. +.+.+ +.+ | 280000 |-+ | 275000 |-+ | 270000 |-+ | | O O | 265000 |-+ O O O O O O | 260000 |-O O O O O O O O | | O O O O O O O O | 255000 |-+ O O | 250000 +------------------------------------------------------------------+ will-it-scale.time.involuntary_context_switches 850000 +------------------------------------------------------------------+ | .+.. +.+ +. + +.+.| 800000 |-+. .+.+ +. + : : + +. .+.+ +. .+.+.+.+ + + + | |+ + + : : + + +. + + + + + + | 750000 |-+ + + + +. .. | | + | 700000 |-+ | | | 650000 |-+ | | | 600000 |-+ | | | 550000 |-+ O O O O O O O | | O O O O O O O O O O O O O O O O O O O | 500000 +------------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen