Greeting, FYI, we noticed a -30.5% regression of stress-ng.switch.ops_per_sec due to commit: commit: fd4d9c7d0c71866ec0c2825189ebd2ce35bd95b8 ("mm: slub: add missing TID bump in kmem_cache_alloc_bulk()") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: stress-ng on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory with following parameters: nr_threads: 100% disk: 1HDD testtime: 30s test: switch cpufreq_governor: performance ucode: 0x500002c If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/test/testcase/testtime/ucode: gcc-7/performance/1HDD/x86_64-rhel-7.6/100%/debian-x86_64-20191114.cgz/lkp-csl-2sp5/switch/stress-ng/30s/0x500002c commit: ac309e7744 ("Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/hid/hid") fd4d9c7d0c ("mm: slub: add missing TID bump in kmem_cache_alloc_bulk()") ac309e7744bee222 fd4d9c7d0c71866ec0c2825189e ---------------- --------------------------- %stddev %change %stddev \ | \ 69076693 -30.5% 47993323 stress-ng.switch.ops 2302520 -30.5% 1599758 stress-ng.switch.ops_per_sec 26.79 -9.0% 24.37 stress-ng.time.user_time 9242 ± 13% -16.2% 7749 ± 2% numa-meminfo.node0.KernelStack 2.86 ±100% -100.0% 0.00 iostat.sdb.await.max 2.86 ±100% -100.0% 0.00 iostat.sdb.r_await.max 9243 ± 13% -16.2% 7748 ± 2% numa-vmstat.node0.nr_kernel_stack 157380 ± 9% -60.3% 62515 ± 90% numa-vmstat.node0.numa_other 22499 ± 28% -41.5% 13173 ± 34% sched_debug.cfs_rq:/.spread0.max -3319 +252.7% -11706 sched_debug.cfs_rq:/.spread0.min -53.25 -45.1% -29.25 sched_debug.cpu.nr_uninterruptible.min 10425 ± 7% +13.3% 11813 ± 5% interrupts.CPU41.RES:Rescheduling_interrupts 10605 ± 2% +31.9% 13993 ± 23% interrupts.CPU46.RES:Rescheduling_interrupts 10804 ± 8% +13.0% 12211 ± 8% interrupts.CPU82.RES:Rescheduling_interrupts 10708 ± 3% +30.1% 13930 ± 22% interrupts.CPU94.RES:Rescheduling_interrupts 5456 ± 15% +71.7% 9369 ± 20% softirqs.CPU0.RCU 18494 ± 4% +6.9% 19771 ± 6% softirqs.CPU0.TIMER 20484 ± 14% -22.5% 15866 ± 9% softirqs.CPU27.TIMER 5114 ± 10% +64.9% 8433 ± 28% softirqs.CPU5.RCU 4841 ± 5% +45.6% 7047 ± 32% softirqs.CPU53.RCU 17421 ± 3% -9.3% 15796 ± 8% softirqs.CPU53.TIMER 18295 ± 4% -11.7% 16155 ± 7% softirqs.CPU59.TIMER 19446 ± 10% -13.6% 16803 ± 9% softirqs.CPU7.TIMER 4847 ± 7% +62.3% 7866 ± 43% softirqs.CPU8.RCU 18.36 +5.3% 19.33 perf-stat.i.MPKI 2.48 ± 3% +0.2 2.63 ± 2% perf-stat.i.cache-miss-rate% 17934024 ± 4% +10.0% 19730768 perf-stat.i.cache-misses 4.13 +4.9% 4.33 perf-stat.i.cpi 9504 -7.7% 8776 perf-stat.i.cycles-between-cache-misses 0.02 ± 3% +0.0 0.02 ± 5% perf-stat.i.dTLB-store-miss-rate% 58.48 -1.5 57.02 perf-stat.i.iTLB-load-miss-rate% 0.25 ± 2% -5.3% 0.23 perf-stat.i.ipc 94.99 -1.0 94.02 perf-stat.i.node-load-miss-rate% 6984752 ± 3% +8.0% 7545390 perf-stat.i.node-load-misses 336707 ± 4% +36.2% 458652 ± 2% perf-stat.i.node-loads 5585196 ± 3% +5.5% 5893365 perf-stat.i.node-store-misses 18.76 +4.2% 19.55 perf-stat.overall.MPKI 2.32 +0.2 2.52 ± 2% perf-stat.overall.cache-miss-rate% 4.21 +4.2% 4.38 perf-stat.overall.cpi 9662 -8.0% 8891 perf-stat.overall.cycles-between-cache-misses 0.02 ± 3% +0.0 0.02 ± 5% perf-stat.overall.dTLB-store-miss-rate% 58.68 -1.6 57.07 perf-stat.overall.iTLB-load-miss-rate% 987.32 +2.2% 1009 perf-stat.overall.instructions-per-iTLB-miss 0.24 -4.0% 0.23 perf-stat.overall.ipc 95.40 -1.1 94.27 perf-stat.overall.node-load-miss-rate% 17353488 ± 4% +10.0% 19087092 perf-stat.ps.cache-misses 4.863e+09 ± 3% -6.2% 4.562e+09 perf-stat.ps.dTLB-stores 6758402 ± 3% +8.0% 7299061 perf-stat.ps.node-load-misses 325857 ± 4% +36.2% 443722 ± 2% perf-stat.ps.node-loads 5404193 ± 3% +5.5% 5700934 perf-stat.ps.node-store-misses 1.275e+12 -6.1% 1.197e+12 ± 2% perf-stat.total.instructions 45.82 ± 36% -27.5 18.30 ± 60% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry 49.70 ± 31% -25.4 24.31 ± 41% perf-profile.calltrace.cycles-pp.secondary_startup_64 49.70 ± 31% -25.4 24.31 ± 41% perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64 49.70 ± 31% -25.4 24.31 ± 41% perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64 49.70 ± 31% -25.4 24.31 ± 41% perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64 49.13 ± 32% -24.8 24.31 ± 41% perf-profile.calltrace.cycles-pp.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64 48.65 ± 31% -24.3 24.31 ± 41% perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.do_idle.cpu_startup_entry.start_secondary 17.04 ± 85% +26.6 43.60 ± 25% perf-profile.calltrace.cycles-pp.ret_from_fork 17.04 ± 85% +26.6 43.60 ± 25% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork 14.96 ±100% +28.6 43.60 ± 25% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork 14.67 ±103% +28.9 43.60 ± 25% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork 12.30 ±133% +30.0 42.32 ± 29% perf-profile.calltrace.cycles-pp.memcpy_erms.drm_fb_helper_dirty_work.process_one_work.worker_thread.kthread 12.59 ±130% +31.3 43.88 ± 24% perf-profile.calltrace.cycles-pp.drm_fb_helper_dirty_work.process_one_work.worker_thread.kthread.ret_from_fork 45.82 ± 36% -27.5 18.30 ± 60% perf-profile.children.cycles-pp.intel_idle 49.70 ± 31% -25.4 24.31 ± 41% perf-profile.children.cycles-pp.secondary_startup_64 49.70 ± 31% -25.4 24.31 ± 41% perf-profile.children.cycles-pp.start_secondary 49.70 ± 31% -25.4 24.31 ± 41% perf-profile.children.cycles-pp.cpu_startup_entry 49.70 ± 31% -25.4 24.31 ± 41% perf-profile.children.cycles-pp.do_idle 49.13 ± 32% -24.8 24.31 ± 41% perf-profile.children.cycles-pp.cpuidle_enter 49.13 ± 32% -24.8 24.31 ± 41% perf-profile.children.cycles-pp.cpuidle_enter_state 17.04 ± 85% +26.6 43.60 ± 25% perf-profile.children.cycles-pp.ret_from_fork 17.04 ± 85% +26.6 43.60 ± 25% perf-profile.children.cycles-pp.kthread 14.96 ±100% +28.6 43.60 ± 25% perf-profile.children.cycles-pp.worker_thread 14.67 ±103% +28.9 43.60 ± 25% perf-profile.children.cycles-pp.process_one_work 12.59 ±130% +31.0 43.60 ± 25% perf-profile.children.cycles-pp.drm_fb_helper_dirty_work 12.59 ±130% +31.0 43.60 ± 25% perf-profile.children.cycles-pp.memcpy_erms 45.82 ± 36% -27.5 18.30 ± 60% perf-profile.self.cycles-pp.intel_idle 12.13 ±128% +31.5 43.60 ± 25% perf-profile.self.cycles-pp.memcpy_erms stress-ng.switch.ops 8e+07 +-------------------------------------------------------------------+ | | 7e+07 |-+...+....+ +.....+....+.....+ | 6e+07 |.. : : | | : : | 5e+07 |-+ O : O : O | | O : O O : O O O O O | 4e+07 |-+ : : | | : : | 3e+07 |-+ : : | 2e+07 |-+ : : | | : : | 1e+07 |-+ : : | | : : | 0 +-------------------------------------------------------------------+ stress-ng.switch.ops_per_sec 2.5e+06 +-----------------------------------------------------------------+ | ...+....+ +.....+....+.....+ | |.. : : | 2e+06 |-+ : : | | : : | | O O : O O O : O O O O O O | 1.5e+06 |-+ : : | | : : | 1e+06 |-+ : : | | : : | | : : | 500000 |-+ : : | | : : | | : : | 0 +-----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen