Greeting, FYI, we noticed a -95.6% regression of stress-ng.vm-splice.ops_per_sec due to commit: commit: a308c71bf1e6e19cc2e4ced31853ee0fc7cb439a ("mm/gup: Remove enfornced COW mechanism") https://git.kernel.org/cgit/linux/kernel/git/torvalds/linux.git master in testcase: stress-ng on test machine: 96 threads Intel(R) Xeon(R) Gold 6252 CPU @ 2.10GHz with 192G memory with following parameters: nr_threads: 100% disk: 1HDD testtime: 30s class: pipe cpufreq_governor: performance ucode: 0x5002f01 If you fix the issue, kindly add following tag Reported-by: kernel test robot Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml ========================================================================================= class/compiler/cpufreq_governor/disk/kconfig/nr_threads/rootfs/tbox_group/testcase/testtime/ucode: pipe/gcc-9/performance/1HDD/x86_64-rhel-8.3/100%/debian-10.4-x86_64-20200603.cgz/lkp-csl-2sp5/stress-ng/30s/0x5002f01 commit: 1a0cf26323 ("mm/ksm: Remove reuse_ksm_page()") a308c71bf1 ("mm/gup: Remove enfornced COW mechanism") 1a0cf26323c80e2f a308c71bf1e6e19cc2e4ced3185 ---------------- --------------------------- %stddev %change %stddev \ | \ 18804 +2.1% 19191 stress-ng.time.system_time 1059 -35.9% 678.99 ± 4% stress-ng.time.user_time 1.556e+08 ± 7% +10.3% 1.716e+08 ± 4% stress-ng.time.voluntary_context_switches 3.406e+09 -95.6% 1.49e+08 stress-ng.vm-splice.ops 1.135e+08 -95.6% 4965911 stress-ng.vm-splice.ops_per_sec 5.11 -1.8 3.26 ± 4% mpstat.cpu.all.usr% 91.93 +2.0% 93.77 iostat.cpu.system 5.07 -36.2% 3.23 ± 4% iostat.cpu.user 28348 +6.7% 30235 ± 2% proc-vmstat.nr_kernel_stack 12148 ± 5% +11.2% 13513 ± 5% proc-vmstat.nr_page_table_pages 2767 ± 4% +8.0% 2987 ± 5% slabinfo.PING.active_objs 2767 ± 4% +8.0% 2987 ± 5% slabinfo.PING.num_objs 91.25 +1.9% 93.00 vmstat.cpu.sy 994917 ± 9% +11.4% 1108222 ± 4% vmstat.system.cs 13547 ± 6% -14.9% 11524 ± 9% numa-vmstat.node0.nr_slab_reclaimable 40767 ±112% +128.4% 93118 ± 61% numa-vmstat.node0.numa_other 12466 ± 6% +15.9% 14448 ± 6% numa-vmstat.node1.nr_slab_reclaimable 54181 ± 6% -14.9% 46101 ± 9% numa-meminfo.node0.KReclaimable 54181 ± 6% -14.9% 46101 ± 9% numa-meminfo.node0.SReclaimable 6689 ± 52% +73.2% 11588 ± 34% numa-meminfo.node1.Inactive 6689 ± 52% +73.2% 11588 ± 34% numa-meminfo.node1.Inactive(anon) 49867 ± 6% +15.9% 57793 ± 6% numa-meminfo.node1.KReclaimable 49867 ± 6% +15.9% 57793 ± 6% numa-meminfo.node1.SReclaimable 79.25 ± 12% -35.0% 51.50 ± 8% sched_debug.cfs_rq:/.nr_spread_over.max 9.76 ± 10% -22.7% 7.55 ± 9% sched_debug.cfs_rq:/.nr_spread_over.stddev 276.75 ± 11% -19.1% 224.01 ± 11% sched_debug.cfs_rq:/.runnable_avg.stddev 414265 ± 27% -44.1% 231543 ± 45% sched_debug.cfs_rq:/.spread0.avg 1199728 ± 14% -28.1% 863115 ± 23% sched_debug.cfs_rq:/.spread0.max 20.22 ± 49% -48.0% 10.52 ± 18% sched_debug.cpu.clock.stddev 1410577 ± 10% +13.1% 1594913 ± 5% sched_debug.cpu.nr_switches.avg 1409041 ± 10% +13.1% 1593368 ± 5% sched_debug.cpu.sched_count.avg 980469 ± 9% +12.5% 1102701 ± 4% sched_debug.cpu.ttwu_count.avg 866170 ± 10% +13.9% 986450 ± 5% sched_debug.cpu.ttwu_local.avg 38.20 ± 7% -6.3 31.86 ± 16% perf-profile.calltrace.cycles-pp.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 37.36 ± 7% -6.2 31.12 ± 16% perf-profile.calltrace.cycles-pp.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 33.38 ± 7% -5.7 27.65 ± 15% perf-profile.calltrace.cycles-pp.new_sync_read.vfs_read.ksys_read.do_syscall_64.entry_SYSCALL_64_after_hwframe 32.56 ± 7% -5.6 26.93 ± 15% perf-profile.calltrace.cycles-pp.pipe_read.new_sync_read.vfs_read.ksys_read.do_syscall_64 26.32 ± 7% -5.1 21.23 ± 14% perf-profile.calltrace.cycles-pp.__wake_up_common_lock.pipe_read.new_sync_read.vfs_read.ksys_read 26.00 ± 7% -5.1 20.92 ± 14% perf-profile.calltrace.cycles-pp.__wake_up_common.__wake_up_common_lock.pipe_read.new_sync_read.vfs_read 25.98 ± 7% -5.1 20.89 ± 14% perf-profile.calltrace.cycles-pp.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read.new_sync_read 25.96 ± 7% -5.1 20.88 ± 14% perf-profile.calltrace.cycles-pp.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock.pipe_read 24.74 ± 8% -5.0 19.70 ± 15% perf-profile.calltrace.cycles-pp.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common.__wake_up_common_lock 24.73 ± 8% -5.0 19.70 ± 15% perf-profile.calltrace.cycles-pp.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function.__wake_up_common 24.30 ± 8% -5.0 19.32 ± 15% perf-profile.calltrace.cycles-pp.enqueue_entity.enqueue_task_fair.ttwu_do_activate.try_to_wake_up.autoremove_wake_function 1.00 ± 23% +0.4 1.40 ± 16% perf-profile.calltrace.cycles-pp.arch_stack_walk.stack_trace_save_tsk.__account_scheduler_latency.update_stats_enqueue_sleeper.enqueue_entity 1.05 ± 23% +0.4 1.48 ± 16% perf-profile.calltrace.cycles-pp.stack_trace_save_tsk.__account_scheduler_latency.update_stats_enqueue_sleeper.enqueue_entity.enqueue_task_fair 11.09 +0.5 11.60 ± 3% perf-profile.calltrace.cycles-pp.do_select.core_sys_select.kern_select.__x64_sys_select.do_syscall_64 38.23 ± 7% -6.3 31.88 ± 16% perf-profile.children.cycles-pp.ksys_read 37.45 ± 7% -6.3 31.20 ± 16% perf-profile.children.cycles-pp.vfs_read 33.41 ± 7% -5.7 27.68 ± 15% perf-profile.children.cycles-pp.new_sync_read 32.63 ± 7% -5.6 26.99 ± 15% perf-profile.children.cycles-pp.pipe_read 25.98 ± 7% -5.1 20.90 ± 14% perf-profile.children.cycles-pp.autoremove_wake_function 0.09 ± 7% +0.0 0.11 ± 4% perf-profile.children.cycles-pp.pick_next_entity 0.27 ± 4% +0.0 0.31 ± 3% perf-profile.children.cycles-pp.load_new_mm_cr3 0.39 ± 4% +0.1 0.45 ± 3% perf-profile.children.cycles-pp.switch_mm_irqs_off 0.50 ± 4% +0.1 0.56 ± 2% perf-profile.children.cycles-pp.pick_next_task_fair 0.06 ± 60% +0.1 0.11 ± 21% perf-profile.children.cycles-pp.__mutex_unlock_slowpath 2.11 ± 4% +0.3 2.40 ± 4% perf-profile.children.cycles-pp.schedule 11.21 +0.5 11.71 ± 2% perf-profile.children.cycles-pp.do_select 0.05 +0.0 0.06 perf-profile.self.cycles-pp.finish_task_switch 0.05 ± 8% +0.0 0.07 ± 7% perf-profile.self.cycles-pp.dequeue_task_fair 0.23 ± 3% +0.0 0.26 ± 4% perf-profile.self.cycles-pp.__sched_text_start 0.27 ± 4% +0.0 0.31 ± 3% perf-profile.self.cycles-pp.load_new_mm_cr3 11.52 ± 3% +8.8% 12.53 perf-stat.i.MPKI 3.307e+10 -20.3% 2.636e+10 ± 2% perf-stat.i.branch-instructions 0.73 +0.0 0.75 perf-stat.i.branch-miss-rate% 1.957e+08 -27.3% 1.423e+08 ± 3% perf-stat.i.branch-misses 987322 ± 9% +13.0% 1115216 ± 4% perf-stat.i.context-switches 4.95 ± 3% +43.7% 7.11 perf-stat.i.cpi 4.722e+10 -20.9% 3.737e+10 ± 2% perf-stat.i.dTLB-loads 2.789e+10 -22.3% 2.168e+10 ± 3% perf-stat.i.dTLB-stores 1.659e+08 ± 3% -29.2% 1.175e+08 ± 5% perf-stat.i.iTLB-load-misses 1.654e+11 -21.1% 1.305e+11 ± 2% perf-stat.i.instructions 0.67 -20.7% 0.53 ± 2% perf-stat.i.ipc 1.19 ± 5% -23.8% 0.91 ± 5% perf-stat.i.metric.K/sec 1130 -21.0% 893.12 ± 3% perf-stat.i.metric.M/sec 91.95 +0.9 92.88 perf-stat.i.node-store-miss-rate% 2.03 ± 5% +31.1% 2.66 ± 5% perf-stat.overall.MPKI 0.59 -0.0 0.54 perf-stat.overall.branch-miss-rate% 1.54 +28.4% 1.98 ± 2% perf-stat.overall.cpi 0.00 ± 3% +0.0 0.00 ± 6% perf-stat.overall.dTLB-store-miss-rate% 97.08 -1.4 95.70 perf-stat.overall.iTLB-load-miss-rate% 991.26 ± 3% +10.9% 1099 ± 2% perf-stat.overall.instructions-per-iTLB-miss 0.65 -22.0% 0.51 ± 2% perf-stat.overall.ipc 87.39 +2.0 89.39 perf-stat.overall.node-store-miss-rate% 3.271e+10 -20.8% 2.592e+10 ± 3% perf-stat.ps.branch-instructions 1.945e+08 -27.4% 1.412e+08 ± 3% perf-stat.ps.branch-misses 1004771 ± 9% +11.5% 1119933 ± 4% perf-stat.ps.context-switches 4.669e+10 -21.3% 3.674e+10 ± 3% perf-stat.ps.dTLB-loads 2.756e+10 -22.7% 2.131e+10 ± 3% perf-stat.ps.dTLB-stores 1.652e+08 ± 3% -29.3% 1.168e+08 ± 5% perf-stat.ps.iTLB-load-misses 1.635e+11 -21.6% 1.283e+11 ± 3% perf-stat.ps.instructions 3.497e+13 -21.6% 2.741e+13 ± 3% perf-stat.total.instructions 6739 ± 14% +33.4% 8988 ± 12% softirqs.CPU14.SCHED 6775 ± 15% +36.0% 9217 ± 8% softirqs.CPU19.SCHED 7359 ± 17% +29.1% 9500 ± 14% softirqs.CPU2.SCHED 9708 ± 10% -19.5% 7811 ± 5% softirqs.CPU24.SCHED 9651 ± 8% -18.2% 7890 ± 3% softirqs.CPU25.SCHED 9444 ± 6% -18.1% 7737 ± 10% softirqs.CPU27.SCHED 9578 ± 10% -18.3% 7826 ± 5% softirqs.CPU30.SCHED 9469 ± 8% -16.8% 7882 ± 5% softirqs.CPU31.SCHED 9826 ± 8% -16.1% 8246 ± 7% softirqs.CPU32.SCHED 9319 ± 9% -15.0% 7922 ± 5% softirqs.CPU34.SCHED 10167 ± 10% -20.1% 8123 ± 3% softirqs.CPU35.SCHED 9208 ± 8% -15.6% 7772 ± 5% softirqs.CPU36.SCHED 9641 ± 6% -17.8% 7921 ± 7% softirqs.CPU37.SCHED 9399 ± 7% -16.0% 7896 ± 6% softirqs.CPU38.SCHED 9275 ± 6% -15.4% 7851 ± 6% softirqs.CPU39.SCHED 9937 ± 12% -21.6% 7786 ± 4% softirqs.CPU40.SCHED 9259 ± 6% -16.2% 7758 ± 6% softirqs.CPU41.SCHED 9490 ± 6% -16.7% 7901 ± 4% softirqs.CPU42.SCHED 9504 ± 8% -18.6% 7736 ± 5% softirqs.CPU43.SCHED 9602 ± 8% -17.4% 7926 ± 2% softirqs.CPU44.SCHED 9564 ± 7% -18.5% 7796 ± 5% softirqs.CPU46.SCHED 6581 ± 12% +26.9% 8353 ± 14% softirqs.CPU50.SCHED 6864 ± 17% +27.7% 8762 ± 16% softirqs.CPU55.SCHED 6495 ± 10% +31.4% 8537 ± 12% softirqs.CPU61.SCHED 6294 ± 2% +42.8% 8985 ± 20% softirqs.CPU69.SCHED 9312 ± 8% -20.1% 7439 ± 8% softirqs.CPU72.SCHED 9408 ± 6% -19.1% 7615 ± 5% softirqs.CPU74.SCHED 9348 ± 8% -15.9% 7863 ± 5% softirqs.CPU75.SCHED 9506 ± 7% -17.4% 7855 ± 5% softirqs.CPU76.SCHED 9433 ± 8% -16.8% 7851 ± 5% softirqs.CPU77.SCHED 9585 ± 10% -19.3% 7737 ± 5% softirqs.CPU78.SCHED 9381 ± 9% -17.2% 7765 ± 4% softirqs.CPU79.SCHED 9559 ± 7% -16.7% 7964 ± 5% softirqs.CPU80.SCHED 9357 ± 6% -15.3% 7927 ± 6% softirqs.CPU81.SCHED 9710 ± 9% -19.4% 7825 ± 6% softirqs.CPU82.SCHED 9761 ± 9% -19.8% 7827 ± 5% softirqs.CPU83.SCHED 9433 ± 6% -16.7% 7856 ± 5% softirqs.CPU84.SCHED 9325 ± 7% -15.7% 7863 ± 5% softirqs.CPU85.SCHED 9609 ± 8% -18.9% 7796 ± 4% softirqs.CPU88.SCHED 9290 ± 7% -17.6% 7654 ± 4% softirqs.CPU89.SCHED 9595 ± 9% -18.5% 7821 ± 5% softirqs.CPU90.SCHED 9528 ± 9% -18.6% 7752 ± 4% softirqs.CPU91.SCHED 9660 ± 8% -19.5% 7773 ± 5% softirqs.CPU92.SCHED 9641 ± 8% -17.5% 7949 ± 5% softirqs.CPU93.SCHED 9710 ± 4% -18.9% 7875 ± 5% softirqs.CPU94.SCHED 10107 ± 9% -22.6% 7821 ± 6% softirqs.CPU95.SCHED 1731 ±114% -88.0% 208.50 ±173% interrupts.43:PCI-MSI.31981576-edge.i40e-eth0-TxRx-7 5625 ± 41% +73.0% 9730 ± 28% interrupts.CPU11.CAL:Function_call_interrupts 3192 ± 51% +500.0% 19155 ± 99% interrupts.CPU12.CAL:Function_call_interrupts 5652 ± 59% +387.9% 27579 ± 96% interrupts.CPU13.CAL:Function_call_interrupts 3389 ± 92% +206.3% 10382 ± 17% interrupts.CPU16.CAL:Function_call_interrupts 3212 ± 64% +781.8% 28328 ± 55% interrupts.CPU18.CAL:Function_call_interrupts 5503 ± 51% +261.3% 19884 ± 65% interrupts.CPU19.CAL:Function_call_interrupts 4305 ± 6% +37.2% 5908 ± 24% interrupts.CPU20.RES:Rescheduling_interrupts 3653 ± 36% +248.2% 12720 ± 12% interrupts.CPU21.CAL:Function_call_interrupts 4937 ± 36% +91.9% 9477 ± 52% interrupts.CPU22.CAL:Function_call_interrupts 23314 ± 20% -75.5% 5720 ± 41% interrupts.CPU25.CAL:Function_call_interrupts 22533 ± 8% -71.8% 6360 ± 35% interrupts.CPU26.CAL:Function_call_interrupts 24106 ± 13% -62.7% 8987 ± 57% interrupts.CPU27.CAL:Function_call_interrupts 6111 ± 17% +367.9% 28593 ± 55% interrupts.CPU27.RES:Rescheduling_interrupts 28249 ± 32% -69.3% 8675 ± 50% interrupts.CPU28.CAL:Function_call_interrupts 5584 ± 26% +402.4% 28060 ± 26% interrupts.CPU28.RES:Rescheduling_interrupts 19155 ± 12% -52.9% 9015 ± 61% interrupts.CPU29.CAL:Function_call_interrupts 4946 ± 18% +368.8% 23186 ± 70% interrupts.CPU29.RES:Rescheduling_interrupts 25158 ± 27% -73.4% 6694 ± 69% interrupts.CPU30.CAL:Function_call_interrupts 20595 ± 28% -63.0% 7620 ± 63% interrupts.CPU31.CAL:Function_call_interrupts 6658 ± 23% -30.6% 4620 ± 33% interrupts.CPU32.NMI:Non-maskable_interrupts 6658 ± 23% -30.6% 4620 ± 33% interrupts.CPU32.PMI:Performance_monitoring_interrupts 19923 ± 27% -70.4% 5906 ± 43% interrupts.CPU33.CAL:Function_call_interrupts 21664 ± 35% -68.5% 6813 ± 44% interrupts.CPU35.CAL:Function_call_interrupts 18049 ± 22% -63.7% 6560 ± 47% interrupts.CPU36.CAL:Function_call_interrupts 22299 ± 15% -60.0% 8913 ± 44% interrupts.CPU37.CAL:Function_call_interrupts 16997 ± 16% -63.7% 6173 ± 56% interrupts.CPU38.CAL:Function_call_interrupts 22528 ± 43% -62.8% 8375 ± 41% interrupts.CPU39.CAL:Function_call_interrupts 5373 ± 39% +100.0% 10743 ± 19% interrupts.CPU4.CAL:Function_call_interrupts 20817 ± 16% -63.8% 7530 ± 62% interrupts.CPU40.CAL:Function_call_interrupts 18927 ± 33% -67.6% 6125 ± 35% interrupts.CPU41.CAL:Function_call_interrupts 28780 ± 37% -66.3% 9688 ± 58% interrupts.CPU42.CAL:Function_call_interrupts 23586 ± 20% -72.2% 6561 ± 37% interrupts.CPU43.CAL:Function_call_interrupts 21970 ± 14% -72.1% 6136 ± 53% interrupts.CPU44.CAL:Function_call_interrupts 21667 ± 21% -47.7% 11326 ± 36% interrupts.CPU47.CAL:Function_call_interrupts 4314 ± 42% +170.1% 11653 ± 44% interrupts.CPU5.CAL:Function_call_interrupts 3969 ± 69% +176.8% 10988 ± 33% interrupts.CPU53.CAL:Function_call_interrupts 8525 ± 37% -33.5% 5672 ± 10% interrupts.CPU58.RES:Rescheduling_interrupts 4521 ± 28% +255.7% 16080 ± 32% interrupts.CPU61.CAL:Function_call_interrupts 3924 ± 50% +242.6% 13445 ± 39% interrupts.CPU63.CAL:Function_call_interrupts 3298 ± 45% +255.7% 11732 ± 44% interrupts.CPU69.CAL:Function_call_interrupts 1730 ±114% -88.0% 208.25 ±173% interrupts.CPU7.43:PCI-MSI.31981576-edge.i40e-eth0-TxRx-7 21881 ± 31% -64.7% 7726 ± 47% interrupts.CPU73.CAL:Function_call_interrupts 23394 ± 27% -64.5% 8308 ± 66% interrupts.CPU74.CAL:Function_call_interrupts 6742 ± 17% +312.2% 27796 ± 38% interrupts.CPU75.RES:Rescheduling_interrupts 28453 ± 45% -72.8% 7728 ± 49% interrupts.CPU76.CAL:Function_call_interrupts 5517 ± 43% +354.3% 25062 ± 20% interrupts.CPU76.RES:Rescheduling_interrupts 4801 ± 16% +433.6% 25624 ± 55% interrupts.CPU77.RES:Rescheduling_interrupts 25911 ± 15% -67.8% 8348 ± 58% interrupts.CPU78.CAL:Function_call_interrupts 21248 ± 16% -67.7% 6863 ± 40% interrupts.CPU80.CAL:Function_call_interrupts 21098 ± 23% -66.8% 6996 ± 34% interrupts.CPU83.CAL:Function_call_interrupts 19567 ± 20% -52.4% 9306 ± 64% interrupts.CPU84.CAL:Function_call_interrupts 16578 ± 20% -58.4% 6902 ± 56% interrupts.CPU87.CAL:Function_call_interrupts 21823 ± 28% -75.4% 5363 ± 41% interrupts.CPU89.CAL:Function_call_interrupts 24409 ± 22% -52.0% 11718 ± 49% interrupts.CPU90.CAL:Function_call_interrupts 20699 ± 21% -74.0% 5389 ± 40% interrupts.CPU91.CAL:Function_call_interrupts 18159 ± 14% -63.7% 6587 ± 67% interrupts.CPU92.CAL:Function_call_interrupts 22992 ± 23% -56.8% 9925 ± 38% interrupts.CPU93.CAL:Function_call_interrupts 5656 ± 20% +368.6% 26505 ± 47% interrupts.CPU93.RES:Rescheduling_interrupts 21995 ± 26% -64.6% 7782 ± 68% interrupts.CPU94.CAL:Function_call_interrupts 868396 ± 3% +20.9% 1050234 ± 14% interrupts.RES:Rescheduling_interrupts stress-ng.time.system_time 19300 +-------------------------------------------------------------------+ | | 19200 |-+ O O O O | | O O O O O O | | O O | 19100 |-+O O O O | | O O | 19000 |-+ | | | 18900 |-+ | | | | ..+.. .+...+..+... | 18800 |.. ..+..+...+.. +... .+. +. +..| | +...+..+...+.. .+. .. +. | 18700 +-------------------------------------------------------------------+ stress-ng.vm-splice.ops 3.5e+09 +-----------------------------------------------------------------+ | | 3e+09 |-+ | | | 2.5e+09 |-+ | | | 2e+09 |-+ | | | 1.5e+09 |-+ | | | 1e+09 |-+ | | | 5e+08 |-+ | | O O O O O O O O O O O O O O O O O O | 0 +-----------------------------------------------------------------+ stress-ng.vm-splice.ops_per_sec 1.2e+08 +-----------------------------------------------------------------+ |..+...+..+..+...+..+..+..+...+..+..+...+..+..+...+..+..+..+...+..| 1e+08 |-+ | | | | | 8e+07 |-+ | | | 6e+07 |-+ | | | 4e+07 |-+ | | | | | 2e+07 |-+ | | O O O O O O O O O O O O O O O O O O | 0 +-----------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Rong Chen