Greeting, FYI, we noticed a 92.0% improvement of fsmark.files_per_sec due to commit: commit: 70bed0d5447e08702c7595d26c88ca37e8eb88b4 ("block: add a bdev_write_cache helper") git://git.infradead.org/users/hch/block.git block-api in testcase: fsmark on test machine: 128 threads 2 sockets Intel(R) Xeon(R) Platinum 8358 CPU @ 2.60GHz with 128G memory with following parameters: iterations: 1x nr_threads: 1t disk: 1HDD fs: btrfs fs2: nfsv4 filesize: 4K test_size: 40M sync_method: fsyncBeforeClose nr_files_per_directory: 1fpd cpufreq_governor: performance ucode: 0xd000331 test-description: The fsmark is a file system benchmark to test synchronous write workloads, for example, mail servers workload. test-url: https://sourceforge.net/projects/fsmark/ Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/intel/lkp-tests.git cd lkp-tests sudo bin/lkp install job.yaml # job file is attached in this email bin/lkp split-job --compatible job.yaml # generate the yaml file for lkp run sudo bin/lkp run generated-yaml-file # if come across any failure that blocks the test, # please remove ~/.lkp and /lkp dir to run from a clean state. ========================================================================================= compiler/cpufreq_governor/disk/filesize/fs2/fs/iterations/kconfig/nr_files_per_directory/nr_threads/rootfs/sync_method/tbox_group/test_size/testcase/ucode: gcc-11/performance/1HDD/4K/nfsv4/btrfs/1x/x86_64-rhel-8.3/1fpd/1t/debian-10.4-x86_64-20200603.cgz/fsyncBeforeClose/lkp-icl-2sp6/40M/fsmark/0xd000331 commit: 6cccbfebc0 ("block: add a bdev_nonrot helper") 70bed0d544 ("block: add a bdev_write_cache helper") 6cccbfebc02395ae 70bed0d5447e08702c7595d26c8 ---------------- --------------------------- %stddev %change %stddev \ | \ 19.10 +92.0% 36.67 fsmark.files_per_sec 536.13 -47.9% 279.40 fsmark.time.elapsed_time 536.13 -47.9% 279.40 fsmark.time.elapsed_time.max 53273 +2.7% 54708 fsmark.time.voluntary_context_switches 1.49 -2.1% 1.46 iostat.cpu.iowait 908369 ± 17% -39.4% 550808 ± 28% numa-numastat.node1.numa_hit 6.694e+10 -48.0% 3.482e+10 cpuidle..time 1.385e+08 -47.7% 72505602 cpuidle..usage 0.03 +0.0 0.04 ± 3% mpstat.cpu.all.sys% 0.01 ± 3% +0.0 0.01 ± 4% mpstat.cpu.all.usr% 577.53 -44.4% 321.22 uptime.boot 70827 -44.7% 39155 uptime.idle 2334 +102.7% 4732 vmstat.io.bo 3380 +45.5% 4919 vmstat.system.cs 1.38e+08 -47.7% 72098570 turbostat.IRQ 22732 ± 12% -38.8% 13910 ± 6% turbostat.POLL 51.67 -3.9% 49.67 ± 2% turbostat.PkgTmp 134519 +15.4% 155275 meminfo.Active 10873 ± 3% -32.7% 7312 meminfo.Active(anon) 123645 +19.7% 147962 meminfo.Active(file) 29545 -12.3% 25909 meminfo.Shmem 256478 -36.6% 162537 ± 39% numa-meminfo.node0.AnonHugePages 7918 ± 30% -55.5% 3522 ± 9% numa-meminfo.node1.Active(anon) 20189 ± 46% +484.8% 118058 ± 66% numa-meminfo.node1.AnonPages 55896 ± 34% +176.2% 154400 ± 47% numa-meminfo.node1.AnonPages.max 25261 ± 31% +383.3% 122094 ± 64% numa-meminfo.node1.Inactive(anon) 1467 ± 16% +26.8% 1860 ± 11% numa-meminfo.node1.PageTables 12916 ± 22% -45.2% 7081 ± 55% numa-meminfo.node1.Shmem 1978 ± 30% -55.5% 880.00 ± 9% numa-vmstat.node1.nr_active_anon 5049 ± 46% +484.5% 29514 ± 66% numa-vmstat.node1.nr_anon_pages 6319 ± 31% +383.1% 30528 ± 64% numa-vmstat.node1.nr_inactive_anon 366.00 ± 17% +26.8% 464.17 ± 10% numa-vmstat.node1.nr_page_table_pages 3231 ± 22% -45.1% 1773 ± 55% numa-vmstat.node1.nr_shmem 1978 ± 30% -55.5% 880.00 ± 9% numa-vmstat.node1.nr_zone_active_anon 6319 ± 31% +383.1% 30528 ± 64% numa-vmstat.node1.nr_zone_inactive_anon 907485 ± 17% -39.2% 551338 ± 28% numa-vmstat.node1.numa_hit 3311 +42.4% 4714 perf-stat.i.context-switches 133.20 +1.8% 135.58 perf-stat.i.cpu-migrations 2.952e+08 +4.3% 3.078e+08 perf-stat.i.dTLB-loads 1.587e+08 +4.3% 1.655e+08 perf-stat.i.dTLB-stores 2945 +4.7% 3084 perf-stat.i.minor-faults 94.72 -1.8 92.97 perf-stat.i.node-load-miss-rate% 6976 ± 19% +65.2% 11527 ± 14% perf-stat.i.node-loads 56884 ± 12% +51.6% 86264 ± 6% perf-stat.i.node-stores 2946 +4.7% 3085 perf-stat.i.page-faults 92.90 -2.4 90.53 perf-stat.overall.node-load-miss-rate% 3305 +42.1% 4697 perf-stat.ps.context-switches 2.946e+08 +4.1% 3.067e+08 perf-stat.ps.dTLB-loads 1.584e+08 +4.1% 1.649e+08 perf-stat.ps.dTLB-stores 2939 +4.5% 3072 perf-stat.ps.minor-faults 6962 ± 19% +64.9% 11483 ± 14% perf-stat.ps.node-loads 56769 ± 12% +51.4% 85938 ± 6% perf-stat.ps.node-stores 2940 +4.5% 3073 perf-stat.ps.page-faults 5.8e+11 ± 3% -46.4% 3.106e+11 ± 4% perf-stat.total.instructions 2718 ± 3% -32.8% 1826 proc-vmstat.nr_active_anon 30918 +19.5% 36954 proc-vmstat.nr_active_file 82517 +2.3% 84385 proc-vmstat.nr_anon_pages 170379 +5.1% 179015 proc-vmstat.nr_dirtied 160.83 +32.7% 213.50 proc-vmstat.nr_dirty 87111 +2.3% 89076 proc-vmstat.nr_inactive_anon 9165 +1.9% 9340 proc-vmstat.nr_mapped 1104 +7.4% 1186 proc-vmstat.nr_page_table_pages 7386 -12.3% 6475 proc-vmstat.nr_shmem 170150 +5.0% 178704 proc-vmstat.nr_written 2718 ± 3% -32.8% 1826 proc-vmstat.nr_zone_active_anon 30918 +19.5% 36954 proc-vmstat.nr_zone_active_file 87111 +2.3% 89076 proc-vmstat.nr_zone_inactive_anon 161.33 +33.5% 215.33 proc-vmstat.nr_zone_write_pending 1722532 -29.5% 1214402 proc-vmstat.numa_hit 1606723 -31.6% 1098636 proc-vmstat.numa_local 1722459 -29.5% 1214419 proc-vmstat.pgalloc_normal 1723177 -42.0% 999350 proc-vmstat.pgfault 1598401 -32.6% 1077857 proc-vmstat.pgfree 1260337 +6.1% 1337822 proc-vmstat.pgpgout 145698 -44.0% 81595 proc-vmstat.pgreuse 34.69 ± 24% +42.8% 49.55 ± 16% sched_debug.cfs_rq:/.load_avg.avg 49.89 ± 7% +55.5% 77.57 ± 4% sched_debug.cfs_rq:/.runnable_avg.avg 633.30 ± 2% +15.3% 730.40 ± 7% sched_debug.cfs_rq:/.runnable_avg.max 116.72 ± 8% +25.6% 146.60 ± 6% sched_debug.cfs_rq:/.runnable_avg.stddev 49.75 ± 7% +55.3% 77.28 ± 4% sched_debug.cfs_rq:/.util_avg.avg 632.78 ± 2% +15.1% 728.53 ± 7% sched_debug.cfs_rq:/.util_avg.max 116.60 ± 8% +25.5% 146.31 ± 6% sched_debug.cfs_rq:/.util_avg.stddev 4.18 ± 22% +53.9% 6.44 ± 15% sched_debug.cfs_rq:/.util_est_enqueued.avg 178.60 ± 10% +39.6% 249.40 ± 10% sched_debug.cfs_rq:/.util_est_enqueued.max 22.91 ± 16% +41.2% 32.34 ± 8% sched_debug.cfs_rq:/.util_est_enqueued.stddev 116236 ± 8% +24.6% 144804 ± 7% sched_debug.cpu.avg_idle.stddev 259878 ± 5% -38.2% 160679 sched_debug.cpu.clock.avg 259881 ± 5% -38.2% 160683 sched_debug.cpu.clock.max 259874 ± 5% -38.2% 160675 sched_debug.cpu.clock.min 1.97 ± 7% +14.2% 2.26 ± 9% sched_debug.cpu.clock.stddev 255028 ± 4% -38.2% 157678 sched_debug.cpu.clock_task.avg 255665 ± 5% -38.2% 158126 sched_debug.cpu.clock_task.max 249556 ± 5% -39.2% 151775 sched_debug.cpu.clock_task.min 11619 ± 3% -22.5% 9002 sched_debug.cpu.curr->pid.max 1173 ± 4% -9.7% 1059 ± 3% sched_debug.cpu.curr->pid.stddev 0.03 ± 7% +26.7% 0.03 ± 5% sched_debug.cpu.nr_running.avg 0.15 ± 2% +10.2% 0.16 ± 2% sched_debug.cpu.nr_running.stddev 8223 ± 4% -15.8% 6924 sched_debug.cpu.nr_switches.avg 1411 ± 9% -22.7% 1090 ± 13% sched_debug.cpu.nr_switches.min 259875 ± 5% -38.2% 160676 sched_debug.cpu_clk 259153 ± 5% -38.3% 159957 sched_debug.ktime 261040 ± 5% -38.2% 161334 sched_debug.sched_clk 53.97 ± 6% -6.4 47.54 ± 2% perf-profile.calltrace.cycles-pp.mwait_idle_with_hints.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call 54.36 ± 6% -6.4 47.99 ± 2% perf-profile.calltrace.cycles-pp.intel_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 84.36 -2.3 82.02 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry 92.94 -2.0 90.93 perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.secondary_startup_64_no_verify 85.53 -2.0 83.56 perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.secondary_startup_64_no_verify 0.92 ± 11% +0.1 1.07 ± 4% perf-profile.calltrace.cycles-pp.rcu_idle_exit.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 0.29 ±100% +0.4 0.74 ± 10% perf-profile.calltrace.cycles-pp.rcu_core.__softirqentry_text_start.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt 0.29 ±101% +0.5 0.81 ± 11% perf-profile.calltrace.cycles-pp.worker_thread.kthread.ret_from_fork 0.10 ±223% +0.5 0.64 ± 10% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork 3.04 ± 8% +0.5 3.58 ± 10% perf-profile.calltrace.cycles-pp.__softirqentry_text_start.__irq_exit_rcu.sysvec_apic_timer_interrupt.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state 1.18 ± 7% +0.9 2.05 ± 18% perf-profile.calltrace.cycles-pp.ret_from_fork 1.18 ± 7% +0.9 2.05 ± 18% perf-profile.calltrace.cycles-pp.kthread.ret_from_fork 28.17 ± 7% +3.8 31.99 ± 2% perf-profile.calltrace.cycles-pp.asm_sysvec_apic_timer_interrupt.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle 54.25 ± 6% -6.4 47.90 ± 2% perf-profile.children.cycles-pp.mwait_idle_with_hints 54.65 ± 6% -6.3 48.32 ± 2% perf-profile.children.cycles-pp.intel_idle 86.06 -2.0 84.03 perf-profile.children.cycles-pp.cpuidle_enter_state 86.29 -2.0 84.28 perf-profile.children.cycles-pp.cpuidle_enter 93.82 -2.0 91.85 perf-profile.children.cycles-pp.cpuidle_idle_call 0.07 ± 21% +0.0 0.11 ± 12% perf-profile.children.cycles-pp.can_stop_idle_tick 0.05 ± 50% +0.0 0.09 ± 26% perf-profile.children.cycles-pp.mmap_region 0.04 ± 47% +0.0 0.09 ± 22% perf-profile.children.cycles-pp.call_transmit 0.04 ± 47% +0.0 0.09 ± 22% perf-profile.children.cycles-pp.xprt_transmit 0.06 ± 11% +0.0 0.11 ± 24% perf-profile.children.cycles-pp.process_backlog 0.06 ± 17% +0.0 0.11 ± 20% perf-profile.children.cycles-pp.__local_bh_enable_ip 0.04 ± 72% +0.0 0.09 ± 29% perf-profile.children.cycles-pp.handle_irq_event 0.04 ± 72% +0.0 0.09 ± 29% perf-profile.children.cycles-pp.__handle_irq_event_percpu 0.05 ± 45% +0.1 0.10 ± 20% perf-profile.children.cycles-pp.ip6_protocol_deliver_rcu 0.05 ± 45% +0.1 0.10 ± 20% perf-profile.children.cycles-pp.tcp_v6_rcv 0.04 ± 74% +0.1 0.10 ± 27% perf-profile.children.cycles-pp.rpc_async_schedule 0.07 ± 23% +0.1 0.12 ± 21% perf-profile.children.cycles-pp.ip6_finish_output2 0.04 ± 72% +0.1 0.09 ± 30% perf-profile.children.cycles-pp.__common_interrupt 0.05 ± 45% +0.1 0.10 ± 20% perf-profile.children.cycles-pp.ip6_input_finish 0.05 ± 46% +0.1 0.10 ± 19% perf-profile.children.cycles-pp.__netif_receive_skb_one_core 0.06 ± 13% +0.1 0.11 ± 21% perf-profile.children.cycles-pp.__napi_poll 0.31 ± 10% +0.1 0.37 ± 4% perf-profile.children.cycles-pp.error_entry 0.07 ± 23% +0.1 0.13 ± 18% perf-profile.children.cycles-pp.ip6_xmit 0.04 ± 73% +0.1 0.09 ± 23% perf-profile.children.cycles-pp.xs_tcp_send_request 0.06 ± 13% +0.1 0.12 ± 18% perf-profile.children.cycles-pp.net_rx_action 0.04 ± 73% +0.1 0.09 ± 22% perf-profile.children.cycles-pp.xprt_request_transmit 0.04 ± 71% +0.1 0.09 ± 23% perf-profile.children.cycles-pp.tcp_v6_do_rcv 0.04 ± 71% +0.1 0.09 ± 23% perf-profile.children.cycles-pp.tcp_rcv_established 0.02 ±145% +0.1 0.08 ± 26% perf-profile.children.cycles-pp.inode_permission 0.07 ± 23% +0.1 0.13 ± 17% perf-profile.children.cycles-pp.inet6_csk_xmit 0.08 ± 17% +0.1 0.14 ± 14% perf-profile.children.cycles-pp.__tcp_transmit_skb 0.05 ± 48% +0.1 0.11 ± 20% perf-profile.children.cycles-pp.rpc_run_task 0.04 ± 71% +0.1 0.10 ± 23% perf-profile.children.cycles-pp.queue_work_on 0.05 ± 46% +0.1 0.11 ± 20% perf-profile.children.cycles-pp.rpc_execute 0.08 ± 23% +0.1 0.15 ± 23% perf-profile.children.cycles-pp.svc_recv 0.08 ± 25% +0.1 0.15 ± 38% perf-profile.children.cycles-pp.do_softirq 0.07 ± 9% +0.1 0.14 ± 16% perf-profile.children.cycles-pp.__tcp_push_pending_frames 0.07 ± 11% +0.1 0.14 ± 16% perf-profile.children.cycles-pp.tcp_write_xmit 0.10 ± 23% +0.1 0.18 ± 12% perf-profile.children.cycles-pp.__rpc_execute 0.08 ± 14% +0.1 0.15 ± 14% perf-profile.children.cycles-pp.__queue_work 0.07 ± 10% +0.1 0.15 ± 15% perf-profile.children.cycles-pp.tcp_sock_set_cork 0.15 ± 16% +0.1 0.24 ± 14% perf-profile.children.cycles-pp.perf_trace_sched_wakeup_template 0.13 ± 27% +0.1 0.23 ± 24% perf-profile.children.cycles-pp.open 0.22 ± 12% +0.1 0.32 ± 11% perf-profile.children.cycles-pp.try_to_wake_up 0.18 ± 18% +0.1 0.30 ± 19% perf-profile.children.cycles-pp.perf_trace_sched_switch 0.03 ±100% +0.1 0.16 ± 45% perf-profile.children.cycles-pp.btree_csum_one_bio 0.03 ±100% +0.1 0.16 ± 45% perf-profile.children.cycles-pp.csum_one_extent_buffer 0.29 ± 17% +0.2 0.44 ± 10% perf-profile.children.cycles-pp.unwind_next_frame 0.32 ± 27% +0.2 0.48 ± 14% perf-profile.children.cycles-pp.io_serial_in 0.40 ± 17% +0.2 0.59 ± 10% perf-profile.children.cycles-pp.get_perf_callchain 0.40 ± 17% +0.2 0.59 ± 10% perf-profile.children.cycles-pp.perf_callchain 0.34 ± 16% +0.2 0.53 ± 10% perf-profile.children.cycles-pp.perf_callchain_kernel 0.45 ± 18% +0.2 0.64 ± 10% perf-profile.children.cycles-pp.process_one_work 0.43 ± 16% +0.2 0.62 ± 10% perf-profile.children.cycles-pp.perf_prepare_sample 0.36 ± 19% +0.2 0.58 ± 12% perf-profile.children.cycles-pp.note_gp_changes 0.48 ± 16% +0.2 0.71 ± 11% perf-profile.children.cycles-pp.perf_event_output_forward 0.55 ± 12% +0.2 0.79 ± 8% perf-profile.children.cycles-pp.rcu_core 0.48 ± 15% +0.2 0.72 ± 11% perf-profile.children.cycles-pp.__perf_event_overflow 0.50 ± 15% +0.2 0.75 ± 11% perf-profile.children.cycles-pp.perf_tp_event 0.52 ± 14% +0.3 0.81 ± 11% perf-profile.children.cycles-pp.worker_thread 0.99 ± 13% +0.3 1.28 ± 5% perf-profile.children.cycles-pp.irqtime_account_irq 1.54 ± 12% +0.4 1.90 ± 4% perf-profile.children.cycles-pp.sched_clock_cpu 3.20 ± 8% +0.7 3.87 ± 9% perf-profile.children.cycles-pp.__softirqentry_text_start 3.88 ± 9% +0.8 4.64 ± 10% perf-profile.children.cycles-pp.__irq_exit_rcu 1.18 ± 7% +0.9 2.05 ± 18% perf-profile.children.cycles-pp.kthread 1.19 ± 7% +0.9 2.07 ± 18% perf-profile.children.cycles-pp.ret_from_fork 25.11 ± 8% +3.4 28.51 ± 2% perf-profile.children.cycles-pp.asm_sysvec_apic_timer_interrupt 54.23 ± 6% -6.4 47.83 ± 2% perf-profile.self.cycles-pp.mwait_idle_with_hints 0.22 ± 11% +0.1 0.30 ± 17% perf-profile.self.cycles-pp.sched_clock_cpu 0.32 ± 27% +0.2 0.48 ± 14% perf-profile.self.cycles-pp.io_serial_in 1.19 ± 12% +0.2 1.44 ± 4% perf-profile.self.cycles-pp.native_sched_clock Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. -- 0-DAY CI Kernel Test Service https://01.org/lkp