Greeting, FYI, we noticed a +510.6% improvement of fio.write_bw_MBps due to commit: commit: 96f8ba3dd632aff684cc7c67d9f4af435be0341c ("ext4: avoid split extents for DAX writes") https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master in testcase: fio-basic on test machine: 56 threads Intel(R) Xeon(R) CPU E5-2695 v3 @ 2.30GHz with 256G memory with following parameters: disk: 2pmem fs: ext4 mount_option: dax runtime: 200s nr_task: 50% time_based: tb rw: randwrite bs: 4k ioengine: sync test_size: 200G cpufreq_governor: performance test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user. test-url: https://github.com/axboe/fio Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone git://git.kernel.org/pub/scm/linux/kernel/git/wfg/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml testcase/path_params/tbox_group/run: fio-basic/2pmem-ext4-dax-200s-50%-tb-randwrite-4k-sync-200G-performance/lkp-hsw-ep6 776722e85d3b0936 96f8ba3dd632aff684cc7c67d9 ---------------- -------------------------- fail:runs %reproduction fail:runs | | | 820.67 ± 0% +510.6% 5011 ± 4% fio.write_bw_MBps 210091 ± 0% +510.6% 1282918 ± 4% fio.write_iops 0.14 ± 0% -92.9% 0.01 ± 0% fio.latency_100ms% 24.00 ± 10% -96.9% 0.74 ± 21% fio.latency_100us% 0.01 ± 57% +2.9e+05% 21.76 ± 12% fio.latency_10us% 1.22 ± 37% +3122.9% 39.40 ± 2% fio.latency_20us% 0.32 ± 8% -93.8% 0.02 ± 0% fio.latency_250us% 74.28 ± 3% -48.8% 38.01 ± 9% fio.latency_50us% 5511 ± 5% +117.5% 11986 ± 3% fio.time.involuntary_context_switches 977.75 ± 1% +149.2% 2436 ± 0% fio.time.percent_of_cpu_this_job_got 1874 ± 1% +149.6% 4679 ± 0% fio.time.system_time 89.35 ± 3% +137.8% 212.46 ± 3% fio.time.user_time 164733 ± 2% -58.0% 69111 ± 3% fio.time.voluntary_context_switches 58.00 ± 2% -44.8% 32.00 ± 2% fio.write_clat_90%_us 65.50 ± 1% -44.5% 36.33 ± 2% fio.write_clat_95%_us 85.25 ± 1% -44.9% 47.00 ± 3% fio.write_clat_99%_us 131.52 ± 0% -83.9% 21.12 ± 4% fio.write_clat_mean_us 2270 ± 0% -84.7% 347.78 ± 0% fio.write_clat_stddev 133959 ± 4% +52.6% 204457 ± 3% softirqs.RCU 1433931 ± 1% +94.7% 2791395 ± 0% softirqs.TIMER 5511 ± 5% +117.5% 11986 ± 3% time.involuntary_context_switches 977.75 ± 1% +149.2% 2436 ± 0% time.percent_of_cpu_this_job_got 1874 ± 1% +149.6% 4679 ± 0% time.system_time 89.35 ± 3% +137.8% 212.46 ± 3% time.user_time 164733 ± 2% -58.0% 69111 ± 3% time.voluntary_context_switches 2766132 ± 0% -49.2% 1405817 ± 0% vmstat.io.bo 613430 ± 0% -60.9% 239670 ± 0% vmstat.memory.buff 1671149 ± 0% -39.4% 1012059 ± 0% vmstat.memory.cache 10.00 ± 7% +140.0% 24.00 ± 0% vmstat.procs.r 58099 ± 0% +4.8% 60882 ± 0% vmstat.system.in 762597 ± 0% -47.5% 400049 ± 0% meminfo.Active 0 5e+03 5067 ± 95% latency_stats.max.do_get_write_access.jbd2_journal_get_write_access.__ext4_journal_get_write_access.ext4_split_extent_at.ext4_split_extent.ext4_ext_map_blocks.ext4_map_blocks.ext4_iomap_begin.iomap_apply.dax_iomap_rw.ext4_file_write_iter.__vfs_write 0 5e+03 4996 ± 98% latency_stats.max.do_get_write_access.jbd2_journal_get_write_access.__ext4_journal_get_write_access.ext4_ext_map_blocks.ext4_map_blocks.ext4_iomap_begin.iomap_apply.dax_iomap_rw.ext4_file_write_iter.__vfs_write.vfs_write.SyS_write 0 2e+05 156829 ± 58% latency_stats.sum.do_get_write_access.jbd2_journal_get_write_access.__ext4_journal_get_write_access.ext4_split_extent_at.ext4_split_extent.ext4_ext_map_blocks.ext4_map_blocks.ext4_iomap_begin.iomap_apply.dax_iomap_rw.ext4_file_write_iter.__vfs_write 0 1e+04 12881 ± 59% latency_stats.sum.do_get_write_access.jbd2_journal_get_write_access.__ext4_journal_get_write_access.ext4_ext_map_blocks.ext4_map_blocks.ext4_iomap_begin.iomap_apply.dax_iomap_rw.ext4_file_write_iter.__vfs_write.vfs_write.SyS_write 209560 ± 33% -2e+05 0 latency_stats.sum.do_get_write_access.jbd2_journal_get_write_access.__ext4_journal_get_write_access.ext4_split_extent_at.ext4_split_extent.ext4_split_convert_extents.ext4_ext_map_blocks.ext4_map_blocks.ext4_iomap_begin.iomap_apply.dax_iomap_rw.ext4_file_write_iter 2492341 ± 4% -2e+06 412777 ± 58% latency_stats.sum.wait_transaction_locked.add_transaction_credits.start_this_handle.jbd2__journal_start.__ext4_journal_start_sb.ext4_iomap_begin.iomap_apply.dax_iomap_rw.ext4_file_write_iter.__vfs_write.vfs_write.SyS_write 3976384 ± 8% -4e+06 338379 ± 57% latency_stats.sum.wait_transaction_locked.add_transaction_credits.start_this_handle.jbd2__journal_start.__ext4_journal_start_sb.ext4_iomap_end.iomap_apply.dax_iomap_rw.ext4_file_write_iter.__vfs_write.vfs_write.SyS_write 1.549e+08 -1e+08 21221204 ± 57% latency_stats.sum.jbd2_log_wait_commit.jbd2_log_do_checkpoint.__jbd2_log_wait_for_space.add_transaction_credits.start_this_handle.jbd2__journal_start.__ext4_journal_start_sb.ext4_iomap_end.iomap_apply.dax_iomap_rw.ext4_file_write_iter.__vfs_write 1.549e+08 -1e+08 21221204 ± 57% latency_stats.sum.max perf-stat.cpu-cycles 1.4e+13 ++----------------------------------------------------------------+ | | 1.2e+13 O+ O O OO O O O O O O | | OO OO O OO OO O OO O OO OO O O O O OO OO OO OO OO O 1e+13 ++ | | | 8e+12 ++ | | | 6e+12 ++ | | | 4e+12 ++ .* .**. *. .*. *. .* | *.** * **.* **.** **.* ** * * *.*.**.** | 2e+12 ++ : :: : | | :: :: | 0 ++-----------------------------------*--*O------------------------+ perf-stat.branch-misses 1e+10 ++------------------------------------------------------------------+ 9e+09 ++ .* .* .* * | *.** *.**.*.** *.*.** *. .* + * * .* .*.* | 8e+09 ++ **.* * : : ** * | 7e+09 ++ : : : | | : : : | 6e+09 ++ : :: : | 5e+09 ++ : :: : | 4e+09 ++O O :O: O: | O O OO OO O OO OO O OO OO OO O OO O: :O:: O OO O OO OO O OO OO OO O 3e+09 ++ : : :: | 2e+09 ++ :: :: | | : : | 1e+09 ++ : : | 0 ++------------------------------------*--*-O------------------------+ perf-stat.iTLB-loads 3.5e+08 ++----------------------------------------------------------------+ | | 3e+08 ++ O O O O O | | O OO OO O O O O O OO O O O O O O OO | 2.5e+08 O+O O OO O O O O O O O O O O O | O O | 2e+08 ++ | | | 1.5e+08 ++ .**. *. *.* .* *. * | *.**.** **.**.**.**.*.* * * * * *.*.* * | 1e+08 ++ : : : | | : :: : | 5e+07 ++ :: :: | | : : | 0 ++-----------------------------------*--*O------------------------+ perf-stat.node-loads 3.5e+09 ++----------------------------------------------------------------+ | | 3e+09 *+* * **. *.* | | *. + :+ * : *. *. *. *.** * *. .* | 2.5e+09 ++ ** * *.* *.* * * : : : *.** * | | : : : | 2e+09 ++ : :: : | | : :: : | 1.5e+09 ++ : :: : | | :: :: | 1e+09 ++ O OO O O O OO O O :O :O O O OO OO OO O O O OO OO OO O O O O OO O::O:: O O O OO O | 5e+08 ++ : : | | : : | 0 ++-----------------------------------*--*O------------------------+ perf-stat.branch-miss-rate_ 3 ++--------------------------------------------------------------------+ *. *.*.* *. .**.*.* .**.*. *. .* *.**.* | 2.5 ++* *.* * * * *.** * * *.* | | : : : | | : : : | 2 ++ : : : | | : :: : | 1.5 ++ : :: : | | : : :: | 1 ++ O : : :: | O O O OO O O OO O OO OO O OO O OO O: :OO: O OO O O OO OO O OO O | O O O: :: O O O | 0.5 ++ : : | | : : | 0 ++-------------------------------------*--*-O-------------------------+ perf-stat.ipc 0.6 *+*--------*-*------**------------------------------------------------+ | *.*.**. : **.* **.*.**. .**.** * *.**.**.* | 0.5 ++ * * : : : | | : : : | | : : : | 0.4 ++ : :: : | | : :: : | 0.3 ++ O :O:: : O O | | O O O : : O: O O O | 0.2 O+ O O OO OO O OO O O OO O OO OO O: : :: O O O O OO OO O O O | : :O:: | | : : | 0.1 ++ : : | | : : | 0 ++-------------------------------------*--*-O-------------------------+ fio.write_bw_MBps 8000 ++-------------------------------------------------------------------+ | O O | 7000 ++ O O | 6000 ++O O | | O O O | 5000 ++ O O O O O O O O O O OO | O OO OO OO O O OO O O O O O OO OO O 4000 ++ O | | | 3000 ++ | 2000 ++ | | | 1000 ++ | *.**.**.*.**.**.*.**.**.*.**.**.*.**.*. *. *.*.**.** | 0 ++-------------------------------------*--*O-------------------------+ fio.write_iops 2e+06 ++----------------------------------------------------------------+ 1.8e+06 ++ O O | | O O O | 1.6e+06 ++O | 1.4e+06 ++ O O O O | | O O O O O O O O O OO OO O 1.2e+06 O+ OO O OO OO O OO O O O O O OO | 1e+06 ++ O | 800000 ++ | | | 600000 ++ | 400000 ++ | *. .* *. | 200000 ++**.**.** *.**.* **.*.**.**.**.**. *. *.*.**.** | 0 ++-----------------------------------*--*O------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Xiaolong