Greeting, FYI, we noticed a +104% improvement of fio.write_bw_MBps due to commit: commit: 71027e97d796d1e9b210a2f64bf2cc25e225a4c0 ("block: stop using discards for zeroing") https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-4.12/test in testcase: fio-basic on test machine: 16 threads Intel(R) Xeon(R) CPU D-1541 @ 2.10GHz with 8G memory with following parameters: runtime: 300s disk: 1SSD fs: ext4 nr_task: 64 rw: randwrite bs: 4k ioengine: sync test_size: 400g cpufreq_governor: performance test-description: Fio is a tool that will spawn a number of threads or processes doing a particular type of I/O action as specified by the user. test-url: https://github.com/axboe/fio Details are as below: --------------------------------------------------------------------------------------------------> To reproduce: git clone https://github.com/01org/lkp-tests.git cd lkp-tests bin/lkp install job.yaml # job file is attached in this email bin/lkp run job.yaml testcase/path_params/tbox_group/run: fio-basic/300s-1SSD-ext4-64-randwrite-4k-sync-400g-performance/lkp-bdw-de1 5d1429fead5beacc 71027e97d796d1e9b210a2f64b ---------------- -------------------------- %stddev change %stddev \ | \ 41.36 104% 84.24 fio.write_bw_MBps 10589 104% 21566 fio.write_iops 0.01 1200% 0.13 fio.latency_250us% 0.01 500% 0.06 fio.latency_100us% 0.27 ± 5% 241% 0.91 fio.latency_100ms% 0.09 ± 39% 200% 0.29 ± 5% fio.latency_50ms% 0.02 ± 47% 171% 0.05 ± 9% fio.latency_20ms% 8 12% 9 fio.write_clat_90%_us 72.99 12% 81.78 fio.latency_10us% 0.17 ± 6% -30% 0.12 ± 3% fio.latency_50us% 3.02 -43% 1.73 fio.latency_250ms% 19.43 -43% 11.00 fio.latency_4us% 207872 -46% 112128 fio.write_clat_99%_us 33495 -47% 17619 fio.write_clat_stddev 0.02 -50% 0.01 fio.latency_750us% 6040 -51% 2965 fio.write_clat_mean_us 25427300 104% 51781684 fio.time.file_system_outputs 20.43 101% 41.07 fio.time.system_time 8 88% 15 fio.time.percent_of_cpu_this_job_got 142609 75% 249305 fio.time.voluntary_context_switches 1127 ± 5% -60% 455 ± 25% fio.time.involuntary_context_switches 54253 55305 interrupts.CAL:Function_call_interrupts 41705 358% 190909 vmstat.io.bo 27554 53% 42193 vmstat.system.in 22038 -4% 21060 vmstat.system.cs 121 ± 5% 67% 202 turbostat.Avg_MHz 4.97 ± 4% 63% 8.13 turbostat.%Busy 23.58 5% 24.74 turbostat.PkgWatt 9.65 9.81 turbostat.RAMWatt 654 353% 2967 iostat.sda.wrqm/s 80766 153% 204329 iostat.sda.wkB/s 11226 130% 25808 iostat.sda.w/s 3.40 85% 6.29 iostat.sda.avgqu-sz 5.371e+08 ± 5% 116% 1.159e+09 perf-stat.branch-misses 55430664 ± 10% 109% 1.156e+08 ± 4% perf-stat.dTLB-load-misses 6.733e+10 79% 1.208e+11 perf-stat.branch-instructions 3.304e+11 78% 5.894e+11 perf-stat.instructions 5.128e+11 ± 3% 76% 9.042e+11 perf-stat.cpu-cycles 9.157e+10 ± 4% 75% 1.604e+11 perf-stat.dTLB-loads 3.76e+09 ± 5% 70% 6.397e+09 perf-stat.cache-references 3.76e+09 ± 5% 70% 6.397e+09 perf-stat.cache-misses 4.454e+10 67% 7.449e+10 perf-stat.dTLB-stores 3933552 ± 3% 50% 5919721 perf-stat.dTLB-store-misses 28404 ± 4% 35% 38265 ± 6% perf-stat.instructions-per-iTLB-miss 11646070 ± 3% 33% 15457926 ± 5% perf-stat.iTLB-load-misses 1.399e+08 25% 1.744e+08 perf-stat.iTLB-loads 0.80 ± 5% 20% 0.96 perf-stat.branch-miss-rate% 0.06 ± 6% 19% 0.07 ± 3% perf-stat.dTLB-load-miss-rate% 24844 ± 5% 17% 28972 perf-stat.cpu-migrations 6652601 -5% 6352172 perf-stat.context-switches 0.01 ± 3% -10% 0.01 perf-stat.dTLB-store-miss-rate% fio.write_bw_MBps 85 ++------------------------------O----------------------------O-O-O-O---+ O O O O O O O O O O O O O O O O O OO O O O O O O O O O O O | 80 ++ | 75 ++ | | | 70 ++ | 65 ++ | | | 60 ++ | 55 ++ | | | 50 ++ | 45 ++ | | | 40 *+*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-**-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-*-* fio.write_iops 22000 ++----------------------------------------------------------O-O-----+ | O O O O O O O O O O O O O O | 20000 O+O O O O O O O O O O O O O O O O O O | | | | | 18000 ++ | | | 16000 ++ | | | 14000 ++ | | | | | 12000 ++ | *.*.*.*.**.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.**.*.*.*.* 10000 ++------------------------------------------------------------------+ fio.write_clat_mean_us 6500 ++-------------------------------------------------------------------+ | | 6000 *+*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.* 5500 ++ | | | 5000 ++ | | | 4500 ++ | | | 4000 ++ | 3500 ++ | | | 3000 O+O O O O O OO O O O O O O O O O O OO O O O O O O O O O O OO O O O | | | 2500 ++-------------------------------------------------------------------+ fio.write_clat_stddev 34000 *+*-*-*-**-*-*-*-*-*-*-*-**-*-*-*-*-*-*-*-**-*-*-*-*-*-*-*-**-*-*-*-* | | 32000 ++ | 30000 ++ | | | 28000 ++ | 26000 ++ | | | 24000 ++ | 22000 ++ | | | 20000 ++ | 18000 O+O O O OO O O O O O O O OO O O O O O O OO O O O O O O O | | O OO O O | 16000 ++------------------------------------------------------------------+ fio.write_clat_99__us 210000 *+*-*-**-*-*-*-*-*-**-*-*-*-*-*-*-**-*-*-*-*-*-**-*-*-*-*-*-**-*-*-* 200000 ++ | | | 190000 ++ | 180000 ++ | | | 170000 ++ | 160000 ++ | 150000 ++ | | | 140000 ++ | 130000 ++ | | | 120000 O+O O OO O O O O O OO O O O O O O OO O O O O O OO O O O O | 110000 ++--------------------------------------------------------O-OO-O---+ fio.latency_4us_ 22 ++---------------------------------------------------------------------+ | * | 20 *+ * : + .*. .* | | *.*. + + .*. .* .*. *. : * *.*. .* + .* | * *.*. .* * : * * *.*.*. .* *.*.* * | 18 ++ *.*.*.* : : * | | :: | 16 ++ * | | | 14 ++ | | | | | 12 ++O O O O O O O O O O O O OO O O O O O O O O O | O O O O O O O O O O O O | 10 ++---------------------------------------------------------------------+ fio.latency_10us_ 84 ++---------------------------------------------------------------------+ | | 82 O+ O O O O O O O O O | | O O O O O O O O O O O O O O OO O O O O O O O O O | 80 ++ | | | 78 ++ | | | 76 ++ * | | .*. .*. : + | 74 ++ .*. .*.* * *.*. .*. : *. .* .*.*.*.*.* .*. | |.*.* * * * * * : *.*.*.*.*.* *.*.*.* 72 *+ : + | | * | 70 ++---------------------------------------------------------------------+ fio.latency_100us_ 0.06 O+--O----O-O-O-O-O-O-O-O-OO-O-O-O-O-O-O---O----O-O-O-O-O-O-OO-O-O---+ 0.055 ++ | | | 0.05 ++O O O O O O | 0.045 ++ | | | 0.04 ++ | 0.035 ++ | 0.03 ++ | | | 0.025 ++ | 0.02 ++ | | | 0.015 ++ | 0.01 *+*-*-*-**-*-*-*-*-*-*-*-**-*-*-*-*-*-*-*-**-*-*-*-*-*-*-*-**-*-*-*-* fio.latency_250us_ 0.14 ++-------------------------------------------------------------------+ O O O O O O O O O O O O O O O O O | 0.12 ++O O O O O O O O OO O O O O O O O O | | | 0.1 ++ | | | 0.08 ++ | | | 0.06 ++ | | | 0.04 ++ | | | 0.02 ++ | *.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.*.*.*.*.*.*.**.*.*.*.*.* 0 ++-------------------------------------------------------------------+ fio.latency_750us_ 0.02 *+*-*-*-**-*-*-*-*-*-*-*-**-*-*-*-*-*-*-*-**-*-*-*-*-*-*-*-**-*-*-*-* | | | | 0.018 ++ | | | | | 0.016 ++ | | | 0.014 ++ | | | | | 0.012 ++ | | | | | 0.01 O+O-O-O-OO-O-O-O-O-O-O-O-OO-O-O-O-O-O-O-O-OO-O-O-O-O-O-O-O-OO-O-O---+ fio.latency_100ms_ 1 ++--------------------------------------------------------------------+ | O O O | 0.9 ++ O O O | 0.8 ++ O O O O O O O O O O O O O O O O O | O O O O O O OO O O O O | 0.7 ++ | 0.6 ++ | | | 0.5 ++ | 0.4 ++ | | * | 0.3 *+*.*. .* : : *.*.*.*. .*. .*.* 0.2 ++ *. .*.*.* * *.* + : : .*. .*. + *.**.*.* *.*.* | | * + + :+ * *.* * * | 0.1 ++--------------*--*--------------------------------------------------+ fio.latency_250ms_ 3.2 ++--------------------------------------------------------------------+ | .*.*. .*.*.**. .*. .*. .*.*.*.*.*.*. .*.**.*. .*.*. | 3 *+*.* *.* * * * *.*.*.* *.*.* *.* 2.8 ++ | | | 2.6 ++ | | | 2.4 ++ | | | 2.2 ++ | O O O O O | 2 ++ O O O O O O O O O O O O O O O O O O O O O OO O O | 1.8 ++ | | O O O O O | 1.6 ++--------------------------------------------------------------------+ turbostat.Avg_MHz 220 ++--------------------------------------------------------------------+ | O | 200 O+ O O O O O O O O O O O O O O | | O O O O OO O O O O O O O O O OO O O | | | 180 ++ | | | 160 ++ | | | 140 ++ .* | | * : * | | *. .* : : .*.*. .*.* * *. .*. .*. + + .* 120 ++ * + : *.*.**.* *.* + + + + *.*.*.* **.* *.* *.* | * * * * | 100 ++--------------------------------------------------------------------+ [*] bisect-good sample [O] bisect-bad sample Disclaimer: Results have been estimated based on internal Intel analysis and are provided for informational purposes only. Any difference in system hardware or software design or configuration may affect actual performance. Thanks, Xiaolong