* [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
@ 2024-01-31 15:42 kernel test robot
2024-01-31 18:17 ` Bart Van Assche
2024-02-09 21:06 ` Jens Axboe
0 siblings, 2 replies; 9+ messages in thread
From: kernel test robot @ 2024-01-31 15:42 UTC (permalink / raw)
To: Bart Van Assche
Cc: oe-lkp, lkp, Linux Memory Management List, Jens Axboe,
Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
feng.tang, fengwei.yin, oliver.sang
Hello,
kernel test robot noticed a -72.9% regression of fio.write_iops on:
commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
testcase: fio-basic
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:
runtime: 300s
disk: 1HDD
fs: xfs
nr_task: 100%
test_size: 128G
rw: write
bs: 4k
ioengine: io_uring
direct: direct
cpufreq_governor: performance
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202401312320.a335db14-oliver.sang@intel.com
Details are as below:
-------------------------------------------------------------------------------------------------->
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com
=========================================================================================
bs/compiler/cpufreq_governor/direct/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
4k/gcc-12/performance/direct/1HDD/xfs/io_uring/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/300s/write/lkp-icl-2sp9/128G/fio-basic
commit:
8f764b91fd ("block/mq-deadline: skip expensive merge lookups if contended")
574e7779cf ("block/mq-deadline: use separate insertion lists")
8f764b91fdf29659 574e7779cf583171acb5bf63650
---------------- ---------------------------
%stddev %change %stddev
\ | \
10031 -68.5% 3157 ± 7% uptime.idle
38.88 -93.1% 2.67 ± 4% iostat.cpu.idle
60.16 +60.2% 96.39 iostat.cpu.iowait
2128969 ± 58% -86.2% 293345 ±114% numa-vmstat.node0.nr_foll_pin_acquired
2127446 ± 58% -86.2% 293180 ±114% numa-vmstat.node0.nr_foll_pin_released
169.33 ± 11% +41.4% 239.50 ± 9% perf-c2c.DRAM.remote
98.67 ± 14% +45.1% 143.17 ± 12% perf-c2c.HITM.remote
96520 ± 3% +50.1% 144877 ± 2% meminfo.Active
96358 ± 3% +50.3% 144805 ± 2% meminfo.Active(anon)
116390 ± 3% +42.2% 165520 ± 2% meminfo.Shmem
38.49 -36.5 2.03 ± 6% mpstat.cpu.all.idle%
60.56 +36.5 97.03 mpstat.cpu.all.iowait%
0.19 -0.0 0.16 mpstat.cpu.all.sys%
0.16 ± 3% -8.5% 0.14 ± 3% turbostat.IPC
75414 ± 5% +278.1% 285115 turbostat.POLL
0.01 +0.1 0.07 turbostat.POLL%
38.89 -93.1% 2.70 ± 4% vmstat.cpu.id
60.14 +60.2% 96.37 vmstat.cpu.wa
92653 -72.9% 25112 vmstat.io.bo
63.11 +340.1% 277.79 vmstat.procs.b
18314 +54.0% 28195 vmstat.system.cs
72218 +5.8% 76405 vmstat.system.in
584892 ± 15% +885.8% 5766120 ±123% sched_debug.cfs_rq:/.load.max
950.75 ± 10% +47.6% 1403 ± 20% sched_debug.cfs_rq:/.load_avg.max
203.10 ± 15% +32.3% 268.69 ± 14% sched_debug.cfs_rq:/.load_avg.stddev
46333 +51.0% 69954 sched_debug.cpu.nr_switches.avg
29986 ± 9% +84.3% 55251 sched_debug.cpu.nr_switches.min
0.00 ± 35% +1e+05% 3.64 ± 9% sched_debug.cpu.nr_uninterruptible.avg
23.03 ± 27% +187.8% 66.28 ± 8% sched_debug.cpu.nr_uninterruptible.max
-18.67 +207.7% -57.44 sched_debug.cpu.nr_uninterruptible.min
7.23 ± 5% +258.7% 25.92 ± 10% sched_debug.cpu.nr_uninterruptible.stddev
24079 ± 3% +50.3% 36201 ± 2% proc-vmstat.nr_active_anon
754201 +1.6% 766356 proc-vmstat.nr_file_pages
3467173 -72.7% 946413 proc-vmstat.nr_foll_pin_acquired
3464673 -72.7% 945922 proc-vmstat.nr_foll_pin_released
11862 +1.6% 12051 proc-vmstat.nr_mapped
29088 ± 3% +42.2% 41375 ± 2% proc-vmstat.nr_shmem
24079 ± 3% +50.3% 36201 ± 2% proc-vmstat.nr_zone_active_anon
1051599 -4.5% 1004608 proc-vmstat.numa_hit
984923 -4.8% 937714 proc-vmstat.numa_local
71376 ± 10% +27.2% 90810 ± 8% proc-vmstat.numa_pte_updates
38600 ± 2% +37.7% 53153 proc-vmstat.pgactivate
1308586 -6.9% 1218899 proc-vmstat.pgalloc_normal
856034 +2.4% 876179 proc-vmstat.pgfault
1224361 -7.3% 1135057 proc-vmstat.pgfree
28164246 -72.9% 7638316 proc-vmstat.pgpgout
0.12 ± 28% +0.8 0.92 ± 9% fio.latency_1000ms%
17.72 ± 3% -10.3 7.39 ± 4% fio.latency_100ms%
0.05 ± 16% +0.2 0.23 ± 5% fio.latency_10ms%
19.15 ± 6% -19.0 0.19 ± 6% fio.latency_20ms%
23.99 ± 3% +7.5 31.50 fio.latency_250ms%
0.01 +0.0 0.03 ± 7% fio.latency_4ms%
5.84 ± 3% +36.8 42.63 fio.latency_500ms%
32.66 ± 6% -29.9 2.73 ± 4% fio.latency_50ms%
0.42 ± 16% +13.9 14.27 fio.latency_750ms%
56330658 -72.9% 15275141 fio.time.file_system_outputs
3139 ± 3% -62.4% 1182 ± 2% fio.time.involuntary_context_switches
8.17 ± 4% -14.3% 7.00 fio.time.percent_of_cpu_this_job_got
1902903 +24.6% 2371272 fio.time.voluntary_context_switches
7041332 -72.9% 1909392 fio.workload
91.62 -72.9% 24.85 fio.write_bw_MBps
2.045e+08 +166.0% 5.439e+08 fio.write_clat_90%_us
3.055e+08 ± 2% +99.1% 6.082e+08 fio.write_clat_95%_us
4.264e+08 ± 3% +76.4% 7.522e+08 fio.write_clat_99%_us
87286745 +257.4% 3.12e+08 fio.write_clat_mean_us
99184810 ± 3% +75.3% 1.739e+08 fio.write_clat_stddev
23454 -72.9% 6361 fio.write_iops
1727 +5.7e+05% 9845156 fio.write_slat_mean_us
7460 ± 39% +4.9e+05% 36525171 fio.write_slat_stddev
1.66 +22.5% 2.04 perf-stat.i.MPKI
0.97 +0.1 1.09 perf-stat.i.branch-miss-rate%
2609625 +7.1% 2794459 ± 4% perf-stat.i.branch-misses
16.66 -0.6 16.05 perf-stat.i.cache-miss-rate%
1901031 +11.6% 2120745 perf-stat.i.cache-misses
11266152 +16.9% 13166895 perf-stat.i.cache-references
18410 +54.3% 28416 perf-stat.i.context-switches
2.10 ± 2% +10.4% 2.32 perf-stat.i.cpi
2.299e+09 ± 2% +5.1% 2.416e+09 perf-stat.i.cpu-cycles
901.94 ± 2% -34.2% 593.78 perf-stat.i.cpu-migrations
1315 ± 2% -12.7% 1148 perf-stat.i.cycles-between-cache-misses
0.04 ± 5% +0.0 0.04 ± 4% perf-stat.i.dTLB-load-miss-rate%
111249 ± 5% +18.1% 131362 ± 4% perf-stat.i.dTLB-load-misses
3.359e+08 -6.7% 3.134e+08 perf-stat.i.dTLB-loads
48568 -14.5% 41547 ± 2% perf-stat.i.dTLB-store-misses
1.737e+08 -11.7% 1.534e+08 perf-stat.i.dTLB-stores
1.215e+09 -5.6% 1.147e+09 perf-stat.i.instructions
0.49 ± 2% -10.2% 0.44 perf-stat.i.ipc
0.04 ± 2% +5.1% 0.04 perf-stat.i.metric.GHz
191.35 +17.5% 224.82 perf-stat.i.metric.K/sec
11.77 -7.1% 10.94 perf-stat.i.metric.M/sec
2459 +2.4% 2518 perf-stat.i.minor-faults
90.60 +3.0 93.58 perf-stat.i.node-load-miss-rate%
452504 +31.3% 594058 perf-stat.i.node-load-misses
51347 ± 3% -9.0% 46745 perf-stat.i.node-loads
209653 ± 3% +17.1% 245593 ± 3% perf-stat.i.node-store-misses
293009 ± 2% +23.4% 361448 perf-stat.i.node-stores
2459 +2.4% 2519 perf-stat.i.page-faults
1.56 +18.2% 1.85 perf-stat.overall.MPKI
1.07 +0.1 1.20 ± 2% perf-stat.overall.branch-miss-rate%
16.87 -0.8 16.11 perf-stat.overall.cache-miss-rate%
1.89 ± 2% +11.3% 2.11 perf-stat.overall.cpi
1209 -5.8% 1139 perf-stat.overall.cycles-between-cache-misses
0.03 ± 5% +0.0 0.04 ± 3% perf-stat.overall.dTLB-load-miss-rate%
0.53 ± 2% -10.2% 0.47 perf-stat.overall.ipc
89.81 +2.9 92.70 perf-stat.overall.node-load-miss-rate%
51848 +248.4% 180654 perf-stat.overall.path-length
2602422 +7.1% 2786733 ± 4% perf-stat.ps.branch-misses
1894822 +11.6% 2113909 perf-stat.ps.cache-misses
11229305 +16.9% 13124260 perf-stat.ps.cache-references
18349 +54.4% 28323 perf-stat.ps.context-switches
2.291e+09 ± 2% +5.1% 2.408e+09 perf-stat.ps.cpu-cycles
898.94 ± 2% -34.2% 591.81 perf-stat.ps.cpu-migrations
110886 ± 5% +18.1% 130940 ± 4% perf-stat.ps.dTLB-load-misses
3.349e+08 -6.7% 3.124e+08 perf-stat.ps.dTLB-loads
48409 -14.4% 41414 ± 2% perf-stat.ps.dTLB-store-misses
1.732e+08 -11.7% 1.529e+08 perf-stat.ps.dTLB-stores
1.211e+09 -5.6% 1.144e+09 perf-stat.ps.instructions
2451 +2.4% 2511 perf-stat.ps.minor-faults
451006 +31.3% 592114 perf-stat.ps.node-load-misses
51197 ± 3% -9.0% 46611 perf-stat.ps.node-loads
208958 ± 3% +17.1% 244787 ± 3% perf-stat.ps.node-store-misses
292047 ± 2% +23.4% 360278 perf-stat.ps.node-stores
2451 +2.4% 2511 perf-stat.ps.page-faults
3.651e+11 -5.5% 3.45e+11 ± 2% perf-stat.total.instructions
0.00 ± 43% +368.2% 0.02 ± 75% perf-sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
0.01 -14.3% 0.01 perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.xfs_ilock_for_iomap
0.01 ± 42% -35.1% 0.00 perf-sched.sch_delay.avg.ms.schedule_timeout.io_wq_worker.ret_from_fork.ret_from_fork_asm
0.01 -33.3% 0.01 perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.01 ± 19% -27.8% 0.00 ± 10% perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
0.00 ±223% +1912.5% 0.03 ± 94% perf-sched.sch_delay.max.ms.__cond_resched.io_assign_current_work.io_worker_handle_work.io_wq_worker.ret_from_fork
0.00 ± 9% +126.1% 0.01 ± 38% perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.io_run_task_work.io_cqring_wait.__do_sys_io_uring_enter
56.45 ± 3% -32.6% 38.05 perf-sched.total_wait_and_delay.average.ms
41251 ± 4% +99.0% 82087 perf-sched.total_wait_and_delay.count.ms
56.44 ± 3% -32.6% 38.04 perf-sched.total_wait_time.average.ms
10.80 ± 2% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
376.03 ± 48% -94.4% 21.05 ±223% perf-sched.wait_and_delay.avg.ms.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_iomap_write_unwritten
433.85 ± 59% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.xfs_ilock.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
815.52 ± 92% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.xfs_trans_alloc.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
486.10 ± 48% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.xlog_ticket_alloc.xfs_log_reserve.xfs_trans_reserve
240.19 ± 23% -100.0% 0.00 perf-sched.wait_and_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
2.32 ± 10% +22.5% 2.84 ± 5% perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.io_wq_worker.ret_from_fork.ret_from_fork_asm
58.50 ± 6% -80.3% 11.54 ± 6% perf-sched.wait_and_delay.avg.ms.io_cqring_wait.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe
434.72 ± 12% -49.8% 218.09 ± 4% perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
48.19 ± 5% -71.8% 13.60 ±141% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.xfs_ilock_for_iomap
37.76 ± 8% -83.0% 6.43 ±223% perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xfs_ilock
36.84 ± 5% +27.8% 47.07 ± 4% perf-sched.wait_and_delay.avg.ms.schedule_timeout.io_wq_worker.ret_from_fork.ret_from_fork_asm
620.47 ± 3% -10.9% 552.59 ± 3% perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
64.08 ± 4% -72.5% 17.65 ± 2% perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
384.00 -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
18.00 ± 29% -99.1% 0.17 ±223% perf-sched.wait_and_delay.count.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_iomap_write_unwritten
2.50 ± 38% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.down_write.xfs_ilock.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
5.00 ± 65% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.xfs_trans_alloc.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
4.00 ± 38% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.xlog_ticket_alloc.xfs_log_reserve.xfs_trans_reserve
45.17 ± 13% -100.0% 0.00 perf-sched.wait_and_delay.count.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
5281 ± 5% -62.8% 1967 ± 12% perf-sched.wait_and_delay.count.io_cqring_wait.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe
33.33 ± 13% +110.5% 70.17 ± 3% perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
4684 ± 6% -98.2% 82.17 ±141% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.xfs_ilock_for_iomap
420.83 ± 8% -99.1% 3.67 ±223% perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xfs_ilock
11733 ± 5% -20.3% 9354 ± 4% perf-sched.wait_and_delay.count.schedule_timeout.io_wq_worker.ret_from_fork.ret_from_fork_asm
622.67 ± 4% -18.7% 506.17 ± 5% perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
11804 ± 4% +203.5% 35824 perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
1000 -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
2061 ± 50% -99.0% 21.05 ±223% perf-sched.wait_and_delay.max.ms.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_iomap_write_unwritten
782.25 ± 68% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.xfs_ilock.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
1601 ± 51% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.xfs_trans_alloc.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
1092 ± 35% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.xlog_ticket_alloc.xfs_log_reserve.xfs_trans_reserve
2843 ± 29% -100.0% 0.00 perf-sched.wait_and_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
511.01 -25.1% 382.56 ± 9% perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.io_wq_worker.ret_from_fork.ret_from_fork_asm
511.46 ± 23% -38.4% 315.16 ± 3% perf-sched.wait_and_delay.max.ms.io_cqring_wait.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe
1060 ± 58% -92.6% 78.42 ±223% perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xfs_ilock
1165 ± 15% +74.8% 2037 ± 17% perf-sched.wait_and_delay.max.ms.schedule_timeout.io_wq_worker.ret_from_fork.ret_from_fork_asm
3997 ± 9% -38.1% 2474 ± 34% perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
376.03 ± 48% -92.9% 26.88 ±169% perf-sched.wait_time.avg.ms.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_iomap_write_unwritten
433.84 ± 59% -99.9% 0.43 ±223% perf-sched.wait_time.avg.ms.__cond_resched.down_write.xfs_ilock.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
815.52 ± 92% -100.0% 0.06 ±223% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.xfs_trans_alloc.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
486.09 ± 48% -100.0% 0.04 ±223% perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.xlog_ticket_alloc.xfs_log_reserve.xfs_trans_reserve
240.19 ± 23% -95.8% 10.16 ± 78% perf-sched.wait_time.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
2.32 ± 10% +22.5% 2.84 ± 5% perf-sched.wait_time.avg.ms.do_task_dead.do_exit.io_wq_worker.ret_from_fork.ret_from_fork_asm
58.50 ± 6% -80.3% 11.53 ± 6% perf-sched.wait_time.avg.ms.io_cqring_wait.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe
434.72 ± 12% -49.8% 218.09 ± 4% perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
48.18 ± 5% -35.6% 31.03 ± 23% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.xfs_ilock_for_iomap
37.74 ± 8% -33.5% 25.11 ± 36% perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xfs_ilock
36.83 ± 5% +27.8% 47.07 ± 4% perf-sched.wait_time.avg.ms.schedule_timeout.io_wq_worker.ret_from_fork.ret_from_fork_asm
620.47 ± 3% -10.9% 552.58 ± 3% perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
64.07 ± 4% -72.5% 17.64 ± 2% perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
2061 ± 50% -98.7% 26.88 ±169% perf-sched.wait_time.max.ms.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_iomap_write_unwritten
782.24 ± 68% -99.9% 0.43 ±223% perf-sched.wait_time.max.ms.__cond_resched.down_write.xfs_ilock.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
28.38 ±223% +539.6% 181.50 ± 73% perf-sched.wait_time.max.ms.__cond_resched.io_assign_current_work.io_worker_handle_work.io_wq_worker.ret_from_fork
1601 ± 51% -100.0% 0.06 ±223% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.xfs_trans_alloc.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
1092 ± 35% -100.0% 0.04 ±223% perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.xlog_ticket_alloc.xfs_log_reserve.xfs_trans_reserve
2842 ± 29% -99.0% 27.95 ± 81% perf-sched.wait_time.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
511.01 -25.1% 382.56 ± 9% perf-sched.wait_time.max.ms.do_task_dead.do_exit.io_wq_worker.ret_from_fork.ret_from_fork_asm
511.46 ± 23% -38.4% 315.16 ± 3% perf-sched.wait_time.max.ms.io_cqring_wait.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe
1060 ± 58% -82.5% 185.64 ± 77% perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xfs_ilock
1165 ± 15% +74.8% 2037 ± 17% perf-sched.wait_time.max.ms.schedule_timeout.io_wq_worker.ret_from_fork.ret_from_fork_asm
3997 ± 9% -38.1% 2474 ± 34% perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
6.92 ± 10% -3.5 3.42 ± 8% perf-profile.calltrace.cycles-pp.fio_ioring_commit
6.68 ± 11% -3.4 3.31 ± 8% perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fio_ioring_commit
6.41 ± 11% -3.2 3.21 ± 8% perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fio_ioring_commit
6.34 ± 12% -3.2 3.18 ± 8% perf-profile.calltrace.cycles-pp.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe.fio_ioring_commit
4.54 ± 6% -2.8 1.72 ± 17% perf-profile.calltrace.cycles-pp.iomap_dio_complete_work.process_one_work.worker_thread.kthread.ret_from_fork
4.51 ± 6% -2.8 1.71 ± 17% perf-profile.calltrace.cycles-pp.iomap_dio_complete.iomap_dio_complete_work.process_one_work.worker_thread.kthread
4.73 ± 12% -2.6 2.08 ± 9% perf-profile.calltrace.cycles-pp.io_submit_sqes.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe.fio_ioring_commit
4.20 ± 6% -2.6 1.61 ± 18% perf-profile.calltrace.cycles-pp.xfs_dio_write_end_io.iomap_dio_complete.iomap_dio_complete_work.process_one_work.worker_thread
4.16 ± 6% -2.6 1.57 ± 18% perf-profile.calltrace.cycles-pp.xfs_iomap_write_unwritten.xfs_dio_write_end_io.iomap_dio_complete.iomap_dio_complete_work.process_one_work
5.32 ± 7% -2.5 2.80 ± 11% perf-profile.calltrace.cycles-pp.do_exit.io_wq_worker.ret_from_fork.ret_from_fork_asm
4.17 ± 15% -2.5 1.72 ± 10% perf-profile.calltrace.cycles-pp.io_issue_sqe.io_submit_sqes.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe
4.05 ± 15% -2.4 1.64 ± 11% perf-profile.calltrace.cycles-pp.io_write.io_issue_sqe.io_submit_sqes.__do_sys_io_uring_enter.do_syscall_64
6.65 ± 7% -2.4 4.28 ± 8% perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
3.84 ± 16% -2.3 1.56 ± 10% perf-profile.calltrace.cycles-pp.xfs_file_write_iter.io_write.io_issue_sqe.io_submit_sqes.__do_sys_io_uring_enter
4.30 ± 10% -2.3 2.03 ± 18% perf-profile.calltrace.cycles-pp.exit_notify.do_exit.io_wq_worker.ret_from_fork.ret_from_fork_asm
3.74 ± 16% -2.2 1.52 ± 10% perf-profile.calltrace.cycles-pp.xfs_file_dio_write_aligned.xfs_file_write_iter.io_write.io_issue_sqe.io_submit_sqes
2.58 ± 8% -1.2 1.35 ± 12% perf-profile.calltrace.cycles-pp.release_task.exit_notify.do_exit.io_wq_worker.ret_from_fork
1.68 ± 18% -1.2 0.50 ± 81% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.exit_notify.do_exit.io_wq_worker
1.71 ± 18% -1.1 0.60 ± 58% perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.exit_notify.do_exit.io_wq_worker.ret_from_fork
1.98 ± 10% -1.0 0.96 ± 20% perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.release_task.exit_notify.do_exit.io_wq_worker
1.92 ± 11% -1.0 0.93 ± 20% perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.release_task.exit_notify.do_exit
1.36 ± 11% -0.7 0.67 ± 6% perf-profile.calltrace.cycles-pp.__xfs_trans_commit.xfs_iomap_write_unwritten.xfs_dio_write_end_io.iomap_dio_complete.iomap_dio_complete_work
1.42 ± 5% -0.6 0.82 ± 4% perf-profile.calltrace.cycles-pp.blk_update_request.scsi_end_request.scsi_io_completion.complete_cmd_fusion.megasas_isr_fusion
1.04 ± 11% -0.5 0.56 ± 5% perf-profile.calltrace.cycles-pp.xlog_cil_commit.__xfs_trans_commit.xfs_iomap_write_unwritten.xfs_dio_write_end_io.iomap_dio_complete
0.96 ± 10% -0.3 0.68 ± 6% perf-profile.calltrace.cycles-pp.iomap_dio_bio_end_io.blk_update_request.scsi_end_request.scsi_io_completion.complete_cmd_fusion
0.74 ± 7% +0.2 0.93 ± 10% perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
0.76 ± 7% +0.2 0.97 ± 10% perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
0.82 ± 7% +0.3 1.14 ± 11% perf-profile.calltrace.cycles-pp.newidle_balance.pick_next_task_fair.__schedule.schedule.schedule_timeout
0.87 ± 6% +0.3 1.19 ± 11% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.schedule_timeout.io_wq_worker
1.76 ± 3% +0.6 2.32 ± 3% perf-profile.calltrace.cycles-pp.scsi_end_request.scsi_io_completion.complete_cmd_fusion.megasas_isr_fusion.__handle_irq_event_percpu
1.76 ± 3% +0.6 2.32 ± 3% perf-profile.calltrace.cycles-pp.scsi_io_completion.complete_cmd_fusion.megasas_isr_fusion.__handle_irq_event_percpu.handle_irq_event
0.00 +0.6 0.57 ± 7% perf-profile.calltrace.cycles-pp.enqueue_task_fair.activate_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue
0.00 +0.6 0.60 ± 7% perf-profile.calltrace.cycles-pp.activate_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single
0.00 +0.6 0.63 ± 10% perf-profile.calltrace.cycles-pp.__blk_mq_free_request.scsi_end_request.scsi_io_completion.complete_cmd_fusion.megasas_isr_fusion
0.00 +0.7 0.69 ± 9% perf-profile.calltrace.cycles-pp.blk_mq_run_hw_queues.scsi_end_request.scsi_io_completion.complete_cmd_fusion.megasas_isr_fusion
0.09 ±223% +0.7 0.82 ± 8% perf-profile.calltrace.cycles-pp.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single
0.39 ± 71% +0.7 1.13 ± 7% perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single
0.00 +0.8 0.77 ± 12% perf-profile.calltrace.cycles-pp.xfs_vn_update_time.kiocb_modified.xfs_file_write_checks.xfs_file_dio_write_aligned.xfs_file_write_iter
0.00 +0.8 0.80 ± 13% perf-profile.calltrace.cycles-pp.dd_dispatch_request.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_work_fn
0.00 +0.8 0.81 ± 11% perf-profile.calltrace.cycles-pp.kiocb_modified.xfs_file_write_checks.xfs_file_dio_write_aligned.xfs_file_write_iter.io_write
0.58 ± 46% +0.8 1.39 ± 9% perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_halt
0.00 +0.8 0.82 ± 20% perf-profile.calltrace.cycles-pp.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue.blk_mq_get_tag
0.00 +0.8 0.83 ± 20% perf-profile.calltrace.cycles-pp.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue.blk_mq_get_tag.__blk_mq_alloc_requests
0.00 +0.8 0.83 ± 20% perf-profile.calltrace.cycles-pp.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
0.60 ± 46% +0.8 1.44 ± 9% perf-profile.calltrace.cycles-pp.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter
0.00 +0.8 0.84 ± 10% perf-profile.calltrace.cycles-pp.xfs_file_write_checks.xfs_file_dio_write_aligned.xfs_file_write_iter.io_write.io_issue_sqe
0.73 ± 6% +0.9 1.60 ± 9% perf-profile.calltrace.cycles-pp.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_work_fn.process_one_work.worker_thread
0.73 ± 6% +0.9 1.60 ± 9% perf-profile.calltrace.cycles-pp.blk_mq_sched_dispatch_requests.blk_mq_run_work_fn.process_one_work.worker_thread.kthread
0.73 ± 6% +0.9 1.60 ± 9% perf-profile.calltrace.cycles-pp.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_work_fn.process_one_work
0.73 ± 6% +0.9 1.60 ± 9% perf-profile.calltrace.cycles-pp.blk_mq_run_work_fn.process_one_work.worker_thread.kthread.ret_from_fork
1.99 ± 3% +0.9 2.87 ± 2% perf-profile.calltrace.cycles-pp.complete_cmd_fusion.megasas_isr_fusion.__handle_irq_event_percpu.handle_irq_event.handle_edge_irq
0.79 ± 17% +0.9 1.69 ± 8% perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state
0.00 +0.9 0.90 ± 18% perf-profile.calltrace.cycles-pp.blk_mq_run_hw_queue.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio.submit_bio_noacct_nocheck
2.01 ± 3% +0.9 2.91 ± 2% perf-profile.calltrace.cycles-pp.__handle_irq_event_percpu.handle_irq_event.handle_edge_irq.__common_interrupt.common_interrupt
2.00 ± 3% +0.9 2.90 ± 2% perf-profile.calltrace.cycles-pp.megasas_isr_fusion.__handle_irq_event_percpu.handle_irq_event.handle_edge_irq.__common_interrupt
2.03 ± 3% +0.9 2.95 ± 2% perf-profile.calltrace.cycles-pp.handle_irq_event.handle_edge_irq.__common_interrupt.common_interrupt.asm_common_interrupt
2.05 ± 3% +0.9 2.98 ± 3% perf-profile.calltrace.cycles-pp.handle_edge_irq.__common_interrupt.common_interrupt.asm_common_interrupt.acpi_safe_halt
2.06 ± 3% +0.9 3.00 ± 3% perf-profile.calltrace.cycles-pp.__common_interrupt.common_interrupt.asm_common_interrupt.acpi_safe_halt.acpi_idle_enter
0.00 +1.0 0.97 ± 6% perf-profile.calltrace.cycles-pp.newidle_balance.pick_next_task_fair.__schedule.schedule.io_schedule
0.00 +1.0 0.98 ± 6% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.io_schedule.blk_mq_get_tag
2.09 ± 3% +1.0 3.10 ± 2% perf-profile.calltrace.cycles-pp.common_interrupt.asm_common_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state
2.10 ± 4% +1.0 3.12 ± 2% perf-profile.calltrace.cycles-pp.asm_common_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
0.00 +1.5 1.52 ± 13% perf-profile.calltrace.cycles-pp.__schedule.schedule.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests
0.00 +1.6 1.61 ± 12% perf-profile.calltrace.cycles-pp.schedule.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
0.00 +1.7 1.68 ± 12% perf-profile.calltrace.cycles-pp.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio.submit_bio_noacct_nocheck
0.50 ± 47% +1.7 2.23 ± 8% perf-profile.calltrace.cycles-pp.__schedule.schedule.worker_thread.kthread.ret_from_fork
0.50 ± 47% +1.7 2.25 ± 7% perf-profile.calltrace.cycles-pp.schedule.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
0.17 ±141% +1.8 1.94 ± 7% perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.worker_thread.kthread
0.08 ±223% +1.8 1.88 ± 7% perf-profile.calltrace.cycles-pp.newidle_balance.pick_next_task_fair.__schedule.schedule.worker_thread
0.00 +2.0 2.00 ± 5% perf-profile.calltrace.cycles-pp.__irqentry_text_start.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
0.61 ± 9% +2.2 2.80 ± 6% perf-profile.calltrace.cycles-pp.update_sg_lb_stats.update_sd_lb_stats.find_busiest_group.load_balance.newidle_balance
0.66 ± 9% +2.4 3.07 ± 6% perf-profile.calltrace.cycles-pp.update_sd_lb_stats.find_busiest_group.load_balance.newidle_balance.pick_next_task_fair
0.67 ± 8% +2.4 3.12 ± 6% perf-profile.calltrace.cycles-pp.find_busiest_group.load_balance.newidle_balance.pick_next_task_fair.__schedule
0.70 ± 7% +2.6 3.30 ± 7% perf-profile.calltrace.cycles-pp.load_balance.newidle_balance.pick_next_task_fair.__schedule.schedule
2.47 ± 6% +2.9 5.34 ± 6% perf-profile.calltrace.cycles-pp.iomap_dio_bio_iter.__iomap_dio_rw.iomap_dio_rw.xfs_file_dio_write_aligned.xfs_file_write_iter
3.37 ± 11% +3.3 6.66 ± 8% perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
1.27 ± 10% +3.5 4.82 ± 6% perf-profile.calltrace.cycles-pp.submit_bio_noacct_nocheck.iomap_dio_bio_iter.__iomap_dio_rw.iomap_dio_rw.xfs_file_dio_write_aligned
1.02 ± 31% +3.8 4.78 ± 6% perf-profile.calltrace.cycles-pp.blk_mq_submit_bio.submit_bio_noacct_nocheck.iomap_dio_bio_iter.__iomap_dio_rw.iomap_dio_rw
0.00 +4.0 4.04 ± 7% perf-profile.calltrace.cycles-pp.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio.submit_bio_noacct_nocheck.iomap_dio_bio_iter
0.00 +4.2 4.15 ± 7% perf-profile.calltrace.cycles-pp.__blk_mq_alloc_requests.blk_mq_submit_bio.submit_bio_noacct_nocheck.iomap_dio_bio_iter.__iomap_dio_rw
0.86 ± 10% +5.1 5.97 ± 3% perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
63.38 ± 2% +5.7 69.08 perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
66.58 ± 2% +5.8 72.38 perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
63.90 ± 2% +5.9 69.85 perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
68.18 ± 2% +5.9 74.13 perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
68.18 ± 2% +5.9 74.12 perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
68.04 ± 2% +6.0 74.00 perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
69.10 ± 2% +6.3 75.35 perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
9.01 ± 7% -4.0 5.00 ± 5% perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
8.73 ± 7% -3.8 4.88 ± 5% perf-profile.children.cycles-pp.do_syscall_64
6.94 ± 9% -3.7 3.26 ± 7% perf-profile.children.cycles-pp.__do_sys_io_uring_enter
6.96 ± 10% -3.5 3.44 ± 7% perf-profile.children.cycles-pp.fio_ioring_commit
6.32 ± 28% -3.4 2.90 ± 16% perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
4.56 ± 6% -2.8 1.72 ± 17% perf-profile.children.cycles-pp.iomap_dio_complete
4.55 ± 6% -2.8 1.72 ± 17% perf-profile.children.cycles-pp.iomap_dio_complete_work
4.74 ± 12% -2.7 2.09 ± 9% perf-profile.children.cycles-pp.io_submit_sqes
4.17 ± 6% -2.6 1.57 ± 18% perf-profile.children.cycles-pp.xfs_iomap_write_unwritten
4.21 ± 6% -2.6 1.61 ± 18% perf-profile.children.cycles-pp.xfs_dio_write_end_io
5.50 ± 7% -2.5 2.95 ± 10% perf-profile.children.cycles-pp.do_exit
6.66 ± 7% -2.4 4.28 ± 8% perf-profile.children.cycles-pp.process_one_work
4.30 ± 10% -2.3 2.04 ± 18% perf-profile.children.cycles-pp.exit_notify
3.69 ± 13% -2.1 1.64 ± 25% perf-profile.children.cycles-pp.queued_write_lock_slowpath
1.64 ± 19% -1.3 0.32 ± 16% perf-profile.children.cycles-pp.iomap_iter
1.53 ± 21% -1.3 0.27 ± 14% perf-profile.children.cycles-pp.xfs_direct_write_iomap_begin
2.59 ± 8% -1.2 1.35 ± 12% perf-profile.children.cycles-pp.release_task
1.48 ±121% -1.1 0.34 ± 19% perf-profile.children.cycles-pp.dd_insert_requests
1.89 ± 94% -1.1 0.82 ± 9% perf-profile.children.cycles-pp.blk_finish_plug
1.88 ± 94% -1.1 0.82 ± 9% perf-profile.children.cycles-pp.__blk_flush_plug
1.85 ± 96% -1.0 0.81 ± 9% perf-profile.children.cycles-pp.blk_mq_flush_plug_list
1.84 ± 97% -1.0 0.80 ± 9% perf-profile.children.cycles-pp.blk_mq_dispatch_plug_list
1.46 ± 3% -1.0 0.43 ± 39% perf-profile.children.cycles-pp.xfs_trans_alloc_inode
0.95 ± 11% -0.9 0.10 ± 25% perf-profile.children.cycles-pp.xfs_ilock_for_iomap
0.83 ± 9% -0.7 0.14 ± 18% perf-profile.children.cycles-pp.down_read
0.81 ± 5% -0.6 0.20 ±107% perf-profile.children.cycles-pp.xfs_ilock
0.78 ± 5% -0.6 0.18 ±117% perf-profile.children.cycles-pp.down_write
0.69 ± 24% -0.6 0.10 ± 37% perf-profile.children.cycles-pp.fio_ioring_getevents
1.44 ± 6% -0.6 0.86 ± 3% perf-profile.children.cycles-pp.blk_update_request
1.51 ± 13% -0.5 0.97 ± 21% perf-profile.children.cycles-pp.io_cqring_wait
0.67 ± 8% -0.5 0.15 ± 15% perf-profile.children.cycles-pp.xfs_iunlock
0.76 ± 9% -0.5 0.25 ± 22% perf-profile.children.cycles-pp.bio_iov_iter_get_pages
0.75 ± 8% -0.5 0.25 ± 22% perf-profile.children.cycles-pp.__bio_iov_iter_get_pages
0.64 ± 24% -0.5 0.14 ± 14% perf-profile.children.cycles-pp.__io_run_local_work
0.69 ± 8% -0.5 0.23 ± 25% perf-profile.children.cycles-pp.iov_iter_extract_pages
0.67 ± 9% -0.5 0.22 ± 25% perf-profile.children.cycles-pp.pin_user_pages_fast
0.66 ± 9% -0.4 0.22 ± 25% perf-profile.children.cycles-pp.internal_get_user_pages_fast
0.44 ± 16% -0.4 0.04 ± 73% perf-profile.children.cycles-pp.up_write
0.84 ± 13% -0.4 0.45 ± 33% perf-profile.children.cycles-pp.xfs_bmapi_write
1.52 ± 10% -0.4 1.15 ± 6% perf-profile.children.cycles-pp.__xfs_trans_commit
0.55 ± 11% -0.3 0.20 ± 26% perf-profile.children.cycles-pp.lockless_pages_from_mm
0.78 ± 14% -0.3 0.45 ± 9% perf-profile.children.cycles-pp.dd_bio_merge
0.64 ± 19% -0.3 0.34 ± 8% perf-profile.children.cycles-pp.blk_mq_sched_try_merge
0.46 ± 15% -0.3 0.17 ± 23% perf-profile.children.cycles-pp.gup_pgd_range
0.98 ± 10% -0.3 0.72 ± 5% perf-profile.children.cycles-pp.iomap_dio_bio_end_io
0.51 ± 24% -0.3 0.24 ± 14% perf-profile.children.cycles-pp.kmem_cache_free
0.40 ± 16% -0.3 0.14 ± 24% perf-profile.children.cycles-pp.__slab_free
0.38 ± 16% -0.2 0.14 ± 23% perf-profile.children.cycles-pp.gup_pte_range
0.32 ± 75% -0.2 0.10 ± 25% perf-profile.children.cycles-pp.xfs_bmapi_read
0.60 ± 9% -0.2 0.39 ± 21% perf-profile.children.cycles-pp.xfs_trans_alloc
0.38 ± 15% -0.2 0.17 ± 15% perf-profile.children.cycles-pp.xfs_bmapi_convert_unwritten
0.27 ± 31% -0.2 0.06 ± 52% perf-profile.children.cycles-pp.xfs_ilock_iocb_for_write
0.31 ± 18% -0.2 0.11 ± 27% perf-profile.children.cycles-pp.try_grab_folio
0.24 ± 19% -0.2 0.05 ± 54% perf-profile.children.cycles-pp.fio_gettime
0.34 ± 16% -0.2 0.15 ± 6% perf-profile.children.cycles-pp.xfs_bmap_add_extent_unwritten_real
0.25 ± 12% -0.2 0.06 ± 59% perf-profile.children.cycles-pp.__io_req_task_work_add
0.27 ± 8% -0.2 0.09 ± 12% perf-profile.children.cycles-pp.__bio_release_pages
0.48 ± 11% -0.2 0.31 ± 20% perf-profile.children.cycles-pp.xfs_trans_reserve
0.20 ± 14% -0.2 0.03 ±100% perf-profile.children.cycles-pp.xfs_ilock_nowait
0.31 ± 15% -0.2 0.16 ± 23% perf-profile.children.cycles-pp.kmem_cache_alloc
0.24 ± 15% -0.2 0.09 ± 24% perf-profile.children.cycles-pp.kfree
0.35 ± 44% -0.2 0.20 ± 20% perf-profile.children.cycles-pp.llist_reverse_order
0.24 ± 11% -0.1 0.09 ± 23% perf-profile.children.cycles-pp.xlog_ticket_alloc
0.17 ± 76% -0.1 0.02 ± 99% perf-profile.children.cycles-pp.io_req_rw_complete
0.42 ± 9% -0.1 0.27 ± 22% perf-profile.children.cycles-pp.xfs_log_reserve
0.23 ± 10% -0.1 0.09 ± 29% perf-profile.children.cycles-pp.bio_alloc_bioset
0.17 ± 23% -0.1 0.03 ±100% perf-profile.children.cycles-pp.down_read_trylock
0.47 ± 19% -0.1 0.33 ± 20% perf-profile.children.cycles-pp.io_queue_iowq
0.16 ± 30% -0.1 0.04 ± 75% perf-profile.children.cycles-pp.get_io_u
0.22 ± 11% -0.1 0.11 ± 8% perf-profile.children.cycles-pp.up_read
0.16 ± 33% -0.1 0.05 ± 76% perf-profile.children.cycles-pp.xfs_trans_run_precommits
0.16 ± 8% -0.1 0.05 ± 71% perf-profile.children.cycles-pp.__cond_resched
0.15 ± 26% -0.1 0.04 ± 72% perf-profile.children.cycles-pp.__io_submit_flush_completions
0.27 ± 18% -0.1 0.16 ± 5% perf-profile.children.cycles-pp.__blk_bios_map_sg
0.12 ± 26% -0.1 0.03 ±100% perf-profile.children.cycles-pp.io_free_batch_list
0.20 ± 22% -0.1 0.10 ± 14% perf-profile.children.cycles-pp.xfs_trans_ijoin
0.12 ± 33% -0.1 0.03 ±100% perf-profile.children.cycles-pp.xfs_bmbt_to_iomap
0.27 ± 17% -0.1 0.18 ± 10% perf-profile.children.cycles-pp.__blk_rq_map_sg
0.18 ± 16% -0.1 0.09 ± 42% perf-profile.children.cycles-pp.memset_orig
0.32 ± 17% -0.1 0.24 ± 15% perf-profile.children.cycles-pp.scsi_alloc_sgtables
0.21 ± 15% -0.1 0.13 ± 17% perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
0.10 ± 27% -0.1 0.03 ±102% perf-profile.children.cycles-pp.mempool_alloc
0.33 ± 16% -0.1 0.26 ± 13% perf-profile.children.cycles-pp.sd_setup_read_write_cmnd
0.09 ± 17% -0.1 0.03 ±100% perf-profile.children.cycles-pp.get_random_u32
0.18 ± 18% -0.1 0.12 ± 28% perf-profile.children.cycles-pp.run_timer_softirq
0.12 ± 13% -0.1 0.06 ± 48% perf-profile.children.cycles-pp.xfs_iext_get_extent
0.10 ± 29% -0.1 0.04 ± 73% perf-profile.children.cycles-pp.bio_associate_blkg
0.01 ±223% +0.1 0.06 ± 15% perf-profile.children.cycles-pp.xfs_ag_block_count
0.12 ± 30% +0.1 0.18 ± 13% perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
0.00 +0.1 0.06 ± 13% perf-profile.children.cycles-pp.finish_wait
0.03 ±141% +0.1 0.08 ± 38% perf-profile.children.cycles-pp.__enqueue_entity
0.12 ± 13% +0.1 0.19 ± 12% perf-profile.children.cycles-pp.set_next_entity
0.00 +0.1 0.07 ± 20% perf-profile.children.cycles-pp.try_to_grab_pending
0.12 ± 23% +0.1 0.20 ± 30% perf-profile.children.cycles-pp.wakeup_preempt
0.02 ±142% +0.1 0.10 ± 40% perf-profile.children.cycles-pp.megasas_build_ldio_fusion
0.16 ± 30% +0.1 0.25 ± 18% perf-profile.children.cycles-pp.switch_mm_irqs_off
0.12 ± 32% +0.1 0.22 ± 12% perf-profile.children.cycles-pp.llist_add_batch
0.11 ± 9% +0.1 0.22 ± 21% perf-profile.children.cycles-pp.rb_erase
0.13 ± 18% +0.1 0.24 ± 23% perf-profile.children.cycles-pp.elv_attempt_insert_merge
0.17 ± 15% +0.1 0.28 ± 15% perf-profile.children.cycles-pp.scsi_dispatch_cmd
0.08 ± 58% +0.1 0.19 ± 13% perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
0.07 ± 20% +0.1 0.19 ± 19% perf-profile.children.cycles-pp.xas_load
0.00 +0.1 0.12 ± 26% perf-profile.children.cycles-pp.sbitmap_finish_wait
0.00 +0.1 0.13 ± 16% perf-profile.children.cycles-pp.elv_rb_del
0.56 ± 9% +0.1 0.69 ± 7% perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
0.28 ± 13% +0.1 0.41 ± 6% perf-profile.children.cycles-pp.queue_work_on
0.00 +0.1 0.14 ± 28% perf-profile.children.cycles-pp.sbitmap_queue_get_shallow
0.06 ± 50% +0.1 0.20 ± 20% perf-profile.children.cycles-pp.kblockd_mod_delayed_work_on
0.06 ± 50% +0.1 0.20 ± 20% perf-profile.children.cycles-pp.mod_delayed_work_on
0.11 ± 14% +0.2 0.27 ± 11% perf-profile.children.cycles-pp.xas_find
0.02 ±142% +0.2 0.18 ± 21% perf-profile.children.cycles-pp.dd_has_work
0.24 ± 20% +0.2 0.40 ± 15% perf-profile.children.cycles-pp.ttwu_queue_wakelist
0.10 ± 24% +0.2 0.27 ± 13% perf-profile.children.cycles-pp._find_next_zero_bit
0.09 ± 25% +0.2 0.26 ± 14% perf-profile.children.cycles-pp.elv_rqhash_find
0.37 ± 14% +0.2 0.54 ± 14% perf-profile.children.cycles-pp.idle_cpu
0.22 ± 11% +0.2 0.40 ± 9% perf-profile.children.cycles-pp._find_next_and_bit
0.55 ± 10% +0.2 0.75 ± 11% perf-profile.children.cycles-pp.update_load_avg
0.00 +0.2 0.21 ± 19% perf-profile.children.cycles-pp.prepare_to_wait_exclusive
0.10 ± 20% +0.2 0.31 ± 15% perf-profile.children.cycles-pp.__dd_dispatch_request
0.66 ± 9% +0.2 0.88 ± 9% perf-profile.children.cycles-pp.update_rq_clock
0.01 ±223% +0.2 0.24 ± 27% perf-profile.children.cycles-pp.__sbitmap_weight
0.01 ±223% +0.2 0.25 ± 25% perf-profile.children.cycles-pp.sbitmap_weight
0.07 ± 28% +0.3 0.32 ± 15% perf-profile.children.cycles-pp.sbitmap_get_shallow
0.47 ± 21% +0.3 0.72 ± 12% perf-profile.children.cycles-pp.enqueue_entity
0.28 ± 10% +0.3 0.54 ± 4% perf-profile.children.cycles-pp.__queue_work
0.77 ± 7% +0.3 1.04 ± 8% perf-profile.children.cycles-pp.schedule_idle
1.04 ± 5% +0.3 1.32 ± 7% perf-profile.children.cycles-pp.try_to_wake_up
0.22 ± 14% +0.3 0.50 ± 6% perf-profile.children.cycles-pp.kick_pool
0.12 ± 27% +0.3 0.40 ± 12% perf-profile.children.cycles-pp.xa_find_after
0.25 ± 10% +0.3 0.53 ± 9% perf-profile.children.cycles-pp.cpu_util
0.55 ± 12% +0.3 0.88 ± 9% perf-profile.children.cycles-pp.xfs_file_write_checks
0.60 ± 17% +0.3 0.94 ± 6% perf-profile.children.cycles-pp.activate_task
0.56 ± 17% +0.4 0.92 ± 6% perf-profile.children.cycles-pp.enqueue_task_fair
0.42 ± 14% +0.4 0.82 ± 10% perf-profile.children.cycles-pp.kiocb_modified
0.00 +0.4 0.41 ± 18% perf-profile.children.cycles-pp.__dd_do_insert
0.31 ± 13% +0.5 0.77 ± 12% perf-profile.children.cycles-pp.xfs_vn_update_time
0.00 +0.5 0.48 ± 7% perf-profile.children.cycles-pp.autoremove_wake_function
0.23 ± 11% +0.5 0.72 ± 8% perf-profile.children.cycles-pp.blk_mq_run_hw_queues
0.65 ± 16% +0.5 1.16 ± 6% perf-profile.children.cycles-pp.ttwu_do_activate
0.65 ± 18% +0.5 1.18 ± 8% perf-profile.children.cycles-pp.sched_ttwu_pending
0.18 ± 10% +0.6 0.74 ± 3% perf-profile.children.cycles-pp.sbitmap_find_bit
0.00 +0.6 0.56 ± 9% perf-profile.children.cycles-pp.__wake_up_common
0.00 +0.6 0.56 ± 9% perf-profile.children.cycles-pp.__wake_up
0.00 +0.6 0.58 ± 12% perf-profile.children.cycles-pp.sbitmap_queue_wake_up
1.80 ± 4% +0.6 2.40 ± 2% perf-profile.children.cycles-pp.scsi_end_request
1.80 ± 4% +0.6 2.41 ± 2% perf-profile.children.cycles-pp.scsi_io_completion
0.41 ± 6% +0.6 1.02 ± 6% perf-profile.children.cycles-pp.__irqentry_text_start
0.02 ±145% +0.6 0.64 ± 11% perf-profile.children.cycles-pp.sbitmap_queue_clear
0.16 ± 22% +0.6 0.80 ± 19% perf-profile.children.cycles-pp.sbitmap_get
0.10 ± 25% +0.6 0.74 ± 11% perf-profile.children.cycles-pp.__blk_mq_free_request
0.82 ± 17% +0.7 1.47 ± 9% perf-profile.children.cycles-pp.__flush_smp_call_function_queue
0.83 ± 16% +0.7 1.49 ± 8% perf-profile.children.cycles-pp.__sysvec_call_function_single
0.17 ± 36% +0.8 0.94 ± 11% perf-profile.children.cycles-pp.dd_dispatch_request
0.95 ± 15% +0.8 1.73 ± 7% perf-profile.children.cycles-pp.sysvec_call_function_single
0.73 ± 6% +0.9 1.61 ± 9% perf-profile.children.cycles-pp.blk_mq_run_work_fn
2.04 ± 4% +0.9 2.98 perf-profile.children.cycles-pp.complete_cmd_fusion
2.05 ± 3% +1.0 3.01 perf-profile.children.cycles-pp.megasas_isr_fusion
2.06 ± 4% +1.0 3.02 perf-profile.children.cycles-pp.__handle_irq_event_percpu
2.08 ± 3% +1.0 3.05 perf-profile.children.cycles-pp.handle_irq_event
2.10 ± 3% +1.0 3.09 ± 2% perf-profile.children.cycles-pp.handle_edge_irq
2.11 ± 3% +1.0 3.10 ± 2% perf-profile.children.cycles-pp.__common_interrupt
2.14 ± 4% +1.1 3.21 perf-profile.children.cycles-pp.common_interrupt
0.20 ± 28% +1.1 1.28 ± 12% perf-profile.children.cycles-pp.scsi_mq_get_budget
2.14 ± 4% +1.1 3.24 perf-profile.children.cycles-pp.asm_common_interrupt
0.29 ± 32% +1.3 1.58 ± 9% perf-profile.children.cycles-pp.blk_mq_run_hw_queue
1.85 ± 6% +1.8 3.66 ± 5% perf-profile.children.cycles-pp.update_sg_lb_stats
2.12 ± 4% +1.8 3.97 ± 5% perf-profile.children.cycles-pp.update_sd_lb_stats
2.18 ± 4% +1.9 4.03 ± 5% perf-profile.children.cycles-pp.find_busiest_group
2.79 ± 4% +1.9 4.73 ± 6% perf-profile.children.cycles-pp.load_balance
2.33 ± 12% +2.0 4.30 ± 7% perf-profile.children.cycles-pp.asm_sysvec_call_function_single
0.00 +2.0 2.01 ± 12% perf-profile.children.cycles-pp.io_schedule
0.98 ± 12% +2.0 3.00 ± 7% perf-profile.children.cycles-pp.__blk_mq_do_dispatch_sched
0.98 ± 13% +2.0 3.01 ± 7% perf-profile.children.cycles-pp.__blk_mq_sched_dispatch_requests
0.98 ± 13% +2.0 3.02 ± 7% perf-profile.children.cycles-pp.blk_mq_sched_dispatch_requests
2.09 ± 5% +2.3 4.43 ± 6% perf-profile.children.cycles-pp.newidle_balance
2.26 ± 4% +2.5 4.75 ± 5% perf-profile.children.cycles-pp.pick_next_task_fair
2.48 ± 6% +2.9 5.34 ± 6% perf-profile.children.cycles-pp.iomap_dio_bio_iter
2.90 ± 7% +2.9 5.81 ± 5% perf-profile.children.cycles-pp.schedule
3.83 ± 5% +3.1 6.90 ± 4% perf-profile.children.cycles-pp.__schedule
1.27 ± 10% +3.6 4.82 ± 6% perf-profile.children.cycles-pp.submit_bio_noacct_nocheck
1.18 ± 9% +3.6 4.78 ± 6% perf-profile.children.cycles-pp.blk_mq_submit_bio
0.09 ± 17% +3.9 4.04 ± 7% perf-profile.children.cycles-pp.blk_mq_get_tag
0.15 ± 15% +4.0 4.15 ± 7% perf-profile.children.cycles-pp.__blk_mq_alloc_requests
0.88 ± 10% +5.3 6.14 ± 3% perf-profile.children.cycles-pp.poll_idle
64.25 ± 2% +5.9 70.18 perf-profile.children.cycles-pp.cpuidle_enter
68.18 ± 2% +5.9 74.13 perf-profile.children.cycles-pp.start_secondary
63.95 ± 2% +6.0 69.91 perf-profile.children.cycles-pp.cpuidle_enter_state
67.55 ± 2% +6.0 73.59 perf-profile.children.cycles-pp.cpuidle_idle_call
69.08 ± 2% +6.3 75.34 perf-profile.children.cycles-pp.do_idle
69.10 ± 2% +6.3 75.35 perf-profile.children.cycles-pp.cpu_startup_entry
69.10 ± 2% +6.3 75.35 perf-profile.children.cycles-pp.secondary_startup_64_no_verify
6.32 ± 28% -3.4 2.89 ± 16% perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
0.46 ± 16% -0.3 0.12 ± 16% perf-profile.self.cycles-pp.__iomap_dio_rw
0.39 ± 18% -0.3 0.14 ± 25% perf-profile.self.cycles-pp.__slab_free
0.45 ± 20% -0.2 0.23 ± 13% perf-profile.self.cycles-pp.iomap_dio_bio_end_io
0.23 ± 20% -0.2 0.04 ± 77% perf-profile.self.cycles-pp.fio_gettime
0.29 ± 19% -0.2 0.10 ± 27% perf-profile.self.cycles-pp.try_grab_folio
0.29 ± 40% -0.2 0.11 ± 30% perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
0.26 ± 11% -0.2 0.09 ± 51% perf-profile.self.cycles-pp.try_to_wake_up
0.16 ± 24% -0.1 0.02 ± 99% perf-profile.self.cycles-pp.down_read_trylock
0.22 ± 9% -0.1 0.11 ± 10% perf-profile.self.cycles-pp.up_read
0.18 ± 23% -0.1 0.08 ± 19% perf-profile.self.cycles-pp.kmem_cache_free
0.14 ± 30% -0.1 0.04 ± 77% perf-profile.self.cycles-pp.get_io_u
0.26 ± 18% -0.1 0.16 ± 5% perf-profile.self.cycles-pp.__blk_bios_map_sg
0.14 ± 20% -0.1 0.05 ± 47% perf-profile.self.cycles-pp.blk_mq_submit_bio
0.14 ± 20% -0.1 0.05 ± 86% perf-profile.self.cycles-pp.__io_req_task_work_add
0.14 ± 18% -0.1 0.05 ± 48% perf-profile.self.cycles-pp.__bio_release_pages
0.29 ± 13% -0.1 0.20 ± 20% perf-profile.self.cycles-pp.llist_reverse_order
0.17 ± 19% -0.1 0.09 ± 41% perf-profile.self.cycles-pp.memset_orig
0.13 ± 31% -0.1 0.06 ± 51% perf-profile.self.cycles-pp.xfs_inode_to_log_dinode
0.12 ± 14% -0.1 0.05 ± 47% perf-profile.self.cycles-pp.xfs_bmap_add_extent_unwritten_real
0.11 ± 23% -0.1 0.05 ± 73% perf-profile.self.cycles-pp.inode_maybe_inc_iversion
0.14 ± 12% -0.1 0.08 ± 14% perf-profile.self.cycles-pp.xfs_file_write_iter
0.14 ± 23% -0.1 0.09 ± 22% perf-profile.self.cycles-pp.fget
0.10 ± 13% -0.0 0.06 ± 46% perf-profile.self.cycles-pp.xfs_iext_get_extent
0.03 ±102% +0.1 0.09 ± 21% perf-profile.self.cycles-pp.pick_next_task_fair
0.02 ±141% +0.1 0.08 ± 38% perf-profile.self.cycles-pp.__enqueue_entity
0.10 ± 21% +0.1 0.16 ± 24% perf-profile.self.cycles-pp.irqentry_enter
0.01 ±223% +0.1 0.08 ± 37% perf-profile.self.cycles-pp.xas_load
0.00 +0.1 0.08 ± 18% perf-profile.self.cycles-pp.__blk_mq_free_request
0.01 ±223% +0.1 0.09 ± 22% perf-profile.self.cycles-pp.dd_dispatch_request
0.20 ± 21% +0.1 0.28 ± 16% perf-profile.self.cycles-pp.update_load_avg
0.00 +0.1 0.08 ± 25% perf-profile.self.cycles-pp.ttwu_do_activate
0.16 ± 31% +0.1 0.24 ± 20% perf-profile.self.cycles-pp.switch_mm_irqs_off
0.08 ± 27% +0.1 0.17 ± 23% perf-profile.self.cycles-pp.__dd_dispatch_request
0.09 ± 21% +0.1 0.19 ± 25% perf-profile.self.cycles-pp.enqueue_task_fair
0.10 ± 14% +0.1 0.20 ± 23% perf-profile.self.cycles-pp.rb_erase
0.12 ± 32% +0.1 0.22 ± 12% perf-profile.self.cycles-pp.llist_add_batch
0.00 +0.1 0.11 ± 31% perf-profile.self.cycles-pp.prepare_to_wait_exclusive
0.54 ± 12% +0.1 0.67 ± 6% perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
0.00 +0.1 0.14 ± 27% perf-profile.self.cycles-pp.sbitmap_queue_get_shallow
0.01 ±223% +0.1 0.15 ± 22% perf-profile.self.cycles-pp.xa_find_after
0.73 ± 11% +0.1 0.87 ± 4% perf-profile.self.cycles-pp._raw_spin_lock_irqsave
0.02 ±142% +0.2 0.17 ± 22% perf-profile.self.cycles-pp.dd_has_work
0.36 ± 13% +0.2 0.52 ± 12% perf-profile.self.cycles-pp.idle_cpu
0.10 ± 25% +0.2 0.26 ± 15% perf-profile.self.cycles-pp._find_next_zero_bit
0.86 ± 5% +0.2 1.02 ± 12% perf-profile.self.cycles-pp.menu_select
0.09 ± 27% +0.2 0.26 ± 14% perf-profile.self.cycles-pp.elv_rqhash_find
0.20 ± 12% +0.2 0.37 ± 14% perf-profile.self.cycles-pp._find_next_and_bit
0.08 ± 53% +0.2 0.26 ± 19% perf-profile.self.cycles-pp.complete_cmd_fusion
0.00 +0.2 0.19 ± 26% perf-profile.self.cycles-pp.__sbitmap_weight
0.24 ± 11% +0.2 0.48 ± 12% perf-profile.self.cycles-pp.__schedule
0.21 ± 7% +0.2 0.46 ± 11% perf-profile.self.cycles-pp.cpu_util
0.01 ±223% +0.3 0.28 ± 15% perf-profile.self.cycles-pp.scsi_mq_get_budget
0.03 ±101% +0.3 0.30 ± 21% perf-profile.self.cycles-pp.sbitmap_get
0.08 ± 24% +0.4 0.48 ± 5% perf-profile.self.cycles-pp.sbitmap_find_bit
1.37 ± 8% +1.2 2.58 ± 8% perf-profile.self.cycles-pp.update_sg_lb_stats
0.82 ± 12% +5.0 5.84 ± 3% perf-profile.self.cycles-pp.poll_idle
Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
2024-01-31 15:42 [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression kernel test robot
@ 2024-01-31 18:17 ` Bart Van Assche
2024-01-31 18:42 ` Jens Axboe
2024-02-09 21:06 ` Jens Axboe
1 sibling, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2024-01-31 18:17 UTC (permalink / raw)
To: kernel test robot
Cc: oe-lkp, lkp, Linux Memory Management List, Jens Axboe,
Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
feng.tang, fengwei.yin
On 1/31/24 07:42, kernel test robot wrote:
> kernel test robot noticed a -72.9% regression of fio.write_iops on:
>
>
> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> testcase: fio-basic
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
>
> runtime: 300s
> disk: 1HDD
> fs: xfs
> nr_task: 100%
> test_size: 128G
> rw: write
> bs: 4k
> ioengine: io_uring
> direct: direct
> cpufreq_governor: performance
The actual test is available in this file:
https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script
I haven't found anything in that file for disabling merging. Merging
requests decreases IOPS. Does this perhaps mean that this test is
broken?
Thanks,
Bart.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
2024-01-31 18:17 ` Bart Van Assche
@ 2024-01-31 18:42 ` Jens Axboe
2024-02-01 7:18 ` Oliver Sang
0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2024-01-31 18:42 UTC (permalink / raw)
To: Bart Van Assche, kernel test robot
Cc: oe-lkp, lkp, Linux Memory Management List, Oleksandr Natalenko,
Johannes Thumshirn, linux-block, ying.huang, feng.tang,
fengwei.yin
On 1/31/24 11:17 AM, Bart Van Assche wrote:
> On 1/31/24 07:42, kernel test robot wrote:
>> kernel test robot noticed a -72.9% regression of fio.write_iops on:
>>
>>
>> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>
>> testcase: fio-basic
>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
>> parameters:
>>
>> runtime: 300s
>> disk: 1HDD
>> fs: xfs
>> nr_task: 100%
>> test_size: 128G
>> rw: write
>> bs: 4k
>> ioengine: io_uring
>> direct: direct
>> cpufreq_governor: performance
>
> The actual test is available in this file:
> https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script
>
> I haven't found anything in that file for disabling merging. Merging
> requests decreases IOPS. Does this perhaps mean that this test is
> broken?
It's hard to know as nothing in this email or links include the actual
output of the job...
But if it's fio IOPS, then those are application side and don't
necessarily correlate to drive IOPS due to merging. Eg for fio iops, if
it does 4k sequential and we merge to 128k, then the fio perceived iops
will be 32 times larger than the device side.
I'll take a look, but seems like there might be something there. By
inserting into the other list, the request is also not available for
merging. And the test in question does single IOs at the time.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
2024-01-31 18:42 ` Jens Axboe
@ 2024-02-01 7:18 ` Oliver Sang
2024-02-01 13:40 ` Jens Axboe
0 siblings, 1 reply; 9+ messages in thread
From: Oliver Sang @ 2024-02-01 7:18 UTC (permalink / raw)
To: Jens Axboe
Cc: Bart Van Assche, oe-lkp, lkp, Linux Memory Management List,
Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
feng.tang, fengwei.yin, oliver.sang
[-- Attachment #1: Type: text/plain, Size: 1953 bytes --]
hi, Jens Axboe,
On Wed, Jan 31, 2024 at 11:42:46AM -0700, Jens Axboe wrote:
> On 1/31/24 11:17 AM, Bart Van Assche wrote:
> > On 1/31/24 07:42, kernel test robot wrote:
> >> kernel test robot noticed a -72.9% regression of fio.write_iops on:
> >>
> >>
> >> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
> >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>
> >> testcase: fio-basic
> >> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> >> parameters:
> >>
> >> runtime: 300s
> >> disk: 1HDD
> >> fs: xfs
> >> nr_task: 100%
> >> test_size: 128G
> >> rw: write
> >> bs: 4k
> >> ioengine: io_uring
> >> direct: direct
> >> cpufreq_governor: performance
> >
> > The actual test is available in this file:
> > https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script
> >
> > I haven't found anything in that file for disabling merging. Merging
> > requests decreases IOPS. Does this perhaps mean that this test is
> > broken?
>
> It's hard to know as nothing in this email or links include the actual
> output of the job...
I attached a dmesg and 2 outputs while running tests on 574e7779cf. not sure if
they are helpful?
>
> But if it's fio IOPS, then those are application side and don't
> necessarily correlate to drive IOPS due to merging. Eg for fio iops, if
> it does 4k sequential and we merge to 128k, then the fio perceived iops
> will be 32 times larger than the device side.
>
> I'll take a look, but seems like there might be something there. By
> inserting into the other list, the request is also not available for
> merging. And the test in question does single IOs at the time.
if you have any debug patch want us to run, please just let us know.
it will be our great pleasure!
>
> --
> Jens Axboe
>
[-- Attachment #2: dmesg.xz --]
[-- Type: application/x-xz, Size: 37396 bytes --]
[-- Attachment #3: fio --]
[-- Type: text/plain, Size: 348 bytes --]
2024-01-30 16:11:43 echo '[global]
bs=4k
ioengine=io_uring
iodepth=32
size=2147483648
nr_files=1
filesize=2147483648
direct=1
runtime=300
invalidate=1
fallocate=posix
io_size=2147483648
file_service_type=roundrobin
random_distribution=random
group_reporting
pre_read=0
[task_0]
rw=write
directory=/fs/sdb1
numjobs=64' | fio --output-format=json -
[-- Attachment #4: fio.output --]
[-- Type: text/plain, Size: 6913 bytes --]
{
"fio version" : "fio-3.33",
"timestamp" : 1706631404,
"timestamp_ms" : 1706631404540,
"time" : "Tue Jan 30 16:16:44 2024",
"global options" : {
"bs" : "4k",
"ioengine" : "io_uring",
"iodepth" : "32",
"size" : "2147483648",
"nrfiles" : "1",
"filesize" : "2147483648",
"direct" : "1",
"runtime" : "300",
"invalidate" : "1",
"fallocate" : "posix",
"io_size" : "2147483648",
"file_service_type" : "roundrobin",
"random_distribution" : "random",
"pre_read" : "0"
},
"jobs" : [
{
"jobname" : "task_0",
"groupid" : 0,
"error" : 0,
"eta" : 0,
"elapsed" : 301,
"job options" : {
"rw" : "write",
"directory" : "/fs/sdb1",
"numjobs" : "64"
},
"read" : {
"io_bytes" : 0,
"io_kbytes" : 0,
"bw_bytes" : 0,
"bw" : 0,
"iops" : 0.000000,
"runtime" : 0,
"total_ios" : 0,
"short_ios" : 0,
"drop_ios" : 0,
"slat_ns" : {
"min" : 0,
"max" : 0,
"mean" : 0.000000,
"stddev" : 0.000000,
"N" : 0
},
"clat_ns" : {
"min" : 0,
"max" : 0,
"mean" : 0.000000,
"stddev" : 0.000000,
"N" : 0
},
"lat_ns" : {
"min" : 0,
"max" : 0,
"mean" : 0.000000,
"stddev" : 0.000000,
"N" : 0
},
"bw_min" : 0,
"bw_max" : 0,
"bw_agg" : 0.000000,
"bw_mean" : 0.000000,
"bw_dev" : 0.000000,
"bw_samples" : 0,
"iops_min" : 0,
"iops_max" : 0,
"iops_mean" : 0.000000,
"iops_stddev" : 0.000000,
"iops_samples" : 0
},
"write" : {
"io_bytes" : 7811690496,
"io_kbytes" : 7628604,
"bw_bytes" : 26013908,
"bw" : 25404,
"iops" : 6351.051820,
"runtime" : 300289,
"total_ios" : 1907151,
"short_ios" : 0,
"drop_ios" : 0,
"slat_ns" : {
"min" : 326,
"max" : 866214376,
"mean" : 9843550.857783,
"stddev" : 36978079.656282,
"N" : 1907151
},
"clat_ns" : {
"min" : 182009,
"max" : 1499097001,
"mean" : 312417107.300301,
"stddev" : 178075607.629576,
"N" : 1907151,
"percentile" : {
"1.000000" : 36438016,
"5.000000" : 64749568,
"10.000000" : 92798976,
"20.000000" : 145752064,
"30.000000" : 191889408,
"40.000000" : 238026752,
"50.000000" : 291504128,
"60.000000" : 350224384,
"70.000000" : 408944640,
"80.000000" : 471859200,
"90.000000" : 549453824,
"95.000000" : 616562688,
"99.000000" : 775946240,
"99.500000" : 859832320,
"99.900000" : 1035993088,
"99.950000" : 1115684864,
"99.990000" : 1249902592
}
},
"lat_ns" : {
"min" : 318070,
"max" : 1499097433,
"mean" : 322260658.158085,
"stddev" : 180107939.833017,
"N" : 1907151
},
"bw_min" : 2800,
"bw_max" : 80728,
"bw_agg" : 99.991312,
"bw_mean" : 25402.113478,
"bw_dev" : 183.115387,
"bw_samples" : 38394,
"iops_min" : 700,
"iops_max" : 20182,
"iops_mean" : 6350.528370,
"iops_stddev" : 45.778847,
"iops_samples" : 38394
},
"trim" : {
"io_bytes" : 0,
"io_kbytes" : 0,
"bw_bytes" : 0,
"bw" : 0,
"iops" : 0.000000,
"runtime" : 0,
"total_ios" : 0,
"short_ios" : 0,
"drop_ios" : 0,
"slat_ns" : {
"min" : 0,
"max" : 0,
"mean" : 0.000000,
"stddev" : 0.000000,
"N" : 0
},
"clat_ns" : {
"min" : 0,
"max" : 0,
"mean" : 0.000000,
"stddev" : 0.000000,
"N" : 0
},
"lat_ns" : {
"min" : 0,
"max" : 0,
"mean" : 0.000000,
"stddev" : 0.000000,
"N" : 0
},
"bw_min" : 0,
"bw_max" : 0,
"bw_agg" : 0.000000,
"bw_mean" : 0.000000,
"bw_dev" : 0.000000,
"bw_samples" : 0,
"iops_min" : 0,
"iops_max" : 0,
"iops_mean" : 0.000000,
"iops_stddev" : 0.000000,
"iops_samples" : 0
},
"sync" : {
"total_ios" : 0,
"lat_ns" : {
"min" : 0,
"max" : 0,
"mean" : 0.000000,
"stddev" : 0.000000,
"N" : 0
}
},
"job_runtime" : 19210274,
"usr_cpu" : 0.005950,
"sys_cpu" : 0.030385,
"ctx" : 446567,
"majf" : 0,
"minf" : 883,
"iodepth_level" : {
"1" : 0.100000,
"2" : 0.100000,
"4" : 0.100000,
"8" : 0.100000,
"16" : 0.100000,
"32" : 99.895970,
">=64" : 0.000000
},
"iodepth_submit" : {
"0" : 0.000000,
"4" : 100.000000,
"8" : 0.000000,
"16" : 0.000000,
"32" : 0.000000,
"64" : 0.000000,
">=64" : 0.000000
},
"iodepth_complete" : {
"0" : 0.000000,
"4" : 99.996641,
"8" : 0.000000,
"16" : 0.000000,
"32" : 0.100000,
"64" : 0.000000,
">=64" : 0.000000
},
"latency_ns" : {
"2" : 0.000000,
"4" : 0.000000,
"10" : 0.000000,
"20" : 0.000000,
"50" : 0.000000,
"100" : 0.000000,
"250" : 0.000000,
"500" : 0.000000,
"750" : 0.000000,
"1000" : 0.000000
},
"latency_us" : {
"2" : 0.000000,
"4" : 0.000000,
"10" : 0.000000,
"20" : 0.000000,
"50" : 0.000000,
"100" : 0.000000,
"250" : 0.010000,
"500" : 0.010000,
"750" : 0.010000,
"1000" : 0.010000
},
"latency_ms" : {
"2" : 0.010000,
"4" : 0.033925,
"10" : 0.247909,
"20" : 0.190231,
"50" : 2.933171,
"100" : 7.702536,
"250" : 31.258039,
"500" : 41.803822,
"750" : 14.596065,
"1000" : 1.079935,
"2000" : 0.153056,
">=2000" : 0.000000
},
"latency_depth" : 32,
"latency_target" : 0,
"latency_percentile" : 100.000000,
"latency_window" : 0
}
],
"disk_util" : [
{
"name" : "sdb",
"read_ios" : 0,
"write_ios" : 1013029,
"read_merges" : 0,
"write_merges" : 894106,
"read_ticks" : 0,
"write_ticks" : 53607123,
"in_queue" : 53609981,
"util" : 96.247980
}
]
}
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
2024-02-01 7:18 ` Oliver Sang
@ 2024-02-01 13:40 ` Jens Axboe
2024-02-01 14:03 ` Oliver Sang
0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2024-02-01 13:40 UTC (permalink / raw)
To: Oliver Sang
Cc: Bart Van Assche, oe-lkp, lkp, Linux Memory Management List,
Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
feng.tang, fengwei.yin
On 2/1/24 12:18 AM, Oliver Sang wrote:
> hi, Jens Axboe,
>
> On Wed, Jan 31, 2024 at 11:42:46AM -0700, Jens Axboe wrote:
>> On 1/31/24 11:17 AM, Bart Van Assche wrote:
>>> On 1/31/24 07:42, kernel test robot wrote:
>>>> kernel test robot noticed a -72.9% regression of fio.write_iops on:
>>>>
>>>>
>>>> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>
>>>> testcase: fio-basic
>>>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
>>>> parameters:
>>>>
>>>> runtime: 300s
>>>> disk: 1HDD
>>>> fs: xfs
>>>> nr_task: 100%
>>>> test_size: 128G
>>>> rw: write
>>>> bs: 4k
>>>> ioengine: io_uring
>>>> direct: direct
>>>> cpufreq_governor: performance
>>>
>>> The actual test is available in this file:
>>> https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script
>>>
>>> I haven't found anything in that file for disabling merging. Merging
>>> requests decreases IOPS. Does this perhaps mean that this test is
>>> broken?
>>
>> It's hard to know as nothing in this email or links include the actual
>> output of the job...
>
> I attached a dmesg and 2 outputs while running tests on 574e7779cf.
> not sure if they are helpful?
Both fio outputs is all I need, but I only see one of them attached?
>> But if it's fio IOPS, then those are application side and don't
>> necessarily correlate to drive IOPS due to merging. Eg for fio iops,
>> if it does 4k sequential and we merge to 128k, then the fio perceived
>> iops will be 32 times larger than the device side.
>>
>> I'll take a look, but seems like there might be something there. By
>> inserting into the other list, the request is also not available for
>> merging. And the test in question does single IOs at the time.
>
> if you have any debug patch want us to run, please just let us know.
> it will be our great pleasure!
Thanks, might take you up on that, probably won't have time for this
until next week however.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
2024-02-01 13:40 ` Jens Axboe
@ 2024-02-01 14:03 ` Oliver Sang
2024-02-01 14:30 ` Jens Axboe
0 siblings, 1 reply; 9+ messages in thread
From: Oliver Sang @ 2024-02-01 14:03 UTC (permalink / raw)
To: Jens Axboe
Cc: Bart Van Assche, oe-lkp, lkp, Linux Memory Management List,
Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
feng.tang, fengwei.yin, oliver.sang
[-- Attachment #1: Type: text/plain, Size: 2538 bytes --]
hi, Jens Axboe,
On Thu, Feb 01, 2024 at 06:40:07AM -0700, Jens Axboe wrote:
> On 2/1/24 12:18 AM, Oliver Sang wrote:
> > hi, Jens Axboe,
> >
> > On Wed, Jan 31, 2024 at 11:42:46AM -0700, Jens Axboe wrote:
> >> On 1/31/24 11:17 AM, Bart Van Assche wrote:
> >>> On 1/31/24 07:42, kernel test robot wrote:
> >>>> kernel test robot noticed a -72.9% regression of fio.write_iops on:
> >>>>
> >>>>
> >>>> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
> >>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>>>
> >>>> testcase: fio-basic
> >>>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> >>>> parameters:
> >>>>
> >>>> runtime: 300s
> >>>> disk: 1HDD
> >>>> fs: xfs
> >>>> nr_task: 100%
> >>>> test_size: 128G
> >>>> rw: write
> >>>> bs: 4k
> >>>> ioengine: io_uring
> >>>> direct: direct
> >>>> cpufreq_governor: performance
> >>>
> >>> The actual test is available in this file:
> >>> https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script
> >>>
> >>> I haven't found anything in that file for disabling merging. Merging
> >>> requests decreases IOPS. Does this perhaps mean that this test is
> >>> broken?
> >>
> >> It's hard to know as nothing in this email or links include the actual
> >> output of the job...
> >
> > I attached a dmesg and 2 outputs while running tests on 574e7779cf.
> > not sure if they are helpful?
>
> Both fio outputs is all I need, but I only see one of them attached?
while we running fio, there are below logs captured:
fio
fio.output
fio.task
fio.time
I tar them in fio.tar.gz as attached.
you can get them by 'tar xzvf fio.tar.gz'
>
> >> But if it's fio IOPS, then those are application side and don't
> >> necessarily correlate to drive IOPS due to merging. Eg for fio iops,
> >> if it does 4k sequential and we merge to 128k, then the fio perceived
> >> iops will be 32 times larger than the device side.
> >>
> >> I'll take a look, but seems like there might be something there. By
> >> inserting into the other list, the request is also not available for
> >> merging. And the test in question does single IOs at the time.
> >
> > if you have any debug patch want us to run, please just let us know.
> > it will be our great pleasure!
>
> Thanks, might take you up on that, probably won't have time for this
> until next week however.
>
> --
> Jens Axboe
>
[-- Attachment #2: fio.tar.gz --]
[-- Type: application/gzip, Size: 2208 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
2024-02-01 14:03 ` Oliver Sang
@ 2024-02-01 14:30 ` Jens Axboe
2024-02-01 14:45 ` Oliver Sang
0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2024-02-01 14:30 UTC (permalink / raw)
To: Oliver Sang
Cc: Bart Van Assche, oe-lkp, lkp, Linux Memory Management List,
Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
feng.tang, fengwei.yin
On 2/1/24 7:03 AM, Oliver Sang wrote:
> hi, Jens Axboe,
>
> On Thu, Feb 01, 2024 at 06:40:07AM -0700, Jens Axboe wrote:
>> On 2/1/24 12:18 AM, Oliver Sang wrote:
>>> hi, Jens Axboe,
>>>
>>> On Wed, Jan 31, 2024 at 11:42:46AM -0700, Jens Axboe wrote:
>>>> On 1/31/24 11:17 AM, Bart Van Assche wrote:
>>>>> On 1/31/24 07:42, kernel test robot wrote:
>>>>>> kernel test robot noticed a -72.9% regression of fio.write_iops on:
>>>>>>
>>>>>>
>>>>>> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>>>
>>>>>> testcase: fio-basic
>>>>>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
>>>>>> parameters:
>>>>>>
>>>>>> runtime: 300s
>>>>>> disk: 1HDD
>>>>>> fs: xfs
>>>>>> nr_task: 100%
>>>>>> test_size: 128G
>>>>>> rw: write
>>>>>> bs: 4k
>>>>>> ioengine: io_uring
>>>>>> direct: direct
>>>>>> cpufreq_governor: performance
>>>>>
>>>>> The actual test is available in this file:
>>>>> https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script
>>>>>
>>>>> I haven't found anything in that file for disabling merging. Merging
>>>>> requests decreases IOPS. Does this perhaps mean that this test is
>>>>> broken?
>>>>
>>>> It's hard to know as nothing in this email or links include the actual
>>>> output of the job...
>>>
>>> I attached a dmesg and 2 outputs while running tests on 574e7779cf.
>>> not sure if they are helpful?
>>
>> Both fio outputs is all I need, but I only see one of them attached?
>
> while we running fio, there are below logs captured:
> fio
> fio.output
> fio.task
> fio.time
>
> I tar them in fio.tar.gz as attached.
> you can get them by 'tar xzvf fio.tar.gz'
Right, but I need BOTH outputs - one from before the commit and the one
on the commit. The report is a regression, hence there must be both a
good and a bad run output... This looks like just the same output again,
I can't really do much with just one output.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
2024-02-01 14:30 ` Jens Axboe
@ 2024-02-01 14:45 ` Oliver Sang
0 siblings, 0 replies; 9+ messages in thread
From: Oliver Sang @ 2024-02-01 14:45 UTC (permalink / raw)
To: Jens Axboe
Cc: Bart Van Assche, oe-lkp, lkp, Linux Memory Management List,
Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
feng.tang, fengwei.yin, oliver.sang
[-- Attachment #1: Type: text/plain, Size: 2429 bytes --]
hi, Jens Axboe,
On Thu, Feb 01, 2024 at 07:30:53AM -0700, Jens Axboe wrote:
> On 2/1/24 7:03 AM, Oliver Sang wrote:
> > hi, Jens Axboe,
> >
> > On Thu, Feb 01, 2024 at 06:40:07AM -0700, Jens Axboe wrote:
> >> On 2/1/24 12:18 AM, Oliver Sang wrote:
> >>> hi, Jens Axboe,
> >>>
> >>> On Wed, Jan 31, 2024 at 11:42:46AM -0700, Jens Axboe wrote:
> >>>> On 1/31/24 11:17 AM, Bart Van Assche wrote:
> >>>>> On 1/31/24 07:42, kernel test robot wrote:
> >>>>>> kernel test robot noticed a -72.9% regression of fio.write_iops on:
> >>>>>>
> >>>>>>
> >>>>>> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
> >>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>>>>>
> >>>>>> testcase: fio-basic
> >>>>>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> >>>>>> parameters:
> >>>>>>
> >>>>>> runtime: 300s
> >>>>>> disk: 1HDD
> >>>>>> fs: xfs
> >>>>>> nr_task: 100%
> >>>>>> test_size: 128G
> >>>>>> rw: write
> >>>>>> bs: 4k
> >>>>>> ioengine: io_uring
> >>>>>> direct: direct
> >>>>>> cpufreq_governor: performance
> >>>>>
> >>>>> The actual test is available in this file:
> >>>>> https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script
> >>>>>
> >>>>> I haven't found anything in that file for disabling merging. Merging
> >>>>> requests decreases IOPS. Does this perhaps mean that this test is
> >>>>> broken?
> >>>>
> >>>> It's hard to know as nothing in this email or links include the actual
> >>>> output of the job...
> >>>
> >>> I attached a dmesg and 2 outputs while running tests on 574e7779cf.
> >>> not sure if they are helpful?
> >>
> >> Both fio outputs is all I need, but I only see one of them attached?
> >
> > while we running fio, there are below logs captured:
> > fio
> > fio.output
> > fio.task
> > fio.time
> >
> > I tar them in fio.tar.gz as attached.
> > you can get them by 'tar xzvf fio.tar.gz'
>
> Right, but I need BOTH outputs - one from before the commit and the one
> on the commit. The report is a regression, hence there must be both a
> good and a bad run output... This looks like just the same output again,
> I can't really do much with just one output.
oh, didn't get you... sorry.
attached fio-8f764b91fd.tar.gz is from one parent run.
>
> --
> Jens Axboe
>
[-- Attachment #2: fio-8f764b91fd.tar.gz --]
[-- Type: application/gzip, Size: 2189 bytes --]
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
2024-01-31 15:42 [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression kernel test robot
2024-01-31 18:17 ` Bart Van Assche
@ 2024-02-09 21:06 ` Jens Axboe
1 sibling, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2024-02-09 21:06 UTC (permalink / raw)
To: kernel test robot, Bart Van Assche
Cc: oe-lkp, lkp, Linux Memory Management List, Oleksandr Natalenko,
Johannes Thumshirn, linux-block, ying.huang, feng.tang,
fengwei.yin
On 1/31/24 8:42 AM, kernel test robot wrote:
>
>
> Hello,
>
> kernel test robot noticed a -72.9% regression of fio.write_iops on:
>
>
> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>
> testcase: fio-basic
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
>
> runtime: 300s
> disk: 1HDD
> fs: xfs
> nr_task: 100%
> test_size: 128G
> rw: write
> bs: 4k
> ioengine: io_uring
> direct: direct
> cpufreq_governor: performance
I looked into this, and I think I see what is happening. We do still do
insertion merges, but it's now postponed to dispatch time. This means
that for this crazy case, where you have 64 threads doing sequential
writes, we run out of tags (which is 64 by default) and hence dispatch
sooner than we would've before. Before, we would've queued one request,
then allocated a new one, and queued that. When that queue event
happened, we would merge with the previous - either upfront, or when the
request is inserted. In any case, we now have 1 bigger request, rather
than two smaller ones that still need merging.
This leaves more requests free.
I think we can solve this by doing smarter merging at insertion time.
I've dropped the series from my for-next branch for now, will need
revisiting and then I'll post it again.
--
Jens Axboe
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-02-09 21:06 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-31 15:42 [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression kernel test robot
2024-01-31 18:17 ` Bart Van Assche
2024-01-31 18:42 ` Jens Axboe
2024-02-01 7:18 ` Oliver Sang
2024-02-01 13:40 ` Jens Axboe
2024-02-01 14:03 ` Oliver Sang
2024-02-01 14:30 ` Jens Axboe
2024-02-01 14:45 ` Oliver Sang
2024-02-09 21:06 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.