All of lore.kernel.org
 help / color / mirror / Atom feed
* [linux-next:master] [block/mq]  574e7779cf:  fio.write_iops -72.9% regression
@ 2024-01-31 15:42 kernel test robot
  2024-01-31 18:17 ` Bart Van Assche
  2024-02-09 21:06 ` Jens Axboe
  0 siblings, 2 replies; 9+ messages in thread
From: kernel test robot @ 2024-01-31 15:42 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: oe-lkp, lkp, Linux Memory Management List, Jens Axboe,
	Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
	feng.tang, fengwei.yin, oliver.sang



Hello,

kernel test robot noticed a -72.9% regression of fio.write_iops on:


commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master

testcase: fio-basic
test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
parameters:

	runtime: 300s
	disk: 1HDD
	fs: xfs
	nr_task: 100%
	test_size: 128G
	rw: write
	bs: 4k
	ioengine: io_uring
	direct: direct
	cpufreq_governor: performance




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202401312320.a335db14-oliver.sang@intel.com


Details are as below:
-------------------------------------------------------------------------------------------------->


The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com

=========================================================================================
bs/compiler/cpufreq_governor/direct/disk/fs/ioengine/kconfig/nr_task/rootfs/runtime/rw/tbox_group/test_size/testcase:
  4k/gcc-12/performance/direct/1HDD/xfs/io_uring/x86_64-rhel-8.3/100%/debian-11.1-x86_64-20220510.cgz/300s/write/lkp-icl-2sp9/128G/fio-basic

commit: 
  8f764b91fd ("block/mq-deadline: skip expensive merge lookups if contended")
  574e7779cf ("block/mq-deadline: use separate insertion lists")

8f764b91fdf29659 574e7779cf583171acb5bf63650 
---------------- --------------------------- 
         %stddev     %change         %stddev
             \          |                \  
     10031           -68.5%       3157 ±  7%  uptime.idle
     38.88           -93.1%       2.67 ±  4%  iostat.cpu.idle
     60.16           +60.2%      96.39        iostat.cpu.iowait
   2128969 ± 58%     -86.2%     293345 ±114%  numa-vmstat.node0.nr_foll_pin_acquired
   2127446 ± 58%     -86.2%     293180 ±114%  numa-vmstat.node0.nr_foll_pin_released
    169.33 ± 11%     +41.4%     239.50 ±  9%  perf-c2c.DRAM.remote
     98.67 ± 14%     +45.1%     143.17 ± 12%  perf-c2c.HITM.remote
     96520 ±  3%     +50.1%     144877 ±  2%  meminfo.Active
     96358 ±  3%     +50.3%     144805 ±  2%  meminfo.Active(anon)
    116390 ±  3%     +42.2%     165520 ±  2%  meminfo.Shmem
     38.49           -36.5        2.03 ±  6%  mpstat.cpu.all.idle%
     60.56           +36.5       97.03        mpstat.cpu.all.iowait%
      0.19            -0.0        0.16        mpstat.cpu.all.sys%
      0.16 ±  3%      -8.5%       0.14 ±  3%  turbostat.IPC
     75414 ±  5%    +278.1%     285115        turbostat.POLL
      0.01            +0.1        0.07        turbostat.POLL%
     38.89           -93.1%       2.70 ±  4%  vmstat.cpu.id
     60.14           +60.2%      96.37        vmstat.cpu.wa
     92653           -72.9%      25112        vmstat.io.bo
     63.11          +340.1%     277.79        vmstat.procs.b
     18314           +54.0%      28195        vmstat.system.cs
     72218            +5.8%      76405        vmstat.system.in
    584892 ± 15%    +885.8%    5766120 ±123%  sched_debug.cfs_rq:/.load.max
    950.75 ± 10%     +47.6%       1403 ± 20%  sched_debug.cfs_rq:/.load_avg.max
    203.10 ± 15%     +32.3%     268.69 ± 14%  sched_debug.cfs_rq:/.load_avg.stddev
     46333           +51.0%      69954        sched_debug.cpu.nr_switches.avg
     29986 ±  9%     +84.3%      55251        sched_debug.cpu.nr_switches.min
      0.00 ± 35%    +1e+05%       3.64 ±  9%  sched_debug.cpu.nr_uninterruptible.avg
     23.03 ± 27%    +187.8%      66.28 ±  8%  sched_debug.cpu.nr_uninterruptible.max
    -18.67          +207.7%     -57.44        sched_debug.cpu.nr_uninterruptible.min
      7.23 ±  5%    +258.7%      25.92 ± 10%  sched_debug.cpu.nr_uninterruptible.stddev
     24079 ±  3%     +50.3%      36201 ±  2%  proc-vmstat.nr_active_anon
    754201            +1.6%     766356        proc-vmstat.nr_file_pages
   3467173           -72.7%     946413        proc-vmstat.nr_foll_pin_acquired
   3464673           -72.7%     945922        proc-vmstat.nr_foll_pin_released
     11862            +1.6%      12051        proc-vmstat.nr_mapped
     29088 ±  3%     +42.2%      41375 ±  2%  proc-vmstat.nr_shmem
     24079 ±  3%     +50.3%      36201 ±  2%  proc-vmstat.nr_zone_active_anon
   1051599            -4.5%    1004608        proc-vmstat.numa_hit
    984923            -4.8%     937714        proc-vmstat.numa_local
     71376 ± 10%     +27.2%      90810 ±  8%  proc-vmstat.numa_pte_updates
     38600 ±  2%     +37.7%      53153        proc-vmstat.pgactivate
   1308586            -6.9%    1218899        proc-vmstat.pgalloc_normal
    856034            +2.4%     876179        proc-vmstat.pgfault
   1224361            -7.3%    1135057        proc-vmstat.pgfree
  28164246           -72.9%    7638316        proc-vmstat.pgpgout
      0.12 ± 28%      +0.8        0.92 ±  9%  fio.latency_1000ms%
     17.72 ±  3%     -10.3        7.39 ±  4%  fio.latency_100ms%
      0.05 ± 16%      +0.2        0.23 ±  5%  fio.latency_10ms%
     19.15 ±  6%     -19.0        0.19 ±  6%  fio.latency_20ms%
     23.99 ±  3%      +7.5       31.50        fio.latency_250ms%
      0.01            +0.0        0.03 ±  7%  fio.latency_4ms%
      5.84 ±  3%     +36.8       42.63        fio.latency_500ms%
     32.66 ±  6%     -29.9        2.73 ±  4%  fio.latency_50ms%
      0.42 ± 16%     +13.9       14.27        fio.latency_750ms%
  56330658           -72.9%   15275141        fio.time.file_system_outputs
      3139 ±  3%     -62.4%       1182 ±  2%  fio.time.involuntary_context_switches
      8.17 ±  4%     -14.3%       7.00        fio.time.percent_of_cpu_this_job_got
   1902903           +24.6%    2371272        fio.time.voluntary_context_switches
   7041332           -72.9%    1909392        fio.workload
     91.62           -72.9%      24.85        fio.write_bw_MBps
 2.045e+08          +166.0%  5.439e+08        fio.write_clat_90%_us
 3.055e+08 ±  2%     +99.1%  6.082e+08        fio.write_clat_95%_us
 4.264e+08 ±  3%     +76.4%  7.522e+08        fio.write_clat_99%_us
  87286745          +257.4%   3.12e+08        fio.write_clat_mean_us
  99184810 ±  3%     +75.3%  1.739e+08        fio.write_clat_stddev
     23454           -72.9%       6361        fio.write_iops
      1727        +5.7e+05%    9845156        fio.write_slat_mean_us
      7460 ± 39%  +4.9e+05%   36525171        fio.write_slat_stddev
      1.66           +22.5%       2.04        perf-stat.i.MPKI
      0.97            +0.1        1.09        perf-stat.i.branch-miss-rate%
   2609625            +7.1%    2794459 ±  4%  perf-stat.i.branch-misses
     16.66            -0.6       16.05        perf-stat.i.cache-miss-rate%
   1901031           +11.6%    2120745        perf-stat.i.cache-misses
  11266152           +16.9%   13166895        perf-stat.i.cache-references
     18410           +54.3%      28416        perf-stat.i.context-switches
      2.10 ±  2%     +10.4%       2.32        perf-stat.i.cpi
 2.299e+09 ±  2%      +5.1%  2.416e+09        perf-stat.i.cpu-cycles
    901.94 ±  2%     -34.2%     593.78        perf-stat.i.cpu-migrations
      1315 ±  2%     -12.7%       1148        perf-stat.i.cycles-between-cache-misses
      0.04 ±  5%      +0.0        0.04 ±  4%  perf-stat.i.dTLB-load-miss-rate%
    111249 ±  5%     +18.1%     131362 ±  4%  perf-stat.i.dTLB-load-misses
 3.359e+08            -6.7%  3.134e+08        perf-stat.i.dTLB-loads
     48568           -14.5%      41547 ±  2%  perf-stat.i.dTLB-store-misses
 1.737e+08           -11.7%  1.534e+08        perf-stat.i.dTLB-stores
 1.215e+09            -5.6%  1.147e+09        perf-stat.i.instructions
      0.49 ±  2%     -10.2%       0.44        perf-stat.i.ipc
      0.04 ±  2%      +5.1%       0.04        perf-stat.i.metric.GHz
    191.35           +17.5%     224.82        perf-stat.i.metric.K/sec
     11.77            -7.1%      10.94        perf-stat.i.metric.M/sec
      2459            +2.4%       2518        perf-stat.i.minor-faults
     90.60            +3.0       93.58        perf-stat.i.node-load-miss-rate%
    452504           +31.3%     594058        perf-stat.i.node-load-misses
     51347 ±  3%      -9.0%      46745        perf-stat.i.node-loads
    209653 ±  3%     +17.1%     245593 ±  3%  perf-stat.i.node-store-misses
    293009 ±  2%     +23.4%     361448        perf-stat.i.node-stores
      2459            +2.4%       2519        perf-stat.i.page-faults
      1.56           +18.2%       1.85        perf-stat.overall.MPKI
      1.07            +0.1        1.20 ±  2%  perf-stat.overall.branch-miss-rate%
     16.87            -0.8       16.11        perf-stat.overall.cache-miss-rate%
      1.89 ±  2%     +11.3%       2.11        perf-stat.overall.cpi
      1209            -5.8%       1139        perf-stat.overall.cycles-between-cache-misses
      0.03 ±  5%      +0.0        0.04 ±  3%  perf-stat.overall.dTLB-load-miss-rate%
      0.53 ±  2%     -10.2%       0.47        perf-stat.overall.ipc
     89.81            +2.9       92.70        perf-stat.overall.node-load-miss-rate%
     51848          +248.4%     180654        perf-stat.overall.path-length
   2602422            +7.1%    2786733 ±  4%  perf-stat.ps.branch-misses
   1894822           +11.6%    2113909        perf-stat.ps.cache-misses
  11229305           +16.9%   13124260        perf-stat.ps.cache-references
     18349           +54.4%      28323        perf-stat.ps.context-switches
 2.291e+09 ±  2%      +5.1%  2.408e+09        perf-stat.ps.cpu-cycles
    898.94 ±  2%     -34.2%     591.81        perf-stat.ps.cpu-migrations
    110886 ±  5%     +18.1%     130940 ±  4%  perf-stat.ps.dTLB-load-misses
 3.349e+08            -6.7%  3.124e+08        perf-stat.ps.dTLB-loads
     48409           -14.4%      41414 ±  2%  perf-stat.ps.dTLB-store-misses
 1.732e+08           -11.7%  1.529e+08        perf-stat.ps.dTLB-stores
 1.211e+09            -5.6%  1.144e+09        perf-stat.ps.instructions
      2451            +2.4%       2511        perf-stat.ps.minor-faults
    451006           +31.3%     592114        perf-stat.ps.node-load-misses
     51197 ±  3%      -9.0%      46611        perf-stat.ps.node-loads
    208958 ±  3%     +17.1%     244787 ±  3%  perf-stat.ps.node-store-misses
    292047 ±  2%     +23.4%     360278        perf-stat.ps.node-stores
      2451            +2.4%       2511        perf-stat.ps.page-faults
 3.651e+11            -5.5%   3.45e+11 ±  2%  perf-stat.total.instructions
      0.00 ± 43%    +368.2%       0.02 ± 75%  perf-sched.sch_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
      0.01           -14.3%       0.01        perf-sched.sch_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.xfs_ilock_for_iomap
      0.01 ± 42%     -35.1%       0.00        perf-sched.sch_delay.avg.ms.schedule_timeout.io_wq_worker.ret_from_fork.ret_from_fork_asm
      0.01           -33.3%       0.01        perf-sched.sch_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.01 ± 19%     -27.8%       0.00 ± 10%  perf-sched.sch_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      0.00 ±223%   +1912.5%       0.03 ± 94%  perf-sched.sch_delay.max.ms.__cond_resched.io_assign_current_work.io_worker_handle_work.io_wq_worker.ret_from_fork
      0.00 ±  9%    +126.1%       0.01 ± 38%  perf-sched.sch_delay.max.ms.__cond_resched.task_work_run.io_run_task_work.io_cqring_wait.__do_sys_io_uring_enter
     56.45 ±  3%     -32.6%      38.05        perf-sched.total_wait_and_delay.average.ms
     41251 ±  4%     +99.0%      82087        perf-sched.total_wait_and_delay.count.ms
     56.44 ±  3%     -32.6%      38.04        perf-sched.total_wait_time.average.ms
     10.80 ±  2%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
    376.03 ± 48%     -94.4%      21.05 ±223%  perf-sched.wait_and_delay.avg.ms.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_iomap_write_unwritten
    433.85 ± 59%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.down_write.xfs_ilock.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
    815.52 ± 92%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.xfs_trans_alloc.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
    486.10 ± 48%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.kmem_cache_alloc.xlog_ticket_alloc.xfs_log_reserve.xfs_trans_reserve
    240.19 ± 23%    -100.0%       0.00        perf-sched.wait_and_delay.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
      2.32 ± 10%     +22.5%       2.84 ±  5%  perf-sched.wait_and_delay.avg.ms.do_task_dead.do_exit.io_wq_worker.ret_from_fork.ret_from_fork_asm
     58.50 ±  6%     -80.3%      11.54 ±  6%  perf-sched.wait_and_delay.avg.ms.io_cqring_wait.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe
    434.72 ± 12%     -49.8%     218.09 ±  4%  perf-sched.wait_and_delay.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
     48.19 ±  5%     -71.8%      13.60 ±141%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.xfs_ilock_for_iomap
     37.76 ±  8%     -83.0%       6.43 ±223%  perf-sched.wait_and_delay.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xfs_ilock
     36.84 ±  5%     +27.8%      47.07 ±  4%  perf-sched.wait_and_delay.avg.ms.schedule_timeout.io_wq_worker.ret_from_fork.ret_from_fork_asm
    620.47 ±  3%     -10.9%     552.59 ±  3%  perf-sched.wait_and_delay.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     64.08 ±  4%     -72.5%      17.65 ±  2%  perf-sched.wait_and_delay.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    384.00          -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
     18.00 ± 29%     -99.1%       0.17 ±223%  perf-sched.wait_and_delay.count.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_iomap_write_unwritten
      2.50 ± 38%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.down_write.xfs_ilock.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
      5.00 ± 65%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.xfs_trans_alloc.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
      4.00 ± 38%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.kmem_cache_alloc.xlog_ticket_alloc.xfs_log_reserve.xfs_trans_reserve
     45.17 ± 13%    -100.0%       0.00        perf-sched.wait_and_delay.count.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
      5281 ±  5%     -62.8%       1967 ± 12%  perf-sched.wait_and_delay.count.io_cqring_wait.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe
     33.33 ± 13%    +110.5%      70.17 ±  3%  perf-sched.wait_and_delay.count.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
      4684 ±  6%     -98.2%      82.17 ±141%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.xfs_ilock_for_iomap
    420.83 ±  8%     -99.1%       3.67 ±223%  perf-sched.wait_and_delay.count.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xfs_ilock
     11733 ±  5%     -20.3%       9354 ±  4%  perf-sched.wait_and_delay.count.schedule_timeout.io_wq_worker.ret_from_fork.ret_from_fork_asm
    622.67 ±  4%     -18.7%     506.17 ±  5%  perf-sched.wait_and_delay.count.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     11804 ±  4%    +203.5%      35824        perf-sched.wait_and_delay.count.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      1000          -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.__wait_for_common.affine_move_task.__set_cpus_allowed_ptr.__sched_setaffinity
      2061 ± 50%     -99.0%      21.05 ±223%  perf-sched.wait_and_delay.max.ms.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_iomap_write_unwritten
    782.25 ± 68%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.down_write.xfs_ilock.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
      1601 ± 51%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.xfs_trans_alloc.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
      1092 ± 35%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.kmem_cache_alloc.xlog_ticket_alloc.xfs_log_reserve.xfs_trans_reserve
      2843 ± 29%    -100.0%       0.00        perf-sched.wait_and_delay.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
    511.01           -25.1%     382.56 ±  9%  perf-sched.wait_and_delay.max.ms.do_task_dead.do_exit.io_wq_worker.ret_from_fork.ret_from_fork_asm
    511.46 ± 23%     -38.4%     315.16 ±  3%  perf-sched.wait_and_delay.max.ms.io_cqring_wait.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1060 ± 58%     -92.6%      78.42 ±223%  perf-sched.wait_and_delay.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xfs_ilock
      1165 ± 15%     +74.8%       2037 ± 17%  perf-sched.wait_and_delay.max.ms.schedule_timeout.io_wq_worker.ret_from_fork.ret_from_fork_asm
      3997 ±  9%     -38.1%       2474 ± 34%  perf-sched.wait_and_delay.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
    376.03 ± 48%     -92.9%      26.88 ±169%  perf-sched.wait_time.avg.ms.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_iomap_write_unwritten
    433.84 ± 59%     -99.9%       0.43 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.down_write.xfs_ilock.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
    815.52 ± 92%    -100.0%       0.06 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.xfs_trans_alloc.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
    486.09 ± 48%    -100.0%       0.04 ±223%  perf-sched.wait_time.avg.ms.__cond_resched.kmem_cache_alloc.xlog_ticket_alloc.xfs_log_reserve.xfs_trans_reserve
    240.19 ± 23%     -95.8%      10.16 ± 78%  perf-sched.wait_time.avg.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
      2.32 ± 10%     +22.5%       2.84 ±  5%  perf-sched.wait_time.avg.ms.do_task_dead.do_exit.io_wq_worker.ret_from_fork.ret_from_fork_asm
     58.50 ±  6%     -80.3%      11.53 ±  6%  perf-sched.wait_time.avg.ms.io_cqring_wait.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe
    434.72 ± 12%     -49.8%     218.09 ±  4%  perf-sched.wait_time.avg.ms.schedule_hrtimeout_range_clock.do_poll.constprop.0.do_sys_poll
     48.18 ±  5%     -35.6%      31.03 ± 23%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_read_slowpath.down_read.xfs_ilock_for_iomap
     37.74 ±  8%     -33.5%      25.11 ± 36%  perf-sched.wait_time.avg.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xfs_ilock
     36.83 ±  5%     +27.8%      47.07 ±  4%  perf-sched.wait_time.avg.ms.schedule_timeout.io_wq_worker.ret_from_fork.ret_from_fork_asm
    620.47 ±  3%     -10.9%     552.58 ±  3%  perf-sched.wait_time.avg.ms.smpboot_thread_fn.kthread.ret_from_fork.ret_from_fork_asm
     64.07 ±  4%     -72.5%      17.64 ±  2%  perf-sched.wait_time.avg.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      2061 ± 50%     -98.7%      26.88 ±169%  perf-sched.wait_time.max.ms.__cond_resched.down_read.xlog_cil_commit.__xfs_trans_commit.xfs_iomap_write_unwritten
    782.24 ± 68%     -99.9%       0.43 ±223%  perf-sched.wait_time.max.ms.__cond_resched.down_write.xfs_ilock.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
     28.38 ±223%    +539.6%     181.50 ± 73%  perf-sched.wait_time.max.ms.__cond_resched.io_assign_current_work.io_worker_handle_work.io_wq_worker.ret_from_fork
      1601 ± 51%    -100.0%       0.06 ±223%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.xfs_trans_alloc.xfs_trans_alloc_inode.xfs_iomap_write_unwritten
      1092 ± 35%    -100.0%       0.04 ±223%  perf-sched.wait_time.max.ms.__cond_resched.kmem_cache_alloc.xlog_ticket_alloc.xfs_log_reserve.xfs_trans_reserve
      2842 ± 29%     -99.0%      27.95 ± 81%  perf-sched.wait_time.max.ms.__cond_resched.process_one_work.worker_thread.kthread.ret_from_fork
    511.01           -25.1%     382.56 ±  9%  perf-sched.wait_time.max.ms.do_task_dead.do_exit.io_wq_worker.ret_from_fork.ret_from_fork_asm
    511.46 ± 23%     -38.4%     315.16 ±  3%  perf-sched.wait_time.max.ms.io_cqring_wait.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe
      1060 ± 58%     -82.5%     185.64 ± 77%  perf-sched.wait_time.max.ms.schedule_preempt_disabled.rwsem_down_write_slowpath.down_write.xfs_ilock
      1165 ± 15%     +74.8%       2037 ± 17%  perf-sched.wait_time.max.ms.schedule_timeout.io_wq_worker.ret_from_fork.ret_from_fork_asm
      3997 ±  9%     -38.1%       2474 ± 34%  perf-sched.wait_time.max.ms.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      6.92 ± 10%      -3.5        3.42 ±  8%  perf-profile.calltrace.cycles-pp.fio_ioring_commit
      6.68 ± 11%      -3.4        3.31 ±  8%  perf-profile.calltrace.cycles-pp.entry_SYSCALL_64_after_hwframe.fio_ioring_commit
      6.41 ± 11%      -3.2        3.21 ±  8%  perf-profile.calltrace.cycles-pp.do_syscall_64.entry_SYSCALL_64_after_hwframe.fio_ioring_commit
      6.34 ± 12%      -3.2        3.18 ±  8%  perf-profile.calltrace.cycles-pp.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe.fio_ioring_commit
      4.54 ±  6%      -2.8        1.72 ± 17%  perf-profile.calltrace.cycles-pp.iomap_dio_complete_work.process_one_work.worker_thread.kthread.ret_from_fork
      4.51 ±  6%      -2.8        1.71 ± 17%  perf-profile.calltrace.cycles-pp.iomap_dio_complete.iomap_dio_complete_work.process_one_work.worker_thread.kthread
      4.73 ± 12%      -2.6        2.08 ±  9%  perf-profile.calltrace.cycles-pp.io_submit_sqes.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe.fio_ioring_commit
      4.20 ±  6%      -2.6        1.61 ± 18%  perf-profile.calltrace.cycles-pp.xfs_dio_write_end_io.iomap_dio_complete.iomap_dio_complete_work.process_one_work.worker_thread
      4.16 ±  6%      -2.6        1.57 ± 18%  perf-profile.calltrace.cycles-pp.xfs_iomap_write_unwritten.xfs_dio_write_end_io.iomap_dio_complete.iomap_dio_complete_work.process_one_work
      5.32 ±  7%      -2.5        2.80 ± 11%  perf-profile.calltrace.cycles-pp.do_exit.io_wq_worker.ret_from_fork.ret_from_fork_asm
      4.17 ± 15%      -2.5        1.72 ± 10%  perf-profile.calltrace.cycles-pp.io_issue_sqe.io_submit_sqes.__do_sys_io_uring_enter.do_syscall_64.entry_SYSCALL_64_after_hwframe
      4.05 ± 15%      -2.4        1.64 ± 11%  perf-profile.calltrace.cycles-pp.io_write.io_issue_sqe.io_submit_sqes.__do_sys_io_uring_enter.do_syscall_64
      6.65 ±  7%      -2.4        4.28 ±  8%  perf-profile.calltrace.cycles-pp.process_one_work.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      3.84 ± 16%      -2.3        1.56 ± 10%  perf-profile.calltrace.cycles-pp.xfs_file_write_iter.io_write.io_issue_sqe.io_submit_sqes.__do_sys_io_uring_enter
      4.30 ± 10%      -2.3        2.03 ± 18%  perf-profile.calltrace.cycles-pp.exit_notify.do_exit.io_wq_worker.ret_from_fork.ret_from_fork_asm
      3.74 ± 16%      -2.2        1.52 ± 10%  perf-profile.calltrace.cycles-pp.xfs_file_dio_write_aligned.xfs_file_write_iter.io_write.io_issue_sqe.io_submit_sqes
      2.58 ±  8%      -1.2        1.35 ± 12%  perf-profile.calltrace.cycles-pp.release_task.exit_notify.do_exit.io_wq_worker.ret_from_fork
      1.68 ± 18%      -1.2        0.50 ± 81%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.exit_notify.do_exit.io_wq_worker
      1.71 ± 18%      -1.1        0.60 ± 58%  perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.exit_notify.do_exit.io_wq_worker.ret_from_fork
      1.98 ± 10%      -1.0        0.96 ± 20%  perf-profile.calltrace.cycles-pp.queued_write_lock_slowpath.release_task.exit_notify.do_exit.io_wq_worker
      1.92 ± 11%      -1.0        0.93 ± 20%  perf-profile.calltrace.cycles-pp.native_queued_spin_lock_slowpath.queued_write_lock_slowpath.release_task.exit_notify.do_exit
      1.36 ± 11%      -0.7        0.67 ±  6%  perf-profile.calltrace.cycles-pp.__xfs_trans_commit.xfs_iomap_write_unwritten.xfs_dio_write_end_io.iomap_dio_complete.iomap_dio_complete_work
      1.42 ±  5%      -0.6        0.82 ±  4%  perf-profile.calltrace.cycles-pp.blk_update_request.scsi_end_request.scsi_io_completion.complete_cmd_fusion.megasas_isr_fusion
      1.04 ± 11%      -0.5        0.56 ±  5%  perf-profile.calltrace.cycles-pp.xlog_cil_commit.__xfs_trans_commit.xfs_iomap_write_unwritten.xfs_dio_write_end_io.iomap_dio_complete
      0.96 ± 10%      -0.3        0.68 ±  6%  perf-profile.calltrace.cycles-pp.iomap_dio_bio_end_io.blk_update_request.scsi_end_request.scsi_io_completion.complete_cmd_fusion
      0.74 ±  7%      +0.2        0.93 ± 10%  perf-profile.calltrace.cycles-pp.__schedule.schedule_idle.do_idle.cpu_startup_entry.start_secondary
      0.76 ±  7%      +0.2        0.97 ± 10%  perf-profile.calltrace.cycles-pp.schedule_idle.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
      0.82 ±  7%      +0.3        1.14 ± 11%  perf-profile.calltrace.cycles-pp.newidle_balance.pick_next_task_fair.__schedule.schedule.schedule_timeout
      0.87 ±  6%      +0.3        1.19 ± 11%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.schedule_timeout.io_wq_worker
      1.76 ±  3%      +0.6        2.32 ±  3%  perf-profile.calltrace.cycles-pp.scsi_end_request.scsi_io_completion.complete_cmd_fusion.megasas_isr_fusion.__handle_irq_event_percpu
      1.76 ±  3%      +0.6        2.32 ±  3%  perf-profile.calltrace.cycles-pp.scsi_io_completion.complete_cmd_fusion.megasas_isr_fusion.__handle_irq_event_percpu.handle_irq_event
      0.00            +0.6        0.57 ±  7%  perf-profile.calltrace.cycles-pp.enqueue_task_fair.activate_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue
      0.00            +0.6        0.60 ±  7%  perf-profile.calltrace.cycles-pp.activate_task.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single
      0.00            +0.6        0.63 ± 10%  perf-profile.calltrace.cycles-pp.__blk_mq_free_request.scsi_end_request.scsi_io_completion.complete_cmd_fusion.megasas_isr_fusion
      0.00            +0.7        0.69 ±  9%  perf-profile.calltrace.cycles-pp.blk_mq_run_hw_queues.scsi_end_request.scsi_io_completion.complete_cmd_fusion.megasas_isr_fusion
      0.09 ±223%      +0.7        0.82 ±  8%  perf-profile.calltrace.cycles-pp.ttwu_do_activate.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single
      0.39 ± 71%      +0.7        1.13 ±  7%  perf-profile.calltrace.cycles-pp.sched_ttwu_pending.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single
      0.00            +0.8        0.77 ± 12%  perf-profile.calltrace.cycles-pp.xfs_vn_update_time.kiocb_modified.xfs_file_write_checks.xfs_file_dio_write_aligned.xfs_file_write_iter
      0.00            +0.8        0.80 ± 13%  perf-profile.calltrace.cycles-pp.dd_dispatch_request.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_work_fn
      0.00            +0.8        0.81 ± 11%  perf-profile.calltrace.cycles-pp.kiocb_modified.xfs_file_write_checks.xfs_file_dio_write_aligned.xfs_file_write_iter.io_write
      0.58 ± 46%      +0.8        1.39 ±  9%  perf-profile.calltrace.cycles-pp.__flush_smp_call_function_queue.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_halt
      0.00            +0.8        0.82 ± 20%  perf-profile.calltrace.cycles-pp.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue.blk_mq_get_tag
      0.00            +0.8        0.83 ± 20%  perf-profile.calltrace.cycles-pp.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue.blk_mq_get_tag.__blk_mq_alloc_requests
      0.00            +0.8        0.83 ± 20%  perf-profile.calltrace.cycles-pp.blk_mq_sched_dispatch_requests.blk_mq_run_hw_queue.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
      0.60 ± 46%      +0.8        1.44 ±  9%  perf-profile.calltrace.cycles-pp.__sysvec_call_function_single.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter
      0.00            +0.8        0.84 ± 10%  perf-profile.calltrace.cycles-pp.xfs_file_write_checks.xfs_file_dio_write_aligned.xfs_file_write_iter.io_write.io_issue_sqe
      0.73 ±  6%      +0.9        1.60 ±  9%  perf-profile.calltrace.cycles-pp.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_work_fn.process_one_work.worker_thread
      0.73 ±  6%      +0.9        1.60 ±  9%  perf-profile.calltrace.cycles-pp.blk_mq_sched_dispatch_requests.blk_mq_run_work_fn.process_one_work.worker_thread.kthread
      0.73 ±  6%      +0.9        1.60 ±  9%  perf-profile.calltrace.cycles-pp.__blk_mq_do_dispatch_sched.__blk_mq_sched_dispatch_requests.blk_mq_sched_dispatch_requests.blk_mq_run_work_fn.process_one_work
      0.73 ±  6%      +0.9        1.60 ±  9%  perf-profile.calltrace.cycles-pp.blk_mq_run_work_fn.process_one_work.worker_thread.kthread.ret_from_fork
      1.99 ±  3%      +0.9        2.87 ±  2%  perf-profile.calltrace.cycles-pp.complete_cmd_fusion.megasas_isr_fusion.__handle_irq_event_percpu.handle_irq_event.handle_edge_irq
      0.79 ± 17%      +0.9        1.69 ±  8%  perf-profile.calltrace.cycles-pp.sysvec_call_function_single.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state
      0.00            +0.9        0.90 ± 18%  perf-profile.calltrace.cycles-pp.blk_mq_run_hw_queue.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio.submit_bio_noacct_nocheck
      2.01 ±  3%      +0.9        2.91 ±  2%  perf-profile.calltrace.cycles-pp.__handle_irq_event_percpu.handle_irq_event.handle_edge_irq.__common_interrupt.common_interrupt
      2.00 ±  3%      +0.9        2.90 ±  2%  perf-profile.calltrace.cycles-pp.megasas_isr_fusion.__handle_irq_event_percpu.handle_irq_event.handle_edge_irq.__common_interrupt
      2.03 ±  3%      +0.9        2.95 ±  2%  perf-profile.calltrace.cycles-pp.handle_irq_event.handle_edge_irq.__common_interrupt.common_interrupt.asm_common_interrupt
      2.05 ±  3%      +0.9        2.98 ±  3%  perf-profile.calltrace.cycles-pp.handle_edge_irq.__common_interrupt.common_interrupt.asm_common_interrupt.acpi_safe_halt
      2.06 ±  3%      +0.9        3.00 ±  3%  perf-profile.calltrace.cycles-pp.__common_interrupt.common_interrupt.asm_common_interrupt.acpi_safe_halt.acpi_idle_enter
      0.00            +1.0        0.97 ±  6%  perf-profile.calltrace.cycles-pp.newidle_balance.pick_next_task_fair.__schedule.schedule.io_schedule
      0.00            +1.0        0.98 ±  6%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.io_schedule.blk_mq_get_tag
      2.09 ±  3%      +1.0        3.10 ±  2%  perf-profile.calltrace.cycles-pp.common_interrupt.asm_common_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state
      2.10 ±  4%      +1.0        3.12 ±  2%  perf-profile.calltrace.cycles-pp.asm_common_interrupt.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
      0.00            +1.5        1.52 ± 13%  perf-profile.calltrace.cycles-pp.__schedule.schedule.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests
      0.00            +1.6        1.61 ± 12%  perf-profile.calltrace.cycles-pp.schedule.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio
      0.00            +1.7        1.68 ± 12%  perf-profile.calltrace.cycles-pp.io_schedule.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio.submit_bio_noacct_nocheck
      0.50 ± 47%      +1.7        2.23 ±  8%  perf-profile.calltrace.cycles-pp.__schedule.schedule.worker_thread.kthread.ret_from_fork
      0.50 ± 47%      +1.7        2.25 ±  7%  perf-profile.calltrace.cycles-pp.schedule.worker_thread.kthread.ret_from_fork.ret_from_fork_asm
      0.17 ±141%      +1.8        1.94 ±  7%  perf-profile.calltrace.cycles-pp.pick_next_task_fair.__schedule.schedule.worker_thread.kthread
      0.08 ±223%      +1.8        1.88 ±  7%  perf-profile.calltrace.cycles-pp.newidle_balance.pick_next_task_fair.__schedule.schedule.worker_thread
      0.00            +2.0        2.00 ±  5%  perf-profile.calltrace.cycles-pp.__irqentry_text_start.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
      0.61 ±  9%      +2.2        2.80 ±  6%  perf-profile.calltrace.cycles-pp.update_sg_lb_stats.update_sd_lb_stats.find_busiest_group.load_balance.newidle_balance
      0.66 ±  9%      +2.4        3.07 ±  6%  perf-profile.calltrace.cycles-pp.update_sd_lb_stats.find_busiest_group.load_balance.newidle_balance.pick_next_task_fair
      0.67 ±  8%      +2.4        3.12 ±  6%  perf-profile.calltrace.cycles-pp.find_busiest_group.load_balance.newidle_balance.pick_next_task_fair.__schedule
      0.70 ±  7%      +2.6        3.30 ±  7%  perf-profile.calltrace.cycles-pp.load_balance.newidle_balance.pick_next_task_fair.__schedule.schedule
      2.47 ±  6%      +2.9        5.34 ±  6%  perf-profile.calltrace.cycles-pp.iomap_dio_bio_iter.__iomap_dio_rw.iomap_dio_rw.xfs_file_dio_write_aligned.xfs_file_write_iter
      3.37 ± 11%      +3.3        6.66 ±  8%  perf-profile.calltrace.cycles-pp.asm_sysvec_call_function_single.acpi_safe_halt.acpi_idle_enter.cpuidle_enter_state.cpuidle_enter
      1.27 ± 10%      +3.5        4.82 ±  6%  perf-profile.calltrace.cycles-pp.submit_bio_noacct_nocheck.iomap_dio_bio_iter.__iomap_dio_rw.iomap_dio_rw.xfs_file_dio_write_aligned
      1.02 ± 31%      +3.8        4.78 ±  6%  perf-profile.calltrace.cycles-pp.blk_mq_submit_bio.submit_bio_noacct_nocheck.iomap_dio_bio_iter.__iomap_dio_rw.iomap_dio_rw
      0.00            +4.0        4.04 ±  7%  perf-profile.calltrace.cycles-pp.blk_mq_get_tag.__blk_mq_alloc_requests.blk_mq_submit_bio.submit_bio_noacct_nocheck.iomap_dio_bio_iter
      0.00            +4.2        4.15 ±  7%  perf-profile.calltrace.cycles-pp.__blk_mq_alloc_requests.blk_mq_submit_bio.submit_bio_noacct_nocheck.iomap_dio_bio_iter.__iomap_dio_rw
      0.86 ± 10%      +5.1        5.97 ±  3%  perf-profile.calltrace.cycles-pp.poll_idle.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle
     63.38 ±  2%      +5.7       69.08        perf-profile.calltrace.cycles-pp.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary
     66.58 ±  2%      +5.8       72.38        perf-profile.calltrace.cycles-pp.cpuidle_idle_call.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     63.90 ±  2%      +5.9       69.85        perf-profile.calltrace.cycles-pp.cpuidle_enter_state.cpuidle_enter.cpuidle_idle_call.do_idle.cpu_startup_entry
     68.18 ±  2%      +5.9       74.13        perf-profile.calltrace.cycles-pp.start_secondary.secondary_startup_64_no_verify
     68.18 ±  2%      +5.9       74.12        perf-profile.calltrace.cycles-pp.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     68.04 ±  2%      +6.0       74.00        perf-profile.calltrace.cycles-pp.do_idle.cpu_startup_entry.start_secondary.secondary_startup_64_no_verify
     69.10 ±  2%      +6.3       75.35        perf-profile.calltrace.cycles-pp.secondary_startup_64_no_verify
      9.01 ±  7%      -4.0        5.00 ±  5%  perf-profile.children.cycles-pp.entry_SYSCALL_64_after_hwframe
      8.73 ±  7%      -3.8        4.88 ±  5%  perf-profile.children.cycles-pp.do_syscall_64
      6.94 ±  9%      -3.7        3.26 ±  7%  perf-profile.children.cycles-pp.__do_sys_io_uring_enter
      6.96 ± 10%      -3.5        3.44 ±  7%  perf-profile.children.cycles-pp.fio_ioring_commit
      6.32 ± 28%      -3.4        2.90 ± 16%  perf-profile.children.cycles-pp.native_queued_spin_lock_slowpath
      4.56 ±  6%      -2.8        1.72 ± 17%  perf-profile.children.cycles-pp.iomap_dio_complete
      4.55 ±  6%      -2.8        1.72 ± 17%  perf-profile.children.cycles-pp.iomap_dio_complete_work
      4.74 ± 12%      -2.7        2.09 ±  9%  perf-profile.children.cycles-pp.io_submit_sqes
      4.17 ±  6%      -2.6        1.57 ± 18%  perf-profile.children.cycles-pp.xfs_iomap_write_unwritten
      4.21 ±  6%      -2.6        1.61 ± 18%  perf-profile.children.cycles-pp.xfs_dio_write_end_io
      5.50 ±  7%      -2.5        2.95 ± 10%  perf-profile.children.cycles-pp.do_exit
      6.66 ±  7%      -2.4        4.28 ±  8%  perf-profile.children.cycles-pp.process_one_work
      4.30 ± 10%      -2.3        2.04 ± 18%  perf-profile.children.cycles-pp.exit_notify
      3.69 ± 13%      -2.1        1.64 ± 25%  perf-profile.children.cycles-pp.queued_write_lock_slowpath
      1.64 ± 19%      -1.3        0.32 ± 16%  perf-profile.children.cycles-pp.iomap_iter
      1.53 ± 21%      -1.3        0.27 ± 14%  perf-profile.children.cycles-pp.xfs_direct_write_iomap_begin
      2.59 ±  8%      -1.2        1.35 ± 12%  perf-profile.children.cycles-pp.release_task
      1.48 ±121%      -1.1        0.34 ± 19%  perf-profile.children.cycles-pp.dd_insert_requests
      1.89 ± 94%      -1.1        0.82 ±  9%  perf-profile.children.cycles-pp.blk_finish_plug
      1.88 ± 94%      -1.1        0.82 ±  9%  perf-profile.children.cycles-pp.__blk_flush_plug
      1.85 ± 96%      -1.0        0.81 ±  9%  perf-profile.children.cycles-pp.blk_mq_flush_plug_list
      1.84 ± 97%      -1.0        0.80 ±  9%  perf-profile.children.cycles-pp.blk_mq_dispatch_plug_list
      1.46 ±  3%      -1.0        0.43 ± 39%  perf-profile.children.cycles-pp.xfs_trans_alloc_inode
      0.95 ± 11%      -0.9        0.10 ± 25%  perf-profile.children.cycles-pp.xfs_ilock_for_iomap
      0.83 ±  9%      -0.7        0.14 ± 18%  perf-profile.children.cycles-pp.down_read
      0.81 ±  5%      -0.6        0.20 ±107%  perf-profile.children.cycles-pp.xfs_ilock
      0.78 ±  5%      -0.6        0.18 ±117%  perf-profile.children.cycles-pp.down_write
      0.69 ± 24%      -0.6        0.10 ± 37%  perf-profile.children.cycles-pp.fio_ioring_getevents
      1.44 ±  6%      -0.6        0.86 ±  3%  perf-profile.children.cycles-pp.blk_update_request
      1.51 ± 13%      -0.5        0.97 ± 21%  perf-profile.children.cycles-pp.io_cqring_wait
      0.67 ±  8%      -0.5        0.15 ± 15%  perf-profile.children.cycles-pp.xfs_iunlock
      0.76 ±  9%      -0.5        0.25 ± 22%  perf-profile.children.cycles-pp.bio_iov_iter_get_pages
      0.75 ±  8%      -0.5        0.25 ± 22%  perf-profile.children.cycles-pp.__bio_iov_iter_get_pages
      0.64 ± 24%      -0.5        0.14 ± 14%  perf-profile.children.cycles-pp.__io_run_local_work
      0.69 ±  8%      -0.5        0.23 ± 25%  perf-profile.children.cycles-pp.iov_iter_extract_pages
      0.67 ±  9%      -0.5        0.22 ± 25%  perf-profile.children.cycles-pp.pin_user_pages_fast
      0.66 ±  9%      -0.4        0.22 ± 25%  perf-profile.children.cycles-pp.internal_get_user_pages_fast
      0.44 ± 16%      -0.4        0.04 ± 73%  perf-profile.children.cycles-pp.up_write
      0.84 ± 13%      -0.4        0.45 ± 33%  perf-profile.children.cycles-pp.xfs_bmapi_write
      1.52 ± 10%      -0.4        1.15 ±  6%  perf-profile.children.cycles-pp.__xfs_trans_commit
      0.55 ± 11%      -0.3        0.20 ± 26%  perf-profile.children.cycles-pp.lockless_pages_from_mm
      0.78 ± 14%      -0.3        0.45 ±  9%  perf-profile.children.cycles-pp.dd_bio_merge
      0.64 ± 19%      -0.3        0.34 ±  8%  perf-profile.children.cycles-pp.blk_mq_sched_try_merge
      0.46 ± 15%      -0.3        0.17 ± 23%  perf-profile.children.cycles-pp.gup_pgd_range
      0.98 ± 10%      -0.3        0.72 ±  5%  perf-profile.children.cycles-pp.iomap_dio_bio_end_io
      0.51 ± 24%      -0.3        0.24 ± 14%  perf-profile.children.cycles-pp.kmem_cache_free
      0.40 ± 16%      -0.3        0.14 ± 24%  perf-profile.children.cycles-pp.__slab_free
      0.38 ± 16%      -0.2        0.14 ± 23%  perf-profile.children.cycles-pp.gup_pte_range
      0.32 ± 75%      -0.2        0.10 ± 25%  perf-profile.children.cycles-pp.xfs_bmapi_read
      0.60 ±  9%      -0.2        0.39 ± 21%  perf-profile.children.cycles-pp.xfs_trans_alloc
      0.38 ± 15%      -0.2        0.17 ± 15%  perf-profile.children.cycles-pp.xfs_bmapi_convert_unwritten
      0.27 ± 31%      -0.2        0.06 ± 52%  perf-profile.children.cycles-pp.xfs_ilock_iocb_for_write
      0.31 ± 18%      -0.2        0.11 ± 27%  perf-profile.children.cycles-pp.try_grab_folio
      0.24 ± 19%      -0.2        0.05 ± 54%  perf-profile.children.cycles-pp.fio_gettime
      0.34 ± 16%      -0.2        0.15 ±  6%  perf-profile.children.cycles-pp.xfs_bmap_add_extent_unwritten_real
      0.25 ± 12%      -0.2        0.06 ± 59%  perf-profile.children.cycles-pp.__io_req_task_work_add
      0.27 ±  8%      -0.2        0.09 ± 12%  perf-profile.children.cycles-pp.__bio_release_pages
      0.48 ± 11%      -0.2        0.31 ± 20%  perf-profile.children.cycles-pp.xfs_trans_reserve
      0.20 ± 14%      -0.2        0.03 ±100%  perf-profile.children.cycles-pp.xfs_ilock_nowait
      0.31 ± 15%      -0.2        0.16 ± 23%  perf-profile.children.cycles-pp.kmem_cache_alloc
      0.24 ± 15%      -0.2        0.09 ± 24%  perf-profile.children.cycles-pp.kfree
      0.35 ± 44%      -0.2        0.20 ± 20%  perf-profile.children.cycles-pp.llist_reverse_order
      0.24 ± 11%      -0.1        0.09 ± 23%  perf-profile.children.cycles-pp.xlog_ticket_alloc
      0.17 ± 76%      -0.1        0.02 ± 99%  perf-profile.children.cycles-pp.io_req_rw_complete
      0.42 ±  9%      -0.1        0.27 ± 22%  perf-profile.children.cycles-pp.xfs_log_reserve
      0.23 ± 10%      -0.1        0.09 ± 29%  perf-profile.children.cycles-pp.bio_alloc_bioset
      0.17 ± 23%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.down_read_trylock
      0.47 ± 19%      -0.1        0.33 ± 20%  perf-profile.children.cycles-pp.io_queue_iowq
      0.16 ± 30%      -0.1        0.04 ± 75%  perf-profile.children.cycles-pp.get_io_u
      0.22 ± 11%      -0.1        0.11 ±  8%  perf-profile.children.cycles-pp.up_read
      0.16 ± 33%      -0.1        0.05 ± 76%  perf-profile.children.cycles-pp.xfs_trans_run_precommits
      0.16 ±  8%      -0.1        0.05 ± 71%  perf-profile.children.cycles-pp.__cond_resched
      0.15 ± 26%      -0.1        0.04 ± 72%  perf-profile.children.cycles-pp.__io_submit_flush_completions
      0.27 ± 18%      -0.1        0.16 ±  5%  perf-profile.children.cycles-pp.__blk_bios_map_sg
      0.12 ± 26%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.io_free_batch_list
      0.20 ± 22%      -0.1        0.10 ± 14%  perf-profile.children.cycles-pp.xfs_trans_ijoin
      0.12 ± 33%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.xfs_bmbt_to_iomap
      0.27 ± 17%      -0.1        0.18 ± 10%  perf-profile.children.cycles-pp.__blk_rq_map_sg
      0.18 ± 16%      -0.1        0.09 ± 42%  perf-profile.children.cycles-pp.memset_orig
      0.32 ± 17%      -0.1        0.24 ± 15%  perf-profile.children.cycles-pp.scsi_alloc_sgtables
      0.21 ± 15%      -0.1        0.13 ± 17%  perf-profile.children.cycles-pp.asm_sysvec_reschedule_ipi
      0.10 ± 27%      -0.1        0.03 ±102%  perf-profile.children.cycles-pp.mempool_alloc
      0.33 ± 16%      -0.1        0.26 ± 13%  perf-profile.children.cycles-pp.sd_setup_read_write_cmnd
      0.09 ± 17%      -0.1        0.03 ±100%  perf-profile.children.cycles-pp.get_random_u32
      0.18 ± 18%      -0.1        0.12 ± 28%  perf-profile.children.cycles-pp.run_timer_softirq
      0.12 ± 13%      -0.1        0.06 ± 48%  perf-profile.children.cycles-pp.xfs_iext_get_extent
      0.10 ± 29%      -0.1        0.04 ± 73%  perf-profile.children.cycles-pp.bio_associate_blkg
      0.01 ±223%      +0.1        0.06 ± 15%  perf-profile.children.cycles-pp.xfs_ag_block_count
      0.12 ± 30%      +0.1        0.18 ± 13%  perf-profile.children.cycles-pp._raw_spin_unlock_irqrestore
      0.00            +0.1        0.06 ± 13%  perf-profile.children.cycles-pp.finish_wait
      0.03 ±141%      +0.1        0.08 ± 38%  perf-profile.children.cycles-pp.__enqueue_entity
      0.12 ± 13%      +0.1        0.19 ± 12%  perf-profile.children.cycles-pp.set_next_entity
      0.00            +0.1        0.07 ± 20%  perf-profile.children.cycles-pp.try_to_grab_pending
      0.12 ± 23%      +0.1        0.20 ± 30%  perf-profile.children.cycles-pp.wakeup_preempt
      0.02 ±142%      +0.1        0.10 ± 40%  perf-profile.children.cycles-pp.megasas_build_ldio_fusion
      0.16 ± 30%      +0.1        0.25 ± 18%  perf-profile.children.cycles-pp.switch_mm_irqs_off
      0.12 ± 32%      +0.1        0.22 ± 12%  perf-profile.children.cycles-pp.llist_add_batch
      0.11 ±  9%      +0.1        0.22 ± 21%  perf-profile.children.cycles-pp.rb_erase
      0.13 ± 18%      +0.1        0.24 ± 23%  perf-profile.children.cycles-pp.elv_attempt_insert_merge
      0.17 ± 15%      +0.1        0.28 ± 15%  perf-profile.children.cycles-pp.scsi_dispatch_cmd
      0.08 ± 58%      +0.1        0.19 ± 13%  perf-profile.children.cycles-pp.raw_spin_rq_lock_nested
      0.07 ± 20%      +0.1        0.19 ± 19%  perf-profile.children.cycles-pp.xas_load
      0.00            +0.1        0.12 ± 26%  perf-profile.children.cycles-pp.sbitmap_finish_wait
      0.00            +0.1        0.13 ± 16%  perf-profile.children.cycles-pp.elv_rb_del
      0.56 ±  9%      +0.1        0.69 ±  7%  perf-profile.children.cycles-pp.__update_load_avg_cfs_rq
      0.28 ± 13%      +0.1        0.41 ±  6%  perf-profile.children.cycles-pp.queue_work_on
      0.00            +0.1        0.14 ± 28%  perf-profile.children.cycles-pp.sbitmap_queue_get_shallow
      0.06 ± 50%      +0.1        0.20 ± 20%  perf-profile.children.cycles-pp.kblockd_mod_delayed_work_on
      0.06 ± 50%      +0.1        0.20 ± 20%  perf-profile.children.cycles-pp.mod_delayed_work_on
      0.11 ± 14%      +0.2        0.27 ± 11%  perf-profile.children.cycles-pp.xas_find
      0.02 ±142%      +0.2        0.18 ± 21%  perf-profile.children.cycles-pp.dd_has_work
      0.24 ± 20%      +0.2        0.40 ± 15%  perf-profile.children.cycles-pp.ttwu_queue_wakelist
      0.10 ± 24%      +0.2        0.27 ± 13%  perf-profile.children.cycles-pp._find_next_zero_bit
      0.09 ± 25%      +0.2        0.26 ± 14%  perf-profile.children.cycles-pp.elv_rqhash_find
      0.37 ± 14%      +0.2        0.54 ± 14%  perf-profile.children.cycles-pp.idle_cpu
      0.22 ± 11%      +0.2        0.40 ±  9%  perf-profile.children.cycles-pp._find_next_and_bit
      0.55 ± 10%      +0.2        0.75 ± 11%  perf-profile.children.cycles-pp.update_load_avg
      0.00            +0.2        0.21 ± 19%  perf-profile.children.cycles-pp.prepare_to_wait_exclusive
      0.10 ± 20%      +0.2        0.31 ± 15%  perf-profile.children.cycles-pp.__dd_dispatch_request
      0.66 ±  9%      +0.2        0.88 ±  9%  perf-profile.children.cycles-pp.update_rq_clock
      0.01 ±223%      +0.2        0.24 ± 27%  perf-profile.children.cycles-pp.__sbitmap_weight
      0.01 ±223%      +0.2        0.25 ± 25%  perf-profile.children.cycles-pp.sbitmap_weight
      0.07 ± 28%      +0.3        0.32 ± 15%  perf-profile.children.cycles-pp.sbitmap_get_shallow
      0.47 ± 21%      +0.3        0.72 ± 12%  perf-profile.children.cycles-pp.enqueue_entity
      0.28 ± 10%      +0.3        0.54 ±  4%  perf-profile.children.cycles-pp.__queue_work
      0.77 ±  7%      +0.3        1.04 ±  8%  perf-profile.children.cycles-pp.schedule_idle
      1.04 ±  5%      +0.3        1.32 ±  7%  perf-profile.children.cycles-pp.try_to_wake_up
      0.22 ± 14%      +0.3        0.50 ±  6%  perf-profile.children.cycles-pp.kick_pool
      0.12 ± 27%      +0.3        0.40 ± 12%  perf-profile.children.cycles-pp.xa_find_after
      0.25 ± 10%      +0.3        0.53 ±  9%  perf-profile.children.cycles-pp.cpu_util
      0.55 ± 12%      +0.3        0.88 ±  9%  perf-profile.children.cycles-pp.xfs_file_write_checks
      0.60 ± 17%      +0.3        0.94 ±  6%  perf-profile.children.cycles-pp.activate_task
      0.56 ± 17%      +0.4        0.92 ±  6%  perf-profile.children.cycles-pp.enqueue_task_fair
      0.42 ± 14%      +0.4        0.82 ± 10%  perf-profile.children.cycles-pp.kiocb_modified
      0.00            +0.4        0.41 ± 18%  perf-profile.children.cycles-pp.__dd_do_insert
      0.31 ± 13%      +0.5        0.77 ± 12%  perf-profile.children.cycles-pp.xfs_vn_update_time
      0.00            +0.5        0.48 ±  7%  perf-profile.children.cycles-pp.autoremove_wake_function
      0.23 ± 11%      +0.5        0.72 ±  8%  perf-profile.children.cycles-pp.blk_mq_run_hw_queues
      0.65 ± 16%      +0.5        1.16 ±  6%  perf-profile.children.cycles-pp.ttwu_do_activate
      0.65 ± 18%      +0.5        1.18 ±  8%  perf-profile.children.cycles-pp.sched_ttwu_pending
      0.18 ± 10%      +0.6        0.74 ±  3%  perf-profile.children.cycles-pp.sbitmap_find_bit
      0.00            +0.6        0.56 ±  9%  perf-profile.children.cycles-pp.__wake_up_common
      0.00            +0.6        0.56 ±  9%  perf-profile.children.cycles-pp.__wake_up
      0.00            +0.6        0.58 ± 12%  perf-profile.children.cycles-pp.sbitmap_queue_wake_up
      1.80 ±  4%      +0.6        2.40 ±  2%  perf-profile.children.cycles-pp.scsi_end_request
      1.80 ±  4%      +0.6        2.41 ±  2%  perf-profile.children.cycles-pp.scsi_io_completion
      0.41 ±  6%      +0.6        1.02 ±  6%  perf-profile.children.cycles-pp.__irqentry_text_start
      0.02 ±145%      +0.6        0.64 ± 11%  perf-profile.children.cycles-pp.sbitmap_queue_clear
      0.16 ± 22%      +0.6        0.80 ± 19%  perf-profile.children.cycles-pp.sbitmap_get
      0.10 ± 25%      +0.6        0.74 ± 11%  perf-profile.children.cycles-pp.__blk_mq_free_request
      0.82 ± 17%      +0.7        1.47 ±  9%  perf-profile.children.cycles-pp.__flush_smp_call_function_queue
      0.83 ± 16%      +0.7        1.49 ±  8%  perf-profile.children.cycles-pp.__sysvec_call_function_single
      0.17 ± 36%      +0.8        0.94 ± 11%  perf-profile.children.cycles-pp.dd_dispatch_request
      0.95 ± 15%      +0.8        1.73 ±  7%  perf-profile.children.cycles-pp.sysvec_call_function_single
      0.73 ±  6%      +0.9        1.61 ±  9%  perf-profile.children.cycles-pp.blk_mq_run_work_fn
      2.04 ±  4%      +0.9        2.98        perf-profile.children.cycles-pp.complete_cmd_fusion
      2.05 ±  3%      +1.0        3.01        perf-profile.children.cycles-pp.megasas_isr_fusion
      2.06 ±  4%      +1.0        3.02        perf-profile.children.cycles-pp.__handle_irq_event_percpu
      2.08 ±  3%      +1.0        3.05        perf-profile.children.cycles-pp.handle_irq_event
      2.10 ±  3%      +1.0        3.09 ±  2%  perf-profile.children.cycles-pp.handle_edge_irq
      2.11 ±  3%      +1.0        3.10 ±  2%  perf-profile.children.cycles-pp.__common_interrupt
      2.14 ±  4%      +1.1        3.21        perf-profile.children.cycles-pp.common_interrupt
      0.20 ± 28%      +1.1        1.28 ± 12%  perf-profile.children.cycles-pp.scsi_mq_get_budget
      2.14 ±  4%      +1.1        3.24        perf-profile.children.cycles-pp.asm_common_interrupt
      0.29 ± 32%      +1.3        1.58 ±  9%  perf-profile.children.cycles-pp.blk_mq_run_hw_queue
      1.85 ±  6%      +1.8        3.66 ±  5%  perf-profile.children.cycles-pp.update_sg_lb_stats
      2.12 ±  4%      +1.8        3.97 ±  5%  perf-profile.children.cycles-pp.update_sd_lb_stats
      2.18 ±  4%      +1.9        4.03 ±  5%  perf-profile.children.cycles-pp.find_busiest_group
      2.79 ±  4%      +1.9        4.73 ±  6%  perf-profile.children.cycles-pp.load_balance
      2.33 ± 12%      +2.0        4.30 ±  7%  perf-profile.children.cycles-pp.asm_sysvec_call_function_single
      0.00            +2.0        2.01 ± 12%  perf-profile.children.cycles-pp.io_schedule
      0.98 ± 12%      +2.0        3.00 ±  7%  perf-profile.children.cycles-pp.__blk_mq_do_dispatch_sched
      0.98 ± 13%      +2.0        3.01 ±  7%  perf-profile.children.cycles-pp.__blk_mq_sched_dispatch_requests
      0.98 ± 13%      +2.0        3.02 ±  7%  perf-profile.children.cycles-pp.blk_mq_sched_dispatch_requests
      2.09 ±  5%      +2.3        4.43 ±  6%  perf-profile.children.cycles-pp.newidle_balance
      2.26 ±  4%      +2.5        4.75 ±  5%  perf-profile.children.cycles-pp.pick_next_task_fair
      2.48 ±  6%      +2.9        5.34 ±  6%  perf-profile.children.cycles-pp.iomap_dio_bio_iter
      2.90 ±  7%      +2.9        5.81 ±  5%  perf-profile.children.cycles-pp.schedule
      3.83 ±  5%      +3.1        6.90 ±  4%  perf-profile.children.cycles-pp.__schedule
      1.27 ± 10%      +3.6        4.82 ±  6%  perf-profile.children.cycles-pp.submit_bio_noacct_nocheck
      1.18 ±  9%      +3.6        4.78 ±  6%  perf-profile.children.cycles-pp.blk_mq_submit_bio
      0.09 ± 17%      +3.9        4.04 ±  7%  perf-profile.children.cycles-pp.blk_mq_get_tag
      0.15 ± 15%      +4.0        4.15 ±  7%  perf-profile.children.cycles-pp.__blk_mq_alloc_requests
      0.88 ± 10%      +5.3        6.14 ±  3%  perf-profile.children.cycles-pp.poll_idle
     64.25 ±  2%      +5.9       70.18        perf-profile.children.cycles-pp.cpuidle_enter
     68.18 ±  2%      +5.9       74.13        perf-profile.children.cycles-pp.start_secondary
     63.95 ±  2%      +6.0       69.91        perf-profile.children.cycles-pp.cpuidle_enter_state
     67.55 ±  2%      +6.0       73.59        perf-profile.children.cycles-pp.cpuidle_idle_call
     69.08 ±  2%      +6.3       75.34        perf-profile.children.cycles-pp.do_idle
     69.10 ±  2%      +6.3       75.35        perf-profile.children.cycles-pp.cpu_startup_entry
     69.10 ±  2%      +6.3       75.35        perf-profile.children.cycles-pp.secondary_startup_64_no_verify
      6.32 ± 28%      -3.4        2.89 ± 16%  perf-profile.self.cycles-pp.native_queued_spin_lock_slowpath
      0.46 ± 16%      -0.3        0.12 ± 16%  perf-profile.self.cycles-pp.__iomap_dio_rw
      0.39 ± 18%      -0.3        0.14 ± 25%  perf-profile.self.cycles-pp.__slab_free
      0.45 ± 20%      -0.2        0.23 ± 13%  perf-profile.self.cycles-pp.iomap_dio_bio_end_io
      0.23 ± 20%      -0.2        0.04 ± 77%  perf-profile.self.cycles-pp.fio_gettime
      0.29 ± 19%      -0.2        0.10 ± 27%  perf-profile.self.cycles-pp.try_grab_folio
      0.29 ± 40%      -0.2        0.11 ± 30%  perf-profile.self.cycles-pp.entry_SYSCALL_64_after_hwframe
      0.26 ± 11%      -0.2        0.09 ± 51%  perf-profile.self.cycles-pp.try_to_wake_up
      0.16 ± 24%      -0.1        0.02 ± 99%  perf-profile.self.cycles-pp.down_read_trylock
      0.22 ±  9%      -0.1        0.11 ± 10%  perf-profile.self.cycles-pp.up_read
      0.18 ± 23%      -0.1        0.08 ± 19%  perf-profile.self.cycles-pp.kmem_cache_free
      0.14 ± 30%      -0.1        0.04 ± 77%  perf-profile.self.cycles-pp.get_io_u
      0.26 ± 18%      -0.1        0.16 ±  5%  perf-profile.self.cycles-pp.__blk_bios_map_sg
      0.14 ± 20%      -0.1        0.05 ± 47%  perf-profile.self.cycles-pp.blk_mq_submit_bio
      0.14 ± 20%      -0.1        0.05 ± 86%  perf-profile.self.cycles-pp.__io_req_task_work_add
      0.14 ± 18%      -0.1        0.05 ± 48%  perf-profile.self.cycles-pp.__bio_release_pages
      0.29 ± 13%      -0.1        0.20 ± 20%  perf-profile.self.cycles-pp.llist_reverse_order
      0.17 ± 19%      -0.1        0.09 ± 41%  perf-profile.self.cycles-pp.memset_orig
      0.13 ± 31%      -0.1        0.06 ± 51%  perf-profile.self.cycles-pp.xfs_inode_to_log_dinode
      0.12 ± 14%      -0.1        0.05 ± 47%  perf-profile.self.cycles-pp.xfs_bmap_add_extent_unwritten_real
      0.11 ± 23%      -0.1        0.05 ± 73%  perf-profile.self.cycles-pp.inode_maybe_inc_iversion
      0.14 ± 12%      -0.1        0.08 ± 14%  perf-profile.self.cycles-pp.xfs_file_write_iter
      0.14 ± 23%      -0.1        0.09 ± 22%  perf-profile.self.cycles-pp.fget
      0.10 ± 13%      -0.0        0.06 ± 46%  perf-profile.self.cycles-pp.xfs_iext_get_extent
      0.03 ±102%      +0.1        0.09 ± 21%  perf-profile.self.cycles-pp.pick_next_task_fair
      0.02 ±141%      +0.1        0.08 ± 38%  perf-profile.self.cycles-pp.__enqueue_entity
      0.10 ± 21%      +0.1        0.16 ± 24%  perf-profile.self.cycles-pp.irqentry_enter
      0.01 ±223%      +0.1        0.08 ± 37%  perf-profile.self.cycles-pp.xas_load
      0.00            +0.1        0.08 ± 18%  perf-profile.self.cycles-pp.__blk_mq_free_request
      0.01 ±223%      +0.1        0.09 ± 22%  perf-profile.self.cycles-pp.dd_dispatch_request
      0.20 ± 21%      +0.1        0.28 ± 16%  perf-profile.self.cycles-pp.update_load_avg
      0.00            +0.1        0.08 ± 25%  perf-profile.self.cycles-pp.ttwu_do_activate
      0.16 ± 31%      +0.1        0.24 ± 20%  perf-profile.self.cycles-pp.switch_mm_irqs_off
      0.08 ± 27%      +0.1        0.17 ± 23%  perf-profile.self.cycles-pp.__dd_dispatch_request
      0.09 ± 21%      +0.1        0.19 ± 25%  perf-profile.self.cycles-pp.enqueue_task_fair
      0.10 ± 14%      +0.1        0.20 ± 23%  perf-profile.self.cycles-pp.rb_erase
      0.12 ± 32%      +0.1        0.22 ± 12%  perf-profile.self.cycles-pp.llist_add_batch
      0.00            +0.1        0.11 ± 31%  perf-profile.self.cycles-pp.prepare_to_wait_exclusive
      0.54 ± 12%      +0.1        0.67 ±  6%  perf-profile.self.cycles-pp.__update_load_avg_cfs_rq
      0.00            +0.1        0.14 ± 27%  perf-profile.self.cycles-pp.sbitmap_queue_get_shallow
      0.01 ±223%      +0.1        0.15 ± 22%  perf-profile.self.cycles-pp.xa_find_after
      0.73 ± 11%      +0.1        0.87 ±  4%  perf-profile.self.cycles-pp._raw_spin_lock_irqsave
      0.02 ±142%      +0.2        0.17 ± 22%  perf-profile.self.cycles-pp.dd_has_work
      0.36 ± 13%      +0.2        0.52 ± 12%  perf-profile.self.cycles-pp.idle_cpu
      0.10 ± 25%      +0.2        0.26 ± 15%  perf-profile.self.cycles-pp._find_next_zero_bit
      0.86 ±  5%      +0.2        1.02 ± 12%  perf-profile.self.cycles-pp.menu_select
      0.09 ± 27%      +0.2        0.26 ± 14%  perf-profile.self.cycles-pp.elv_rqhash_find
      0.20 ± 12%      +0.2        0.37 ± 14%  perf-profile.self.cycles-pp._find_next_and_bit
      0.08 ± 53%      +0.2        0.26 ± 19%  perf-profile.self.cycles-pp.complete_cmd_fusion
      0.00            +0.2        0.19 ± 26%  perf-profile.self.cycles-pp.__sbitmap_weight
      0.24 ± 11%      +0.2        0.48 ± 12%  perf-profile.self.cycles-pp.__schedule
      0.21 ±  7%      +0.2        0.46 ± 11%  perf-profile.self.cycles-pp.cpu_util
      0.01 ±223%      +0.3        0.28 ± 15%  perf-profile.self.cycles-pp.scsi_mq_get_budget
      0.03 ±101%      +0.3        0.30 ± 21%  perf-profile.self.cycles-pp.sbitmap_get
      0.08 ± 24%      +0.4        0.48 ±  5%  perf-profile.self.cycles-pp.sbitmap_find_bit
      1.37 ±  8%      +1.2        2.58 ±  8%  perf-profile.self.cycles-pp.update_sg_lb_stats
      0.82 ± 12%      +5.0        5.84 ±  3%  perf-profile.self.cycles-pp.poll_idle




Disclaimer:
Results have been estimated based on internal Intel analysis and are provided
for informational purposes only. Any difference in system hardware or software
design or configuration may affect actual performance.


-- 
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
  2024-01-31 15:42 [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression kernel test robot
@ 2024-01-31 18:17 ` Bart Van Assche
  2024-01-31 18:42   ` Jens Axboe
  2024-02-09 21:06 ` Jens Axboe
  1 sibling, 1 reply; 9+ messages in thread
From: Bart Van Assche @ 2024-01-31 18:17 UTC (permalink / raw)
  To: kernel test robot
  Cc: oe-lkp, lkp, Linux Memory Management List, Jens Axboe,
	Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
	feng.tang, fengwei.yin

On 1/31/24 07:42, kernel test robot wrote:
> kernel test robot noticed a -72.9% regression of fio.write_iops on:
> 
> 
> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> testcase: fio-basic
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
> 
> 	runtime: 300s
> 	disk: 1HDD
> 	fs: xfs
> 	nr_task: 100%
> 	test_size: 128G
> 	rw: write
> 	bs: 4k
> 	ioengine: io_uring
> 	direct: direct
> 	cpufreq_governor: performance

The actual test is available in this file:
https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script

I haven't found anything in that file for disabling merging. Merging
requests decreases IOPS. Does this perhaps mean that this test is
broken?

Thanks,

Bart.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
  2024-01-31 18:17 ` Bart Van Assche
@ 2024-01-31 18:42   ` Jens Axboe
  2024-02-01  7:18     ` Oliver Sang
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2024-01-31 18:42 UTC (permalink / raw)
  To: Bart Van Assche, kernel test robot
  Cc: oe-lkp, lkp, Linux Memory Management List, Oleksandr Natalenko,
	Johannes Thumshirn, linux-block, ying.huang, feng.tang,
	fengwei.yin

On 1/31/24 11:17 AM, Bart Van Assche wrote:
> On 1/31/24 07:42, kernel test robot wrote:
>> kernel test robot noticed a -72.9% regression of fio.write_iops on:
>>
>>
>> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>
>> testcase: fio-basic
>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
>> parameters:
>>
>>     runtime: 300s
>>     disk: 1HDD
>>     fs: xfs
>>     nr_task: 100%
>>     test_size: 128G
>>     rw: write
>>     bs: 4k
>>     ioengine: io_uring
>>     direct: direct
>>     cpufreq_governor: performance
> 
> The actual test is available in this file:
> https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script
> 
> I haven't found anything in that file for disabling merging. Merging
> requests decreases IOPS. Does this perhaps mean that this test is
> broken?

It's hard to know as nothing in this email or links include the actual
output of the job...

But if it's fio IOPS, then those are application side and don't
necessarily correlate to drive IOPS due to merging. Eg for fio iops, if
it does 4k sequential and we merge to 128k, then the fio perceived iops
will be 32 times larger than the device side.

I'll take a look, but seems like there might be something there. By
inserting into the other list, the request is also not available for
merging. And the test in question does single IOs at the time.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
  2024-01-31 18:42   ` Jens Axboe
@ 2024-02-01  7:18     ` Oliver Sang
  2024-02-01 13:40       ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Oliver Sang @ 2024-02-01  7:18 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, oe-lkp, lkp, Linux Memory Management List,
	Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
	feng.tang, fengwei.yin, oliver.sang

[-- Attachment #1: Type: text/plain, Size: 1953 bytes --]

hi, Jens Axboe,

On Wed, Jan 31, 2024 at 11:42:46AM -0700, Jens Axboe wrote:
> On 1/31/24 11:17 AM, Bart Van Assche wrote:
> > On 1/31/24 07:42, kernel test robot wrote:
> >> kernel test robot noticed a -72.9% regression of fio.write_iops on:
> >>
> >>
> >> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
> >> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>
> >> testcase: fio-basic
> >> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> >> parameters:
> >>
> >>     runtime: 300s
> >>     disk: 1HDD
> >>     fs: xfs
> >>     nr_task: 100%
> >>     test_size: 128G
> >>     rw: write
> >>     bs: 4k
> >>     ioengine: io_uring
> >>     direct: direct
> >>     cpufreq_governor: performance
> > 
> > The actual test is available in this file:
> > https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script
> > 
> > I haven't found anything in that file for disabling merging. Merging
> > requests decreases IOPS. Does this perhaps mean that this test is
> > broken?
> 
> It's hard to know as nothing in this email or links include the actual
> output of the job...

I attached a dmesg and 2 outputs while running tests on 574e7779cf. not sure if
they are helpful?

> 
> But if it's fio IOPS, then those are application side and don't
> necessarily correlate to drive IOPS due to merging. Eg for fio iops, if
> it does 4k sequential and we merge to 128k, then the fio perceived iops
> will be 32 times larger than the device side.
> 
> I'll take a look, but seems like there might be something there. By
> inserting into the other list, the request is also not available for
> merging. And the test in question does single IOs at the time.

if you have any debug patch want us to run, please just let us know.
it will be our great pleasure!

> 
> -- 
> Jens Axboe
> 

[-- Attachment #2: dmesg.xz --]
[-- Type: application/x-xz, Size: 37396 bytes --]

[-- Attachment #3: fio --]
[-- Type: text/plain, Size: 348 bytes --]

2024-01-30 16:11:43 echo '[global]
bs=4k
ioengine=io_uring
iodepth=32
size=2147483648
nr_files=1
filesize=2147483648
direct=1
runtime=300
invalidate=1
fallocate=posix
io_size=2147483648
file_service_type=roundrobin
random_distribution=random
group_reporting
pre_read=0

[task_0]
rw=write
directory=/fs/sdb1
numjobs=64' | fio --output-format=json -

[-- Attachment #4: fio.output --]
[-- Type: text/plain, Size: 6913 bytes --]

{
  "fio version" : "fio-3.33",
  "timestamp" : 1706631404,
  "timestamp_ms" : 1706631404540,
  "time" : "Tue Jan 30 16:16:44 2024",
  "global options" : {
    "bs" : "4k",
    "ioengine" : "io_uring",
    "iodepth" : "32",
    "size" : "2147483648",
    "nrfiles" : "1",
    "filesize" : "2147483648",
    "direct" : "1",
    "runtime" : "300",
    "invalidate" : "1",
    "fallocate" : "posix",
    "io_size" : "2147483648",
    "file_service_type" : "roundrobin",
    "random_distribution" : "random",
    "pre_read" : "0"
  },
  "jobs" : [
    {
      "jobname" : "task_0",
      "groupid" : 0,
      "error" : 0,
      "eta" : 0,
      "elapsed" : 301,
      "job options" : {
        "rw" : "write",
        "directory" : "/fs/sdb1",
        "numjobs" : "64"
      },
      "read" : {
        "io_bytes" : 0,
        "io_kbytes" : 0,
        "bw_bytes" : 0,
        "bw" : 0,
        "iops" : 0.000000,
        "runtime" : 0,
        "total_ios" : 0,
        "short_ios" : 0,
        "drop_ios" : 0,
        "slat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "clat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "lat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "bw_min" : 0,
        "bw_max" : 0,
        "bw_agg" : 0.000000,
        "bw_mean" : 0.000000,
        "bw_dev" : 0.000000,
        "bw_samples" : 0,
        "iops_min" : 0,
        "iops_max" : 0,
        "iops_mean" : 0.000000,
        "iops_stddev" : 0.000000,
        "iops_samples" : 0
      },
      "write" : {
        "io_bytes" : 7811690496,
        "io_kbytes" : 7628604,
        "bw_bytes" : 26013908,
        "bw" : 25404,
        "iops" : 6351.051820,
        "runtime" : 300289,
        "total_ios" : 1907151,
        "short_ios" : 0,
        "drop_ios" : 0,
        "slat_ns" : {
          "min" : 326,
          "max" : 866214376,
          "mean" : 9843550.857783,
          "stddev" : 36978079.656282,
          "N" : 1907151
        },
        "clat_ns" : {
          "min" : 182009,
          "max" : 1499097001,
          "mean" : 312417107.300301,
          "stddev" : 178075607.629576,
          "N" : 1907151,
          "percentile" : {
            "1.000000" : 36438016,
            "5.000000" : 64749568,
            "10.000000" : 92798976,
            "20.000000" : 145752064,
            "30.000000" : 191889408,
            "40.000000" : 238026752,
            "50.000000" : 291504128,
            "60.000000" : 350224384,
            "70.000000" : 408944640,
            "80.000000" : 471859200,
            "90.000000" : 549453824,
            "95.000000" : 616562688,
            "99.000000" : 775946240,
            "99.500000" : 859832320,
            "99.900000" : 1035993088,
            "99.950000" : 1115684864,
            "99.990000" : 1249902592
          }
        },
        "lat_ns" : {
          "min" : 318070,
          "max" : 1499097433,
          "mean" : 322260658.158085,
          "stddev" : 180107939.833017,
          "N" : 1907151
        },
        "bw_min" : 2800,
        "bw_max" : 80728,
        "bw_agg" : 99.991312,
        "bw_mean" : 25402.113478,
        "bw_dev" : 183.115387,
        "bw_samples" : 38394,
        "iops_min" : 700,
        "iops_max" : 20182,
        "iops_mean" : 6350.528370,
        "iops_stddev" : 45.778847,
        "iops_samples" : 38394
      },
      "trim" : {
        "io_bytes" : 0,
        "io_kbytes" : 0,
        "bw_bytes" : 0,
        "bw" : 0,
        "iops" : 0.000000,
        "runtime" : 0,
        "total_ios" : 0,
        "short_ios" : 0,
        "drop_ios" : 0,
        "slat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "clat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "lat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        },
        "bw_min" : 0,
        "bw_max" : 0,
        "bw_agg" : 0.000000,
        "bw_mean" : 0.000000,
        "bw_dev" : 0.000000,
        "bw_samples" : 0,
        "iops_min" : 0,
        "iops_max" : 0,
        "iops_mean" : 0.000000,
        "iops_stddev" : 0.000000,
        "iops_samples" : 0
      },
      "sync" : {
        "total_ios" : 0,
        "lat_ns" : {
          "min" : 0,
          "max" : 0,
          "mean" : 0.000000,
          "stddev" : 0.000000,
          "N" : 0
        }
      },
      "job_runtime" : 19210274,
      "usr_cpu" : 0.005950,
      "sys_cpu" : 0.030385,
      "ctx" : 446567,
      "majf" : 0,
      "minf" : 883,
      "iodepth_level" : {
        "1" : 0.100000,
        "2" : 0.100000,
        "4" : 0.100000,
        "8" : 0.100000,
        "16" : 0.100000,
        "32" : 99.895970,
        ">=64" : 0.000000
      },
      "iodepth_submit" : {
        "0" : 0.000000,
        "4" : 100.000000,
        "8" : 0.000000,
        "16" : 0.000000,
        "32" : 0.000000,
        "64" : 0.000000,
        ">=64" : 0.000000
      },
      "iodepth_complete" : {
        "0" : 0.000000,
        "4" : 99.996641,
        "8" : 0.000000,
        "16" : 0.000000,
        "32" : 0.100000,
        "64" : 0.000000,
        ">=64" : 0.000000
      },
      "latency_ns" : {
        "2" : 0.000000,
        "4" : 0.000000,
        "10" : 0.000000,
        "20" : 0.000000,
        "50" : 0.000000,
        "100" : 0.000000,
        "250" : 0.000000,
        "500" : 0.000000,
        "750" : 0.000000,
        "1000" : 0.000000
      },
      "latency_us" : {
        "2" : 0.000000,
        "4" : 0.000000,
        "10" : 0.000000,
        "20" : 0.000000,
        "50" : 0.000000,
        "100" : 0.000000,
        "250" : 0.010000,
        "500" : 0.010000,
        "750" : 0.010000,
        "1000" : 0.010000
      },
      "latency_ms" : {
        "2" : 0.010000,
        "4" : 0.033925,
        "10" : 0.247909,
        "20" : 0.190231,
        "50" : 2.933171,
        "100" : 7.702536,
        "250" : 31.258039,
        "500" : 41.803822,
        "750" : 14.596065,
        "1000" : 1.079935,
        "2000" : 0.153056,
        ">=2000" : 0.000000
      },
      "latency_depth" : 32,
      "latency_target" : 0,
      "latency_percentile" : 100.000000,
      "latency_window" : 0
    }
  ],
  "disk_util" : [
    {
      "name" : "sdb",
      "read_ios" : 0,
      "write_ios" : 1013029,
      "read_merges" : 0,
      "write_merges" : 894106,
      "read_ticks" : 0,
      "write_ticks" : 53607123,
      "in_queue" : 53609981,
      "util" : 96.247980
    }
  ]
}

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
  2024-02-01  7:18     ` Oliver Sang
@ 2024-02-01 13:40       ` Jens Axboe
  2024-02-01 14:03         ` Oliver Sang
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2024-02-01 13:40 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Bart Van Assche, oe-lkp, lkp, Linux Memory Management List,
	Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
	feng.tang, fengwei.yin

On 2/1/24 12:18 AM, Oliver Sang wrote:
> hi, Jens Axboe,
> 
> On Wed, Jan 31, 2024 at 11:42:46AM -0700, Jens Axboe wrote:
>> On 1/31/24 11:17 AM, Bart Van Assche wrote:
>>> On 1/31/24 07:42, kernel test robot wrote:
>>>> kernel test robot noticed a -72.9% regression of fio.write_iops on:
>>>>
>>>>
>>>> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>
>>>> testcase: fio-basic
>>>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
>>>> parameters:
>>>>
>>>>     runtime: 300s
>>>>     disk: 1HDD
>>>>     fs: xfs
>>>>     nr_task: 100%
>>>>     test_size: 128G
>>>>     rw: write
>>>>     bs: 4k
>>>>     ioengine: io_uring
>>>>     direct: direct
>>>>     cpufreq_governor: performance
>>>
>>> The actual test is available in this file:
>>> https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script
>>>
>>> I haven't found anything in that file for disabling merging. Merging
>>> requests decreases IOPS. Does this perhaps mean that this test is
>>> broken?
>>
>> It's hard to know as nothing in this email or links include the actual
>> output of the job...
> 
> I attached a dmesg and 2 outputs while running tests on 574e7779cf.
> not sure if they are helpful?

Both fio outputs is all I need, but I only see one of them attached?

>> But if it's fio IOPS, then those are application side and don't
>> necessarily correlate to drive IOPS due to merging. Eg for fio iops,
>> if it does 4k sequential and we merge to 128k, then the fio perceived
>> iops will be 32 times larger than the device side.
>>
>> I'll take a look, but seems like there might be something there. By
>> inserting into the other list, the request is also not available for
>> merging. And the test in question does single IOs at the time.
> 
> if you have any debug patch want us to run, please just let us know.
> it will be our great pleasure!

Thanks, might take you up on that, probably won't have time for this
until next week however.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
  2024-02-01 13:40       ` Jens Axboe
@ 2024-02-01 14:03         ` Oliver Sang
  2024-02-01 14:30           ` Jens Axboe
  0 siblings, 1 reply; 9+ messages in thread
From: Oliver Sang @ 2024-02-01 14:03 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, oe-lkp, lkp, Linux Memory Management List,
	Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
	feng.tang, fengwei.yin, oliver.sang

[-- Attachment #1: Type: text/plain, Size: 2538 bytes --]

hi, Jens Axboe,

On Thu, Feb 01, 2024 at 06:40:07AM -0700, Jens Axboe wrote:
> On 2/1/24 12:18 AM, Oliver Sang wrote:
> > hi, Jens Axboe,
> > 
> > On Wed, Jan 31, 2024 at 11:42:46AM -0700, Jens Axboe wrote:
> >> On 1/31/24 11:17 AM, Bart Van Assche wrote:
> >>> On 1/31/24 07:42, kernel test robot wrote:
> >>>> kernel test robot noticed a -72.9% regression of fio.write_iops on:
> >>>>
> >>>>
> >>>> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
> >>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>>>
> >>>> testcase: fio-basic
> >>>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> >>>> parameters:
> >>>>
> >>>>     runtime: 300s
> >>>>     disk: 1HDD
> >>>>     fs: xfs
> >>>>     nr_task: 100%
> >>>>     test_size: 128G
> >>>>     rw: write
> >>>>     bs: 4k
> >>>>     ioengine: io_uring
> >>>>     direct: direct
> >>>>     cpufreq_governor: performance
> >>>
> >>> The actual test is available in this file:
> >>> https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script
> >>>
> >>> I haven't found anything in that file for disabling merging. Merging
> >>> requests decreases IOPS. Does this perhaps mean that this test is
> >>> broken?
> >>
> >> It's hard to know as nothing in this email or links include the actual
> >> output of the job...
> > 
> > I attached a dmesg and 2 outputs while running tests on 574e7779cf.
> > not sure if they are helpful?
> 
> Both fio outputs is all I need, but I only see one of them attached?

while we running fio, there are below logs captured:
fio
fio.output
fio.task
fio.time

I tar them in fio.tar.gz as attached.
you can get them by 'tar xzvf fio.tar.gz'


> 
> >> But if it's fio IOPS, then those are application side and don't
> >> necessarily correlate to drive IOPS due to merging. Eg for fio iops,
> >> if it does 4k sequential and we merge to 128k, then the fio perceived
> >> iops will be 32 times larger than the device side.
> >>
> >> I'll take a look, but seems like there might be something there. By
> >> inserting into the other list, the request is also not available for
> >> merging. And the test in question does single IOs at the time.
> > 
> > if you have any debug patch want us to run, please just let us know.
> > it will be our great pleasure!
> 
> Thanks, might take you up on that, probably won't have time for this
> until next week however.
> 
> -- 
> Jens Axboe
> 

[-- Attachment #2: fio.tar.gz --]
[-- Type: application/gzip, Size: 2208 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
  2024-02-01 14:03         ` Oliver Sang
@ 2024-02-01 14:30           ` Jens Axboe
  2024-02-01 14:45             ` Oliver Sang
  0 siblings, 1 reply; 9+ messages in thread
From: Jens Axboe @ 2024-02-01 14:30 UTC (permalink / raw)
  To: Oliver Sang
  Cc: Bart Van Assche, oe-lkp, lkp, Linux Memory Management List,
	Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
	feng.tang, fengwei.yin

On 2/1/24 7:03 AM, Oliver Sang wrote:
> hi, Jens Axboe,
> 
> On Thu, Feb 01, 2024 at 06:40:07AM -0700, Jens Axboe wrote:
>> On 2/1/24 12:18 AM, Oliver Sang wrote:
>>> hi, Jens Axboe,
>>>
>>> On Wed, Jan 31, 2024 at 11:42:46AM -0700, Jens Axboe wrote:
>>>> On 1/31/24 11:17 AM, Bart Van Assche wrote:
>>>>> On 1/31/24 07:42, kernel test robot wrote:
>>>>>> kernel test robot noticed a -72.9% regression of fio.write_iops on:
>>>>>>
>>>>>>
>>>>>> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
>>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
>>>>>>
>>>>>> testcase: fio-basic
>>>>>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
>>>>>> parameters:
>>>>>>
>>>>>>     runtime: 300s
>>>>>>     disk: 1HDD
>>>>>>     fs: xfs
>>>>>>     nr_task: 100%
>>>>>>     test_size: 128G
>>>>>>     rw: write
>>>>>>     bs: 4k
>>>>>>     ioengine: io_uring
>>>>>>     direct: direct
>>>>>>     cpufreq_governor: performance
>>>>>
>>>>> The actual test is available in this file:
>>>>> https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script
>>>>>
>>>>> I haven't found anything in that file for disabling merging. Merging
>>>>> requests decreases IOPS. Does this perhaps mean that this test is
>>>>> broken?
>>>>
>>>> It's hard to know as nothing in this email or links include the actual
>>>> output of the job...
>>>
>>> I attached a dmesg and 2 outputs while running tests on 574e7779cf.
>>> not sure if they are helpful?
>>
>> Both fio outputs is all I need, but I only see one of them attached?
> 
> while we running fio, there are below logs captured:
> fio
> fio.output
> fio.task
> fio.time
> 
> I tar them in fio.tar.gz as attached.
> you can get them by 'tar xzvf fio.tar.gz'

Right, but I need BOTH outputs - one from before the commit and the one
on the commit. The report is a regression, hence there must be both a
good and a bad run output... This looks like just the same output again,
I can't really do much with just one output.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
  2024-02-01 14:30           ` Jens Axboe
@ 2024-02-01 14:45             ` Oliver Sang
  0 siblings, 0 replies; 9+ messages in thread
From: Oliver Sang @ 2024-02-01 14:45 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Bart Van Assche, oe-lkp, lkp, Linux Memory Management List,
	Oleksandr Natalenko, Johannes Thumshirn, linux-block, ying.huang,
	feng.tang, fengwei.yin, oliver.sang

[-- Attachment #1: Type: text/plain, Size: 2429 bytes --]

hi, Jens Axboe,

On Thu, Feb 01, 2024 at 07:30:53AM -0700, Jens Axboe wrote:
> On 2/1/24 7:03 AM, Oliver Sang wrote:
> > hi, Jens Axboe,
> > 
> > On Thu, Feb 01, 2024 at 06:40:07AM -0700, Jens Axboe wrote:
> >> On 2/1/24 12:18 AM, Oliver Sang wrote:
> >>> hi, Jens Axboe,
> >>>
> >>> On Wed, Jan 31, 2024 at 11:42:46AM -0700, Jens Axboe wrote:
> >>>> On 1/31/24 11:17 AM, Bart Van Assche wrote:
> >>>>> On 1/31/24 07:42, kernel test robot wrote:
> >>>>>> kernel test robot noticed a -72.9% regression of fio.write_iops on:
> >>>>>>
> >>>>>>
> >>>>>> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
> >>>>>> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> >>>>>>
> >>>>>> testcase: fio-basic
> >>>>>> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> >>>>>> parameters:
> >>>>>>
> >>>>>>     runtime: 300s
> >>>>>>     disk: 1HDD
> >>>>>>     fs: xfs
> >>>>>>     nr_task: 100%
> >>>>>>     test_size: 128G
> >>>>>>     rw: write
> >>>>>>     bs: 4k
> >>>>>>     ioengine: io_uring
> >>>>>>     direct: direct
> >>>>>>     cpufreq_governor: performance
> >>>>>
> >>>>> The actual test is available in this file:
> >>>>> https://download.01.org/0day-ci/archive/20240131/202401312320.a335db14-oliver.sang@intel.com/repro-script
> >>>>>
> >>>>> I haven't found anything in that file for disabling merging. Merging
> >>>>> requests decreases IOPS. Does this perhaps mean that this test is
> >>>>> broken?
> >>>>
> >>>> It's hard to know as nothing in this email or links include the actual
> >>>> output of the job...
> >>>
> >>> I attached a dmesg and 2 outputs while running tests on 574e7779cf.
> >>> not sure if they are helpful?
> >>
> >> Both fio outputs is all I need, but I only see one of them attached?
> > 
> > while we running fio, there are below logs captured:
> > fio
> > fio.output
> > fio.task
> > fio.time
> > 
> > I tar them in fio.tar.gz as attached.
> > you can get them by 'tar xzvf fio.tar.gz'
> 
> Right, but I need BOTH outputs - one from before the commit and the one
> on the commit. The report is a regression, hence there must be both a
> good and a bad run output... This looks like just the same output again,
> I can't really do much with just one output.

oh, didn't get you... sorry.
attached fio-8f764b91fd.tar.gz is from one parent run.

> 
> -- 
> Jens Axboe
> 

[-- Attachment #2: fio-8f764b91fd.tar.gz --]
[-- Type: application/gzip, Size: 2189 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression
  2024-01-31 15:42 [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression kernel test robot
  2024-01-31 18:17 ` Bart Van Assche
@ 2024-02-09 21:06 ` Jens Axboe
  1 sibling, 0 replies; 9+ messages in thread
From: Jens Axboe @ 2024-02-09 21:06 UTC (permalink / raw)
  To: kernel test robot, Bart Van Assche
  Cc: oe-lkp, lkp, Linux Memory Management List, Oleksandr Natalenko,
	Johannes Thumshirn, linux-block, ying.huang, feng.tang,
	fengwei.yin

On 1/31/24 8:42 AM, kernel test robot wrote:
> 
> 
> Hello,
> 
> kernel test robot noticed a -72.9% regression of fio.write_iops on:
> 
> 
> commit: 574e7779cf583171acb5bf6365047bb0941b387c ("block/mq-deadline: use separate insertion lists")
> https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git master
> 
> testcase: fio-basic
> test machine: 64 threads 2 sockets Intel(R) Xeon(R) Gold 6346 CPU @ 3.10GHz (Ice Lake) with 256G memory
> parameters:
> 
> 	runtime: 300s
> 	disk: 1HDD
> 	fs: xfs
> 	nr_task: 100%
> 	test_size: 128G
> 	rw: write
> 	bs: 4k
> 	ioengine: io_uring
> 	direct: direct
> 	cpufreq_governor: performance

I looked into this, and I think I see what is happening. We do still do
insertion merges, but it's now postponed to dispatch time. This means
that for this crazy case, where you have 64 threads doing sequential
writes, we run out of tags (which is 64 by default) and hence dispatch
sooner than we would've before. Before, we would've queued one request,
then allocated a new one, and queued that. When that queue event
happened, we would merge with the previous - either upfront, or when the
request is inserted. In any case, we now have 1 bigger request, rather
than two smaller ones that still need merging.

This leaves more requests free.

I think we can solve this by doing smarter merging at insertion time.
I've dropped the series from my for-next branch for now, will need
revisiting and then I'll post it again.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-02-09 21:06 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-31 15:42 [linux-next:master] [block/mq] 574e7779cf: fio.write_iops -72.9% regression kernel test robot
2024-01-31 18:17 ` Bart Van Assche
2024-01-31 18:42   ` Jens Axboe
2024-02-01  7:18     ` Oliver Sang
2024-02-01 13:40       ` Jens Axboe
2024-02-01 14:03         ` Oliver Sang
2024-02-01 14:30           ` Jens Axboe
2024-02-01 14:45             ` Oliver Sang
2024-02-09 21:06 ` Jens Axboe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.