Huge reduction in write bandwidth with filesystem vs direct block device

* Huge reduction in write bandwidth with filesystem vs direct block device
@ 2021-01-25 17:23 Rick Warner
  2021-01-25 20:58 ` Dave Chinner
  0 siblings, 1 reply; 2+ messages in thread
From: Rick Warner @ 2021-01-25 17:23 UTC (permalink / raw)
  To: linux-xfs

Hi all,

We're working with NVME storage systems and are seeing a significant 
reduction in write speed with an XFS filesystem vs direct access to the 
block device.

Using a 5 disk software RAID5, we're able to get ~16GB/s write speed 
direct to the device.  If we put an XFS filesystem on the software RAID 
and run the same fio command (except --directory /xfs instead of 
--filename /dev/md11) we only get ~2.5GB/s write speed.

Are there any tunables that could improve this? Is performance 
degradation this big considered a bug?

The fio runs showing this are below:

*******Direct to /dev/md11 block device
[root@flashstore ~]# fio --filename=/dev/md11 --rw=write --numjobs=32 
--size=12G --bs=1M --name=1m --group_reporting
1m: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 
1024KiB-1024KiB, ioengine=psync, iodepth=1
...
fio-3.7
Starting 32 processes
Jobs: 26 (f=26): 
[f(1),_(1),f(16),_(1),f(3),_(2),f(3),_(1),f(2),_(1),f(1)][100.0%][r=0KiB/s,w=0KiB/s][r=0,w=0 
IOPS][eta 00m:00s]
1m: (groupid=0, jobs=32): err= 0: pid=74592: Mon Jan 25 12:13:28 2021
   write: IOPS=15.4k, BW=15.0GiB/s (16.1GB/s)(384GiB/25551msec)
     clat (usec): min=230, max=31691, avg=2044.43, stdev=778.36
      lat (usec): min=245, max=31710, avg=2067.97, stdev=783.21
     clat percentiles (usec):
      |  1.00th=[  420],  5.00th=[ 1745], 10.00th=[ 1811], 20.00th=[ 1860],
      | 30.00th=[ 1893], 40.00th=[ 1926], 50.00th=[ 1942], 60.00th=[ 1975],
      | 70.00th=[ 2024], 80.00th=[ 2089], 90.00th=[ 2180], 95.00th=[ 2900],
      | 99.00th=[ 4490], 99.50th=[ 4883], 99.90th=[13829], 99.95th=[14746],
      | 99.99th=[20841]
    bw (  KiB/s): min=400606, max=679936, per=3.13%, avg=492489.85, 
stdev=53436.36, samples=1632
    iops        : min=  391, max=  664, avg=480.90, stdev=52.18, 
samples=1632
   lat (usec)   : 250=0.01%, 500=1.72%, 750=0.80%, 1000=0.09%
   lat (msec)   : 2=62.47%, 4=32.77%, 10=1.94%, 20=0.20%, 50=0.01%
   cpu          : usr=1.37%, sys=62.91%, ctx=38028757, majf=0, minf=60496
   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      issued rwts: total=0,393216,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   WRITE: bw=15.0GiB/s (16.1GB/s), 15.0GiB/s-15.0GiB/s 
(16.1GB/s-16.1GB/s), io=384GiB (412GB), run=25551-25551msec

Disk stats (read/write):
     md11: ios=98/2237881, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, 
aggrios=10252/73455, aggrmerge=20863/582117, aggrticks=4425/138224, 
aggrin_queue=130116, aggrutil=17.71%
   nvme2n1: ios=12427/88141, merge=25370/698549, ticks=5030/163534, 
in_queue=153382, util=16.71%
   nvme3n1: ios=12210/88148, merge=24728/698544, ticks=4979/162745, 
in_queue=152592, util=16.84%
   nvme4n1: ios=12246/88150, merge=24861/698524, ticks=4875/165703, 
in_queue=156034, util=16.81%
   nvme5n1: ios=12289/88146, merge=25200/698533, ticks=5013/164900, 
in_queue=154398, util=16.96%
   nvme6n1: ios=12343/88149, merge=25021/698553, ticks=6655/172464, 
in_queue=164291, util=17.71%
   nvme22n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

******* mkfs.xfs on /dev/md11 (w/ no flags) and fio run on that mount
[root@flashstore ~]# fio --directory=/xfs --rw=write --numjobs=32 
--size=12G --bs=1M --name=1m --group_reporting
1m: (g=0): rw=write, bs=(R) 1024KiB-1024KiB, (W) 1024KiB-1024KiB, (T) 
1024KiB-1024KiB, ioengine=psync, iodepth=1
...
fio-3.7
Starting 32 processes
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
1m: Laying out IO file (1 file / 12288MiB)
Jobs: 11 (f=11): 
[_(6),W(1),_(4),W(1),_(8),W(1),_(3),W(8)][99.4%][r=0KiB/s,w=1213MiB/s][r=0,w=1213 
IOPS][eta 00m:01s]
1m: (groupid=0, jobs=32): err= 0: pid=74782: Mon Jan 25 12:20:32 2021
   write: IOPS=2431, BW=2432MiB/s (2550MB/s)(384GiB/161704msec)
     clat (usec): min=251, max=117777, avg=13006.54, stdev=23856.18
      lat (usec): min=270, max=117787, avg=13027.39, stdev=23851.96
     clat percentiles (usec):
      |  1.00th=[  359],  5.00th=[  371], 10.00th=[  383], 20.00th=[ 408],
      | 30.00th=[  424], 40.00th=[  453], 50.00th=[  537], 60.00th=[ 578],
      | 70.00th=[  619], 80.00th=[55313], 90.00th=[58459], 95.00th=[60556],
      | 99.00th=[63177], 99.50th=[64226], 99.90th=[66323], 99.95th=[68682],
      | 99.99th=[80217]
    bw (  KiB/s): min=55296, max=1054720, per=3.15%, avg=78557.24, 
stdev=41896.99, samples=10233
    iops        : min=   54, max= 1030, avg=76.67, stdev=40.92, 
samples=10233
   lat (usec)   : 500=46.14%, 750=31.31%, 1000=0.78%
   lat (msec)   : 2=0.03%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.59%
   lat (msec)   : 100=21.13%, 250=0.01%
   cpu          : usr=0.12%, sys=3.83%, ctx=86515, majf=0, minf=22227
   IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, 
 >=64=0.0%
      submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, 
 >=64=0.0%
      issued rwts: total=0,393216,0,0 short=0,0,0,0 dropped=0,0,0,0
      latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   WRITE: bw=2432MiB/s (2550MB/s), 2432MiB/s-2432MiB/s 
(2550MB/s-2550MB/s), io=384GiB (412GB), run=161704-161704msec

Disk stats (read/write):
     md11: ios=1/6097731, merge=0/0, ticks=0/0, in_queue=0, util=0.00%, 
aggrios=23774/2849499, aggrmerge=34232/17493878, 
aggrticks=28040/18125298, aggrin_queue=18363574, aggrutil=80.03%
   nvme2n1: ios=28860/3419298, merge=41127/20992122, 
ticks=39496/23053174, in_queue=23421586, util=75.76%
   nvme3n1: ios=28440/3419396, merge=41081/20992524, 
ticks=34881/23067448, in_queue=23411872, util=80.03%
   nvme4n1: ios=28457/3419413, merge=41361/20992713, 
ticks=30990/21139316, in_queue=21420720, util=78.03%
   nvme5n1: ios=28131/3419446, merge=40331/20992920, 
ticks=29288/20184431, in_queue=20418749, util=77.00%
   nvme6n1: ios=28759/3419446, merge=41496/20992991, 
ticks=33587/21307424, in_queue=21508518, util=77.04%
   nvme22n1: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%

Thanks,
Rick

^ permalink raw reply	[flat|nested] 2+ messages in thread