Understanding fio results on a new ZFS server

* Understanding  fio results on a new ZFS server
@ 2018-07-25 12:35 Aravindh Sampathkumar
  2018-07-25 19:03 ` Bryan Gurney
  2018-07-25 19:03 ` Jens Axboe
  0 siblings, 2 replies; 3+ messages in thread
From: Aravindh Sampathkumar @ 2018-07-25 12:35 UTC (permalink / raw)
  To: fio

[-- Attachment #1: Type: text/plain, Size: 9036 bytes --]

Hello fio users.

This is my first attempt at using fio, and I'm puzzled by the results I
obtained while running fio on a ZFS server I built recently. I'm running
these commands mostly out of curiosity, and just to understand what the
server is capable of in a ballpark range.
Here goes my naive attempts and results. 

Context: FreeBSD 11.2, ZFS, Compression enabled, Deduplication disabled, 120 X 10TB SAS HDD disks, no SSD caching, 768GB of DDR4 memory, 2X intel Xeon silver 4110 processors.All commands were run directly on the FreeBSD server.
Running fio-3.8

*_Test # 1:_*
Use dd to create a 16 GiB file with random contents.

root@delorean:/sec_stor/backup/fiotest # time dd if=/dev/random
of=./test_file bs=128k count=131072131072+0 records in
131072+0 records out
17179869184 bytes transferred in 137.402721 secs (125032962 bytes/sec)
0.031u 136.915s 2:17.40 99.6% 30+172k 4+131072io 0pf+0w

This shows that I was able to create a 16GiB file with random data in
137 secs with a throughput of approx. *117 MiB/s.*
*_Test # 2:_*
Use fio to perform seq_writes on a 16 GiB file with default
random contents.
root@delorean:/sec_stor/backup/fiotest/fio-master # ./fio --
name=seqwrite --rw=write --bs=128k --numjobs=1 --size=16G --runtime=60
--iodepth=1 --group_reportingseqwrite: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=1fio-3.8

Starting 1 process

Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=2188MiB/s][r=0,w=17.5k IOPS][eta 00m:00s]seqwrite: (groupid=0, jobs=1): err= 0: pid=8693: Wed Jul 25 14:11:54 2018  write: IOPS=16.5k, BW=2065MiB/s (2165MB/s)(16.0GiB/7935msec)

    clat (usec): min=18, max=17174, avg=58.18, stdev=285.68

     lat (usec): min=18, max=17180, avg=59.94, stdev=285.88

    clat percentiles (usec):

     |  1.00th=[   23],  5.00th=[   39], 10.00th=[   41],
     |    20.00th=[   42],     | 30.00th=[   43], 40.00th=[   44], 50.00th=[   45],
     |    60.00th=[   47],     | 70.00th=[   51], 80.00th=[   57], 90.00th=[   73],
     |    95.00th=[   89],     | 99.00th=[  135], 99.50th=[  165], 99.90th=[  396],
     |    99.95th=[10421],     | 99.99th=[14877]

   bw (  MiB/s): min= 1648, max= 2388, per=99.00%, avg=2044.07,
   stdev=218.90, samples=15   iops        : min=13185, max=19108, avg=16352.07, stdev=1751.17,
   samples=15  lat (usec)   : 20=0.33%, 50=67.57%, 100=28.78%, 250=3.12%, 500=0.13%  lat (usec)   : 750=0.01%, 1000=0.01%

  lat (msec)   : 2=0.01%, 4=0.01%, 20=0.05%

  cpu          : usr=5.08%, sys=90.27%, ctx=40602, majf=0, minf=0

  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
  32=0.0%, >=64=0.0%     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
     64=0.0%, >=64=0.0%     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
     64=0.0%, >=64=0.0%     issued rwts: total=0,131072,0,0 short=0,0,0,0 dropped=0,0,0,0

     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):

  WRITE: bw=2065MiB/s (2165MB/s), 2065MiB/s-2065MiB/s (2165MB/s-
  2165MB/s), io=16.0GiB (17.2GB), run=7935-7935msecroot@delorean:/sec_stor/backup/fiotest/fio-master # du -sh seqwrite.0.0
16G seqwrite.0.0
As you can see, it hits a throughput of 2065 MiB/s.
Question: Why such a drastic difference between dd and this? 
My understanding is that dd command must have used createcall to create
a file, then issue open, and write calls to write the random data into
the open file. Finally closethe file. I chose block size of 128 K to
match with ZFS default record size.The fio test should be measuring just the write calls, but everything
else, the same. Why is there so much difference in throughput?While that question is still open in mind, I ventured into testing with
compressible data.
*_Test # 3:_*
Run fio with buffer_compress_percent=0 (0% compression) - My expectation
is this to match the results of test #2. But, it didnt actually create
0% compressible data. More like 100% compressible data!
root@delorean:/sec_stor/backup/fiotest/fio-master # ./fio --
name=seqwrite --rw=write --bs=128k --numjobs=1 --size=16G --runtime=60
--iodepth=1 --buffer_compress_percentage=0 --buffer_pattern=0xdeadbeef
--refill_buffers --group_reportingseqwrite: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=1fio-3.8

Starting 1 process

Jobs: 1 (f=1): [W(1)][100.0%][r=0KiB/s,w=2470MiB/s][r=0,w=19.8k IOPS][eta 00m:00s]seqwrite: (groupid=0, jobs=1): err= 0: pid=10204: Wed Jul 25 14:14:22 2018  write: IOPS=19.8k, BW=2477MiB/s (2597MB/s)(16.0GiB/6615msec)

    clat (usec): min=25, max=529, avg=45.34, stdev= 7.71

     lat (usec): min=25, max=532, avg=45.39, stdev= 7.73

    clat percentiles (usec):

     |  1.00th=[   34],  5.00th=[   39], 10.00th=[   41],
     |    20.00th=[   42],     | 30.00th=[   42], 40.00th=[   43], 50.00th=[   44],
     |    60.00th=[   45],     | 70.00th=[   46], 80.00th=[   49], 90.00th=[   54],
     |    95.00th=[   59],     | 99.00th=[   71], 99.50th=[   77], 99.90th=[  106], 99.95th=[
     |    121],     | 99.99th=[  239]

   bw (  MiB/s): min= 2404, max= 2513, per=99.54%, avg=2465.45,
   stdev=34.80, samples=13   iops        : min=19235, max=20109, avg=19723.23, stdev=278.34,
   samples=13  lat (usec)   : 50=83.01%, 100=16.84%, 250=0.15%, 500=0.01%, 750=0.01%  cpu          : usr=13.09%, sys=86.82%, ctx=305, majf=0, minf=0

  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
  32=0.0%, >=64=0.0%     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
     64=0.0%, >=64=0.0%     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
     64=0.0%, >=64=0.0%     issued rwts: total=0,131072,0,0 short=0,0,0,0 dropped=0,0,0,0

     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):

  WRITE: bw=2477MiB/s (2597MB/s), 2477MiB/s-2477MiB/s (2597MB/s-
  2597MB/s), io=16.0GiB (17.2GB), run=6615-6615msecroot@delorean:/sec_stor/backup/fiotest/fio-master # du -sh seqwrite.0.0
1.1G seqwrite.0.0
As you see, this test created a 1.1GiB file when I was expecting a 16
GiB file. Why? What options did I miss or specify wrong?
I just couldn't stop, so I went on with another test,

*_Test # 4:_*
 Run fio with buffer_compress_percent=50 (50% compression). It looks
 like it did create a 50% compressible dataset. But, yielded relatively
 low throughput.
root@delorean:/sec_stor/backup/fiotest/fio-master # ./fio --
name=seqwrite --rw=write --bs=128k --numjobs=1 --size=16G --runtime=60
--iodepth=1 --buffer_compress_percentage=50 --buffer_pattern=0xdeadbeef
--refill_buffers --group_reportingseqwrite: (g=0): rw=write, bs=(R) 128KiB-128KiB, (W) 128KiB-128KiB, (T) 128KiB-128KiB, ioengine=psync, iodepth=1fio-3.8

Starting 1 process

Jobs: 1 (f=1): [W(1)][95.7%][r=0KiB/s,w=715MiB/s][r=0,w=5721 IOPS][eta 00m:01s]seqwrite: (groupid=0, jobs=1): err= 0: pid=13543: Wed Jul 25 14:18:41 2018  write: IOPS=5729, BW=716MiB/s (751MB/s)(16.0GiB/22876msec)

    clat (usec): min=26, max=1228, avg=46.59, stdev=19.26

     lat (usec): min=26, max=1229, avg=46.79, stdev=19.73

    clat percentiles (usec):

     |  1.00th=[   35],  5.00th=[   39], 10.00th=[   40],
     |    20.00th=[   42],     | 30.00th=[   42], 40.00th=[   43], 50.00th=[   44],
     |    60.00th=[   44],     | 70.00th=[   45], 80.00th=[   47], 90.00th=[   52],
     |    95.00th=[   64],     | 99.00th=[  120], 99.50th=[  153], 99.90th=[  269], 99.95th=[
     |    359],     | 99.99th=[  627]

   bw (  KiB/s): min=689820, max=748346, per=99.16%, avg=727266.49,
   stdev=13577.99, samples=45   iops        : min= 5389, max= 5846, avg=5681.29, stdev=106.15,
   samples=45  lat (usec)   : 50=88.05%, 100=10.38%, 250=1.45%, 500=0.10%, 750=0.02%  lat (usec)   : 1000=0.01%

  lat (msec)   : 2=0.01%

  cpu          : usr=70.79%, sys=27.64%, ctx=24546, majf=0, minf=0

  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%,
  32=0.0%, >=64=0.0%     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
     64=0.0%, >=64=0.0%     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%,
     64=0.0%, >=64=0.0%     issued rwts: total=0,131072,0,0 short=0,0,0,0 dropped=0,0,0,0

     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):

  WRITE: bw=716MiB/s (751MB/s), 716MiB/s-716MiB/s (751MB/s-751MB/s),
  io=16.0GiB (17.2GB), run=22876-22876msecroot@delorean:/sec_stor/backup/fiotest/fio-master # du -sh seqwrite.0.0
9.2G seqwrite.0.0
As you see, 9.2G file seems reasonable at 50% compression. But, why 716
MiB/s when compared to test #2(no compression) - 2065 MiB/s? At 50%
compression, I understand there is work for the CPUs to compress the
data inline, but, would'nt that cost be negated by the reduction in
amount of data to be written to disks? Is this kind of drop in
throughput reasonable in the field for enabling compression?
Apologies in advance if I missed anything obvious in the documentation.
This is my first use of fio.
Thanks,
--
  Aravindh Sampathkumar
  aravindh@fastmail.com

[-- Attachment #2: Type: text/html, Size: 147195 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread