All of lore.kernel.org
 help / color / mirror / Atom feed
* Verify concentrated on first few drives
@ 2015-06-17 14:45 Gavin Martin
  2015-06-20 16:53 ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Gavin Martin @ 2015-06-17 14:45 UTC (permalink / raw)
  To: fio

Hi,

Running FIO (2.2.8) with the verify option I see that it seems to
concentrate the Reads (from the Verify) on the first few drives and
tails off as you run more drives, as reported in the Disk Stats.

Is this normal? Does the Verify follow the Round-Robin approach of
normal workloads?

Config file:-
[global]
ioengine=libaio
iodepth=2
direct=1
runtime=30s
time_based
bs=64K
verify=meta
verify_backlog=32
verify_state_save=0
stonewall
norandommap
atomic=1

filename=/dev/sdb
filename=/dev/sdc
filename=/dev/sdd
filename=/dev/sde
filename=/dev/sdf
filename=/dev/sdg
filename=/dev/sdh
filename=/dev/sdi
filename=/dev/sdj
filename=/dev/sdk
filename=/dev/sdl
filename=/dev/sdm
filename=/dev/sdn
filename=/dev/sdo
filename=/dev/sdp
filename=/dev/sdq
filename=/dev/sdr
filename=/dev/sds
filename=/dev/sdt
filename=/dev/sdu
filename=/dev/sdv
filename=/dev/sdw
filename=/dev/sdx
filename=/dev/sdy

[RUN01]
rw=randwrite
do_verify=1

Output:-
RUN01: (g=0): rw=randwrite, bs=64K-64K/64K-64K/64K-64K,
ioengine=libaio, iodepth=2
fio-2.2.8
Starting 1 process
Jobs: 1 (f=24): [w(1)] [100.0% done] [10199KB/9816KB/0KB /s]
[159/153/0 iops] [eta 00m:00s]
RUN01: (groupid=0, jobs=1): err= 0: pid=22749: Wed Jun 17 10:28:23 2015
  write: io=296000KB, bw=9862.5KB/s, iops=154, runt= 30013msec
    slat (usec): min=10, max=86, avg=32.74, stdev= 8.18
    clat (msec): min=1, max=50, avg=11.80, stdev= 3.94
     lat (msec): min=2, max=50, avg=11.83, stdev= 3.94
    clat percentiles (usec):
     |  1.00th=[ 3120],  5.00th=[ 9408], 10.00th=[10432], 20.00th=[10816],
     | 30.00th=[10944], 40.00th=[11072], 50.00th=[11200], 60.00th=[11328],
     | 70.00th=[11456], 80.00th=[11584], 90.00th=[12480], 95.00th=[19328],
     | 99.00th=[30336], 99.50th=[33024], 99.90th=[44288], 99.95th=[44800],
     | 99.99th=[50432]
    bw (KB  /s): min= 8431, max=11241, per=100.00%, avg=9876.58, stdev=453.15
    lat (usec) : 250=6.93%, 500=36.19%, 750=3.56%, 1000=0.17%
    lat (msec) : 2=0.19%, 4=1.19%, 10=4.42%, 20=44.42%, 50=2.91%
    lat (msec) : 100=0.01%
  cpu          : usr=0.68%, sys=0.99%, ctx=9001, majf=0, minf=87
  IO depths    : 1=0.1%, 2=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=4608/w=4625/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=2

Run status group 0 (all jobs):
  WRITE: io=296000KB, aggrb=9862KB/s, minb=9862KB/s, maxb=9862KB/s,
mint=30013msec, maxt=30013msec

Disk stats (read/write):
  sdb: ios=274/192, merge=0/0, ticks=305/2299, in_queue=2604, util=8.65%
  sdc: ios=225/192, merge=0/0, ticks=359/2232, in_queue=2591, util=8.61%
  sdd: ios=191/192, merge=0/0, ticks=222/2278, in_queue=2500, util=8.31%
  sde: ios=217/192, merge=0/0, ticks=509/2235, in_queue=2744, util=9.12%
  sdf: ios=191/192, merge=0/0, ticks=199/2209, in_queue=2408, util=8.01%
  sdg: ios=191/192, merge=0/0, ticks=390/2254, in_queue=2644, util=8.80%
  sdh: ios=191/192, merge=0/0, ticks=122/2339, in_queue=2461, util=8.19%
  sdi: ios=191/192, merge=0/0, ticks=123/2270, in_queue=2393, util=7.95%
  sdj: ios=191/191, merge=0/0, ticks=107/2188, in_queue=2302, util=7.64%
  sdk: ios=191/191, merge=0/0, ticks=82/2168, in_queue=2255, util=7.51%
  sdl: ios=191/191, merge=0/0, ticks=134/2322, in_queue=2456, util=8.17%
  sdm: ios=191/191, merge=0/0, ticks=131/2152, in_queue=2283, util=7.61%
  sdn: ios=191/191, merge=0/0, ticks=127/2288, in_queue=2415, util=8.05%
  sdo: ios=191/191, merge=0/0, ticks=134/2312, in_queue=2446, util=8.13%
  sdp: ios=191/191, merge=0/0, ticks=128/2258, in_queue=2386, util=7.95%
  sdq: ios=191/191, merge=0/0, ticks=292/2296, in_queue=2587, util=8.49%
  sdr: ios=191/191, merge=0/0, ticks=189/2341, in_queue=2530, util=8.27%
  sds: ios=191/191, merge=0/0, ticks=167/2379, in_queue=2545, util=8.47%
  sdt: ios=191/191, merge=0/0, ticks=297/2286, in_queue=2583, util=8.39%
  sdu: ios=191/191, merge=0/0, ticks=284/2182, in_queue=2465, util=8.15%
  sdv: ios=189/191, merge=0/0, ticks=191/2134, in_queue=2325, util=7.72%
  sdw: ios=189/191, merge=0/0, ticks=189/2337, in_queue=2526, util=8.36%
  sdx: ios=189/191, merge=0/0, ticks=122/2291, in_queue=2413, util=8.06%
  sdy: ios=189/191, merge=0/0, ticks=195/2164, in_queue=2359, util=7.86%

If I increase the iodepth to 8 it gets even worse. The Writes
and utilisation
 has increased, but the last few drives are not going through any verification:-

Disk stats (read/write):
  sdb: ios=970/899, merge=0/0, ticks=624/9107, in_queue=9736, util=32.25%
  sdc: ios=901/899, merge=0/0, ticks=622/8927, in_queue=9549, util=31.58%
  sdd: ios=899/899, merge=0/0, ticks=679/9151, in_queue=9831, util=32.60%
  sde: ios=899/899, merge=0/0, ticks=657/8991, in_queue=9652, util=32.02%
  sdf: ios=899/900, merge=0/0, ticks=591/9044, in_queue=9634, util=31.96%
  sdg: ios=899/899, merge=0/0, ticks=543/9260, in_queue=9805, util=32.56%
  sdh: ios=899/899, merge=0/0, ticks=547/9435, in_queue=9982, util=32.82%
  sdi: ios=899/899, merge=0/0, ticks=488/9365, in_queue=9853, util=32.68%
  sdj: ios=899/899, merge=0/0, ticks=422/9349, in_queue=9771, util=32.46%
  sdk: ios=899/899, merge=0/0, ticks=577/9414, in_queue=9991, util=33.11%
  sdl: ios=899/899, merge=0/0, ticks=507/9199, in_queue=9703, util=32.19%
  sdm: ios=899/899, merge=0/0, ticks=495/9125, in_queue=9619, util=31.92%
  sdn: ios=899/899, merge=0/0, ticks=485/9103, in_queue=9587, util=31.82%
  sdo: ios=899/899, merge=0/0, ticks=501/9240, in_queue=9741, util=32.23%
  sdp: ios=899/899, merge=0/0, ticks=477/9110, in_queue=9586, util=31.76%
  sdq: ios=899/899, merge=0/0, ticks=557/9169, in_queue=9726, util=32.11%
  sdr: ios=899/899, merge=0/0, ticks=545/11115, in_queue=11656, util=35.65%
  sds: ios=899/899, merge=0/0, ticks=1266/11542, in_queue=12808, util=36.97%
  sdt: ios=330/899, merge=0/0, ticks=363/9283, in_queue=9644, util=31.43%
  sdu: ios=0/899, merge=0/0, ticks=0/9211, in_queue=9209, util=30.77%
  sdv: ios=0/899, merge=0/0, ticks=0/9232, in_queue=9231, util=30.86%
  sdw: ios=0/899, merge=0/0, ticks=0/9470, in_queue=9470, util=31.68%
  sdx: ios=0/899, merge=0/0, ticks=0/9342, in_queue=9342, util=31.26%
  sdy: ios=0/898, merge=0/0, ticks=0/9174, in_queue=9181, util=30.74%

Have I missed an option or have the wrong one in my config file that
could be causing this?

Thanks,

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Verify concentrated on first few drives
  2015-06-17 14:45 Verify concentrated on first few drives Gavin Martin
@ 2015-06-20 16:53 ` Jens Axboe
  2015-06-20 17:01   ` Jens Axboe
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2015-06-20 16:53 UTC (permalink / raw)
  To: Gavin Martin, fio

On 06/17/2015 10:45 AM, Gavin Martin wrote:
> Hi,
>
> Running FIO (2.2.8) with the verify option I see that it seems to
> concentrate the Reads (from the Verify) on the first few drives and
> tails off as you run more drives, as reported in the Disk Stats.
>
> Is this normal? Does the Verify follow the Round-Robin approach of
> normal workloads?
>
> Config file:-
> [global]
> ioengine=libaio
> iodepth=2
> direct=1
> runtime=30s
> time_based
> bs=64K
> verify=meta
> verify_backlog=32
> verify_state_save=0
> stonewall
> norandommap
> atomic=1
>
> filename=/dev/sdb
> filename=/dev/sdc
> filename=/dev/sdd
> filename=/dev/sde
> filename=/dev/sdf
> filename=/dev/sdg
> filename=/dev/sdh
> filename=/dev/sdi
> filename=/dev/sdj
> filename=/dev/sdk
> filename=/dev/sdl
> filename=/dev/sdm
> filename=/dev/sdn
> filename=/dev/sdo
> filename=/dev/sdp
> filename=/dev/sdq
> filename=/dev/sdr
> filename=/dev/sds
> filename=/dev/sdt
> filename=/dev/sdu
> filename=/dev/sdv
> filename=/dev/sdw
> filename=/dev/sdx
> filename=/dev/sdy
>
> [RUN01]
> rw=randwrite
> do_verify=1
>
> Output:-
> RUN01: (g=0): rw=randwrite, bs=64K-64K/64K-64K/64K-64K,
> ioengine=libaio, iodepth=2
> fio-2.2.8
> Starting 1 process
> Jobs: 1 (f=24): [w(1)] [100.0% done] [10199KB/9816KB/0KB /s]
> [159/153/0 iops] [eta 00m:00s]
> RUN01: (groupid=0, jobs=1): err= 0: pid=22749: Wed Jun 17 10:28:23 2015
>    write: io=296000KB, bw=9862.5KB/s, iops=154, runt= 30013msec
>      slat (usec): min=10, max=86, avg=32.74, stdev= 8.18
>      clat (msec): min=1, max=50, avg=11.80, stdev= 3.94
>       lat (msec): min=2, max=50, avg=11.83, stdev= 3.94
>      clat percentiles (usec):
>       |  1.00th=[ 3120],  5.00th=[ 9408], 10.00th=[10432], 20.00th=[10816],
>       | 30.00th=[10944], 40.00th=[11072], 50.00th=[11200], 60.00th=[11328],
>       | 70.00th=[11456], 80.00th=[11584], 90.00th=[12480], 95.00th=[19328],
>       | 99.00th=[30336], 99.50th=[33024], 99.90th=[44288], 99.95th=[44800],
>       | 99.99th=[50432]
>      bw (KB  /s): min= 8431, max=11241, per=100.00%, avg=9876.58, stdev=453.15
>      lat (usec) : 250=6.93%, 500=36.19%, 750=3.56%, 1000=0.17%
>      lat (msec) : 2=0.19%, 4=1.19%, 10=4.42%, 20=44.42%, 50=2.91%
>      lat (msec) : 100=0.01%
>    cpu          : usr=0.68%, sys=0.99%, ctx=9001, majf=0, minf=87
>    IO depths    : 1=0.1%, 2=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>       issued    : total=r=4608/w=4625/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
>       latency   : target=0, window=0, percentile=100.00%, depth=2
>
> Run status group 0 (all jobs):
>    WRITE: io=296000KB, aggrb=9862KB/s, minb=9862KB/s, maxb=9862KB/s,
> mint=30013msec, maxt=30013msec
>
> Disk stats (read/write):
>    sdb: ios=274/192, merge=0/0, ticks=305/2299, in_queue=2604, util=8.65%
>    sdc: ios=225/192, merge=0/0, ticks=359/2232, in_queue=2591, util=8.61%
>    sdd: ios=191/192, merge=0/0, ticks=222/2278, in_queue=2500, util=8.31%
>    sde: ios=217/192, merge=0/0, ticks=509/2235, in_queue=2744, util=9.12%
>    sdf: ios=191/192, merge=0/0, ticks=199/2209, in_queue=2408, util=8.01%
>    sdg: ios=191/192, merge=0/0, ticks=390/2254, in_queue=2644, util=8.80%
>    sdh: ios=191/192, merge=0/0, ticks=122/2339, in_queue=2461, util=8.19%
>    sdi: ios=191/192, merge=0/0, ticks=123/2270, in_queue=2393, util=7.95%
>    sdj: ios=191/191, merge=0/0, ticks=107/2188, in_queue=2302, util=7.64%
>    sdk: ios=191/191, merge=0/0, ticks=82/2168, in_queue=2255, util=7.51%
>    sdl: ios=191/191, merge=0/0, ticks=134/2322, in_queue=2456, util=8.17%
>    sdm: ios=191/191, merge=0/0, ticks=131/2152, in_queue=2283, util=7.61%
>    sdn: ios=191/191, merge=0/0, ticks=127/2288, in_queue=2415, util=8.05%
>    sdo: ios=191/191, merge=0/0, ticks=134/2312, in_queue=2446, util=8.13%
>    sdp: ios=191/191, merge=0/0, ticks=128/2258, in_queue=2386, util=7.95%
>    sdq: ios=191/191, merge=0/0, ticks=292/2296, in_queue=2587, util=8.49%
>    sdr: ios=191/191, merge=0/0, ticks=189/2341, in_queue=2530, util=8.27%
>    sds: ios=191/191, merge=0/0, ticks=167/2379, in_queue=2545, util=8.47%
>    sdt: ios=191/191, merge=0/0, ticks=297/2286, in_queue=2583, util=8.39%
>    sdu: ios=191/191, merge=0/0, ticks=284/2182, in_queue=2465, util=8.15%
>    sdv: ios=189/191, merge=0/0, ticks=191/2134, in_queue=2325, util=7.72%
>    sdw: ios=189/191, merge=0/0, ticks=189/2337, in_queue=2526, util=8.36%
>    sdx: ios=189/191, merge=0/0, ticks=122/2291, in_queue=2413, util=8.06%
>    sdy: ios=189/191, merge=0/0, ticks=195/2164, in_queue=2359, util=7.86%
>
> If I increase the iodepth to 8 it gets even worse. The Writes
> and utilisation
>   has increased, but the last few drives are not going through any verification:-
>
> Disk stats (read/write):
>    sdb: ios=970/899, merge=0/0, ticks=624/9107, in_queue=9736, util=32.25%
>    sdc: ios=901/899, merge=0/0, ticks=622/8927, in_queue=9549, util=31.58%
>    sdd: ios=899/899, merge=0/0, ticks=679/9151, in_queue=9831, util=32.60%
>    sde: ios=899/899, merge=0/0, ticks=657/8991, in_queue=9652, util=32.02%
>    sdf: ios=899/900, merge=0/0, ticks=591/9044, in_queue=9634, util=31.96%
>    sdg: ios=899/899, merge=0/0, ticks=543/9260, in_queue=9805, util=32.56%
>    sdh: ios=899/899, merge=0/0, ticks=547/9435, in_queue=9982, util=32.82%
>    sdi: ios=899/899, merge=0/0, ticks=488/9365, in_queue=9853, util=32.68%
>    sdj: ios=899/899, merge=0/0, ticks=422/9349, in_queue=9771, util=32.46%
>    sdk: ios=899/899, merge=0/0, ticks=577/9414, in_queue=9991, util=33.11%
>    sdl: ios=899/899, merge=0/0, ticks=507/9199, in_queue=9703, util=32.19%
>    sdm: ios=899/899, merge=0/0, ticks=495/9125, in_queue=9619, util=31.92%
>    sdn: ios=899/899, merge=0/0, ticks=485/9103, in_queue=9587, util=31.82%
>    sdo: ios=899/899, merge=0/0, ticks=501/9240, in_queue=9741, util=32.23%
>    sdp: ios=899/899, merge=0/0, ticks=477/9110, in_queue=9586, util=31.76%
>    sdq: ios=899/899, merge=0/0, ticks=557/9169, in_queue=9726, util=32.11%
>    sdr: ios=899/899, merge=0/0, ticks=545/11115, in_queue=11656, util=35.65%
>    sds: ios=899/899, merge=0/0, ticks=1266/11542, in_queue=12808, util=36.97%
>    sdt: ios=330/899, merge=0/0, ticks=363/9283, in_queue=9644, util=31.43%
>    sdu: ios=0/899, merge=0/0, ticks=0/9211, in_queue=9209, util=30.77%
>    sdv: ios=0/899, merge=0/0, ticks=0/9232, in_queue=9231, util=30.86%
>    sdw: ios=0/899, merge=0/0, ticks=0/9470, in_queue=9470, util=31.68%
>    sdx: ios=0/899, merge=0/0, ticks=0/9342, in_queue=9342, util=31.26%
>    sdy: ios=0/898, merge=0/0, ticks=0/9174, in_queue=9181, util=30.74%
>
> Have I missed an option or have the wrong one in my config file that
> could be causing this?

That does look very odd. I'll take a look at this. As a temporary 
work-around, you could have separate jobs for each of the devices. Or 
try using your original job file, but add:

verifysort=0

I suspect the latter would also fix it for you.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Verify concentrated on first few drives
  2015-06-20 16:53 ` Jens Axboe
@ 2015-06-20 17:01   ` Jens Axboe
  2015-06-22  8:41     ` Gavin Martin
  0 siblings, 1 reply; 4+ messages in thread
From: Jens Axboe @ 2015-06-20 17:01 UTC (permalink / raw)
  To: Gavin Martin, fio

On 06/20/2015 12:53 PM, Jens Axboe wrote:
> On 06/17/2015 10:45 AM, Gavin Martin wrote:
>> Hi,
>>
>> Running FIO (2.2.8) with the verify option I see that it seems to
>> concentrate the Reads (from the Verify) on the first few drives and
>> tails off as you run more drives, as reported in the Disk Stats.
>>
>> Is this normal? Does the Verify follow the Round-Robin approach of
>> normal workloads?
>>
>> Config file:-
>> [global]
>> ioengine=libaio
>> iodepth=2
>> direct=1
>> runtime=30s
>> time_based
>> bs=64K
>> verify=meta
>> verify_backlog=32
>> verify_state_save=0
>> stonewall
>> norandommap
>> atomic=1
>>
>> filename=/dev/sdb
>> filename=/dev/sdc
>> filename=/dev/sdd
>> filename=/dev/sde
>> filename=/dev/sdf
>> filename=/dev/sdg
>> filename=/dev/sdh
>> filename=/dev/sdi
>> filename=/dev/sdj
>> filename=/dev/sdk
>> filename=/dev/sdl
>> filename=/dev/sdm
>> filename=/dev/sdn
>> filename=/dev/sdo
>> filename=/dev/sdp
>> filename=/dev/sdq
>> filename=/dev/sdr
>> filename=/dev/sds
>> filename=/dev/sdt
>> filename=/dev/sdu
>> filename=/dev/sdv
>> filename=/dev/sdw
>> filename=/dev/sdx
>> filename=/dev/sdy
>>
>> [RUN01]
>> rw=randwrite
>> do_verify=1
>>
>> Output:-
>> RUN01: (g=0): rw=randwrite, bs=64K-64K/64K-64K/64K-64K,
>> ioengine=libaio, iodepth=2
>> fio-2.2.8
>> Starting 1 process
>> Jobs: 1 (f=24): [w(1)] [100.0% done] [10199KB/9816KB/0KB /s]
>> [159/153/0 iops] [eta 00m:00s]
>> RUN01: (groupid=0, jobs=1): err= 0: pid=22749: Wed Jun 17 10:28:23 2015
>>    write: io=296000KB, bw=9862.5KB/s, iops=154, runt= 30013msec
>>      slat (usec): min=10, max=86, avg=32.74, stdev= 8.18
>>      clat (msec): min=1, max=50, avg=11.80, stdev= 3.94
>>       lat (msec): min=2, max=50, avg=11.83, stdev= 3.94
>>      clat percentiles (usec):
>>       |  1.00th=[ 3120],  5.00th=[ 9408], 10.00th=[10432],
>> 20.00th=[10816],
>>       | 30.00th=[10944], 40.00th=[11072], 50.00th=[11200],
>> 60.00th=[11328],
>>       | 70.00th=[11456], 80.00th=[11584], 90.00th=[12480],
>> 95.00th=[19328],
>>       | 99.00th=[30336], 99.50th=[33024], 99.90th=[44288],
>> 99.95th=[44800],
>>       | 99.99th=[50432]
>>      bw (KB  /s): min= 8431, max=11241, per=100.00%, avg=9876.58,
>> stdev=453.15
>>      lat (usec) : 250=6.93%, 500=36.19%, 750=3.56%, 1000=0.17%
>>      lat (msec) : 2=0.19%, 4=1.19%, 10=4.42%, 20=44.42%, 50=2.91%
>>      lat (msec) : 100=0.01%
>>    cpu          : usr=0.68%, sys=0.99%, ctx=9001, majf=0, minf=87
>>    IO depths    : 1=0.1%, 2=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>> >=64=0.0%
>>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>> >=64=0.0%
>>       issued    : total=r=4608/w=4625/d=0, short=r=0/w=0/d=0,
>> drop=r=0/w=0/d=0
>>       latency   : target=0, window=0, percentile=100.00%, depth=2
>>
>> Run status group 0 (all jobs):
>>    WRITE: io=296000KB, aggrb=9862KB/s, minb=9862KB/s, maxb=9862KB/s,
>> mint=30013msec, maxt=30013msec
>>
>> Disk stats (read/write):
>>    sdb: ios=274/192, merge=0/0, ticks=305/2299, in_queue=2604, util=8.65%
>>    sdc: ios=225/192, merge=0/0, ticks=359/2232, in_queue=2591, util=8.61%
>>    sdd: ios=191/192, merge=0/0, ticks=222/2278, in_queue=2500, util=8.31%
>>    sde: ios=217/192, merge=0/0, ticks=509/2235, in_queue=2744, util=9.12%
>>    sdf: ios=191/192, merge=0/0, ticks=199/2209, in_queue=2408, util=8.01%
>>    sdg: ios=191/192, merge=0/0, ticks=390/2254, in_queue=2644, util=8.80%
>>    sdh: ios=191/192, merge=0/0, ticks=122/2339, in_queue=2461, util=8.19%
>>    sdi: ios=191/192, merge=0/0, ticks=123/2270, in_queue=2393, util=7.95%
>>    sdj: ios=191/191, merge=0/0, ticks=107/2188, in_queue=2302, util=7.64%
>>    sdk: ios=191/191, merge=0/0, ticks=82/2168, in_queue=2255, util=7.51%
>>    sdl: ios=191/191, merge=0/0, ticks=134/2322, in_queue=2456, util=8.17%
>>    sdm: ios=191/191, merge=0/0, ticks=131/2152, in_queue=2283, util=7.61%
>>    sdn: ios=191/191, merge=0/0, ticks=127/2288, in_queue=2415, util=8.05%
>>    sdo: ios=191/191, merge=0/0, ticks=134/2312, in_queue=2446, util=8.13%
>>    sdp: ios=191/191, merge=0/0, ticks=128/2258, in_queue=2386, util=7.95%
>>    sdq: ios=191/191, merge=0/0, ticks=292/2296, in_queue=2587, util=8.49%
>>    sdr: ios=191/191, merge=0/0, ticks=189/2341, in_queue=2530, util=8.27%
>>    sds: ios=191/191, merge=0/0, ticks=167/2379, in_queue=2545, util=8.47%
>>    sdt: ios=191/191, merge=0/0, ticks=297/2286, in_queue=2583, util=8.39%
>>    sdu: ios=191/191, merge=0/0, ticks=284/2182, in_queue=2465, util=8.15%
>>    sdv: ios=189/191, merge=0/0, ticks=191/2134, in_queue=2325, util=7.72%
>>    sdw: ios=189/191, merge=0/0, ticks=189/2337, in_queue=2526, util=8.36%
>>    sdx: ios=189/191, merge=0/0, ticks=122/2291, in_queue=2413, util=8.06%
>>    sdy: ios=189/191, merge=0/0, ticks=195/2164, in_queue=2359, util=7.86%
>>
>> If I increase the iodepth to 8 it gets even worse. The Writes
>> and utilisation
>>   has increased, but the last few drives are not going through any
>> verification:-
>>
>> Disk stats (read/write):
>>    sdb: ios=970/899, merge=0/0, ticks=624/9107, in_queue=9736,
>> util=32.25%
>>    sdc: ios=901/899, merge=0/0, ticks=622/8927, in_queue=9549,
>> util=31.58%
>>    sdd: ios=899/899, merge=0/0, ticks=679/9151, in_queue=9831,
>> util=32.60%
>>    sde: ios=899/899, merge=0/0, ticks=657/8991, in_queue=9652,
>> util=32.02%
>>    sdf: ios=899/900, merge=0/0, ticks=591/9044, in_queue=9634,
>> util=31.96%
>>    sdg: ios=899/899, merge=0/0, ticks=543/9260, in_queue=9805,
>> util=32.56%
>>    sdh: ios=899/899, merge=0/0, ticks=547/9435, in_queue=9982,
>> util=32.82%
>>    sdi: ios=899/899, merge=0/0, ticks=488/9365, in_queue=9853,
>> util=32.68%
>>    sdj: ios=899/899, merge=0/0, ticks=422/9349, in_queue=9771,
>> util=32.46%
>>    sdk: ios=899/899, merge=0/0, ticks=577/9414, in_queue=9991,
>> util=33.11%
>>    sdl: ios=899/899, merge=0/0, ticks=507/9199, in_queue=9703,
>> util=32.19%
>>    sdm: ios=899/899, merge=0/0, ticks=495/9125, in_queue=9619,
>> util=31.92%
>>    sdn: ios=899/899, merge=0/0, ticks=485/9103, in_queue=9587,
>> util=31.82%
>>    sdo: ios=899/899, merge=0/0, ticks=501/9240, in_queue=9741,
>> util=32.23%
>>    sdp: ios=899/899, merge=0/0, ticks=477/9110, in_queue=9586,
>> util=31.76%
>>    sdq: ios=899/899, merge=0/0, ticks=557/9169, in_queue=9726,
>> util=32.11%
>>    sdr: ios=899/899, merge=0/0, ticks=545/11115, in_queue=11656,
>> util=35.65%
>>    sds: ios=899/899, merge=0/0, ticks=1266/11542, in_queue=12808,
>> util=36.97%
>>    sdt: ios=330/899, merge=0/0, ticks=363/9283, in_queue=9644,
>> util=31.43%
>>    sdu: ios=0/899, merge=0/0, ticks=0/9211, in_queue=9209, util=30.77%
>>    sdv: ios=0/899, merge=0/0, ticks=0/9232, in_queue=9231, util=30.86%
>>    sdw: ios=0/899, merge=0/0, ticks=0/9470, in_queue=9470, util=31.68%
>>    sdx: ios=0/899, merge=0/0, ticks=0/9342, in_queue=9342, util=31.26%
>>    sdy: ios=0/898, merge=0/0, ticks=0/9174, in_queue=9181, util=30.74%
>>
>> Have I missed an option or have the wrong one in my config file that
>> could be causing this?
>
> That does look very odd. I'll take a look at this. As a temporary
> work-around, you could have separate jobs for each of the devices. Or
> try using your original job file, but add:
>
> verifysort=0
>
> I suspect the latter would also fix it for you.

I looked at the code, and it is an artifact of how the verify sorting 
works. It has a builtin check for overlaps, and includes the file in the 
sorting. This means that it will favor the starting files when you do 
verify like this. If the job ran to completion, you would get full 
verification. Setting verifysort=0 should make the time limited backlog 
verification more fair for you, since it'll turn the backlog into FIFO 
behavior.

-- 
Jens Axboe



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Verify concentrated on first few drives
  2015-06-20 17:01   ` Jens Axboe
@ 2015-06-22  8:41     ` Gavin Martin
  0 siblings, 0 replies; 4+ messages in thread
From: Gavin Martin @ 2015-06-22  8:41 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio

On 20 June 2015 at 18:01, Jens Axboe <axboe@kernel.dk> wrote:
> On 06/20/2015 12:53 PM, Jens Axboe wrote:
>>
>> On 06/17/2015 10:45 AM, Gavin Martin wrote:
>>>
>>> Hi,
>>>
>>> Running FIO (2.2.8) with the verify option I see that it seems to
>>> concentrate the Reads (from the Verify) on the first few drives and
>>> tails off as you run more drives, as reported in the Disk Stats.
>>>
>>> Is this normal? Does the Verify follow the Round-Robin approach of
>>> normal workloads?
>>>
>>> Config file:-
>>> [global]
>>> ioengine=libaio
>>> iodepth=2
>>> direct=1
>>> runtime=30s
>>> time_based
>>> bs=64K
>>> verify=meta
>>> verify_backlog=32
>>> verify_state_save=0
>>> stonewall
>>> norandommap
>>> atomic=1
>>>
>>> filename=/dev/sdb
>>> filename=/dev/sdc
>>> filename=/dev/sdd
>>> filename=/dev/sde
>>> filename=/dev/sdf
>>> filename=/dev/sdg
>>> filename=/dev/sdh
>>> filename=/dev/sdi
>>> filename=/dev/sdj
>>> filename=/dev/sdk
>>> filename=/dev/sdl
>>> filename=/dev/sdm
>>> filename=/dev/sdn
>>> filename=/dev/sdo
>>> filename=/dev/sdp
>>> filename=/dev/sdq
>>> filename=/dev/sdr
>>> filename=/dev/sds
>>> filename=/dev/sdt
>>> filename=/dev/sdu
>>> filename=/dev/sdv
>>> filename=/dev/sdw
>>> filename=/dev/sdx
>>> filename=/dev/sdy
>>>
>>> [RUN01]
>>> rw=randwrite
>>> do_verify=1
>>>
>>> Output:-
>>> RUN01: (g=0): rw=randwrite, bs=64K-64K/64K-64K/64K-64K,
>>> ioengine=libaio, iodepth=2
>>> fio-2.2.8
>>> Starting 1 process
>>> Jobs: 1 (f=24): [w(1)] [100.0% done] [10199KB/9816KB/0KB /s]
>>> [159/153/0 iops] [eta 00m:00s]
>>> RUN01: (groupid=0, jobs=1): err= 0: pid=22749: Wed Jun 17 10:28:23 2015
>>>    write: io=296000KB, bw=9862.5KB/s, iops=154, runt= 30013msec
>>>      slat (usec): min=10, max=86, avg=32.74, stdev= 8.18
>>>      clat (msec): min=1, max=50, avg=11.80, stdev= 3.94
>>>       lat (msec): min=2, max=50, avg=11.83, stdev= 3.94
>>>      clat percentiles (usec):
>>>       |  1.00th=[ 3120],  5.00th=[ 9408], 10.00th=[10432],
>>> 20.00th=[10816],
>>>       | 30.00th=[10944], 40.00th=[11072], 50.00th=[11200],
>>> 60.00th=[11328],
>>>       | 70.00th=[11456], 80.00th=[11584], 90.00th=[12480],
>>> 95.00th=[19328],
>>>       | 99.00th=[30336], 99.50th=[33024], 99.90th=[44288],
>>> 99.95th=[44800],
>>>       | 99.99th=[50432]
>>>      bw (KB  /s): min= 8431, max=11241, per=100.00%, avg=9876.58,
>>> stdev=453.15
>>>      lat (usec) : 250=6.93%, 500=36.19%, 750=3.56%, 1000=0.17%
>>>      lat (msec) : 2=0.19%, 4=1.19%, 10=4.42%, 20=44.42%, 50=2.91%
>>>      lat (msec) : 100=0.01%
>>>    cpu          : usr=0.68%, sys=0.99%, ctx=9001, majf=0, minf=87
>>>    IO depths    : 1=0.1%, 2=100.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>>> >=64=0.0%
>>>       submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> >=64=0.0%
>>>       complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>>> >=64=0.0%
>>>       issued    : total=r=4608/w=4625/d=0, short=r=0/w=0/d=0,
>>> drop=r=0/w=0/d=0
>>>       latency   : target=0, window=0, percentile=100.00%, depth=2
>>>
>>> Run status group 0 (all jobs):
>>>    WRITE: io=296000KB, aggrb=9862KB/s, minb=9862KB/s, maxb=9862KB/s,
>>> mint=30013msec, maxt=30013msec
>>>
>>> Disk stats (read/write):
>>>    sdb: ios=274/192, merge=0/0, ticks=305/2299, in_queue=2604, util=8.65%
>>>    sdc: ios=225/192, merge=0/0, ticks=359/2232, in_queue=2591, util=8.61%
>>>    sdd: ios=191/192, merge=0/0, ticks=222/2278, in_queue=2500, util=8.31%
>>>    sde: ios=217/192, merge=0/0, ticks=509/2235, in_queue=2744, util=9.12%
>>>    sdf: ios=191/192, merge=0/0, ticks=199/2209, in_queue=2408, util=8.01%
>>>    sdg: ios=191/192, merge=0/0, ticks=390/2254, in_queue=2644, util=8.80%
>>>    sdh: ios=191/192, merge=0/0, ticks=122/2339, in_queue=2461, util=8.19%
>>>    sdi: ios=191/192, merge=0/0, ticks=123/2270, in_queue=2393, util=7.95%
>>>    sdj: ios=191/191, merge=0/0, ticks=107/2188, in_queue=2302, util=7.64%
>>>    sdk: ios=191/191, merge=0/0, ticks=82/2168, in_queue=2255, util=7.51%
>>>    sdl: ios=191/191, merge=0/0, ticks=134/2322, in_queue=2456, util=8.17%
>>>    sdm: ios=191/191, merge=0/0, ticks=131/2152, in_queue=2283, util=7.61%
>>>    sdn: ios=191/191, merge=0/0, ticks=127/2288, in_queue=2415, util=8.05%
>>>    sdo: ios=191/191, merge=0/0, ticks=134/2312, in_queue=2446, util=8.13%
>>>    sdp: ios=191/191, merge=0/0, ticks=128/2258, in_queue=2386, util=7.95%
>>>    sdq: ios=191/191, merge=0/0, ticks=292/2296, in_queue=2587, util=8.49%
>>>    sdr: ios=191/191, merge=0/0, ticks=189/2341, in_queue=2530, util=8.27%
>>>    sds: ios=191/191, merge=0/0, ticks=167/2379, in_queue=2545, util=8.47%
>>>    sdt: ios=191/191, merge=0/0, ticks=297/2286, in_queue=2583, util=8.39%
>>>    sdu: ios=191/191, merge=0/0, ticks=284/2182, in_queue=2465, util=8.15%
>>>    sdv: ios=189/191, merge=0/0, ticks=191/2134, in_queue=2325, util=7.72%
>>>    sdw: ios=189/191, merge=0/0, ticks=189/2337, in_queue=2526, util=8.36%
>>>    sdx: ios=189/191, merge=0/0, ticks=122/2291, in_queue=2413, util=8.06%
>>>    sdy: ios=189/191, merge=0/0, ticks=195/2164, in_queue=2359, util=7.86%
>>>
>>> If I increase the iodepth to 8 it gets even worse. The Writes
>>> and utilisation
>>>   has increased, but the last few drives are not going through any
>>> verification:-
>>>
>>> Disk stats (read/write):
>>>    sdb: ios=970/899, merge=0/0, ticks=624/9107, in_queue=9736,
>>> util=32.25%
>>>    sdc: ios=901/899, merge=0/0, ticks=622/8927, in_queue=9549,
>>> util=31.58%
>>>    sdd: ios=899/899, merge=0/0, ticks=679/9151, in_queue=9831,
>>> util=32.60%
>>>    sde: ios=899/899, merge=0/0, ticks=657/8991, in_queue=9652,
>>> util=32.02%
>>>    sdf: ios=899/900, merge=0/0, ticks=591/9044, in_queue=9634,
>>> util=31.96%
>>>    sdg: ios=899/899, merge=0/0, ticks=543/9260, in_queue=9805,
>>> util=32.56%
>>>    sdh: ios=899/899, merge=0/0, ticks=547/9435, in_queue=9982,
>>> util=32.82%
>>>    sdi: ios=899/899, merge=0/0, ticks=488/9365, in_queue=9853,
>>> util=32.68%
>>>    sdj: ios=899/899, merge=0/0, ticks=422/9349, in_queue=9771,
>>> util=32.46%
>>>    sdk: ios=899/899, merge=0/0, ticks=577/9414, in_queue=9991,
>>> util=33.11%
>>>    sdl: ios=899/899, merge=0/0, ticks=507/9199, in_queue=9703,
>>> util=32.19%
>>>    sdm: ios=899/899, merge=0/0, ticks=495/9125, in_queue=9619,
>>> util=31.92%
>>>    sdn: ios=899/899, merge=0/0, ticks=485/9103, in_queue=9587,
>>> util=31.82%
>>>    sdo: ios=899/899, merge=0/0, ticks=501/9240, in_queue=9741,
>>> util=32.23%
>>>    sdp: ios=899/899, merge=0/0, ticks=477/9110, in_queue=9586,
>>> util=31.76%
>>>    sdq: ios=899/899, merge=0/0, ticks=557/9169, in_queue=9726,
>>> util=32.11%
>>>    sdr: ios=899/899, merge=0/0, ticks=545/11115, in_queue=11656,
>>> util=35.65%
>>>    sds: ios=899/899, merge=0/0, ticks=1266/11542, in_queue=12808,
>>> util=36.97%
>>>    sdt: ios=330/899, merge=0/0, ticks=363/9283, in_queue=9644,
>>> util=31.43%
>>>    sdu: ios=0/899, merge=0/0, ticks=0/9211, in_queue=9209, util=30.77%
>>>    sdv: ios=0/899, merge=0/0, ticks=0/9232, in_queue=9231, util=30.86%
>>>    sdw: ios=0/899, merge=0/0, ticks=0/9470, in_queue=9470, util=31.68%
>>>    sdx: ios=0/899, merge=0/0, ticks=0/9342, in_queue=9342, util=31.26%
>>>    sdy: ios=0/898, merge=0/0, ticks=0/9174, in_queue=9181, util=30.74%
>>>
>>> Have I missed an option or have the wrong one in my config file that
>>> could be causing this?
>>
>>
>> That does look very odd. I'll take a look at this. As a temporary
>> work-around, you could have separate jobs for each of the devices. Or
>> try using your original job file, but add:
>>
>> verifysort=0
>>
>> I suspect the latter would also fix it for you.
>
>
> I looked at the code, and it is an artifact of how the verify sorting works.
> It has a builtin check for overlaps, and includes the file in the sorting.
> This means that it will favor the starting files when you do verify like
> this. If the job ran to completion, you would get full verification. Setting
> verifysort=0 should make the time limited backlog verification more fair for
> you, since it'll turn the backlog into FIFO behavior.
>
> --
> Jens Axboe
>

Hi Jens,

Thanks for the reply and the suggestion of using verifysort.

I gave it a try and unfortunately it did not make much difference for me;

Disk stats (read/write):
  sdb: ios=971/904, merge=0/0, ticks=602/9354, in_queue=9954, util=33.02%
  sdc: ios=904/904, merge=0/0, ticks=722/8983, in_queue=9704, util=32.18%
  sdd: ios=903/903, merge=0/0, ticks=678/9246, in_queue=9933, util=32.96%
  sde: ios=903/904, merge=0/0, ticks=610/9269, in_queue=9878, util=32.65%
  sdf: ios=903/904, merge=0/0, ticks=619/9087, in_queue=9703, util=32.19%
  sdg: ios=903/904, merge=0/0, ticks=677/9221, in_queue=9896, util=32.73%
  sdh: ios=903/903, merge=0/0, ticks=545/9333, in_queue=9883, util=32.83%
  sdi: ios=903/903, merge=0/0, ticks=489/9217, in_queue=9710, util=32.26%
  sdj: ios=903/903, merge=0/0, ticks=494/9556, in_queue=10053, util=33.42%
  sdk: ios=903/903, merge=0/0, ticks=482/9509, in_queue=9991, util=33.15%
  sdl: ios=903/903, merge=0/0, ticks=489/9347, in_queue=9836, util=32.64%
  sdm: ios=903/903, merge=0/0, ticks=467/9434, in_queue=9902, util=32.77%
  sdn: ios=903/903, merge=0/0, ticks=541/9252, in_queue=9792, util=32.46%
  sdo: ios=903/903, merge=0/0, ticks=434/9298, in_queue=9729, util=32.28%
  sdp: ios=903/903, merge=0/0, ticks=542/9200, in_queue=9741, util=32.07%
  sdq: ios=903/903, merge=0/0, ticks=519/9233, in_queue=9751, util=32.16%
  sdr: ios=895/903, merge=0/0, ticks=550/9814, in_queue=10363, util=33.86%
  sds: ios=892/903, merge=0/0, ticks=1371/10481, in_queue=11851, util=35.37%
  sdt: ios=769/903, merge=0/0, ticks=642/9403, in_queue=10044, util=32.17%
  sdu: ios=0/903, merge=0/0, ticks=0/9371, in_queue=9370, util=31.29%
  sdv: ios=0/903, merge=0/0, ticks=0/9333, in_queue=9333, util=31.22%
  sdw: ios=0/903, merge=0/0, ticks=0/9553, in_queue=9553, util=31.97%
  sdx: ios=0/903, merge=0/0, ticks=0/9441, in_queue=9440, util=31.60%
  sdy: ios=0/903, merge=0/0, ticks=0/9271, in_queue=9270, util=30.94%

The last 5 drives get ignored for reads. I'll look at separate jobs
for each drive, the problem here is that I would like the script to be
as generic as possible as we use it in different scenarios with a
number of different drive configurations, and the original script
contains a number of different jobs that it runs on all the drives, I
just broke it down to demonstrate the issue.

I had a play with a couple of other settings in the script and found
that if I increase the verify_backlog then it does seem to be a bit
more fair with the read commands. Increasing it from 32 to 256, and I
get the following results;

Disk stats (read/write):
  sdb: ios=921/866, merge=0/0, ticks=577/9022, in_queue=9599, util=31.26%
  sdc: ios=933/866, merge=0/0, ticks=726/8590, in_queue=9315, util=29.76%
  sdd: ios=864/866, merge=0/0, ticks=803/8870, in_queue=9671, util=30.72%
  sde: ios=864/866, merge=0/0, ticks=837/8677, in_queue=9510, util=30.08%
  sdf: ios=864/866, merge=0/0, ticks=828/8827, in_queue=9654, util=30.62%
  sdg: ios=864/866, merge=0/0, ticks=777/8956, in_queue=9731, util=30.87%
  sdh: ios=864/866, merge=0/0, ticks=832/9123, in_queue=9955, util=31.63%
  sdi: ios=864/866, merge=0/0, ticks=825/9255, in_queue=10077, util=31.86%
  sdj: ios=864/865, merge=0/0, ticks=832/9147, in_queue=9995, util=31.70%
  sdk: ios=864/866, merge=0/0, ticks=776/9151, in_queue=9926, util=31.60%
  sdl: ios=864/865, merge=0/0, ticks=842/9018, in_queue=9875, util=31.20%
  sdm: ios=864/866, merge=0/0, ticks=762/8915, in_queue=9676, util=30.67%
  sdn: ios=864/866, merge=0/0, ticks=743/9165, in_queue=9907, util=31.42%
  sdo: ios=864/866, merge=0/0, ticks=806/9206, in_queue=10009, util=31.49%
  sdp: ios=864/866, merge=0/0, ticks=737/8948, in_queue=9684, util=30.70%
  sdq: ios=864/865, merge=0/0, ticks=794/8985, in_queue=9789, util=31.03%
  sdr: ios=864/865, merge=0/0, ticks=640/9645, in_queue=10294, util=32.80%
  sds: ios=864/865, merge=0/0, ticks=696/10969, in_queue=11671, util=34.56%
  sdt: ios=864/865, merge=0/0, ticks=662/9036, in_queue=9703, util=30.79%
  sdu: ios=864/865, merge=0/0, ticks=660/8930, in_queue=9589, util=30.43%
  sdv: ios=864/865, merge=0/0, ticks=712/9084, in_queue=9796, util=31.00%
  sdw: ios=864/865, merge=0/0, ticks=681/9148, in_queue=9828, util=31.18%
  sdx: ios=864/865, merge=0/0, ticks=785/9170, in_queue=9954, util=31.25%
  sdy: ios=864/865, merge=0/0, ticks=797/8822, in_queue=9618, util=30.16%

Obviously this works well for the 24 drives connected. I'll need to
check as we do run a configuration of 84 drives, so might have to
increase further to cope with that number of devices.

I'm assuming that as I increase the backlog more memory would be used
by Fio, is there limits on the values that can go in this field? Or am
I only limited by the Host?

Thanks,


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-06-22  8:41 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-17 14:45 Verify concentrated on first few drives Gavin Martin
2015-06-20 16:53 ` Jens Axboe
2015-06-20 17:01   ` Jens Axboe
2015-06-22  8:41     ` Gavin Martin

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.