All of lore.kernel.org
 help / color / mirror / Atom feed
* how to understand latency when rate is set?
@ 2018-04-19  3:08 shadow_lin
  2018-04-19  8:16 ` Sitsofe Wheeler
  0 siblings, 1 reply; 4+ messages in thread
From: shadow_lin @ 2018-04-19  3:08 UTC (permalink / raw)
  To: fio

Hi list,
    I am using rate parameter to limit ecah job's speed.
    The fio setting is:
[4m]
description="4m-seq-write"
direct=1
ioengine=libaio
directory=data
numjobs=60
iodepth=4
group_reporting
rw=write
bs=4M
size=150G
rate=10M

The result is:
4m: (groupid=0, jobs=60): err= 0: pid=6421: Mon Apr 16 03:01:29 2018
  Description : ["ceph rbd 4m-seq-write"]
  write: io=9000.0GB, bw=614407KB/s, iops=150, runt=15359834msec
    slat (usec): min=118, max=2339.4K, avg=736.32, stdev=13419.04
    clat (msec): min=84, max=20170, avg=258.32, stdev=133.97
     lat (msec): min=84, max=20171, avg=259.06, stdev=135.98
    clat percentiles (msec):
     | 1.00th=[ 122], 5.00th=[ 167], 10.00th=[ 176], 20.00th=[ 190],
     | 30.00th=[ 206], 40.00th=[ 223], 50.00th=[ 239], 60.00th=[ 258],
     | 70.00th=[ 281], 80.00th=[ 314], 90.00th=[ 367], 95.00th=[ 416],
     | 99.00th=[ 529], 99.50th=[ 586], 99.90th=[ 791], 99.95th=[ 1004],
     | 99.99th=[ 1958]
    bw (KB /s): min= 204, max=72624, per=1.79%, avg=10979.20, stdev=5690.79
    lat (msec) : 100=0.11%, 250=56.13%, 500=42.24%, 750=1.40%, 1000=0.07%
    lat (msec) : 2000=0.04%, >=2000=0.01%
  cpu : usr=0.08%, sys=0.03%, ctx=4263613, majf=0, minf=570
  IO depths : 1=84.9%, 2=9.7%, 4=5.4%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued : total=r=0/w=2304000/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
  WRITE: io=9000.0GB, aggrb=614406KB/s, minb=614406KB/s, maxb=614406KB/s, mint=15359834msec, maxt=15359834msec
Disk stats (read/write):
  rbd0: ios=0/2305605, merge=0/6913029, ticks=0/594137632, in_queue=594308692, util=100.00%

Then I did the same test without rate paramter
[4m]
description="4m-seq-write"
direct=1
ioengine=libaio
directory=data
numjobs=60
iodepth=4
group_reporting
rw=write
bs=4M
size=150G

The  result is:
4m: (groupid=0, jobs=60): err= 0: pid=30154: Tue Apr 17 03:13:55 2018
  Description : ["ceph rbd 4m-seq-write"]
  write: io=9000.0GB, bw=1048.1MB/s, iops=262, runt=8785724msec
    slat (usec): min=113, max=16389K, avg=159607.76, stdev=284760.52
    clat (msec): min=139, max=33403, avg=748.97, stdev=548.35
     lat (msec): min=148, max=33842, avg=908.58, stdev=671.56
    clat percentiles (msec):
     | 1.00th=[ 212], 5.00th=[ 265], 10.00th=[ 306], 20.00th=[ 371],
     | 30.00th=[ 429], 40.00th=[ 498], 50.00th=[ 578], 60.00th=[ 685],
     | 70.00th=[ 840], 80.00th=[ 1057], 90.00th=[ 1434], 95.00th=[ 1778],
     | 99.00th=[ 2540], 99.50th=[ 2835], 99.90th=[ 3589], 99.95th=[ 3916],
     | 99.99th=[ 5997]
    bw (KB /s): min= 135, max=103595, per=1.90%, avg=20420.87, stdev=11560.88
    lat (msec) : 250=3.56%, 500=36.90%, 750=24.35%, 1000=12.77%, 2000=19.23%
    lat (msec) : >=2000=3.19%
  cpu : usr=0.24%, sys=0.08%, ctx=3888021, majf=0, minf=509
  IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued : total=r=0/w=2304000/d=0, short=r=0/w=0/d=0
Run status group 0 (all jobs):
  WRITE: io=9000.0GB, aggrb=1048.1MB/s, minb=1048.1MB/s, maxb=1048.1MB/s, mint=8785724msec, maxt=8785724msec
Disk stats (read/write):
  rbd0: ios=0/2304660, merge=0/6912635, ticks=0/1107332792, in_queue=1107640692, util=100.00%


The bandwith of the second test is much higher than the first one which is expected, but the latency info is confusing.I was thinking with the same job number and io depth but higher bandwith the latency should be much lower.
The first test with rate to limit the speed shows the avg latency is 259.06ms
The second test without rate to limit the speed shows the avg latency is 908.58ms.

Use 4M/latency*job_number*io_depth to calculate the bandwith.
The second test can match the formula, but the first test is way off from the formula, I think it is because the method to calculate the latency is different.

How should I understand the latency stats with rate parameters?

Thanks.


2018-04-19
shadow_lin 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: how to understand latency when rate is set?
  2018-04-19  3:08 how to understand latency when rate is set? shadow_lin
@ 2018-04-19  8:16 ` Sitsofe Wheeler
  2018-04-19 14:45   ` shadow_lin
  0 siblings, 1 reply; 4+ messages in thread
From: Sitsofe Wheeler @ 2018-04-19  8:16 UTC (permalink / raw)
  To: shadow_lin; +Cc: fio

Hi,

On 19 April 2018 at 04:08, shadow_lin <shadow_lin@163.com> wrote:
> Hi list,
>     I am using rate parameter to limit ecah job's speed.
>     The fio setting is:
> [4m]
> description="4m-seq-write"
> direct=1
> ioengine=libaio
> directory=data
> numjobs=60
> iodepth=4
> group_reporting
> rw=write
> bs=4M
> size=150G
> rate=10M
>
> The result is:
> 4m: (groupid=0, jobs=60): err= 0: pid=6421: Mon Apr 16 03:01:29 2018
>   Description : ["ceph rbd 4m-seq-write"]
>   write: io=9000.0GB, bw=614407KB/s, iops=150, runt=15359834msec
>     slat (usec): min=118, max=2339.4K, avg=736.32, stdev=13419.04
>     clat (msec): min=84, max=20170, avg=258.32, stdev=133.97
>      lat (msec): min=84, max=20171, avg=259.06, stdev=135.98
>     clat percentiles (msec):
>      | 1.00th=[ 122], 5.00th=[ 167], 10.00th=[ 176], 20.00th=[ 190],
>      | 30.00th=[ 206], 40.00th=[ 223], 50.00th=[ 239], 60.00th=[ 258],
>      | 70.00th=[ 281], 80.00th=[ 314], 90.00th=[ 367], 95.00th=[ 416],
>      | 99.00th=[ 529], 99.50th=[ 586], 99.90th=[ 791], 99.95th=[ 1004],
>      | 99.99th=[ 1958]
>     bw (KB /s): min= 204, max=72624, per=1.79%, avg=10979.20, stdev=5690.79
>     lat (msec) : 100=0.11%, 250=56.13%, 500=42.24%, 750=1.40%, 1000=0.07%
>     lat (msec) : 2000=0.04%, >=2000=0.01%
>   cpu : usr=0.08%, sys=0.03%, ctx=4263613, majf=0, minf=570
>   IO depths : 1=84.9%, 2=9.7%, 4=5.4%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued : total=r=0/w=2304000/d=0, short=r=0/w=0/d=0
> Run status group 0 (all jobs):
>   WRITE: io=9000.0GB, aggrb=614406KB/s, minb=614406KB/s, maxb=614406KB/s, mint=15359834msec, maxt=15359834msec
> Disk stats (read/write):
>   rbd0: ios=0/2305605, merge=0/6913029, ticks=0/594137632, in_queue=594308692, util=100.00%
>
> Then I did the same test without rate paramter
> [4m]
> description="4m-seq-write"
> direct=1
> ioengine=libaio
> directory=data
> numjobs=60
> iodepth=4
> group_reporting
> rw=write
> bs=4M
> size=150G
>
> The  result is:
> 4m: (groupid=0, jobs=60): err= 0: pid=30154: Tue Apr 17 03:13:55 2018
>   Description : ["ceph rbd 4m-seq-write"]
>   write: io=9000.0GB, bw=1048.1MB/s, iops=262, runt=8785724msec
>     slat (usec): min=113, max=16389K, avg=159607.76, stdev=284760.52
>     clat (msec): min=139, max=33403, avg=748.97, stdev=548.35
>      lat (msec): min=148, max=33842, avg=908.58, stdev=671.56
>     clat percentiles (msec):
>      | 1.00th=[ 212], 5.00th=[ 265], 10.00th=[ 306], 20.00th=[ 371],
>      | 30.00th=[ 429], 40.00th=[ 498], 50.00th=[ 578], 60.00th=[ 685],
>      | 70.00th=[ 840], 80.00th=[ 1057], 90.00th=[ 1434], 95.00th=[ 1778],
>      | 99.00th=[ 2540], 99.50th=[ 2835], 99.90th=[ 3589], 99.95th=[ 3916],
>      | 99.99th=[ 5997]
>     bw (KB /s): min= 135, max=103595, per=1.90%, avg=20420.87, stdev=11560.88
>     lat (msec) : 250=3.56%, 500=36.90%, 750=24.35%, 1000=12.77%, 2000=19.23%
>     lat (msec) : >=2000=3.19%
>   cpu : usr=0.24%, sys=0.08%, ctx=3888021, majf=0, minf=509
>   IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
>      submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
>      issued : total=r=0/w=2304000/d=0, short=r=0/w=0/d=0
> Run status group 0 (all jobs):
>   WRITE: io=9000.0GB, aggrb=1048.1MB/s, minb=1048.1MB/s, maxb=1048.1MB/s, mint=8785724msec, maxt=8785724msec
> Disk stats (read/write):
>   rbd0: ios=0/2304660, merge=0/6912635, ticks=0/1107332792, in_queue=1107640692, util=100.00%
>
>
> The bandwith of the second test is much higher than the first one which is expected,

Yes.

> but the latency info is confusing.I was thinking with the same job number and io depth but higher bandwith the latency should be much lower.

The relationship between throughput and latency is more complicated
(higher throughput on the exact same system rarely means lower
latency). Normally it is only when the number of things you are
sending simultaneously goes above a threshold that their latency also
starts increasing. Imagine I have a wire four wide - I can put up to
four things down it simultaneously (say I have to send and wait for
them to come back). If I now want to send four things and then ANOTHER
four immediately I have to wait for the slots to become free before I
can send the last four. My throughput may be higher (I sent eight
things in total instead of four) but my latency has gone up due to
contention. Put another way, you can't calculate latency from
throughput when the throughput is not being maxed out.

> The first test with rate to limit the speed shows the avg latency is 259.06ms
> The second test without rate to limit the speed shows the avg latency is 908.58ms.
>
> Use 4M/latency*job_number*io_depth to calculate the bandwith.
> The second test can match the formula, but the first test is way off from the formula, I think it is because the method to calculate the latency is different.
>
> How should I understand the latency stats with rate parameters?

When you used rate you were limiting contention. Without it everything
tries to go as fast as it can creating contention. Looking at your job
are you sure your Ceph disk can really sustain an depth of 240 (4 *
60) I/Os simultaneously all the way? Notice how your disk utilization
hit 100% suggesting the system was waiting on your Ceph disk to
complete I/Os potentially hinting that the disk was overloaded.

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re:  Re: how to understand latency when rate is set?
  2018-04-19  8:16 ` Sitsofe Wheeler
@ 2018-04-19 14:45   ` shadow_lin
  2018-04-19 15:36     ` Sitsofe Wheeler
  0 siblings, 1 reply; 4+ messages in thread
From: shadow_lin @ 2018-04-19 14:45 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: fio


Hi Sitsofe,
	Thank you for your insight.
	I think the lat(along with slat,clat) in fio result means how long it takes to complete the whole io operation.
So with 4m block it means how long it takes to submit the request and gets the comfirmation that the 4m block is
 written into the disk.So I think with higher bandwidth the lat should be lower.Is my understanding correct?Wthout rate 
parameter I use 4M/lat*job_number*io_depthIs to calculate the bandwith the result matched with the fio bandwidth result.
	


         


2018-04-19 

shadowlin

发件人:Sitsofe Wheeler <sitsofe@gmail.com>
发送时间:2018-04-19 16:16
主题:Re: how to understand latency when rate is set?
收件人:"shadow_lin"<shadow_lin@163.com>
抄送:"fio"<fio@vger.kernel.org>

Hi, 

On 19 April 2018 at 04:08, shadow_lin <shadow_lin@163.com> wrote: 
> Hi list, 
>     I am using rate parameter to limit ecah job's speed. 
>     The fio setting is: 
> [4m] 
> description="4m-seq-write" 
> direct=1 
> ioengine=libaio 
> directory=data 
> numjobs=60 
> iodepth=4 
> group_reporting 
> rw=write 
> bs=4M 
> size=150G 
> rate=10M 
> 
> The result is: 
> 4m: (groupid=0, jobs=60): err= 0: pid=6421: Mon Apr 16 03:01:29 2018 
>   Description : ["ceph rbd 4m-seq-write"] 
>   write: io=9000.0GB, bw=614407KB/s, iops=150, runt=15359834msec 
>     slat (usec): min=118, max=2339.4K, avg=736.32, stdev=13419.04 
>     clat (msec): min=84, max=20170, avg=258.32, stdev=133.97 
>      lat (msec): min=84, max=20171, avg=259.06, stdev=135.98 
>     clat percentiles (msec): 
>      | 1.00th=[ 122], 5.00th=[ 167], 10.00th=[ 176], 20.00th=[ 190], 
>      | 30.00th=[ 206], 40.00th=[ 223], 50.00th=[ 239], 60.00th=[ 258], 
>      | 70.00th=[ 281], 80.00th=[ 314], 90.00th=[ 367], 95.00th=[ 416], 
>      | 99.00th=[ 529], 99.50th=[ 586], 99.90th=[ 791], 99.95th=[ 1004], 
>      | 99.99th=[ 1958] 
>     bw (KB /s): min= 204, max=72624, per=1.79%, avg=10979.20, stdev=5690.79 
>     lat (msec) : 100=0.11%, 250=56.13%, 500=42.24%, 750=1.40%, 1000=0.07% 
>     lat (msec) : 2000=0.04%, >=2000=0.01% 
>   cpu : usr=0.08%, sys=0.03%, ctx=4263613, majf=0, minf=570 
>   IO depths : 1=84.9%, 2=9.7%, 4=5.4%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% 
>      submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>      complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>      issued : total=r=0/w=2304000/d=0, short=r=0/w=0/d=0 
> Run status group 0 (all jobs): 
>   WRITE: io=9000.0GB, aggrb=614406KB/s, minb=614406KB/s, maxb=614406KB/s, mint=15359834msec, maxt=15359834msec 
> Disk stats (read/write): 
>   rbd0: ios=0/2305605, merge=0/6913029, ticks=0/594137632, in_queue=594308692, util=100.00% 
> 
> Then I did the same test without rate paramter 
> [4m] 
> description="4m-seq-write" 
> direct=1 
> ioengine=libaio 
> directory=data 
> numjobs=60 
> iodepth=4 
> group_reporting 
> rw=write 
> bs=4M 
> size=150G 
> 
> The  result is: 
> 4m: (groupid=0, jobs=60): err= 0: pid=30154: Tue Apr 17 03:13:55 2018 
>   Description : ["ceph rbd 4m-seq-write"] 
>   write: io=9000.0GB, bw=1048.1MB/s, iops=262, runt=8785724msec 
>     slat (usec): min=113, max=16389K, avg=159607.76, stdev=284760.52 
>     clat (msec): min=139, max=33403, avg=748.97, stdev=548.35 
>      lat (msec): min=148, max=33842, avg=908.58, stdev=671.56 
>     clat percentiles (msec): 
>      | 1.00th=[ 212], 5.00th=[ 265], 10.00th=[ 306], 20.00th=[ 371], 
>      | 30.00th=[ 429], 40.00th=[ 498], 50.00th=[ 578], 60.00th=[ 685], 
>      | 70.00th=[ 840], 80.00th=[ 1057], 90.00th=[ 1434], 95.00th=[ 1778], 
>      | 99.00th=[ 2540], 99.50th=[ 2835], 99.90th=[ 3589], 99.95th=[ 3916], 
>      | 99.99th=[ 5997] 
>     bw (KB /s): min= 135, max=103595, per=1.90%, avg=20420.87, stdev=11560.88 
>     lat (msec) : 250=3.56%, 500=36.90%, 750=24.35%, 1000=12.77%, 2000=19.23% 
>     lat (msec) : >=2000=3.19% 
>   cpu : usr=0.24%, sys=0.08%, ctx=3888021, majf=0, minf=509 
>   IO depths : 1=0.1%, 2=0.1%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0% 
>      submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>      complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0% 
>      issued : total=r=0/w=2304000/d=0, short=r=0/w=0/d=0 
> Run status group 0 (all jobs): 
>   WRITE: io=9000.0GB, aggrb=1048.1MB/s, minb=1048.1MB/s, maxb=1048.1MB/s, mint=8785724msec, maxt=8785724msec 
> Disk stats (read/write): 
>   rbd0: ios=0/2304660, merge=0/6912635, ticks=0/1107332792, in_queue=1107640692, util=100.00% 
> 
> 
> The bandwith of the second test is much higher than the first one which is expected, 

Yes. 

> but the latency info is confusing.I was thinking with the same job number and io depth but higher bandwith the latency should be much lower. 

The relationship between throughput and latency is more complicated 
(higher throughput on the exact same system rarely means lower 
latency). Normally it is only when the number of things you are 
sending simultaneously goes above a threshold that their latency also 
starts increasing. Imagine I have a wire four wide - I can put up to 
four things down it simultaneously (say I have to send and wait for 
them to come back). If I now want to send four things and then ANOTHER 
four immediately I have to wait for the slots to become free before I 
can send the last four. My throughput may be higher (I sent eight 
things in total instead of four) but my latency has gone up due to 
contention. Put another way, you can't calculate latency from 
throughput when the throughput is not being maxed out. 

> The first test with rate to limit the speed shows the avg latency is 259.06ms 
> The second test without rate to limit the speed shows the avg latency is 908.58ms. 
> 
> Use 4M/latency*job_number*io_depth to calculate the bandwith. 
> The second test can match the formula, but the first test is way off from the formula, I think it is because the method to calculate the latency is different. 
> 
> How should I understand the latency stats with rate parameters? 

When you used rate you were limiting contention. Without it everything 
tries to go as fast as it can creating contention. Looking at your job 
are you sure your Ceph disk can really sustain an depth of 240 (4 * 
60) I/Os simultaneously all the way? Notice how your disk utilization 
hit 100% suggesting the system was waiting on your Ceph disk to 
complete I/Os potentially hinting that the disk was overloaded. 

--  
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Re: how to understand latency when rate is set?
  2018-04-19 14:45   ` shadow_lin
@ 2018-04-19 15:36     ` Sitsofe Wheeler
  0 siblings, 0 replies; 4+ messages in thread
From: Sitsofe Wheeler @ 2018-04-19 15:36 UTC (permalink / raw)
  To: shadow_lin; +Cc: fio

Hi,

On 19 April 2018 at 15:45, shadow_lin <shadow_lin@163.com> >
> Hi Sitsofe,
>         Thank you for your insight.
>         I think the lat(along with slat,clat) in fio result means how long it takes to complete the whole io operation.

slat is submission latency alone. clat is completion latency alone.
lat is the total latency from the time it was submitted to the time it
came back completed. See the sections in
http://fio.readthedocs.io/en/latest/fio_doc.html#interpreting-the-output
for descriptions.

> So with 4m block it means how long it takes to submit the request and gets the comfirmation that the 4m block is
>  written into the disk.So I think with higher bandwidth the lat should be lower.Is my understanding correct?Wthout rate

I don't think your understanding is completely correct.

Something to be aware of: when you start using giant blocks sizes your
I/Os may have to be split up due to device (or system) constraints.
Generally speaking disks don't normally accept huge (say bigger than
2MByte) I/Os and if something tries to send them it is up to the
kernel to split them up (generating extra work). Typically there's an
optimal block size that the disk likes best and when you go bigger
than that you often go past the point of diminishing returns. I
mention this because if your 4MByte I/O is split up into 64 x 64KByte
pieces its latency is now the time for ALL the 64KByte pieces to come
back so because they all have to complete. You may also find this
effect means the depth of I/Os being sent down to the disk is
DIFFERENT to what fio is submitting. Take a look at the iostat command
output while your fio is running to see what size and depth of I/Os
the kernel is sending down.

Re throughput and latency: imagine the case where the system only has
one thing to do (one tiny I/O). It could be the case that this is
"easier" to complete than when 100 things have to all be done at once
(e.g. due to the overhead of switching between them all). You might
get a higher throughput doing 100 I/Os at once (all spare resources
are kept occupied) but at the cost that each I/O operation completes a
little bit slower - higher throughput but worse latency. Latency does
not have to go down just because throughput goes up...

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2018-04-19 15:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-04-19  3:08 how to understand latency when rate is set? shadow_lin
2018-04-19  8:16 ` Sitsofe Wheeler
2018-04-19 14:45   ` shadow_lin
2018-04-19 15:36     ` Sitsofe Wheeler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.