All of lore.kernel.org
 help / color / mirror / Atom feed
* Differences in "BW" and "bw" results
@ 2017-02-03 16:57 Frank Steinberg
  2017-02-07 16:37 ` Sitsofe Wheeler
  0 siblings, 1 reply; 3+ messages in thread
From: Frank Steinberg @ 2017-02-03 16:57 UTC (permalink / raw)
  To: fio

Hi,

I am quite new to fio, and I cannot interpret bandwidth results, yet. I found some articles on the interpretation of fio results, but no answer for my specific question.

When I do *not* generate log files, the results seem clear to me. But with the "--write_bw_log" option I see a new "bw" section (which also reflects the numbers written to the logfile) and a discrepancy between these bandwidth numbers:

=========================================
root@knorkator:~/fio# ./fio/fio --numjobs=1 --iodepth=1 --direct=1 --runtime=13 --time_based --size=50M --directory=/opt/tmp --ioengine=libaio --rw=read --bs=4k --name=local-libaio-read-4k --write_bw_log=log/local-libaio-read-4k 
local-libaio-read-4k: (g=0): rw=read, bs=4096B-4096B,4096B-4096B,4096B-4096B, ioengine=libaio, iodepth=1
fio-2.17-33-g3dd2e
Starting 1 process
Jobs: 1 (f=1): [R(1)][100.0%][r=91.2MiB/s,w=0KiB/s][r=23.6k,w=0 IOPS][eta 00m:00s]
local-libaio-read-4k: (groupid=0, jobs=1): err= 0: pid=13599: Fri Feb  3 17:26:41 2017
   read: IOPS=23.5k, BW=91.5MiB/s (95.9MB/s)(1189MiB/13001msec)
    slat (usec): min=5, max=127, avg= 7.56, stdev= 2.02
    clat (usec): min=2, max=1058, avg=32.63, stdev= 8.33
     lat (usec): min=33, max=1065, avg=40.19, stdev= 8.90
    clat percentiles (usec):
     |  1.00th=[   29],  5.00th=[   29], 10.00th=[   29], 20.00th=[   30],
     | 30.00th=[   30], 40.00th=[   30], 50.00th=[   30], 60.00th=[   31],
     | 70.00th=[   32], 80.00th=[   34], 90.00th=[   39], 95.00th=[   41],
     | 99.00th=[   58], 99.50th=[   91], 99.90th=[  126], 99.95th=[  153],
     | 99.99th=[  213]
   bw (  KiB/s): min= 3871, max=2048000, per=0.13%, avg=128690.23, stdev=16014.17
    lat (usec) : 4=0.01%, 20=0.01%, 50=98.87%, 100=0.78%, 250=0.34%
    lat (usec) : 500=0.01%, 750=0.01%
    lat (msec) : 2=0.01%
  cpu          : usr=10.15%, sys=26.03%, ctx=304964, majf=0, minf=1802
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued rwt: total=304292,0,0, short=0,0,0, dropped=0,0,0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
   READ: bw=91.5MiB/s (95.9MB/s), 91.5MiB/s-91.5MiB/s (95.9MB/s-95.9MB/s), io=1189MiB (1246MB), run=13001-13001msec

Disk stats (read/write):
  sda: ios=301861/41, merge=0/25, ticks=8612/24, in_queue=8660, util=66.55%
=========================================

The bandwidth numbers on the "read: ..." line and close to the bottom claim that I measured a total average bandwidth of 91.5 MiB/s. That seems reasonable.

But there is also a "bw" section that claims an avg of 128690.23 KiB/s. What is the difference between these numbers?

When I modify iodepth or numjobs parameters, the discrepancy changes, e.g. with iodepth=8 the first number goes up to 278 MiB/s, while the "bw" number goes down to 42921.43 KiB/s.

Unfortunately, the "bw" numbers are written to the logfile, so that my plots do not show what I would expect.


Thanks in advance for any help,

 -frank








^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Differences in "BW" and "bw" results
  2017-02-03 16:57 Differences in "BW" and "bw" results Frank Steinberg
@ 2017-02-07 16:37 ` Sitsofe Wheeler
  2017-02-07 22:02   ` Sitsofe Wheeler
  0 siblings, 1 reply; 3+ messages in thread
From: Sitsofe Wheeler @ 2017-02-07 16:37 UTC (permalink / raw)
  To: Frank Steinberg; +Cc: fio

Hi,

On 3 February 2017 at 16:57, Frank Steinberg <steinberg@ibr.cs.tu-bs.de> wrote:
>
> local-libaio-read-4k: (groupid=0, jobs=1): err= 0: pid=13599: Fri Feb  3 17:26:41 2017
>    read: IOPS=23.5k, BW=91.5MiB/s (95.9MB/s)(1189MiB/13001msec)
[...]
>    bw (  KiB/s): min= 3871, max=2048000, per=0.13%, avg=128690.23, stdev=16014.17
[...]
> The bandwidth numbers on the "read: ..." line and close to the bottom claim that I measured a total average bandwidth of 91.5 MiB/s. That seems reasonable.
>
> But there is also a "bw" section that claims an avg of 128690.23 KiB/s. What is the difference between these numbers?
>
> When I modify iodepth or numjobs parameters, the discrepancy changes, e.g. with iodepth=8 the first number goes up to 278 MiB/s, while the "bw" number goes down to 42921.43 KiB/s.

BW is an average per second. bw is an average of averages...

BW is calculated by summing all the I/O you did (1189MiB) and dividing
by time in seconds so (1189*1024.0)/(13001/1000.0)/1024 ~ 91.5

bw is calculated by summing all the "values" in the bw log and
dividing by the number of those values. Each value is itself some sort
of average but you don't know what period it is over. This can lead to
a big discrepancy.

Imagine I have a job that run for two seconds. In the first second 10
MBytes is done over two I/Os. In the second second 100MByte is done
over two I/Os. BW will be
120 / 2 = 60MBytes/s. However imagine the bw values are like this
0. 10240, 0, 0
0, 10240, 0, 1024
1, 102400, 0, 2048
1, 102400, 0, 3072

(10240 * 2 + 102400 * 2)/4.0/1024.0 = 55.0

The period that each bw entry is being averaged over is not
necessarily known and if an entry is being averaged over time other
than a second the "average of averages" won't be per second either. I
wonder if that (  KiB/s) is actually misleading...

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Differences in "BW" and "bw" results
  2017-02-07 16:37 ` Sitsofe Wheeler
@ 2017-02-07 22:02   ` Sitsofe Wheeler
  0 siblings, 0 replies; 3+ messages in thread
From: Sitsofe Wheeler @ 2017-02-07 22:02 UTC (permalink / raw)
  To: Frank Steinberg; +Cc: fio

The original reply had a few errors:

On 7 February 2017 at 16:37, Sitsofe Wheeler <sitsofe@gmail.com> wrote:
> Hi,
>
> On 3 February 2017 at 16:57, Frank Steinberg <steinberg@ibr.cs.tu-bs.de> wrote:
>>
>> local-libaio-read-4k: (groupid=0, jobs=1): err= 0: pid=13599: Fri Feb  3 17:26:41 2017
>>    read: IOPS=23.5k, BW=91.5MiB/s (95.9MB/s)(1189MiB/13001msec)
> [...]
>>    bw (  KiB/s): min= 3871, max=2048000, per=0.13%, avg=128690.23, stdev=16014.17
> [...]
>> The bandwidth numbers on the "read: ..." line and close to the bottom claim that I measured a total average bandwidth of 91.5 MiB/s. That seems reasonable.
>>
>> But there is also a "bw" section that claims an avg of 128690.23 KiB/s. What is the difference between these numbers?
>>
>> When I modify iodepth or numjobs parameters, the discrepancy changes, e.g. with iodepth=8 the first number goes up to 278 MiB/s, while the "bw" number goes down to 42921.43 KiB/s.
>
> BW is an average per second. bw is an average of averages...
>
> BW is calculated by summing all the I/O you did (1189MiB) and dividing
> by time in seconds so (1189*1024.0)/(13001/1000.0)/1024 ~ 91.5
>
> bw is calculated by summing all the "values" in the bw log and
> dividing by the number of those values. Each value is itself some sort
> of average but you don't know what period it is over. This can lead to
> a big discrepancy.
>
> Imagine I have a job that run for two seconds. In the first second 10
> MBytes is done over two I/Os. In the second second 100MByte is done
> over two I/Os. BW will be
> 120 / 2 = 60MBytes/s. However imagine the bw values are like this
> 0. 10240, 0, 0
> 0, 10240, 0, 1024
> 1, 102400, 0, 2048
> 1, 102400, 0, 3072

By default write_bw_log logs per each I/O. It's going to be tough to
reach speeds of 100MBytes/s with only a few I/Os so let's ratchet the
numbers down to 1KByte/s and 4KBytes. Also the first column is
milliseconds so we need larger numbers there.

Imagine we do 1KByte/s in the first second and 2KByte/s in the second.
Also imagine we have a fixed block size of 512 bytes.

0. 1100, 0, 512
1, 1100, 0, 512
1000, 1100, 0, 512
1250, 1100, 0, 512
1500, 1100, 0, 512
1990, 1100, 0, 512

The last column gives us the size of each I/O done so 512 * 6 is the
total bytes we did and we were going for two seconds so: 512 * 6  / 2
= 1536 making the BW the job achieved 1.5KByte/s. On the other hand
each individual I/O is reporting a bandwidth of 1100KBytes/s but for
whatever reason in the first second we only managed to do two 512Byte
I/Os and in the second we manage 4 (perhaps there's throttling in
place which means they process fast but have big spaces between them).
1100 * 6 / 6 = 1100 KBytes/s so bw is 1100 KByte/s - radically
different from BW.

With a higher iodepth perhaps each individual I/O finishes slower
making per I/O average bandwidth (bw) go down but so many complete in
parallel that the total bandwidth achieved (BW) goes up.

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-02-07 22:03 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-03 16:57 Differences in "BW" and "bw" results Frank Steinberg
2017-02-07 16:37 ` Sitsofe Wheeler
2017-02-07 22:02   ` Sitsofe Wheeler

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.