* Measuring IOPS
@ 2011-07-29 15:37 Martin Steigerwald
2011-07-29 16:14 ` Martin Steigerwald
2011-08-03 19:31 ` Measuring IOPS Martin Steigerwald
0 siblings, 2 replies; 20+ messages in thread
From: Martin Steigerwald @ 2011-07-29 15:37 UTC (permalink / raw)
To: fio, Jens Axboe
Hi!
I am currently writing an article about fio for a german print magazine
after having packaged it for Debian and using it in performance analysis &
tuning trainings.
After introducting into the concepts of fio with some basic job files I´d
like how to do meaningful IOPS measurements that also work with SSDs that
compress.
For some first tests I came up with:
martin@merkaba:~[…]> cat iops.job
[global]
size=2G
bsrange=2-16k
filename=iops1
numjobs=1
iodepth=1
# Zufällige Daten für SSDs, die komprimieren
refill_buffers=1
[zufälligschreiben]
rw=randwrite
stonewall
[sequentiellschreiben]
rw=write
stonewall
[zufälliglesen]
rw=randread
stonewall
[sequentielllesen]
rw=read
(small german dictionary:
- zufällig => random
- lesen => read
- schreiben => write;)
This takes the following into account:
- It is recommended to just use one process. Why actually? Why not just
filling the device with as much requests as possible and see what it can
handle?
- I do instruct fio first to write random data by even refilling the buffer
with different random data for each write - thats for compressing SSDs,
those with newer sandforce chips
- I let it do sync I/O cause I want to measure the device, not the cache
speed. I considered direct I/O, but at least with sync I/O engine it does
not work on Linux 3.0 with Ext4 on an LVM: invalid request. This may or
may not be expected. I wondering whether direct I/O is for complete
devices, not for filesystems.
Things I didn´t consider:
- I do not use the complete device. For obvious reasons here: I tested on
a SSD that I use for production work as well ;).
- Thus for a harddisk this might not be realistic enough, cause a
harddisk has different speeds at different cylinders. I think for 2-16 KB
request it shouldn´t matter tough.
- I am considering a read test on the complete device
- The test does not go directly to the device, so there might be some Ext4
/ LVM overhead. On the ThinkPad T520 with Intel i5 Sandybridge Dual Core
CPU I think this is negligible.
- 2 GB might not be enough for reliable measurements
Do you think the above job file could give realistic results? Any
suggestions?
I got these results:
martin@merkaba:~[…]> ./fio iops.job
zufälligschreiben: (g=0): rw=randwrite, bs=2-16K/2-16K, ioengine=sync,
iodepth=1
sequentiellschreiben: (g=1): rw=write, bs=2-16K/2-16K, ioengine=sync,
iodepth=1
zufälliglesen: (g=2): rw=randread, bs=2-16K/2-16K, ioengine=sync,
iodepth=1
sequentielllesen: (g=2): rw=read, bs=2-16K/2-16K, ioengine=sync, iodepth=1
fio 1.57
Starting 4 processes
Jobs: 1 (f=1): [__r_] [100.0% done] [561.9M/0K /s] [339K/0 iops] [eta
00m:00s]
zufälligschreiben: (groupid=0, jobs=1): err= 0: pid=23221
write: io=2048.0MB, bw=16971KB/s, iops=5190 , runt=123573msec
clat (usec): min=0 , max=275675 , avg=183.76, stdev=989.34
lat (usec): min=0 , max=275675 , avg=184.02, stdev=989.36
bw (KB/s) : min= 353, max=94417, per=99.87%, avg=16947.64,
stdev=11562.05
cpu : usr=5.39%, sys=14.47%, ctx=344861, majf=0, minf=30
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued r/w/d: total=0/641383/0, short=0/0/0
lat (usec): 2=4.48%, 4=23.03%, 10=27.46%, 20=5.22%, 50=2.17%
lat (usec): 100=0.08%, 250=10.16%, 500=21.35%, 750=4.79%, 1000=0.06%
lat (msec): 2=0.13%, 4=0.64%, 10=0.40%, 20=0.01%, 50=0.01%
lat (msec): 100=0.01%, 250=0.01%, 500=0.01%
sequentiellschreiben: (groupid=1, jobs=1): err= 0: pid=23227
write: io=2048.0MB, bw=49431KB/s, iops=6172 , runt= 42426msec
clat (usec): min=0 , max=83105 , avg=134.18, stdev=1286.14
lat (usec): min=0 , max=83105 , avg=134.53, stdev=1286.14
bw (KB/s) : min= 0, max=73767, per=109.57%, avg=54162.16,
stdev=22989.92
cpu : usr=10.29%, sys=22.17%, ctx=232818, majf=0, minf=33
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued r/w/d: total=0/261869/0, short=0/0/0
lat (usec): 2=0.10%, 4=1.16%, 10=9.97%, 20=1.14%, 50=0.09%
lat (usec): 100=27.31%, 250=59.37%, 500=0.61%, 750=0.04%, 1000=0.02%
lat (msec): 2=0.04%, 4=0.06%, 10=0.01%, 20=0.06%, 50=0.01%
lat (msec): 100=0.03%
zufälliglesen: (groupid=2, jobs=1): err= 0: pid=23564
read : io=2048.0MB, bw=198312KB/s, iops=60635 , runt= 10575msec
clat (usec): min=0 , max=103758 , avg=14.46, stdev=1058.66
lat (usec): min=0 , max=103758 , avg=14.50, stdev=1058.66
bw (KB/s) : min= 98, max=1996998, per=54.76%, avg=217197.79,
stdev=563543.94
cpu : usr=11.20%, sys=8.25%, ctx=513, majf=0, minf=28
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued r/w/d: total=641220/0/0, short=0/0/0
lat (usec): 2=77.54%, 4=21.11%, 10=1.19%, 20=0.09%, 50=0.01%
lat (usec): 100=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%
lat (msec): 2=0.01%, 4=0.03%, 10=0.01%, 20=0.01%, 50=0.01%
lat (msec): 100=0.01%, 250=0.01%
sequentielllesen: (groupid=2, jobs=1): err= 0: pid=23565
read : io=2048.0MB, bw=235953KB/s, iops=29458 , runt= 8888msec
clat (usec): min=0 , max=71904 , avg=30.61, stdev=278.25
lat (usec): min=0 , max=71904 , avg=30.71, stdev=278.25
bw (KB/s) : min= 2, max=266240, per=59.04%, avg=234162.53,
stdev=63283.64
cpu : usr=3.42%, sys=16.70%, ctx=8326, majf=0, minf=28
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued r/w/d: total=261826/0/0, short=0/0/0
lat (usec): 2=28.95%, 4=46.75%, 10=19.20%, 20=1.80%, 50=0.17%
lat (usec): 100=0.11%, 250=0.05%, 500=0.05%, 750=0.24%, 1000=2.44%
lat (msec): 2=0.15%, 4=0.08%, 10=0.01%, 20=0.01%, 100=0.01%
Run status group 0 (all jobs):
WRITE: io=2048.0MB, aggrb=16970KB/s, minb=17378KB/s, maxb=17378KB/s,
mint=123573msec, maxt=123573msec
Run status group 1 (all jobs):
WRITE: io=2048.0MB, aggrb=49430KB/s, minb=50617KB/s, maxb=50617KB/s,
mint=42426msec, maxt=42426msec
Run status group 2 (all jobs):
READ: io=4096.0MB, aggrb=396624KB/s, minb=203071KB/s, maxb=241616KB/s,
mint=8888msec, maxt=10575msec
Disk stats (read/write):
dm-2: ios=577687/390944, merge=0/0, ticks=141180/6046100,
in_queue=6187964, util=76.63%, aggrios=577469/390258, aggrmerge=216/761,
aggrticks=140576/6004336, aggrin_queue=6144016, aggrutil=76.38%
sda: ios=577469/390258, merge=216/761, ticks=140576/6004336,
in_queue=6144016, util=76.38%
Which looks quite fine, I believe ;). I didn´t yet run this test on a
harddisk.
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS
2011-07-29 15:37 Measuring IOPS Martin Steigerwald
@ 2011-07-29 16:14 ` Martin Steigerwald
2011-08-02 14:32 ` Measuring IOPS (solved, I think) Martin Steigerwald
2011-08-03 19:31 ` Measuring IOPS Martin Steigerwald
1 sibling, 1 reply; 20+ messages in thread
From: Martin Steigerwald @ 2011-07-29 16:14 UTC (permalink / raw)
To: fio; +Cc: Jens Axboe
Am Freitag, 29. Juli 2011 schrieb Martin Steigerwald:
> Hi!
>
> I am currently writing an article about fio for a german print magazine
> after having packaged it for Debian and using it in performance
> analysis & tuning trainings.
>
> After introducting into the concepts of fio with some basic job files
> I´d like how to do meaningful IOPS measurements that also work with
> SSDs that compress.
>
> For some first tests I came up with:
>
> martin@merkaba:~[…]> cat iops.job
> [global]
> size=2G
> bsrange=2-16k
> filename=iops1
> numjobs=1
> iodepth=1
> # Zufällige Daten für SSDs, die komprimieren
> refill_buffers=1
>
> [zufälligschreiben]
> rw=randwrite
> stonewall
> [sequentiellschreiben]
> rw=write
> stonewall
>
> [zufälliglesen]
> rw=randread
> stonewall
> [sequentielllesen]
> rw=read
>
> (small german dictionary:
> - zufällig => random
> - lesen => read
> - schreiben => write;)
[...]
> Do you think the above job file could give realistic results? Any
> suggestions?
>
>
> I got these results:
With a simpler read job I have different results that puzzle me:
martin@merkaba:~/Artikel/LinuxNewMedia/fio/Recherche/fio> cat zweierlei-
lesen-2gb-variable-blockgrößen.job
[global]
rw=randread
size=2g
bsrange=2-16k
[zufälliglesen]
stonewall
[sequentielllesen]
rw=read
martin@merkaba:~[...]> ./fio zweierlei-lesen-2gb-variable-blockgrößen.job
zufälliglesen: (g=0): rw=randread, bs=2-16K/2-16K, ioengine=sync,
iodepth=1
sequentielllesen: (g=0): rw=read, bs=2-16K/2-16K, ioengine=sync, iodepth=1
fio 1.57
Starting 2 processes
Jobs: 1 (f=1): [r_] [100.0% done] [96146K/0K /s] [88.3K/0 iops] [eta
00m:00s]
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=29273
read : io=2048.0MB, bw=20915KB/s, iops=6389 , runt=100269msec
clat (usec): min=0 , max=103772 , avg=150.09, stdev=1042.77
lat (usec): min=0 , max=103772 , avg=150.34, stdev=1042.79
bw (KB/s) : min= 131, max=112571, per=50.31%, avg=21045.54,
stdev=13225.53
cpu : usr=4.66%, sys=11.24%, ctx=262203, majf=0, minf=26
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued r/w/d: total=640622/0/0, short=0/0/0
lat (usec): 2=23.94%, 4=26.21%, 10=7.86%, 20=1.39%, 50=0.15%
lat (usec): 100=0.01%, 250=14.76%, 500=21.53%, 750=3.77%, 1000=0.10%
lat (msec): 2=0.16%, 4=0.09%, 10=0.01%, 20=0.01%, 50=0.01%
lat (msec): 100=0.01%, 250=0.01%
sequentielllesen: (groupid=0, jobs=1): err= 0: pid=29274
read : io=2048.0MB, bw=254108KB/s, iops=31748 , runt= 8253msec
clat (usec): min=0 , max=4773 , avg=30.44, stdev=173.41
lat (usec): min=0 , max=4773 , avg=30.54, stdev=173.41
bw (KB/s) : min=229329, max=265720, per=607.79%, avg=254236.81,
stdev=8940.36
cpu : usr=4.02%, sys=16.97%, ctx=8407, majf=0, minf=28
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued r/w/d: total=262021/0/0, short=0/0/0
lat (usec): 2=30.07%, 4=46.83%, 10=17.84%, 20=1.91%, 50=0.19%
lat (usec): 100=0.12%, 250=0.02%, 500=0.02%, 750=0.21%, 1000=2.52%
lat (msec): 2=0.16%, 4=0.10%, 10=0.01%
Run status group 0 (all jobs):
READ: io=4096.0MB, aggrb=41830KB/s, minb=21417KB/s, maxb=260206KB/s,
mint=8253msec, maxt=100269msec
Disk stats (read/write):
dm-2: ios=267216/204, merge=0/0, ticks=95188/36, in_queue=95240,
util=80.52%, aggrios=266989/191, aggrmerge=267/175, aggrticks=94712/44,
aggrin_queue=94312, aggrutil=80.18%
sda: ios=266989/191, merge=267/175, ticks=94712/44, in_queue=94312,
util=80.18%
What´s going on here? Where does the difference between 6389 IOPS for this
simpler read job file versus 60635 IOPS for the IOPS job file come from?
These results compared to the results from the IOPS job do not make sense
to me. Is it just random versus zeros? Which values are more realistic? I
thought on an SSD random I/O versus sequential I/O should cause such a big
difference.
Files are laid out as follows:
martin@merkaba:~[…]> sudo filefrag zufälliglesen.1.0 sequentielllesen.2.0
iops1
zufälliglesen.1.0: 17 extents found
sequentielllesen.2.0: 17 extents found
iops1: 258 extents found
Not that it should matter much on an SSD.
This is on an ThinkPad T520 with Intel i5 Sandybridge Dual Core, 8 GB of
RAM and said Intel SSD 320. On Ext4 on LVM with Linux 3.0.0 debian
package.
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS (solved, I think)
2011-07-29 16:14 ` Martin Steigerwald
@ 2011-08-02 14:32 ` Martin Steigerwald
2011-08-02 19:48 ` Jens Axboe
0 siblings, 1 reply; 20+ messages in thread
From: Martin Steigerwald @ 2011-08-02 14:32 UTC (permalink / raw)
To: fio; +Cc: Jens Axboe
[-- Attachment #1: Type: Text/Plain, Size: 12575 bytes --]
Am Freitag, 29. Juli 2011 schrieb Martin Steigerwald:
> Am Freitag, 29. Juli 2011 schrieb Martin Steigerwald:
> > Hi!
> >
> > I am currently writing an article about fio for a german print
> > magazine after having packaged it for Debian and using it in
> > performance analysis & tuning trainings.
> >
> > After introducting into the concepts of fio with some basic job files
> > I´d like how to do meaningful IOPS measurements that also work with
> > SSDs that compress.
> >
> > For some first tests I came up with:
> >
> > martin@merkaba:~[…]> cat iops.job
> > [global]
> > size=2G
> > bsrange=2-16k
> > filename=iops1
> > numjobs=1
> > iodepth=1
> > # Zufällige Daten für SSDs, die komprimieren
> > refill_buffers=1
> >
> > [zufälligschreiben]
> > rw=randwrite
> > stonewall
> > [sequentiellschreiben]
> > rw=write
> > stonewall
> >
> > [zufälliglesen]
> > rw=randread
> > stonewall
> > [sequentielllesen]
> > rw=read
> >
> > (small german dictionary:
> > - zufällig => random
> > - lesen => read
> > - schreiben => write;)
>
> [...]
>
> > Do you think the above job file could give realistic results? Any
> > suggestions?
>
> > I got these results:
> With a simpler read job I have different results that puzzle me:
>
> martin@merkaba:~/Artikel/LinuxNewMedia/fio/Recherche/fio> cat
> zweierlei- lesen-2gb-variable-blockgrößen.job
> [global]
> rw=randread
> size=2g
> bsrange=2-16k
>
> [zufälliglesen]
> stonewall
> [sequentielllesen]
> rw=read
[...]
> What´s going on here? Where does the difference between 6389 IOPS for
> this simpler read job file versus 60635 IOPS for the IOPS job file
> come from? These results compared to the results from the IOPS job do
> not make sense to me. Is it just random versus zeros? Which values are
> more realistic? I thought on an SSD random I/O versus sequential I/O
> should cause such a big difference.
>
> Files are laid out as follows:
>
> martin@merkaba:~[…]> sudo filefrag zufälliglesen.1.0
> sequentielllesen.2.0 iops1
> zufälliglesen.1.0: 17 extents found
> sequentielllesen.2.0: 17 extents found
> iops1: 258 extents found
>
> Not that it should matter much on an SSD.
>
> This is on an ThinkPad T520 with Intel i5 Sandybridge Dual Core, 8 GB
> of RAM and said Intel SSD 320. On Ext4 on LVM with Linux 3.0.0 debian
> package.
I think I found it.
It depends on whether I specify a filename explicitely *and* globally or
not. When I specify it the job runs fast. No matter whether there are
zeros or random data in the file - as expected for this SSD. When I do not
specify it or specify it in a section the job runs slow.
Now follows the complete investigation and explaination of why this was
so:
The only difference between job files is:
martin@merkaba:~[...]> cat zweierlei-lesen-2gb-variable-blockgrößen-
jobfile-given.job
[global]
rw=randread
size=2g
bsrange=2-16k
filename=zufälliglesen.1.0
[zufälliglesen]
stonewall
[sequentielllesen]
rw=read
The only difference of the implicit-jobfile job is that I commented the
filename option. But since the filename matches that what fio would choose by
itself, fio in both cases should use *the same* file.
Steps to (hopefully) reproduce it:
1. do the following once to have the job files created: fio zweierlei-
lesen-2gb-variable-blockgrößen-jobfile-given.job
2. do
su -c "echo 3 > /proc/sys/vm/drop_caches" ; fio zweierlei-lesen-2gb-
variable-blockgrößen-jobfile-implicit.job > zweierlei-lesen-2gb-variable-
blockgrößen-jobfile-implicit.results
fio runs slow.
3. do
su -c "echo 3 > /proc/sys/vm/drop_caches" ; fio zweierlei-lesen-2gb-
variable-blockgrößen-jobfile-given.job > zweierlei-lesen-2gb-variable-
blockgrößen-jobfile-given.results
fio runs fast.
4. Use kompare, vimdiff or other side by side diff to compare the results.
I do think that echo 3 > /proc/sys/vm/drop_caches is not needed, as far as
I understand, fio clears caches if not told otherwise.
Aside from the speed difference I only found one difference that might
explain this fast
cpu : usr=11.54%, sys=9.26%, ctx=3392, majf=0, minf=28
versus this slow
cpu : usr=4.87%, sys=11.20%, ctx=261968, majf=0, minf=26
Any hints why giving or not giving the filename makes such a big difference?
I also tested whether this might be an UTF-8 issue and rewrote the job
files to not use any umlauts. That didn´t make any difference.
I made it a bit more narrow even:
martin@merkaba:~[...]> diff -u zweierlei-lesen-2gb-variable-blockgroessen-
jobfile-given-in-section-no-utf8.job zweierlei-lesen-2gb-variable-
blockgroessen-jobfile-given-no-utf8.job
--- zweierlei-lesen-2gb-variable-blockgroessen-jobfile-given-in-section-no-
utf8.job 2011-08-02 15:35:41.246226877 +0200
+++ zweierlei-lesen-2gb-variable-blockgroessen-jobfile-given-no-utf8.job
2011-08-02 15:50:00.073095677 +0200
@@ -2,9 +2,9 @@
rw=randread
size=2g
bsrange=2-16k
+filename=zufaelliglesen.1.0
[zufaelliglesen]
-filename=zufaelliglesen.1.0
stonewall
[sequentielllesen]
rw=read
makes the difference.
Okay, and now I understand it:
As I see from the progress display I see that fio runs the second job first.
When the filename is in the global section, both jobs use the same file. And
with the missing stonewall option in the second job section the sequential
read job even runs in parallel. I wondered whether I had [rR] there.
Okay, then I know:
When I want to have mutiple jobs run one after another I need to do a
stonewall argument in *each* job. *Also the last one*, cause in the notion
of fio there is no last job, as fio sets up each job before it starts job
execution. And it seems that everything it runs everything that has no
stonewall option right away even if an earlier defined job file has a
stonewall option. Fio only thinks sequentially for the jobs that have a
stonewall option which might be a disadvantage, if I want to run groups of
jobs one after another:
[readjob]
blabla
[parallelwritejob]
blabla
stonewall
[randomreadjob]
blabla
[sequentialreadjob]
blabla
stonewall
As far as I understand it, Fio would then run the both jobs without a
stonewall option straight away while it also starts the first job with a
stonewall option. Then, the jobs without the stonewall option might still
running or not, fio starts the second job without the stonewall option.
So I understand it now. Personally I would prefer, if fio runs the first two
jobs in parallel then does the stonewall, and then the second two jobs in
parallel. Is this possible somehow?
With an additional "stonewall" for the last job in the iops.job file I made
up it also works when the filename is specified globally.
Thus there we go:
martin@merkaba:~[...]> fio iops-done-right.job
zufälligschreiben: (g=0): rw=randwrite, bs=2-16K/2-16K, ioengine=sync,
iodepth=1
sequentiellschreiben: (g=1): rw=write, bs=2-16K/2-16K, ioengine=sync,
iodepth=1
zufälliglesen: (g=2): rw=randread, bs=2-16K/2-16K, ioengine=sync,
iodepth=1
sequentielllesen: (g=3): rw=read, bs=2-16K/2-16K, ioengine=sync, iodepth=1
fio 1.57
Starting 4 processes
Jobs: 1 (f=1): [___R] [100.0% done] [268.3M/0K /s] [33.6K/0 iops] [eta
00m:00s]
zufälligschreiben: (groupid=0, jobs=1): err= 0: pid=20474
write: io=2048.0MB, bw=16686KB/s, iops=5096 , runt=125687msec
clat (usec): min=0 , max=292792 , avg=188.16, stdev=933.01
lat (usec): min=0 , max=292792 , avg=188.42, stdev=933.04
bw (KB/s) : min= 320, max=63259, per=100.00%, avg=16684.86,
stdev=9052.82
cpu : usr=4.60%, sys=13.58%, ctx=344489, majf=0, minf=31
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued r/w/d: total=0/640575/0, short=0/0/0
lat (usec): 2=4.55%, 4=23.30%, 10=27.49%, 20=5.00%, 50=2.07%
lat (usec): 100=0.09%, 250=12.02%, 500=20.41%, 750=3.46%, 1000=0.11%
lat (msec): 2=0.31%, 4=0.78%, 10=0.40%, 20=0.01%, 50=0.01%
lat (msec): 100=0.01%, 250=0.01%, 500=0.01%
sequentiellschreiben: (groupid=1, jobs=1): err= 0: pid=20482
write: io=2048.0MB, bw=49401KB/s, iops=6176 , runt= 42452msec
clat (usec): min=0 , max=213632 , avg=132.32, stdev=1355.16
lat (usec): min=1 , max=213632 , avg=132.65, stdev=1355.16
bw (KB/s) : min= 2, max=79902, per=110.53%, avg=54600.95,
stdev=24636.03
cpu : usr=10.83%, sys=21.02%, ctx=232933, majf=0, minf=34
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued r/w/d: total=0/262197/0, short=0/0/0
lat (usec): 2=0.09%, 4=1.07%, 10=10.43%, 20=0.89%, 50=0.07%
lat (usec): 100=35.03%, 250=51.57%, 500=0.57%, 750=0.03%, 1000=0.01%
lat (msec): 2=0.04%, 4=0.08%, 10=0.02%, 20=0.06%, 50=0.01%
lat (msec): 100=0.03%, 250=0.01%
zufälliglesen: (groupid=2, jobs=1): err= 0: pid=20484
read : io=2048.0MB, bw=23151KB/s, iops=7050 , runt= 90584msec
clat (usec): min=0 , max=70235 , avg=134.92, stdev=212.70
lat (usec): min=0 , max=70236 , avg=135.16, stdev=212.79
bw (KB/s) : min= 5, max=118959, per=100.36%, avg=23233.61,
stdev=12885.69
cpu : usr=4.55%, sys=13.30%, ctx=259109, majf=0, minf=27
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued r/w/d: total=638666/0/0, short=0/0/0
lat (usec): 2=21.65%, 4=30.12%, 10=7.42%, 20=0.69%, 50=0.06%
lat (usec): 100=0.01%, 250=14.58%, 500=21.58%, 750=3.85%, 1000=0.01%
lat (msec): 2=0.02%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
lat (msec): 100=0.01%
sequentielllesen: (groupid=3, jobs=1): err= 0: pid=20820
read : io=2048.0MB, bw=267392KB/s, iops=33432 , runt= 7843msec
clat (usec): min=0 , max=4098 , avg=28.40, stdev=143.49
lat (usec): min=0 , max=4098 , avg=28.51, stdev=143.49
bw (KB/s) : min=176584, max=275993, per=100.04%, avg=267511.13,
stdev=25285.86
cpu : usr=4.18%, sys=21.27%, ctx=8616, majf=0, minf=29
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%,
>=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%,
>=64=0.0%
issued r/w/d: total=262210/0/0, short=0/0/0
lat (usec): 2=26.04%, 4=47.22%, 10=21.85%, 20=1.61%, 50=0.13%
lat (usec): 100=0.02%, 250=0.01%, 500=0.01%, 750=0.14%, 1000=2.96%
lat (msec): 2=0.01%, 4=0.01%, 10=0.01%
Run status group 0 (all jobs):
WRITE: io=2048.0MB, aggrb=16685KB/s, minb=17085KB/s, maxb=17085KB/s,
mint=125687msec, maxt=125687msec
Run status group 1 (all jobs):
WRITE: io=2048.0MB, aggrb=49400KB/s, minb=50586KB/s, maxb=50586KB/s,
mint=42452msec, maxt=42452msec
Run status group 2 (all jobs):
READ: io=2048.0MB, aggrb=23151KB/s, minb=23707KB/s, maxb=23707KB/s,
mint=90584msec, maxt=90584msec
Run status group 3 (all jobs):
READ: io=2048.0MB, aggrb=267391KB/s, minb=273808KB/s, maxb=273808KB/s,
mint=7843msec, maxt=7843msec
Disk stats (read/write):
dm-2: ios=832862/416089, merge=0/0, ticks=206456/6699696,
in_queue=6907728, util=78.36%, aggrios=833046/418069, aggrmerge=95/663,
aggrticks=206032/6668768, aggrin_queue=6873712, aggrutil=77.80%
sda: ios=833046/418069, merge=95/663, ticks=206032/6668768,
in_queue=6873712, util=77.80%
martin@merkaba:~[...]> diff -u iops.job iops-done-right.job
--- iops.job 2011-07-29 16:40:41.776809061 +0200
+++ iops-done-right.job 2011-08-02 16:15:06.055626894 +0200
@@ -19,4 +19,5 @@
stonewall
[sequentielllesen]
rw=read
+stonewall
So always question results that don´t make sense.
May this serve as pointer should anyone stumple upon something like this
;)
Ciao,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
[-- Attachment #2: zweierlei-lesen-2gb-variable-blockgrößen-jobfile-given.job --]
[-- Type: text/plain, Size: 136 bytes --]
[global]
rw=randread
size=2g
bsrange=2-16k
filename=zufälliglesen.1.0
[zufälliglesen]
stonewall
[sequentielllesen]
rw=read
[-- Attachment #3: zweierlei-lesen-2gb-variable-blockgrößen-jobfile-given.results --]
[-- Type: text/plain, Size: 2428 bytes --]
zufälliglesen: (g=0): rw=randread, bs=2-16K/2-16K, ioengine=sync, iodepth=1
sequentielllesen: (g=0): rw=read, bs=2-16K/2-16K, ioengine=sync, iodepth=1
fio 1.57
Starting 2 processes
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=14723
read : io=2048.0MB, bw=186066KB/s, iops=56794 , runt= 11271msec
clat (usec): min=0 , max=103559 , avg=15.57, stdev=1069.35
lat (usec): min=0 , max=103559 , avg=15.62, stdev=1069.35
bw (KB/s) : min= 243, max=843245, per=53.37%, avg=198607.29, stdev=327979.47
cpu : usr=11.54%, sys=9.26%, ctx=3392, majf=0, minf=28
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=640132/0/0, short=0/0/0
lat (usec): 2=67.88%, 4=25.43%, 10=5.79%, 20=0.36%, 50=0.05%
lat (usec): 100=0.01%, 250=0.15%, 500=0.26%, 750=0.04%, 1000=0.01%
lat (msec): 2=0.01%, 4=0.03%, 10=0.01%, 20=0.01%, 50=0.01%
lat (msec): 100=0.01%, 250=0.01%
sequentielllesen: (groupid=0, jobs=1): err= 0: pid=14724
read : io=2048.0MB, bw=255813KB/s, iops=31989 , runt= 8198msec
clat (usec): min=0 , max=22658 , avg=30.46, stdev=171.40
lat (usec): min=0 , max=22658 , avg=30.56, stdev=171.39
bw (KB/s) : min=240308, max=264524, per=68.85%, avg=256220.81, stdev=6556.93
cpu : usr=4.39%, sys=17.18%, ctx=8630, majf=0, minf=28
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=262251/0/0, short=0/0/0
lat (usec): 2=26.88%, 4=48.21%, 10=19.94%, 20=1.59%, 50=0.16%
lat (usec): 100=0.09%, 250=0.04%, 500=0.05%, 750=0.16%, 1000=2.60%
lat (msec): 2=0.21%, 4=0.06%, 50=0.01%
Run status group 0 (all jobs):
READ: io=4096.0MB, aggrb=372132KB/s, minb=190531KB/s, maxb=261952KB/s, mint=8198msec, maxt=11271msec
Disk stats (read/write):
dm-2: ios=11567/15, merge=0/0, ticks=23592/508, in_queue=24100, util=79.23%, aggrios=11473/353, aggrmerge=156/14, aggrticks=24556/2332, aggrin_queue=26864, aggrutil=77.77%
sda: ios=11473/353, merge=156/14, ticks=24556/2332, in_queue=26864, util=77.77%
[-- Attachment #4: zweierlei-lesen-2gb-variable-blockgrößen-jobfile-implicit.job --]
[-- Type: text/plain, Size: 137 bytes --]
[global]
rw=randread
size=2g
bsrange=2-16k
#filename=zufälliglesen.1.0
[zufälliglesen]
stonewall
[sequentielllesen]
rw=read
[-- Attachment #5: zweierlei-lesen-2gb-variable-blockgrößen-jobfile-implicit.results --]
[-- Type: text/plain, Size: 2410 bytes --]
zufälliglesen: (g=0): rw=randread, bs=2-16K/2-16K, ioengine=sync, iodepth=1
sequentielllesen: (g=0): rw=read, bs=2-16K/2-16K, ioengine=sync, iodepth=1
fio 1.57
Starting 2 processes
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=14745
read : io=2048.0MB, bw=21516KB/s, iops=6567 , runt= 97469msec
clat (usec): min=0 , max=103566 , avg=146.45, stdev=1099.54
lat (usec): min=0 , max=103566 , avg=146.68, stdev=1099.56
bw (KB/s) : min= 132, max=118508, per=49.82%, avg=21437.43, stdev=13479.42
cpu : usr=4.87%, sys=11.20%, ctx=261968, majf=0, minf=26
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=640153/0/0, short=0/0/0
lat (usec): 2=21.55%, 4=30.22%, 10=7.16%, 20=0.58%, 50=0.06%
lat (usec): 100=0.01%, 250=15.55%, 500=21.14%, 750=3.68%, 1000=0.01%
lat (msec): 2=0.01%, 4=0.03%, 20=0.01%, 50=0.01%, 100=0.01%
lat (msec): 250=0.01%
sequentielllesen: (groupid=0, jobs=1): err= 0: pid=14746
read : io=2048.0MB, bw=256909KB/s, iops=32115 , runt= 8163msec
clat (usec): min=0 , max=3979 , avg=30.33, stdev=166.59
lat (usec): min=0 , max=3979 , avg=30.44, stdev=166.58
bw (KB/s) : min=245618, max=267008, per=597.43%, avg=257087.19, stdev=6569.89
cpu : usr=3.97%, sys=17.30%, ctx=8475, majf=0, minf=28
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=262162/0/0, short=0/0/0
lat (usec): 2=28.30%, 4=47.36%, 10=19.41%, 20=1.64%, 50=0.15%
lat (usec): 100=0.06%, 250=0.01%, 500=0.01%, 750=0.11%, 1000=2.69%
lat (msec): 2=0.20%, 4=0.06%
Run status group 0 (all jobs):
READ: io=4096.0MB, aggrb=43032KB/s, minb=22032KB/s, maxb=263075KB/s, mint=8163msec, maxt=97469msec
Disk stats (read/write):
dm-2: ios=267158/84, merge=0/0, ticks=94852/676, in_queue=95528, util=82.25%, aggrios=267040/962, aggrmerge=164/37, aggrticks=94356/5816, aggrin_queue=99820, aggrutil=81.77%
sda: ios=267040/962, merge=164/37, ticks=94356/5816, in_queue=99820, util=81.77%
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS (solved, I think)
2011-08-02 14:32 ` Measuring IOPS (solved, I think) Martin Steigerwald
@ 2011-08-02 19:48 ` Jens Axboe
2011-08-02 21:28 ` Martin Steigerwald
0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2011-08-02 19:48 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: fio
That's a long email! The stonewall should be put in the job section that
has to wait for previous jobs. So, ala:
[job1]
something
[job2]
stonewall # will wait for job1 to finish
something
[job3]
something # will run in parallel with job2
[job4]
stonewall # will run when job2+3 are finished
something
If that's not the case, something is broken. A quick test here seems to
show that it works.
--
Jens Axboe
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS (solved, I think)
2011-08-02 19:48 ` Jens Axboe
@ 2011-08-02 21:28 ` Martin Steigerwald
2011-08-03 7:17 ` Jens Axboe
0 siblings, 1 reply; 20+ messages in thread
From: Martin Steigerwald @ 2011-08-02 21:28 UTC (permalink / raw)
To: Jens Axboe; +Cc: fio
Am Dienstag, 2. August 2011 schrieben Sie:
> That's a long email! The stonewall should be put in the job section
> that has to wait for previous jobs. So, ala:
>
> [job1]
> something
>
> [job2]
> stonewall # will wait for job1 to finish
> something
>
> [job3]
> something # will run in parallel with job2
>
> [job4]
> stonewall # will run when job2+3 are finished
> something
>
> If that's not the case, something is broken. A quick test here seems to
> show that it works.
Its documented. From the manpage that I read several times by now:
Wait for preceding jobs in the job file to exit before starting this one.
stonewall implies new_group.
Somehow despite my reading of manpage, README, HOWTO I came to the thought
that it tells fio to wait for the current job to finish, thus I had the
stonewall options misordered.
I expect that it works exactly as you said and try it this way. Instead of
omitting the last stonewall option in my iops job file I could omit the
first for the first job. Cause the first job does not need to wait for a
previous job.
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS (solved, I think)
2011-08-02 21:28 ` Martin Steigerwald
@ 2011-08-03 7:17 ` Jens Axboe
2011-08-03 9:03 ` Martin Steigerwald
0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2011-08-03 7:17 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: fio
On 2011-08-02 23:28, Martin Steigerwald wrote:
> Am Dienstag, 2. August 2011 schrieben Sie:
>> That's a long email! The stonewall should be put in the job section
>> that has to wait for previous jobs. So, ala:
>>
>> [job1]
>> something
>>
>> [job2]
>> stonewall # will wait for job1 to finish
>> something
>>
>> [job3]
>> something # will run in parallel with job2
>>
>> [job4]
>> stonewall # will run when job2+3 are finished
>> something
>>
>> If that's not the case, something is broken. A quick test here seems to
>> show that it works.
>
> Its documented. From the manpage that I read several times by now:
>
> Wait for preceding jobs in the job file to exit before starting this one.
> stonewall implies new_group.
>
>
> Somehow despite my reading of manpage, README, HOWTO I came to the thought
> that it tells fio to wait for the current job to finish, thus I had the
> stonewall options misordered.
>
> I expect that it works exactly as you said and try it this way. Instead of
> omitting the last stonewall option in my iops job file I could omit the
> first for the first job. Cause the first job does not need to wait for a
> previous job.
Good, that makes me feel a little better :-)
Perhaps the name isn't that great? I'll gladly put in an alias for that
option, "wait_for_previous" or "barrier" or something like that. Fence?
--
Jens Axboe
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS (solved, I think)
2011-08-03 7:17 ` Jens Axboe
@ 2011-08-03 9:03 ` Martin Steigerwald
2011-08-03 10:34 ` Jens Axboe
0 siblings, 1 reply; 20+ messages in thread
From: Martin Steigerwald @ 2011-08-03 9:03 UTC (permalink / raw)
To: Jens Axboe; +Cc: fio
Am Mittwoch, 3. August 2011 schrieben Sie:
> On 2011-08-02 23:28, Martin Steigerwald wrote:
> > Am Dienstag, 2. August 2011 schrieben Sie:
> >> That's a long email! The stonewall should be put in the job section
> >> that has to wait for previous jobs. So, ala:
> >>
> >> [job1]
> >> something
> >>
> >> [job2]
> >> stonewall # will wait for job1 to finish
> >> something
> >>
> >> [job3]
> >> something # will run in parallel with job2
> >>
> >> [job4]
> >> stonewall # will run when job2+3 are finished
> >> something
> >>
> >> If that's not the case, something is broken. A quick test here seems
> >> to show that it works.
> >
> > Its documented. From the manpage that I read several times by now:
> >
> > Wait for preceding jobs in the job file to exit before starting this
> > one. stonewall implies new_group.
> >
> >
> > Somehow despite my reading of manpage, README, HOWTO I came to the
> > thought that it tells fio to wait for the current job to finish,
> > thus I had the stonewall options misordered.
> >
> > I expect that it works exactly as you said and try it this way.
> > Instead of omitting the last stonewall option in my iops job file I
> > could omit the first for the first job. Cause the first job does not
> > need to wait for a previous job.
>
> Good, that makes me feel a little better :-)
What did you feel bad about? I didn´t intend to trigger bad feelings.
There was nothing wrong with fio. Behavior was documented.
> Perhaps the name isn't that great? I'll gladly put in an alias for that
> option, "wait_for_previous" or "barrier" or something like that. Fence?
wait_before? But then "wait_for_previous" might be the clearest
description. "wait_before" would make sense with an "wait_after" that
waits after the job for its completion. But two options for basically the
same thing might complicate matters even more.
So "wait_for_previous" or maybe "finish_previous_first" or just
"finish_previous" would be fine with me.
But then this doesn´t imply that fio does a cache flush. But that could be
documented in the manpage with an additional hint on this option. I will
think about it and possibly provide a patch.
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS (solved, I think)
2011-08-03 9:03 ` Martin Steigerwald
@ 2011-08-03 10:34 ` Jens Axboe
0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2011-08-03 10:34 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: fio
On 2011-08-03 11:03, Martin Steigerwald wrote:
>> Perhaps the name isn't that great? I'll gladly put in an alias for that
>> option, "wait_for_previous" or "barrier" or something like that. Fence?
>
> wait_before? But then "wait_for_previous" might be the clearest
> description. "wait_before" would make sense with an "wait_after" that
> waits after the job for its completion. But two options for basically the
> same thing might complicate matters even more.
Yes, I'm not going to add another option where only the placement of it
would make a difference. I'll add wait_for_previous.
> So "wait_for_previous" or maybe "finish_previous_first" or just
> "finish_previous" would be fine with me.
>
> But then this doesn´t imply that fio does a cache flush. But that could be
> documented in the manpage with an additional hint on this option. I will
> think about it and possibly provide a patch.
Not really impacted by that, those are controlled on a job by job basis
anyway.
--
Jens Axboe
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS
2011-07-29 15:37 Measuring IOPS Martin Steigerwald
2011-07-29 16:14 ` Martin Steigerwald
@ 2011-08-03 19:31 ` Martin Steigerwald
2011-08-03 20:22 ` Jeff Moyer
1 sibling, 1 reply; 20+ messages in thread
From: Martin Steigerwald @ 2011-08-03 19:31 UTC (permalink / raw)
To: fio
[-- Attachment #1: Type: Text/Plain, Size: 1661 bytes --]
Am Freitag, 29. Juli 2011 schrieben Sie:
> Hi!
>
> I am currently writing an article about fio for a german print magazine
> after having packaged it for Debian and using it in performance
> analysis & tuning trainings.
>
> After introducting into the concepts of fio with some basic job files
> I´d like how to do meaningful IOPS measurements that also work with
> SSDs that compress.
>
> For some first tests I came up with:
>
> martin@merkaba:~[…]> cat iops.job
> [global]
> size=2G
> bsrange=2-16k
> filename=iops1
> numjobs=1
> iodepth=1
> # Zufällige Daten für SSDs, die komprimieren
> refill_buffers=1
>
> [zufälligschreiben]
> rw=randwrite
> stonewall
> [sequentiellschreiben]
> rw=write
> stonewall
>
> [zufälliglesen]
> rw=randread
> stonewall
> [sequentielllesen]
> rw=read
Even with the additional stonewall this still isn´t accurate. I found this
by getting completely bogus values with a SoftRAID 1 on two SAS disks.
It needs the following additional changes:
- ioengine=libaio
- direct=1
- and then due to direct I/O alignment requirement: bsrange=2k-16k
So I now also fully understand that ioengine=sync just refers to the
synchronous nature of the system calls used, not on whether the I/Os are
issued synchronously via sync=1 or by circumventing the page cache via
direct=1
Attached are results that bring down IOPS on read drastically! I first let
sequentiell.job write out the complete 2 gb with random data and then ran
the iops.job.
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
[-- Attachment #2: iops.job --]
[-- Type: text/plain, Size: 523 bytes --]
[global]
ioengine=libaio
direct=1
# Für zufällige Daten vorher den Job sequentiell
# laufen lassen
# Wichtig für SSDs, die komprimieren
filename=testdatei
size=2G
bsrange=2k-16k
# Das, was hier geschrieben wird, soll natürlich
# auch wieder zufällig sein
refill_buffers=1
[zufälliglesen]
stonewall
rw=randread
runtime=60
[sequentielllesen]
stonewall
rw=read
runtime=60
[zufälligschreiben]
stonewall
rw=randwrite
runtime=60
[sequentiellschreiben]
stonewall
rw=write
runtime=60
[-- Attachment #3: iops.log --]
[-- Type: text/x-log, Size: 4940 bytes --]
zufälliglesen: (g=0): rw=randread, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1
sequentielllesen: (g=1): rw=read, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1
zufälligschreiben: (g=2): rw=randwrite, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1
sequentiellschreiben: (g=3): rw=write, bs=2K-16K/2K-16K, ioengine=libaio, iodepth=1
fio 1.57
Starting 4 processes
zufälliglesen: (groupid=0, jobs=1): err= 0: pid=6954
read : io=1322.9MB, bw=22563KB/s, iops=3194 , runt= 60001msec
slat (usec): min=6 , max=1763 , avg=29.52, stdev=12.62
clat (usec): min=2 , max=7206 , avg=274.52, stdev=114.08
lat (usec): min=128 , max=7246 , avg=304.68, stdev=116.81
bw (KB/s) : min=18844, max=25304, per=100.15%, avg=22596.20, stdev=1740.26
cpu : usr=4.15%, sys=10.50%, ctx=193490, majf=0, minf=23
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=191664/0/0, short=0/0/0
lat (usec): 4=0.01%, 10=0.01%, 50=0.01%, 100=0.01%, 250=49.32%
lat (usec): 500=48.57%, 750=1.98%, 1000=0.05%
lat (msec): 2=0.05%, 4=0.02%, 10=0.01%
sequentielllesen: (groupid=1, jobs=1): err= 0: pid=6956
read : io=2048.0MB, bw=72598KB/s, iops=8066 , runt= 28887msec
slat (usec): min=5 , max=1909 , avg=26.76, stdev= 8.98
clat (usec): min=1 , max=4631 , avg=91.18, stdev=36.03
lat (usec): min=40 , max=4644 , avg=118.51, stdev=37.86
bw (KB/s) : min=70224, max=77412, per=100.09%, avg=72663.79, stdev=1589.19
cpu : usr=6.47%, sys=24.83%, ctx=234568, majf=0, minf=25
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=233021/0/0, short=0/0/0
lat (usec): 2=0.01%, 4=0.01%, 10=0.01%, 50=0.71%, 100=65.76%
lat (usec): 250=33.16%, 500=0.29%, 750=0.05%, 1000=0.02%
lat (msec): 2=0.01%, 4=0.01%, 10=0.01%
zufälligschreiben: (groupid=2, jobs=1): err= 0: pid=6958
write: io=2048.0MB, bw=36083KB/s, iops=6594 , runt= 58121msec
slat (usec): min=6 , max=1952 , avg=31.79, stdev= 9.51
clat (usec): min=0 , max=19882 , avg=113.47, stdev=216.71
lat (usec): min=44 , max=19949 , avg=145.84, stdev=217.32
bw (KB/s) : min=14000, max=58580, per=100.12%, avg=36125.51, stdev=10544.88
cpu : usr=5.66%, sys=23.66%, ctx=386270, majf=0, minf=17
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=0/383305/0, short=0/0/0
lat (usec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=1.92%
lat (usec): 100=61.91%, 250=30.73%, 500=5.22%, 750=0.12%, 1000=0.04%
lat (msec): 2=0.03%, 4=0.01%, 10=0.01%, 20=0.01%
sequentiellschreiben: (groupid=3, jobs=1): err= 0: pid=6959
write: io=2048.0MB, bw=63465KB/s, iops=7050 , runt= 33044msec
slat (usec): min=6 , max=2854 , avg=30.54, stdev=11.23
clat (usec): min=1 , max=19371 , avg=104.68, stdev=190.45
lat (usec): min=43 , max=19417 , avg=135.81, stdev=191.17
bw (KB/s) : min=22984, max=68224, per=100.07%, avg=63511.21, stdev=5443.51
cpu : usr=6.16%, sys=24.62%, ctx=234687, majf=0, minf=19
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=0/232969/0, short=0/0/0
lat (usec): 2=0.01%, 4=0.01%, 10=0.01%, 50=0.68%, 100=51.92%
lat (usec): 250=46.97%, 500=0.22%, 750=0.07%, 1000=0.07%
lat (msec): 2=0.04%, 4=0.01%, 10=0.01%, 20=0.01%
Run status group 0 (all jobs):
READ: io=1322.9MB, aggrb=22563KB/s, minb=23104KB/s, maxb=23104KB/s, mint=60001msec, maxt=60001msec
Run status group 1 (all jobs):
READ: io=2048.0MB, aggrb=72598KB/s, minb=74340KB/s, maxb=74340KB/s, mint=28887msec, maxt=28887msec
Run status group 2 (all jobs):
WRITE: io=2048.0MB, aggrb=36082KB/s, minb=36948KB/s, maxb=36948KB/s, mint=58121msec, maxt=58121msec
Run status group 3 (all jobs):
WRITE: io=2048.0MB, aggrb=63465KB/s, minb=64988KB/s, maxb=64988KB/s, mint=33044msec, maxt=33044msec
Disk stats (read/write):
dm-2: ios=424704/615629, merge=0/0, ticks=70028/59768, in_queue=129796, util=71.90%, aggrios=424704/616498, aggrmerge=0/60, aggrticks=69568/60584, aggrin_queue=128920, aggrutil=71.33%
sda: ios=424704/616498, merge=0/60, ticks=69568/60584, in_queue=128920, util=71.33%
[-- Attachment #4: sequentiell.job --]
[-- Type: text/plain, Size: 221 bytes --]
[global]
ioengine=libaio
direct=1
filename=testdatei
size=2g
bs=4m
# Vollständig zufällige Daten für SSDs, die komprimieren
refill_buffers=1
[schreiben]
stonewall
rw=write
[lesen]
stonewall
rw=read
[-- Attachment #5: sequentiell.log --]
[-- Type: text/x-log, Size: 2432 bytes --]
[global]
ioengine=libaio
direct=1
filename=testdatei
size=2g
bs=4m
[schreiben]
stonewall
rw=write
[lesen]
stonewall
rw=read
schreiben: (g=0): rw=write, bs=4M-4M/4M-4M, ioengine=libaio, iodepth=1
lesen: (g=1): rw=read, bs=4M-4M/4M-4M, ioengine=libaio, iodepth=1
fio 1.57
Starting 2 processes
schreiben: Laying out IO file(s) (1 file(s) / 2048MB)
schreiben: (groupid=0, jobs=1): err= 0: pid=5855
write: io=2048.0MB, bw=220150KB/s, iops=53 , runt= 9526msec
slat (usec): min=239 , max=1328 , avg=452.88, stdev=182.05
clat (msec): min=17 , max=22 , avg=18.14, stdev= 1.10
lat (msec): min=17 , max=23 , avg=18.59, stdev= 1.12
bw (KB/s) : min=216422, max=223128, per=100.08%, avg=220331.94, stdev=2205.18
cpu : usr=0.17%, sys=2.44%, ctx=557, majf=0, minf=19
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=0/512/0, short=0/0/0
lat (msec): 20=93.55%, 50=6.45%
lesen: (groupid=1, jobs=1): err= 0: pid=5856
read : io=2048.0MB, bw=267460KB/s, iops=65 , runt= 7841msec
slat (usec): min=251 , max=4071 , avg=581.06, stdev=300.62
clat (usec): min=14517 , max=17700 , avg=14724.38, stdev=340.74
lat (usec): min=14906 , max=20094 , avg=15306.37, stdev=451.23
bw (KB/s) : min=264000, max=270336, per=100.07%, avg=267634.87, stdev=1787.07
cpu : usr=0.10%, sys=3.78%, ctx=569, majf=0, minf=1045
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=512/0/0, short=0/0/0
lat (msec): 20=100.00%
Run status group 0 (all jobs):
WRITE: io=2048.0MB, aggrb=220150KB/s, minb=225433KB/s, maxb=225433KB/s, mint=9526msec, maxt=9526msec
Run status group 1 (all jobs):
READ: io=2048.0MB, aggrb=267459KB/s, minb=273878KB/s, maxb=273878KB/s, mint=7841msec, maxt=7841msec
Disk stats (read/write):
dm-2: ios=3991/4196, merge=0/0, ticks=33880/43220, in_queue=77124, util=96.78%, aggrios=4112/4143, aggrmerge=0/56, aggrticks=34944/42968, aggrin_queue=77904, aggrutil=96.79%
sda: ios=4112/4143, merge=0/56, ticks=34944/42968, in_queue=77904, util=96.79%
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS
2011-08-03 19:31 ` Measuring IOPS Martin Steigerwald
@ 2011-08-03 20:22 ` Jeff Moyer
2011-08-03 20:33 ` Martin Steigerwald
2011-08-03 20:42 ` Martin Steigerwald
0 siblings, 2 replies; 20+ messages in thread
From: Jeff Moyer @ 2011-08-03 20:22 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: fio
Martin Steigerwald <Martin@lichtvoll.de> writes:
> - ioengine=libaio
> - direct=1
> - and then due to direct I/O alignment requirement: bsrange=2k-16k
>
> So I now also fully understand that ioengine=sync just refers to the
> synchronous nature of the system calls used, not on whether the I/Os are
> issued synchronously via sync=1 or by circumventing the page cache via
> direct=1
>
> Attached are results that bring down IOPS on read drastically! I first let
> sequentiell.job write out the complete 2 gb with random data and then ran
> the iops.job.
If you want to measure the maximum iops, then you should consider
driving iodepths > 1. Assuming you are testing a sata ssd, try using a
depth of 64 (twice the NCQ depth).
Cheers,
Jeff
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS
2011-08-03 20:22 ` Jeff Moyer
@ 2011-08-03 20:33 ` Martin Steigerwald
2011-08-04 7:50 ` Jens Axboe
2011-08-03 20:42 ` Martin Steigerwald
1 sibling, 1 reply; 20+ messages in thread
From: Martin Steigerwald @ 2011-08-03 20:33 UTC (permalink / raw)
To: Jeff Moyer; +Cc: fio
Am Mittwoch, 3. August 2011 schrieb Jeff Moyer:
> Martin Steigerwald <Martin@lichtvoll.de> writes:
> > - ioengine=libaio
> > - direct=1
> > - and then due to direct I/O alignment requirement: bsrange=2k-16k
> >
> > So I now also fully understand that ioengine=sync just refers to the
> > synchronous nature of the system calls used, not on whether the I/Os
> > are issued synchronously via sync=1 or by circumventing the page
> > cache via direct=1
> >
> > Attached are results that bring down IOPS on read drastically! I
> > first let sequentiell.job write out the complete 2 gb with random
> > data and then ran the iops.job.
>
> If you want to measure the maximum iops, then you should consider
> driving iodepths > 1. Assuming you are testing a sata ssd, try using a
> depth of 64 (twice the NCQ depth).
Yes, I thought about that too, but then also read about the
"recommendation" to use an iodepth of one in a post here:
http://www.spinics.net/lists/fio/msg00502.html
What will be used in regular workloads - say Linux desktop on an SSD here?
I would bet that Linux uses what it can get? What about server workloads
like mail processing on SAS disks or fileserver on SATA disks and such
like?
Twice of
merkaba:~> hdparm -I /dev/sda | grep -i queue
Queue depth: 32
* Native Command Queueing (NCQ)
?
Why twice?
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS
2011-08-03 20:22 ` Jeff Moyer
2011-08-03 20:33 ` Martin Steigerwald
@ 2011-08-03 20:42 ` Martin Steigerwald
2011-08-03 20:50 ` Martin Steigerwald
1 sibling, 1 reply; 20+ messages in thread
From: Martin Steigerwald @ 2011-08-03 20:42 UTC (permalink / raw)
To: Jeff Moyer; +Cc: fio
Am Mittwoch, 3. August 2011 schrieben Sie:
> Martin Steigerwald <Martin@lichtvoll.de> writes:
> > - ioengine=libaio
> > - direct=1
> > - and then due to direct I/O alignment requirement: bsrange=2k-16k
> >
> > So I now also fully understand that ioengine=sync just refers to the
> > synchronous nature of the system calls used, not on whether the I/Os
> > are issued synchronously via sync=1 or by circumventing the page
> > cache via direct=1
> >
> > Attached are results that bring down IOPS on read drastically! I
> > first let sequentiell.job write out the complete 2 gb with random
> > data and then ran the iops.job.
>
> If you want to measure the maximum iops, then you should consider
> driving iodepths > 1. Assuming you are testing a sata ssd, try using a
> depth of 64 (twice the NCQ depth).
And additionally?
Does using iodepth > 1 need ioengine=libaio? Let�s see the manpage:
iodepth=int
Number of I/O units to keep in flight against the
file. Note that increasing iodepth beyond 1 will
not affect synchronous ioengines (except for small
degress when verify_async is in use). Even async
engines my impose OS restrictions causing the
desired depth not to be achieved. This may happen
on Linux when using libaio and not setting
direct=1, since buffered IO is not async on that
OS. Keep an eye on the IO depth distribution in
the fio output to verify that the achieved depth
is as expected. Default: 1.
Okay, yes, it does. I start getting a hang on it. Its a bit puzzling to
have two concepts of synchronous I/O around:
1) synchronous system call interfaces aka fio I/O engine
2) synchronous I/O requests aka O_SYNC
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS
2011-08-03 20:42 ` Martin Steigerwald
@ 2011-08-03 20:50 ` Martin Steigerwald
2011-08-04 8:51 ` Martin Steigerwald
0 siblings, 1 reply; 20+ messages in thread
From: Martin Steigerwald @ 2011-08-03 20:50 UTC (permalink / raw)
To: Jeff Moyer; +Cc: fio
Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
> Am Mittwoch, 3. August 2011 schrieben Sie:
> > Martin Steigerwald <Martin@lichtvoll.de> writes:
[...]
> Does using iodepth > 1 need ioengine=libaio? Let�s see the manpage:
>
> iodepth=int
> Number of I/O units to keep in flight against the
> file. Note that increasing iodepth beyond 1 will
> not affect synchronous ioengines (except for small
> degress when verify_async is in use). Even async
> engines my impose OS restrictions causing the
> desired depth not to be achieved. This may happen
> on Linux when using libaio and not setting
> direct=1, since buffered IO is not async on that
> OS. Keep an eye on the IO depth distribution in
> the fio output to verify that the achieved depth
> is as expected. Default: 1.
>
> Okay, yes, it does. I start getting a hang on it. Its a bit puzzling to
> have two concepts of synchronous I/O around:
>
> 1) synchronous system call interfaces aka fio I/O engine
>
> 2) synchronous I/O requests aka O_SYNC
But isn�t this a case for iodepth=1 if buffered I/O on Linux is
synchronous? I bet most regular applications except some databases use
buffered I/O.
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS
2011-08-03 20:33 ` Martin Steigerwald
@ 2011-08-04 7:50 ` Jens Axboe
0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2011-08-04 7:50 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: Jeff Moyer, fio
On 2011-08-03 22:33, Martin Steigerwald wrote:
> Am Mittwoch, 3. August 2011 schrieb Jeff Moyer:
>> Martin Steigerwald <Martin@lichtvoll.de> writes:
>>> - ioengine=libaio
>>> - direct=1
>>> - and then due to direct I/O alignment requirement: bsrange=2k-16k
>>>
>>> So I now also fully understand that ioengine=sync just refers to the
>>> synchronous nature of the system calls used, not on whether the I/Os
>>> are issued synchronously via sync=1 or by circumventing the page
>>> cache via direct=1
>>>
>>> Attached are results that bring down IOPS on read drastically! I
>>> first let sequentiell.job write out the complete 2 gb with random
>>> data and then ran the iops.job.
>>
>> If you want to measure the maximum iops, then you should consider
>> driving iodepths > 1. Assuming you are testing a sata ssd, try using a
>> depth of 64 (twice the NCQ depth).
>
> Yes, I thought about that too, but then also read about the
> "recommendation" to use an iodepth of one in a post here:
>
> http://www.spinics.net/lists/fio/msg00502.html
>
> What will be used in regular workloads - say Linux desktop on an SSD here?
> I would bet that Linux uses what it can get? What about server workloads
> like mail processing on SAS disks or fileserver on SATA disks and such
> like?
>
>
> Twice of
>
> merkaba:~> hdparm -I /dev/sda | grep -i queue
> Queue depth: 32
> * Native Command Queueing (NCQ)
>
> ?
>
> Why twice?
Twice is a good rule of thumb, since it allows both the drive some
freedom for scheduling to reduce rotational latencies, but it also
allows the OS to work on a larger range of requests. This is beneficial
mostly for merging of sequential requests, but also for scheduling
purposes.
So at least depth + a_few, 2*depth is a good default.
--
Jens Axboe
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS
2011-08-03 20:50 ` Martin Steigerwald
@ 2011-08-04 8:51 ` Martin Steigerwald
2011-08-04 8:58 ` Jens Axboe
0 siblings, 1 reply; 20+ messages in thread
From: Martin Steigerwald @ 2011-08-04 8:51 UTC (permalink / raw)
To: Jeff Moyer; +Cc: fio, Jens Axboe
Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
> Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
> > Am Mittwoch, 3. August 2011 schrieben Sie:
> > > Martin Steigerwald <Martin@lichtvoll.de> writes:
> [...]
>
> > Does using iodepth > 1 need ioengine=libaio? Let´s see the manpage:
> > iodepth=int
> >
> > Number of I/O units to keep in flight against the
> > file. Note that increasing iodepth beyond 1 will
> > not affect synchronous ioengines (except for small
> > degress when verify_async is in use). Even async
> > engines my impose OS restrictions causing the
> > desired depth not to be achieved. This may happen
> > on Linux when using libaio and not setting
> > direct=1, since buffered IO is not async on that
> > OS. Keep an eye on the IO depth distribution in
> > the fio output to verify that the achieved depth
> > is as expected. Default: 1.
> >
> > Okay, yes, it does. I start getting a hang on it. Its a bit puzzling
> > to have two concepts of synchronous I/O around:
> >
> > 1) synchronous system call interfaces aka fio I/O engine
> >
> > 2) synchronous I/O requests aka O_SYNC
>
> But isn´t this a case for iodepth=1 if buffered I/O on Linux is
> synchronous? I bet most regular applications except some databases use
> buffered I/O.
Thanks a lot for your answers, Jens, Jeff, DongJin.
Now what about the above one?
In what cases is iodepth > 1 relevant, when Linux buffered I/O is
synchronous? For mutiple threads or processes?
One process / thread can only submit one I/O at a time with synchronous
system call I/O, but the function returns when the stuff is in the page
cache. So first why can´t Linux use iodepth > 1 when there is lots of stuff
in the page cache to be written out? That should help the single process
case.
On the mutiple process/threadsa case Linux gets several I/O requests from
mutiple processes/threads and thus iodepth > 1 does make sense?
Maybe it helps getting clear where in the stack iodepth is located at, is
it
process / thread
systemcall
pagecache
blocklayer
iodepth
device driver
device
? If so, why can´t Linux not make use of iodepth > 1 with synchronous
system call I/O? Or is it further up on the system call level? But then
what sense would it make there, when using system calls that are
asynchronous already?
(Is that ordering above correct at all?)
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS
2011-08-04 8:51 ` Martin Steigerwald
@ 2011-08-04 8:58 ` Jens Axboe
2011-08-04 9:34 ` Martin Steigerwald
0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2011-08-04 8:58 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: Jeff Moyer, fio
On 2011-08-04 10:51, Martin Steigerwald wrote:
> Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
>> Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
>>> Am Mittwoch, 3. August 2011 schrieben Sie:
>>>> Martin Steigerwald <Martin@lichtvoll.de> writes:
>> [...]
>>
>>> Does using iodepth > 1 need ioengine=libaio? Let�s see the manpage:
>>> iodepth=int
>>>
>>> Number of I/O units to keep in flight against the
>>> file. Note that increasing iodepth beyond 1 will
>>> not affect synchronous ioengines (except for small
>>> degress when verify_async is in use). Even async
>>> engines my impose OS restrictions causing the
>>> desired depth not to be achieved. This may happen
>>> on Linux when using libaio and not setting
>>> direct=1, since buffered IO is not async on that
>>> OS. Keep an eye on the IO depth distribution in
>>> the fio output to verify that the achieved depth
>>> is as expected. Default: 1.
>>>
>>> Okay, yes, it does. I start getting a hang on it. Its a bit puzzling
>>> to have two concepts of synchronous I/O around:
>>>
>>> 1) synchronous system call interfaces aka fio I/O engine
>>>
>>> 2) synchronous I/O requests aka O_SYNC
>>
>> But isn�t this a case for iodepth=1 if buffered I/O on Linux is
>> synchronous? I bet most regular applications except some databases use
>> buffered I/O.
>
> Thanks a lot for your answers, Jens, Jeff, DongJin.
>
> Now what about the above one?
>
> In what cases is iodepth > 1 relevant, when Linux buffered I/O is
> synchronous? For mutiple threads or processes?
iodepth controls what depth fio operates at, not the OS. You are right
in that with iodepth=1, for buffered writes you could be seeing a much
higher depth on the device side.
So think of iodepth as how many IO units fio can have in flight, nothing
else.
> One process / thread can only submit one I/O at a time with synchronous
> system call I/O, but the function returns when the stuff is in the page
> cache. So first why can�t Linux use iodepth > 1 when there is lots of stuff
> in the page cache to be written out? That should help the single process
> case.
Since the IO unit is done when the system call returns, you can never
have more than the one in flight for a sync engine. So iodepth > 1 makes
no sense for a sync engine.
> On the mutiple process/threadsa case Linux gets several I/O requests from
> mutiple processes/threads and thus iodepth > 1 does make sense?
No.
> Maybe it helps getting clear where in the stack iodepth is located at, is
> it
>
> process / thread
> systemcall
> pagecache
> blocklayer
> iodepth
> device driver
> device
>
> ? If so, why can�t Linux not make use of iodepth > 1 with synchronous
> system call I/O? Or is it further up on the system call level? But then
Because it is sync. The very nature of the sync system calls is that
submission and completion are one event. For libaio, you could submit a
bunch of requests before retrieving or waiting for completion of any one
of them.
The only example where a sync engine could drive a higher queue depth on
the device side is buffered writes. For any other case (reads, direct
writes), you need async submission to build up a higher queue depth.
> what sense would it make there, when using system calls that are
> asynchronous already?
> (Is that ordering above correct at all?)
Your ordering looks OK. Now consider where and how you end up waiting
for issued IO, that should tell you where queue depth could build up or
not.
--
Jens Axboe
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS
2011-08-04 8:58 ` Jens Axboe
@ 2011-08-04 9:34 ` Martin Steigerwald
2011-08-04 10:02 ` Jens Axboe
0 siblings, 1 reply; 20+ messages in thread
From: Martin Steigerwald @ 2011-08-04 9:34 UTC (permalink / raw)
To: Jens Axboe; +Cc: Jeff Moyer, fio
Am Donnerstag, 4. August 2011 schrieb Jens Axboe:
> On 2011-08-04 10:51, Martin Steigerwald wrote:
> > Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
> >> Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
> >>> Am Mittwoch, 3. August 2011 schrieben Sie:
> >>>> Martin Steigerwald <Martin@lichtvoll.de> writes:
> >> [...]
> >>
> >>> Does using iodepth > 1 need ioengine=libaio? Let´s see the manpage:
> >>> iodepth=int
> >>>
> >>> Number of I/O units to keep in flight against the
> >>> file. Note that increasing iodepth beyond 1 will
> >>> not affect synchronous ioengines (except for small
> >>> degress when verify_async is in use). Even async
> >>> engines my impose OS restrictions causing the
> >>> desired depth not to be achieved. This may happen
> >>> on Linux when using libaio and not setting
> >>> direct=1, since buffered IO is not async on that
> >>> OS. Keep an eye on the IO depth distribution in
> >>> the fio output to verify that the achieved depth
> >>> is as expected. Default: 1.
> >>>
> >>> Okay, yes, it does. I start getting a hang on it. Its a bit
> >>> puzzling to have two concepts of synchronous I/O around:
> >>>
> >>> 1) synchronous system call interfaces aka fio I/O engine
> >>>
> >>> 2) synchronous I/O requests aka O_SYNC
> >>
> >> But isn´t this a case for iodepth=1 if buffered I/O on Linux is
> >> synchronous? I bet most regular applications except some databases
> >> use buffered I/O.
> >
> > Thanks a lot for your answers, Jens, Jeff, DongJin.
> >
> > Now what about the above one?
> >
> > In what cases is iodepth > 1 relevant, when Linux buffered I/O is
> > synchronous? For mutiple threads or processes?
>
> iodepth controls what depth fio operates at, not the OS. You are right
> in that with iodepth=1, for buffered writes you could be seeing a much
> higher depth on the device side.
>
> So think of iodepth as how many IO units fio can have in flight,
> nothing else.
Ah okay. So when using iodepth=64 and ioengine=libaio with fio then fio
issues 64 I/O requests at once before it bothers waiting for I/O requests
to complete. And as the block layer completes I/O requests fio fills up the
64 I/O requests queue. Right?
Now when I do have two jobs running at once and iodepth=64, will each
process submit 64 I/O requests before waiting thus having at most 128 I/O
requests in flight? Or will each process use 32 I/O requests? My bet is
that iodepth is per job, per process.
> > One process / thread can only submit one I/O at a time with
> > synchronous system call I/O, but the function returns when the stuff
> > is in the page cache. So first why can´t Linux use iodepth > 1 when
> > there is lots of stuff in the page cache to be written out? That
> > should help the single process case.
>
> Since the IO unit is done when the system call returns, you can never
> have more than the one in flight for a sync engine. So iodepth > 1
> makes no sense for a sync engine.
Makes perfect sense then I understand that iodepth option related to what
the fio processes do.
> > On the mutiple process/threadsa case Linux gets several I/O requests
> > from mutiple processes/threads and thus iodepth > 1 does make sense?
>
> No.
Since each synchronous system call I/O fio job still submits one I/O at a
time...
> > Maybe it helps getting clear where in the stack iodepth is located
> > at, is it
> >
> > process / thread
> > systemcall
> > pagecache
> > blocklayer
> > iodepth
> > device driver
> > device
> >
> > ? If so, why can´t Linux not make use of iodepth > 1 with
> > synchronous system call I/O? Or is it further up on the system call
> > level? But then
>
> Because it is sync. The very nature of the sync system calls is that
> submission and completion are one event. For libaio, you could submit a
> bunch of requests before retrieving or waiting for completion of any
> one of them.
>
> The only example where a sync engine could drive a higher queue depth
> on the device side is buffered writes. For any other case (reads,
> direct writes), you need async submission to build up a higher queue
> depth.
Great! I think that makes it pretty clear.
Thus when I want to read subsequent blocks 1, 2, 3, 4, 5, 6, 7, 8, 9 and
10 from a file at once and then wait I need async I/O. Block might be of
arbitrary size.
What when I use 10 processes, each reading one of these blocks as once?
Couldn´t this fill up the queue at the device level? But then different
processes usually read different files...
... my question hints at how I/O depths might accumulate at the device
level, when several processes are issuing read and/or write requests at
once.
> > what sense would it make there, when using system calls that are
> > asynchronous already?
> > (Is that ordering above correct at all?)
>
> Your ordering looks OK. Now consider where and how you end up waiting
> for issued IO, that should tell you where queue depth could build up or
> not.
So we have several levels of queue depth.
- queue depth at the system call level
- queue depth at device level
=== sync I/O engines ===
queue depth at the system call level = 1
== reads ==
queue depth at the device level = 1
since read() returns when the data is in RAM and thus is synchronous I/O
on the lower level by nature
page cache will be used unless direct=1, so one might be measuring RAM /
read ahead performance, especially when several read jobs are running
concurrently.
writes might not hit the device unless direct=1 and thus one should use
larger than RAM file size.
== writes ==
queue depth at the device level = depending on the workload upto what the
device supports
unless direct=1, cause then write() is doing synchronous I/O on the lower
level and only returns when data is at least in drive cache
=== libaio ===
queue depth at the system call level = iodepth option of fio
as long as direct=1, since libaio falls back to synchronous system calls
with buffered writes
queue depth at the device level = same
fio submits as much I/Os as specified by iodepth and only then waits. As the
block layer completes I/Os fio fills up the queue.
conclusion:
thus when I want to measure higher I/O depths at read I need libaio and
direct=1. but then I am measuring something that does not have any
practical effect on processes that use synchronous system call I/O.
so for regular applications ioengine=sync + iodepth=64 gives more
realistic results - even when its then just I/O depth 1 for reads - and
for databases that use direct I/O ioengine=libaio makes sense and will
cause higher I/O depths on the device side if it supports it.
anything without direct=1 (or the slower sync=1) is potentially measuring
RAM performance. direct=1 omits the page cache. sync=1 basically disables
caching on the device / controller side as well.
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS
2011-08-04 9:34 ` Martin Steigerwald
@ 2011-08-04 10:02 ` Jens Axboe
2011-08-04 10:23 ` Martin Steigerwald
0 siblings, 1 reply; 20+ messages in thread
From: Jens Axboe @ 2011-08-04 10:02 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: Jeff Moyer, fio
On 2011-08-04 11:34, Martin Steigerwald wrote:
> Am Donnerstag, 4. August 2011 schrieb Jens Axboe:
>> On 2011-08-04 10:51, Martin Steigerwald wrote:
>>> Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
>>>> Am Mittwoch, 3. August 2011 schrieb Martin Steigerwald:
>>>>> Am Mittwoch, 3. August 2011 schrieben Sie:
>>>>>> Martin Steigerwald <Martin@lichtvoll.de> writes:
>>>> [...]
>>>>
>>>>> Does using iodepth > 1 need ioengine=libaio? Let�s see the manpage:
>>>>> iodepth=int
>>>>>
>>>>> Number of I/O units to keep in flight against the
>>>>> file. Note that increasing iodepth beyond 1 will
>>>>> not affect synchronous ioengines (except for small
>>>>> degress when verify_async is in use). Even async
>>>>> engines my impose OS restrictions causing the
>>>>> desired depth not to be achieved. This may happen
>>>>> on Linux when using libaio and not setting
>>>>> direct=1, since buffered IO is not async on that
>>>>> OS. Keep an eye on the IO depth distribution in
>>>>> the fio output to verify that the achieved depth
>>>>> is as expected. Default: 1.
>>>>>
>>>>> Okay, yes, it does. I start getting a hang on it. Its a bit
>>>>> puzzling to have two concepts of synchronous I/O around:
>>>>>
>>>>> 1) synchronous system call interfaces aka fio I/O engine
>>>>>
>>>>> 2) synchronous I/O requests aka O_SYNC
>>>>
>>>> But isn�t this a case for iodepth=1 if buffered I/O on Linux is
>>>> synchronous? I bet most regular applications except some databases
>>>> use buffered I/O.
>>>
>>> Thanks a lot for your answers, Jens, Jeff, DongJin.
>>>
>>> Now what about the above one?
>>>
>>> In what cases is iodepth > 1 relevant, when Linux buffered I/O is
>>> synchronous? For mutiple threads or processes?
>>
>> iodepth controls what depth fio operates at, not the OS. You are right
>> in that with iodepth=1, for buffered writes you could be seeing a much
>> higher depth on the device side.
>>
>> So think of iodepth as how many IO units fio can have in flight,
>> nothing else.
>
> Ah okay. So when using iodepth=64 and ioengine=libaio with fio then fio
> issues 64 I/O requests at once before it bothers waiting for I/O requests
> to complete. And as the block layer completes I/O requests fio fills up the
> 64 I/O requests queue. Right?
Not quite right, the iodepth=64 will mean that fio can have 64
_pending_, not that it necessarily submits or retrieves that many at the
time. The latter two are controlled by the iodepth_batch (and
iodepth_batch_*) settings.
> Now when I do have two jobs running at once and iodepth=64, will each
> process submit 64 I/O requests before waiting thus having at most 128 I/O
> requests in flight? Or will each process use 32 I/O requests? My bet is
> that iodepth is per job, per process.
iodepth is per job/process/thread. So each will have 64 requests.
>>> One process / thread can only submit one I/O at a time with
>>> synchronous system call I/O, but the function returns when the stuff
>>> is in the page cache. So first why can�t Linux use iodepth > 1 when
>>> there is lots of stuff in the page cache to be written out? That
>>> should help the single process case.
>>
>> Since the IO unit is done when the system call returns, you can never
>> have more than the one in flight for a sync engine. So iodepth > 1
>> makes no sense for a sync engine.
>
> Makes perfect sense then I understand that iodepth option related to what
> the fio processes do.
>
>>> On the mutiple process/threadsa case Linux gets several I/O requests
>>> from mutiple processes/threads and thus iodepth > 1 does make sense?
>>
>> No.
>
> Since each synchronous system call I/O fio job still submits one I/O at a
> time...
Because each sync system call returns with the IO completed already, not
just queued for completion.
>>> Maybe it helps getting clear where in the stack iodepth is located
>>> at, is it
>>>
>>> process / thread
>>> systemcall
>>> pagecache
>>> blocklayer
>>> iodepth
>>> device driver
>>> device
>>>
>>> ? If so, why can�t Linux not make use of iodepth > 1 with
>>> synchronous system call I/O? Or is it further up on the system call
>>> level? But then
>>
>> Because it is sync. The very nature of the sync system calls is that
>> submission and completion are one event. For libaio, you could submit a
>> bunch of requests before retrieving or waiting for completion of any
>> one of them.
>>
>> The only example where a sync engine could drive a higher queue depth
>> on the device side is buffered writes. For any other case (reads,
>> direct writes), you need async submission to build up a higher queue
>> depth.
>
> Great! I think that makes it pretty clear.
>
> Thus when I want to read subsequent blocks 1, 2, 3, 4, 5, 6, 7, 8, 9 and
> 10 from a file at once and then wait I need async I/O. Block might be of
> arbitrary size.
>
> What when I use 10 processes, each reading one of these blocks as once?
> Couldn�t this fill up the queue at the device level? But then different
> processes usually read different files...
Yes, you could get the same IO on the device side with just more
processes instead of using async IO. It would not be as efficient,
though.
> ... my question hints at how I/O depths might accumulate at the device
> level, when several processes are issuing read and/or write requests at
> once.
Various things can impact that, ultimately the IO scheduler decides when
to dispatch more requests to the driver.
>>> what sense would it make there, when using system calls that are
>>> asynchronous already?
>>> (Is that ordering above correct at all?)
>>
>> Your ordering looks OK. Now consider where and how you end up waiting
>> for issued IO, that should tell you where queue depth could build up or
>> not.
>
> So we have several levels of queue depth.
>
> - queue depth at the system call level
> - queue depth at device level
Not sure I like the 'system call level' title, but yes. Lets call it
application and device level.
> === sync I/O engines ===
> queue depth at the system call level = 1
>
> == reads ==
> queue depth at the device level = 1
> since read() returns when the data is in RAM and thus is synchronous I/O
> on the lower level by nature
>
> page cache will be used unless direct=1, so one might be measuring RAM /
> read ahead performance, especially when several read jobs are running
> concurrently.
>
> writes might not hit the device unless direct=1 and thus one should use
> larger than RAM file size.
>
> == writes ==
> queue depth at the device level = depending on the workload upto what the
> device supports
>
> unless direct=1, cause then write() is doing synchronous I/O on the lower
> level and only returns when data is at least in drive cache
Correct, or unless O_SYNC is used.
> === libaio ===
> queue depth at the system call level = iodepth option of fio
>
> as long as direct=1, since libaio falls back to synchronous system calls
> with buffered writes
>
> queue depth at the device level = same
Not necessarily the same, up to the same.
> fio submits as much I/Os as specified by iodepth and only then waits. As the
> block layer completes I/Os fio fills up the queue.
That's not true, see earlier comment on what controls how many IOs are
submitted in one go and completed in one go.
> conclusion:
>
> thus when I want to measure higher I/O depths at read I need libaio and
> direct=1. but then I am measuring something that does not have any
> practical effect on processes that use synchronous system call I/O.
>
> so for regular applications ioengine=sync + iodepth=64 gives more
> realistic results - even when its then just I/O depth 1 for reads - and
> for databases that use direct I/O ioengine=libaio makes sense and will
> cause higher I/O depths on the device side if it supports it.
iodepth > 1 makes no sense for sync engines...
> anything without direct=1 (or the slower sync=1) is potentially measuring
> RAM performance. direct=1 omits the page cache. sync=1 basically disables
> caching on the device / controller side as well.
Not quite measuring RAM (or copy) performance, at some point fio will be
blocked by the OS and prevented from dirtying more memory. At that point
it'll either just wait, or participate in flushing out dirty data. For
any buffered write workload, it'll quickly de-generate into that.
--
Jens Axboe
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS
2011-08-04 10:02 ` Jens Axboe
@ 2011-08-04 10:23 ` Martin Steigerwald
2011-08-05 7:28 ` Jens Axboe
0 siblings, 1 reply; 20+ messages in thread
From: Martin Steigerwald @ 2011-08-04 10:23 UTC (permalink / raw)
To: Jens Axboe; +Cc: Jeff Moyer, fio
[-- Attachment #1: Type: Text/Plain, Size: 3798 bytes --]
Am Donnerstag, 4. August 2011 schrieb Jens Axboe:
> On 2011-08-04 11:34, Martin Steigerwald wrote:
> > Am Donnerstag, 4. August 2011 schrieb Jens Axboe:
> >> On 2011-08-04 10:51, Martin Steigerwald wrote:
[...]
> >> The only example where a sync engine could drive a higher queue
> >> depth on the device side is buffered writes. For any other case
> >> (reads, direct writes), you need async submission to build up a
> >> higher queue depth.
> >
> > Great! I think that makes it pretty clear.
> >
> > Thus when I want to read subsequent blocks 1, 2, 3, 4, 5, 6, 7, 8, 9
> > and 10 from a file at once and then wait I need async I/O. Block
> > might be of arbitrary size.
> >
> > What when I use 10 processes, each reading one of these blocks as
> > once? Couldn´t this fill up the queue at the device level? But then
> > different processes usually read different files...
>
> Yes, you could get the same IO on the device side with just more
> processes instead of using async IO. It would not be as efficient,
> though.
Just tried this. See attached file.
> > ... my question hints at how I/O depths might accumulate at the
> > device level, when several processes are issuing read and/or write
> > requests at once.
>
> Various things can impact that, ultimately the IO scheduler decides
> when to dispatch more requests to the driver.
Okay. I think not as efficient as asynchronous I/O on the application level
will do it for now ;).
> > conclusion:
> >
> > thus when I want to measure higher I/O depths at read I need libaio
> > and direct=1. but then I am measuring something that does not have
> > any practical effect on processes that use synchronous system call
> > I/O.
> >
> > so for regular applications ioengine=sync + iodepth=64 gives more
> > realistic results - even when its then just I/O depth 1 for reads -
> > and for databases that use direct I/O ioengine=libaio makes sense
> > and will cause higher I/O depths on the device side if it supports
> > it.
>
> iodepth > 1 makes no sense for sync engines...
I mixed it up again, sorry.
Yes, so for regular application using sync which implies iodepth=1 might
still give me a higher iodepth at the device level for buffered writes. A
higher iodepth on the device side is only possible with having more than
one process with sync engine running at the same time, but this would not
be as efficient as asynchronous I/O.
> > anything without direct=1 (or the slower sync=1) is potentially
> > measuring RAM performance. direct=1 omits the page cache. sync=1
> > basically disables caching on the device / controller side as well.
>
> Not quite measuring RAM (or copy) performance, at some point fio will
> be blocked by the OS and prevented from dirtying more memory. At that
> point it'll either just wait, or participate in flushing out dirty
> data. For any buffered write workload, it'll quickly de-generate into
> that.
Which depends on the size of the job, cause I for bet 1 GB/s with 250000
IOPS I need some PCI express based SSD solution - a SATA-300 SSD like the
Intel SSD 320 in use here can´t reach this (see attached file). It seems
with 8 GB of RAM I need more than one GB to write in order to get
meaningful results (related to raw SSD performance). With Ext4 delayed
allocation a subsequent rm might even cause the file to not be written at
all.
For the application side of thing it can make perfect sense to measure
buffered writes. But one should go with a large enough data set in order to
get meaningful results. At least when the application uses a large dataset
too ;).
Thanks,
--
Martin 'Helios' Steigerwald - http://www.Lichtvoll.de
GPG: 03B0 0D6C 0040 0710 4AFA B82F 991B EAAC A599 84C7
[-- Attachment #2: parallel-lesen.txt --]
[-- Type: text/plain, Size: 5110 bytes --]
martin@merkaba:~/Zeit> fio --name massiveparallelreads --ioengine=sync --direct=1 --rw=randread --size=1g --filename=testfile --group_reporting --numjobs=1 --runtime=60
massiveparallelreads: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
fio 1.57
Starting 1 process
Jobs: 1 (f=1): [r] [100.0% done] [17316K/0K /s] [4227 /0 iops] [eta 00m:00s]
massiveparallelreads: (groupid=0, jobs=1): err= 0: pid=19234
read : io=987.22MB, bw=16848KB/s, iops=4212 , runt= 60001msec
clat (usec): min=170 , max=8073 , avg=225.79, stdev=66.71
lat (usec): min=170 , max=8073 , avg=226.21, stdev=66.73
bw (KB/s) : min=15992, max=17248, per=100.02%, avg=16852.15, stdev=184.38
cpu : usr=5.29%, sys=11.79%, ctx=254656, majf=0, minf=24
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=252725/0/0, short=0/0/0
lat (usec): 250=88.02%, 500=11.85%, 750=0.04%, 1000=0.02%
lat (msec): 2=0.04%, 4=0.03%, 10=0.01%
Run status group 0 (all jobs):
READ: io=987.22MB, aggrb=16848KB/s, minb=17252KB/s, maxb=17252KB/s, mint=60001msec, maxt=60001msec
Disk stats (read/write):
dm-2: ios=252641/34, merge=0/0, ticks=47496/0, in_queue=47492, util=79.16%, aggrios=252725/26, aggrmerge=0/15, aggrticks=47152/8, aggrin_queue=46888, aggrutil=78.09%
sda: ios=252725/26, merge=0/15, ticks=47152/8, in_queue=46888, util=78.09%
martin@merkaba:~/Zeit> fio --name massiveparallelreads --ioengine=sync --direct=1 --rw=randread --size=1g --filename=testfile --group_reporting --numjobs=8 --runtime=60
massiveparallelreads: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
...
massiveparallelreads: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
fio 1.57
Starting 8 processes
Jobs: 8 (f=8): [rrrrrrrr] [100.0% done] [101.9M/0K /s] [25.5K/0 iops] [eta 00m:00s]
massiveparallelreads: (groupid=0, jobs=8): err= 0: pid=19237
read : io=5855.1MB, bw=99925KB/s, iops=24981 , runt= 60001msec
clat (usec): min=171 , max=83857 , avg=313.75, stdev=73.37
lat (usec): min=171 , max=83858 , avg=314.15, stdev=73.36
bw (KB/s) : min=10008, max=13504, per=12.49%, avg=12485.41, stdev=94.90
cpu : usr=1.81%, sys=10.25%, ctx=1514639, majf=0, minf=214
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=1498905/0/0, short=0/0/0
lat (usec): 250=16.99%, 500=81.51%, 750=1.28%, 1000=0.09%
lat (msec): 2=0.09%, 4=0.02%, 10=0.01%, 20=0.01%, 50=0.01%
lat (msec): 100=0.01%
Run status group 0 (all jobs):
READ: io=5855.1MB, aggrb=99925KB/s, minb=102323KB/s, maxb=102323KB/s, mint=60001msec, maxt=60001msec
Disk stats (read/write):
dm-2: ios=1497855/17, merge=0/0, ticks=405216/448, in_queue=405812, util=99.81%, aggrios=1498905/30, aggrmerge=0/11, aggrticks=405948/468, aggrin_queue=405140, aggrutil=99.77%
sda: ios=1498905/30, merge=0/11, ticks=405948/468, in_queue=405140, util=99.77%
martin@merkaba:~/Zeit> fio --name massiveparallelreads --ioengine=sync --direct=1 --rw=randread --size=1g --filename=testfile --group_reporting --numjobs=32 --runtime=60
massiveparallelreads: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
...
massiveparallelreads: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
fio 1.57
Starting 32 processes
Jobs: 32 (f=32): [rrrrrrrrrrrrrrrrrrrrrrrrrrrrrrrr] [100.0% done] [157.1M/0K /s] [39.5K/0 iops] [eta 00m:00s]
massiveparallelreads: (groupid=0, jobs=32): err= 0: pid=19583
read : io=9199.6MB, bw=157003KB/s, iops=39250 , runt= 60001msec
clat (usec): min=173 , max=1016.5K, avg=818.37, stdev=326.10
lat (usec): min=174 , max=1016.5K, avg=818.60, stdev=326.10
bw (KB/s) : min= 3, max= 9464, per=3.11%, avg=4885.51, stdev=82.96
cpu : usr=0.51%, sys=2.77%, ctx=2399581, majf=0, minf=878
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=2355087/0/0, short=0/0/0
lat (usec): 250=0.03%, 500=0.97%, 750=33.84%, 1000=59.32%
lat (msec): 2=5.79%, 4=0.03%, 10=0.01%, 20=0.01%, 50=0.01%
lat (msec): 250=0.01%, 500=0.01%, 750=0.01%, 1000=0.01%, 2000=0.01%
Run status group 0 (all jobs):
READ: io=9199.6MB, aggrb=157003KB/s, minb=160771KB/s, maxb=160771KB/s, mint=60001msec, maxt=60001msec
Disk stats (read/write):
dm-2: ios=2352589/34, merge=0/0, ticks=1820480/101988, in_queue=1924564, util=99.91%, aggrios=2355082/52, aggrmerge=0/24, aggrticks=1822620/98852, aggrin_queue=1920740, aggrutil=99.87%
sda: ios=2355082/52, merge=0/24, ticks=1822620/98852, in_queue=1920740, util=99.87%
martin@merkaba:~/Zeit>
[-- Attachment #3: cachedwrite.txt --]
[-- Type: text/plain, Size: 7381 bytes --]
martin@merkaba:~/Zeit> fio -name cachedwrite --ioengine=sync --buffered=1 --rw write --size=100
cachedwrite: (g=0): rw=write, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
fio 1.57
Starting 1 process
cachedwrite: Laying out IO file(s) (1 file(s) / 0MB)
Run status group 0 (all jobs):
Disk stats (read/write):
dm-2: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=-nan%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
martin@merkaba:~/Zeit> fio -name cachedwrite --ioengine=sync --buffered=1 --rw write --size=100m
cachedwrite: (g=0): rw=write, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
fio 1.57
Starting 1 process
cachedwrite: Laying out IO file(s) (1 file(s) / 100MB)
cachedwrite: (groupid=0, jobs=1): err= 0: pid=20543
write: io=102400KB, bw=1000.0MB/s, iops=256000 , runt= 100msec
clat (usec): min=2 , max=75 , avg= 3.33, stdev= 1.44
lat (usec): min=2 , max=76 , avg= 3.42, stdev= 1.53
cpu : usr=20.20%, sys=72.73%, ctx=9, majf=0, minf=24
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=0/25600/0, short=0/0/0
lat (usec): 4=82.50%, 10=15.28%, 20=2.21%, 50=0.01%, 100=0.01%
Run status group 0 (all jobs):
WRITE: io=102400KB, aggrb=1000.0MB/s, minb=1024.0MB/s, maxb=1024.0MB/s, mint=100msec, maxt=100msec
Disk stats (read/write):
dm-2: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=-nan%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
martin@merkaba:~/Zeit> fio -name cachedwrite --ioengine=sync --buffered=1 --rw write --size=100m
cachedwrite: (g=0): rw=write, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
fio 1.57
Starting 1 process
cachedwrite: (groupid=0, jobs=1): err= 0: pid=20552
write: io=102400KB, bw=966038KB/s, iops=241509 , runt= 106msec
clat (usec): min=2 , max=36 , avg= 2.88, stdev= 0.74
lat (usec): min=2 , max=36 , avg= 2.94, stdev= 0.72
cpu : usr=15.24%, sys=80.00%, ctx=11, majf=0, minf=24
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=0/25600/0, short=0/0/0
lat (usec): 4=92.35%, 10=7.57%, 20=0.05%, 50=0.02%
Run status group 0 (all jobs):
WRITE: io=102400KB, aggrb=966037KB/s, minb=989222KB/s, maxb=989222KB/s, mint=106msec, maxt=106msec
Disk stats (read/write):
dm-2: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=-nan%, aggrios=0/0, aggrmerge=0/0, aggrticks=0/0, aggrin_queue=0, aggrutil=0.00%
sda: ios=0/0, merge=0/0, ticks=0/0, in_queue=0, util=0.00%
martin@merkaba:~/Zeit> fio -name cachedwrite --ioengine=sync --buffered=1 --rw write --size=100m
cachedwrite: (g=0): rw=write, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
fio 1.57
Starting 1 process
cachedwrite: (groupid=0, jobs=1): err= 0: pid=20555
write: io=102400KB, bw=223581KB/s, iops=55895 , runt= 458msec
clat (usec): min=2 , max=74675 , avg=10.57, stdev=474.70
lat (usec): min=2 , max=74675 , avg=10.67, stdev=474.70
cpu : usr=3.50%, sys=30.63%, ctx=77, majf=0, minf=24
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=0/25600/0, short=0/0/0
lat (usec): 4=55.96%, 10=43.44%, 20=0.18%, 50=0.05%, 100=0.12%
lat (usec): 250=0.02%
lat (msec): 2=0.21%, 4=0.01%, 10=0.01%, 100=0.01%
Run status group 0 (all jobs):
WRITE: io=102400KB, aggrb=223580KB/s, minb=228946KB/s, maxb=228946KB/s, mint=458msec, maxt=458msec
Disk stats (read/write):
dm-2: ios=0/200, merge=0/0, ticks=0/31632, in_queue=39816, util=78.23%, aggrios=0/200, aggrmerge=0/0, aggrticks=0/40580, aggrin_queue=40580, aggrutil=78.05%
sda: ios=0/200, merge=0/0, ticks=0/40580, in_queue=40580, util=78.05%
martin@merkaba:~/Zeit> fio -name cachedwrite --ioengine=sync --buffered=1 --rw write --size=1000m
cachedwrite: (g=0): rw=write, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
fio 1.57
Starting 1 process
cachedwrite: Laying out IO file(s) (1 file(s) / 1000MB)
Jobs: 1 (f=1)
cachedwrite: (groupid=0, jobs=1): err= 0: pid=20558
write: io=1000.0MB, bw=581488KB/s, iops=145371 , runt= 1761msec
clat (usec): min=2 , max=28078 , avg= 6.29, stdev=236.66
lat (usec): min=2 , max=28078 , avg= 6.37, stdev=236.66
bw (KB/s) : min=214976, max=1004912, per=110.42%, avg=642049.67, stdev=398863.44
cpu : usr=12.05%, sys=50.23%, ctx=206, majf=0, minf=24
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=0/256000/0, short=0/0/0
lat (usec): 4=62.17%, 10=36.27%, 20=1.44%, 50=0.10%, 100=0.01%
lat (usec): 250=0.01%
lat (msec): 2=0.01%, 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%
Run status group 0 (all jobs):
WRITE: io=1000.0MB, aggrb=581487KB/s, minb=595443KB/s, maxb=595443KB/s, mint=1761msec, maxt=1761msec
Disk stats (read/write):
dm-2: ios=0/714, merge=0/0, ticks=0/152156, in_queue=173448, util=71.91%, aggrios=1/631, aggrmerge=0/1, aggrticks=76/174312, aggrin_queue=196172, aggrutil=73.97%
sda: ios=1/631, merge=0/1, ticks=76/174312, in_queue=196172, util=73.97%
martin@merkaba:~/Zeit> fio -name cachedwrite --ioengine=sync --buffered=1 --rw write --size=1000m
cachedwrite: (g=0): rw=write, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
fio 1.57
Starting 1 process
Jobs: 1 (f=1): [W] [-.-% done] [0K/659.1M /s] [0 /165K iops] [eta 00m:00s]
cachedwrite: (groupid=0, jobs=1): err= 0: pid=20561
write: io=1000.0MB, bw=316440KB/s, iops=79110 , runt= 3236msec
clat (usec): min=1 , max=120823 , avg= 6.54, stdev=342.44
lat (usec): min=1 , max=120823 , avg= 6.61, stdev=342.44
bw (KB/s) : min= 2, max=1111080, per=154.15%, avg=487795.00, stdev=488926.65
cpu : usr=6.18%, sys=30.79%, ctx=236, majf=0, minf=24
IO depths : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
submit : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
complete : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
issued r/w/d: total=0/256000/0, short=0/0/0
lat (usec): 2=0.17%, 4=77.44%, 10=21.17%, 20=1.00%, 50=0.09%
lat (usec): 100=0.10%, 250=0.01%, 500=0.01%, 750=0.01%
lat (msec): 4=0.01%, 10=0.01%, 20=0.01%, 50=0.01%, 250=0.01%
Run status group 0 (all jobs):
WRITE: io=1000.0MB, aggrb=316440KB/s, minb=324034KB/s, maxb=324034KB/s, mint=3236msec, maxt=3236msec
Disk stats (read/write):
dm-2: ios=0/1463, merge=0/0, ticks=0/378048, in_queue=411168, util=93.56%, aggrios=0/1348, aggrmerge=0/0, aggrticks=0/392324, aggrin_queue=423772, aggrutil=93.79%
sda: ios=0/1348, merge=0/0, ticks=0/392324, in_queue=423772, util=93.79%
martin@merkaba:~/Zeit>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Measuring IOPS
2011-08-04 10:23 ` Martin Steigerwald
@ 2011-08-05 7:28 ` Jens Axboe
0 siblings, 0 replies; 20+ messages in thread
From: Jens Axboe @ 2011-08-05 7:28 UTC (permalink / raw)
To: Martin Steigerwald; +Cc: Jeff Moyer, fio
On 2011-08-04 12:23, Martin Steigerwald wrote:
>> Not quite measuring RAM (or copy) performance, at some point fio will
>> be blocked by the OS and prevented from dirtying more memory. At that
>> point it'll either just wait, or participate in flushing out dirty
>> data. For any buffered write workload, it'll quickly de-generate into
>> that.
>
> Which depends on the size of the job, cause I for bet 1 GB/s with 250000
> IOPS I need some PCI express based SSD solution - a SATA-300 SSD like the
> Intel SSD 320 in use here can´t reach this (see attached file). It seems
Right, you'll need something state-of-the-art to reach those numbers,
and nothing on a SATA/SAS bus will be able to do that. Latencies and
transport overhead are just too large.
> with 8 GB of RAM I need more than one GB to write in order to get
> meaningful results (related to raw SSD performance). With Ext4 delayed
> allocation a subsequent rm might even cause the file to not be written at
> all.
Depending on the kernel, some percentage of total memory dirty will kick
off background writing. Some higher percentage will kick off direct
reclaim. So yes, the usual rule of thumb for buffered write performance
is that the job size should be at least twice that of RAM to yield
usable results.
> For the application side of thing it can make perfect sense to measure
> buffered writes. But one should go with a large enough data set in order to
> get meaningful results. At least when the application uses a large dataset
> too ;).
Indeed.
--
Jens Axboe
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2011-08-05 7:28 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-07-29 15:37 Measuring IOPS Martin Steigerwald
2011-07-29 16:14 ` Martin Steigerwald
2011-08-02 14:32 ` Measuring IOPS (solved, I think) Martin Steigerwald
2011-08-02 19:48 ` Jens Axboe
2011-08-02 21:28 ` Martin Steigerwald
2011-08-03 7:17 ` Jens Axboe
2011-08-03 9:03 ` Martin Steigerwald
2011-08-03 10:34 ` Jens Axboe
2011-08-03 19:31 ` Measuring IOPS Martin Steigerwald
2011-08-03 20:22 ` Jeff Moyer
2011-08-03 20:33 ` Martin Steigerwald
2011-08-04 7:50 ` Jens Axboe
2011-08-03 20:42 ` Martin Steigerwald
2011-08-03 20:50 ` Martin Steigerwald
2011-08-04 8:51 ` Martin Steigerwald
2011-08-04 8:58 ` Jens Axboe
2011-08-04 9:34 ` Martin Steigerwald
2011-08-04 10:02 ` Jens Axboe
2011-08-04 10:23 ` Martin Steigerwald
2011-08-05 7:28 ` Jens Axboe
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.