All of lore.kernel.org
 help / color / mirror / Atom feed
* IOPS higher than expected on randwrite, direct=1 tests
@ 2010-11-09 18:28 Sebastian Kayser
  2010-11-10  6:57 ` John Cagle
  0 siblings, 1 reply; 23+ messages in thread
From: Sebastian Kayser @ 2010-11-09 18:28 UTC (permalink / raw)
  To: fio

Greetings,

running fio 1.41 randwrite, direct=1 tests on Ubuntu against an
iSCSI-connected disk reports IOPS figures higher than I would expect, so
now I am trying to understand what's going on. Any pointers or
enlightenment appreciated, e.g. where's a caching effect that I am
currently missing (and if there's one, how can I deactivate it)?

Setup:
* iSCSI storage exports a 10GB LUN which sits on a single 7.2K SATA disk [1]
  Storage system write-cache is disabled (write-through)
* Client: Ubuntu 8.04 (kernel 2.6.24-28) w/ open-iscsi, 512MB RAM
* LUN mkfs.ext3'd and mounted on /mnt
* fio is configured to operate in O_DIRECT, O_SYNC mode
* Random write IOPS expected: <80
* Random write IOPS observed: ~300 (confirmed in multiple runs)

root@ubuntu-804-x64:~# df -h /mnt
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb1             9.7G  151M  9.1G   2% /mnt

root@ubuntu-804-x64:~# cat patterns.fio 
[global]
size=9g
runtime=120
direct=1
sync=1

[mnt-randwrite]
directory=/mnt
rw=randwrite

root@ubuntu-804-x64:~# ./fio --section=mnt-randwrite patterns.fio 
mnt-randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
Starting 1 process
mnt-randwrite: Laying out IO file(s) (1 file(s) / 9216MB)
Jobs: 1 (f=1): [w] [100.0% done] [0K/1243K /s] [0/303 iops] [eta 00m:00s]
mnt-randwrite: (groupid=0, jobs=1): err= 0: pid=9398
  write: io=126796KB, bw=1057KB/s, iops=264, runt=120010msec
    clat (usec): min=486, max=332821, avg=3782.24, stdev=5744.57
    bw (KB/s) : min=  187, max= 1641, per=100.27%, avg=1058.85, stdev=260.24
  cpu          : usr=0.02%, sys=0.19%, ctx=33866, majf=0, minf=241
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued r/w: total=0/31699, short=0/0
     lat (usec): 500=0.35%, 750=11.91%, 1000=1.97%
     lat (msec): 2=4.43%, 4=66.01%, 10=9.15%, 20=4.74%, 50=1.36%
     lat (msec): 100=0.06%, 250=0.02%, 500=0.01%

Run status group 0 (all jobs):
  WRITE: io=126796KB, aggrb=1056KB/s, minb=1081KB/s, maxb=1081KB/s, mint=120010msec, maxt=120010msec

Disk stats (read/write):
  sdb: ios=2091/31879, merge=0/33, ticks=30360/118040, in_queue=143320, util=99.30%


Unfortunately, this system as well as all the other Ubuntu systems which
I currently have access to are virtual machines, so I can't simply
compare the findings to a simple local disk.

Sebastian

[1] http://www.hitachigst.com/internal-drives/desktop/deskstar/deskstar-7k2000

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-09 18:28 IOPS higher than expected on randwrite, direct=1 tests Sebastian Kayser
@ 2010-11-10  6:57 ` John Cagle
  2010-11-10  8:22   ` Sebastian Kayser
  0 siblings, 1 reply; 23+ messages in thread
From: John Cagle @ 2010-11-10  6:57 UTC (permalink / raw)
  To: Sebastian Kayser; +Cc: fio

Hi Sebastian,

Is your iSCSI target also in a virtual machine?  If so, maybe the
hypervisor (vmware? kvm?) is caching the 10GB volume that is being
used by the iSCSI target?

John

On Tue, Nov 9, 2010 at 11:28 AM, Sebastian Kayser <sebastian@skayser.de> wrote:
> Greetings,
>
> running fio 1.41 randwrite, direct=1 tests on Ubuntu against an
> iSCSI-connected disk reports IOPS figures higher than I would expect, so
> now I am trying to understand what's going on. Any pointers or
> enlightenment appreciated, e.g. where's a caching effect that I am
> currently missing (and if there's one, how can I deactivate it)?
>
> Setup:
> * iSCSI storage exports a 10GB LUN which sits on a single 7.2K SATA disk [1]
> �Storage system write-cache is disabled (write-through)
> * Client: Ubuntu 8.04 (kernel 2.6.24-28) w/ open-iscsi, 512MB RAM
> * LUN mkfs.ext3'd and mounted on /mnt
> * fio is configured to operate in O_DIRECT, O_SYNC mode
> * Random write IOPS expected: <80
> * Random write IOPS observed: ~300 (confirmed in multiple runs)
>
> root@ubuntu-804-x64:~# df -h /mnt
> Filesystem � � � � � �Size �Used Avail Use% Mounted on
> /dev/sdb1 � � � � � � 9.7G �151M �9.1G � 2% /mnt
>
> root@ubuntu-804-x64:~# cat patterns.fio
> [global]
> size=9g
> runtime=120
> direct=1
> sync=1
>
> [mnt-randwrite]
> directory=/mnt
> rw=randwrite
>
> root@ubuntu-804-x64:~# ./fio --section=mnt-randwrite patterns.fio
> mnt-randwrite: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
> Starting 1 process
> mnt-randwrite: Laying out IO file(s) (1 file(s) / 9216MB)
> Jobs: 1 (f=1): [w] [100.0% done] [0K/1243K /s] [0/303 iops] [eta 00m:00s]
> mnt-randwrite: (groupid=0, jobs=1): err= 0: pid=9398
> �write: io=126796KB, bw=1057KB/s, iops=264, runt=120010msec
> � �clat (usec): min=486, max=332821, avg=3782.24, stdev=5744.57
> � �bw (KB/s) : min= �187, max= 1641, per=100.27%, avg=1058.85, stdev=260.24
> �cpu � � � � �: usr=0.02%, sys=0.19%, ctx=33866, majf=0, minf=241
> �IO depths � �: 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
> � � submit � �: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> � � complete �: 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
> � � issued r/w: total=0/31699, short=0/0
> � � lat (usec): 500=0.35%, 750=11.91%, 1000=1.97%
> � � lat (msec): 2=4.43%, 4=66.01%, 10=9.15%, 20=4.74%, 50=1.36%
> � � lat (msec): 100=0.06%, 250=0.02%, 500=0.01%
>
> Run status group 0 (all jobs):
> �WRITE: io=126796KB, aggrb=1056KB/s, minb=1081KB/s, maxb=1081KB/s, mint=120010msec, maxt=120010msec
>
> Disk stats (read/write):
> �sdb: ios=2091/31879, merge=0/33, ticks=30360/118040, in_queue=143320, util=99.30%
>
>
> Unfortunately, this system as well as all the other Ubuntu systems which
> I currently have access to are virtual machines, so I can't simply
> compare the findings to a simple local disk.
>
> Sebastian
>
> [1] http://www.hitachigst.com/internal-drives/desktop/deskstar/deskstar-7k2000
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at �http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10  6:57 ` John Cagle
@ 2010-11-10  8:22   ` Sebastian Kayser
  2010-11-10 14:09     ` Jens Axboe
  2010-11-11 16:22     ` Sebastian Kayser
  0 siblings, 2 replies; 23+ messages in thread
From: Sebastian Kayser @ 2010-11-10  8:22 UTC (permalink / raw)
  To: fio

Hi John,

* John Cagle <jcagle@gmail.com> wrote:
> Is your iSCSI target also in a virtual machine?  If so, maybe the
> hypervisor (vmware? kvm?) is caching the 10GB volume that is being
> used by the iSCSI target?

thanks for answering. No hypervisor involved on the iSCSI target, it's
an oldish Infortrend storage [1] for which I am trying to determine the
performance profile. Anything else along the stack that might skew the
test results?

Just to make sure my understanding is correct:
- direct=1 should mitigate (disable?) OS caching effects
- sync=1, iodepth=1 should make sure that an I/O has really made it to
  disk before the next on is issued, i.e. should de-facto disable
  I/O coalescing or device caching

Are these sane/valid assumptions?

Sebastian

[1] http://www.infortrend.com/main/2_product/es_a16e-g2130-4.asp

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10  8:22   ` Sebastian Kayser
@ 2010-11-10 14:09     ` Jens Axboe
  2010-11-10 17:18       ` Sebastian Kayser
  2010-11-11 16:22     ` Sebastian Kayser
  1 sibling, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2010-11-10 14:09 UTC (permalink / raw)
  To: Sebastian Kayser; +Cc: fio

On 2010-11-10 09:22, Sebastian Kayser wrote:
> Hi John,
> 
> * John Cagle <jcagle@gmail.com> wrote:
>> Is your iSCSI target also in a virtual machine?  If so, maybe the
>> hypervisor (vmware? kvm?) is caching the 10GB volume that is being
>> used by the iSCSI target?
> 
> thanks for answering. No hypervisor involved on the iSCSI target, it's
> an oldish Infortrend storage [1] for which I am trying to determine the
> performance profile. Anything else along the stack that might skew the
> test results?
> 
> Just to make sure my understanding is correct:
> - direct=1 should mitigate (disable?) OS caching effects
> - sync=1, iodepth=1 should make sure that an I/O has really made it to
>   disk before the next on is issued, i.e. should de-facto disable
>   I/O coalescing or device caching
> 
> Are these sane/valid assumptions?

Yes, your assumptions are valid. I think the issue here is that you give
randwrite, but don't also specify that you want overwrites. So what
happens is that the file will be truncated and then randomly written,
allowing the file system to place randomly chosen file block together.
If you remove the test file and re-run with overwrite=1 added, then see
if those results are more in line with what you expect.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10 14:09     ` Jens Axboe
@ 2010-11-10 17:18       ` Sebastian Kayser
  2010-11-10 18:58         ` Sebastian Kayser
  2010-11-10 19:48         ` Jens Axboe
  0 siblings, 2 replies; 23+ messages in thread
From: Sebastian Kayser @ 2010-11-10 17:18 UTC (permalink / raw)
  To: fio

* Jens Axboe <jaxboe@fusionio.com> wrote:
> On 2010-11-10 09:22, Sebastian Kayser wrote:
> > Just to make sure my understanding is correct:
> > - direct=1 should mitigate (disable?) OS caching effects
> > - sync=1, iodepth=1 should make sure that an I/O has really made it to
> >   disk before the next on is issued, i.e. should de-facto disable
> >   I/O coalescing or device caching
> > 
> > Are these sane/valid assumptions?
> 
> Yes, your assumptions are valid. I think the issue here is that you give
> randwrite, but don't also specify that you want overwrites. So what
> happens is that the file will be truncated and then randomly written,
> allowing the file system to place randomly chosen file block together.
> If you remove the test file and re-run with overwrite=1 added, then see
> if those results are more in line with what you expect.

Thanks for the additional aspect! So the possible pitfall is that w/o
overwrite=1 the test file could be sparse and the file system then
re-arrange random file addresses to fall onto consecutive device blocks?

(If so, wouldn't that be evil / deliberate data fragmentation?)

Tested it again with overwrite=1, same IOPS figures. Test file size as
measured with du -ks is the same in both cases btw. (which combined with
the limited runtime=120 doesn't seem to indicate sparseness in case of
overwrite=0, right?).

What I will do now is to export the whole 2TB of the disk (instead of
just 10GB) and increase size= to see whether that makes any difference
(hopefully). Other than that, further ideas?

Sebastian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10 17:18       ` Sebastian Kayser
@ 2010-11-10 18:58         ` Sebastian Kayser
  2010-11-10 19:50           ` John Cagle
  2010-11-10 19:51           ` Jens Axboe
  2010-11-10 19:48         ` Jens Axboe
  1 sibling, 2 replies; 23+ messages in thread
From: Sebastian Kayser @ 2010-11-10 18:58 UTC (permalink / raw)
  To: fio

* Sebastian Kayser <sebastian@skayser.de> wrote:
> What I will do now is to export the whole 2TB of the disk (instead of
> just 10GB) and increase size= to see whether that makes any difference
> (hopefully). Other than that, further ideas?

Interim update. Exported the whole 2TB disk as a LUN, mkfs.ext3'd it and
set size=100g in fio's configuration. Also set runtime=1800, re-started
the test and could observe ~80 IOPS ... my dear heart was jumping with
joy :)

However, a few minutes into the test, IOPS started to increase steadily
and by now have again reached (non-bursty) regions that don't seem
plausible for a single 7.2K SATA disk.

root@ubuntu-804-x64:~# ./fio --section=iscsi patterns.fio 
iscsi: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
Starting 1 process
iscsi: Laying out IO file(s) (1 file(s) / 102400MB)
Jobs: 1 (f=1): [w] [48.4% done] [0K/986K /s] [0/240 iops] [eta 15m:28s] 

root@ubuntu-804-x64:~# cat patterns.fio 
[global]
size=100g
runtime=1800
direct=1
sync=1
overwrite=1

[iscsi]
directory=/mnt
rw=randwrite

root@ubuntu-804-x64:~# df -h /mnt
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdb1             1.9T  101G  1.7T   6% /mnt

Well, I am so used to see less IOPS than hoped for ... but obviously not
this time and it's driving me crazy :) Any further thoughts greatly
appreciated.

Sebastian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10 17:18       ` Sebastian Kayser
  2010-11-10 18:58         ` Sebastian Kayser
@ 2010-11-10 19:48         ` Jens Axboe
  2010-11-10 21:32           ` Udi.S.Karni
  2010-11-11 17:43           ` Sebastian Kayser
  1 sibling, 2 replies; 23+ messages in thread
From: Jens Axboe @ 2010-11-10 19:48 UTC (permalink / raw)
  To: Sebastian Kayser; +Cc: fio

On 2010-11-10 18:18, Sebastian Kayser wrote:
> * Jens Axboe <jaxboe@fusionio.com> wrote:
>> On 2010-11-10 09:22, Sebastian Kayser wrote:
>>> Just to make sure my understanding is correct:
>>> - direct=1 should mitigate (disable?) OS caching effects
>>> - sync=1, iodepth=1 should make sure that an I/O has really made it to
>>>   disk before the next on is issued, i.e. should de-facto disable
>>>   I/O coalescing or device caching
>>>
>>> Are these sane/valid assumptions?
>>
>> Yes, your assumptions are valid. I think the issue here is that you give
>> randwrite, but don't also specify that you want overwrites. So what
>> happens is that the file will be truncated and then randomly written,
>> allowing the file system to place randomly chosen file block together.
>> If you remove the test file and re-run with overwrite=1 added, then see
>> if those results are more in line with what you expect.
> 
> Thanks for the additional aspect! So the possible pitfall is that w/o
> overwrite=1 the test file could be sparse and the file system then
> re-arrange random file addresses to fall onto consecutive device blocks?
> 
> (If so, wouldn't that be evil / deliberate data fragmentation?)

Depends on whether you care about the write performance or the read-back
performance :-)

> Tested it again with overwrite=1, same IOPS figures. Test file size as
> measured with du -ks is the same in both cases btw. (which combined with
> the limited runtime=120 doesn't seem to indicate sparseness in case of
> overwrite=0, right?).

Was the file pre-created? I'm pretty sure that fio will not lay it out
first with a write workload, even for random writes (I could be wrong
though, and if the file is indeed non-sparse and the test stopped before
all IO was written, then it would have been sparse for this case).

> What I will do now is to export the whole 2TB of the disk (instead of
> just 10GB) and increase size= to see whether that makes any difference
> (hopefully). Other than that, further ideas?

Lets see what that yields.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10 18:58         ` Sebastian Kayser
@ 2010-11-10 19:50           ` John Cagle
  2010-11-10 19:52             ` Sebastian Kayser
  2010-11-10 19:52             ` Jens Axboe
  2010-11-10 19:51           ` Jens Axboe
  1 sibling, 2 replies; 23+ messages in thread
From: John Cagle @ 2010-11-10 19:50 UTC (permalink / raw)
  To: Sebastian Kayser; +Cc: fio

If the disk is 2TB, then your 100GB test is only using 5% of it-- thus your observed IOPS will be a lot better than expected due to short-stroking.  Right?

On Nov 10, 2010, at 11:58 AM, Sebastian Kayser <sebastian@skayser.de> wrote:

> * Sebastian Kayser <sebastian@skayser.de> wrote:
>> What I will do now is to export the whole 2TB of the disk (instead of
>> just 10GB) and increase size= to see whether that makes any difference
>> (hopefully). Other than that, further ideas?
> 
> Interim update. Exported the whole 2TB disk as a LUN, mkfs.ext3'd it and
> set size=100g in fio's configuration. Also set runtime=1800, re-started
> the test and could observe ~80 IOPS ... my dear heart was jumping with
> joy :)
> 
> However, a few minutes into the test, IOPS started to increase steadily
> and by now have again reached (non-bursty) regions that don't seem
> plausible for a single 7.2K SATA disk.
> 
> root@ubuntu-804-x64:~# ./fio --section=iscsi patterns.fio 
> iscsi: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
> Starting 1 process
> iscsi: Laying out IO file(s) (1 file(s) / 102400MB)
> Jobs: 1 (f=1): [w] [48.4% done] [0K/986K /s] [0/240 iops] [eta 15m:28s] 
> 
> root@ubuntu-804-x64:~# cat patterns.fio 
> [global]
> size=100g
> runtime=1800
> direct=1
> sync=1
> overwrite=1
> 
> [iscsi]
> directory=/mnt
> rw=randwrite
> 
> root@ubuntu-804-x64:~# df -h /mnt
> Filesystem            Size  Used Avail Use% Mounted on
> /dev/sdb1             1.9T  101G  1.7T   6% /mnt
> 
> Well, I am so used to see less IOPS than hoped for ... but obviously not
> this time and it's driving me crazy :) Any further thoughts greatly
> appreciated.
> 
> Sebastian
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10 18:58         ` Sebastian Kayser
  2010-11-10 19:50           ` John Cagle
@ 2010-11-10 19:51           ` Jens Axboe
  2010-11-10 20:35             ` Sebastian Kayser
  1 sibling, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2010-11-10 19:51 UTC (permalink / raw)
  To: Sebastian Kayser; +Cc: fio

On 2010-11-10 19:58, Sebastian Kayser wrote:
> * Sebastian Kayser <sebastian@skayser.de> wrote:
>> What I will do now is to export the whole 2TB of the disk (instead of
>> just 10GB) and increase size= to see whether that makes any difference
>> (hopefully). Other than that, further ideas?
> 
> Interim update. Exported the whole 2TB disk as a LUN, mkfs.ext3'd it and
> set size=100g in fio's configuration. Also set runtime=1800, re-started
> the test and could observe ~80 IOPS ... my dear heart was jumping with
> joy :)
> 
> However, a few minutes into the test, IOPS started to increase steadily
> and by now have again reached (non-bursty) regions that don't seem
> plausible for a single 7.2K SATA disk.
> 
> root@ubuntu-804-x64:~# ./fio --section=iscsi patterns.fio 
> iscsi: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
> Starting 1 process
> iscsi: Laying out IO file(s) (1 file(s) / 102400MB)
> Jobs: 1 (f=1): [w] [48.4% done] [0K/986K /s] [0/240 iops] [eta 15m:28s] 

A 7200RPM drive will spin around 120 times per second, that yields an
average rotational latency of 8.3 msecs. For truly random IO, rotational
latency will dominate the seek and the average wait-for-platter-spin
will be half that, so 4.17 msecs. That gives us about 240 IOPS.

So your results don't seem all that out of whack. What are your reasons
for expecting ~80 IOPS?

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10 19:50           ` John Cagle
@ 2010-11-10 19:52             ` Sebastian Kayser
  2010-11-10 20:04               ` Jens Axboe
  2010-11-10 19:52             ` Jens Axboe
  1 sibling, 1 reply; 23+ messages in thread
From: Sebastian Kayser @ 2010-11-10 19:52 UTC (permalink / raw)
  To: fio

* John Cagle <jcagle@gmail.com> wrote:
> If the disk is 2TB, then your 100GB test is only using 5% of it-- thus
> your observed IOPS will be a lot better than expected due to
> short-stroking.  Right?

I don't know. The inital 80 IOPS (observed over about 2 full minutes)
made me believe that 100GB would have covered a high enough percentage
to at least eliminate track-to-track seeks. Are short-stroked seeks also
that much faster compared to average seek times? And where would the
steady increase in IOPS during the test come from?

But you are definitly right when it comes to the test setup. I just
started a test with size=1800g. Looking foward to what that will show.

Thanks!

Sebastian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10 19:50           ` John Cagle
  2010-11-10 19:52             ` Sebastian Kayser
@ 2010-11-10 19:52             ` Jens Axboe
  1 sibling, 0 replies; 23+ messages in thread
From: Jens Axboe @ 2010-11-10 19:52 UTC (permalink / raw)
  To: John Cagle; +Cc: Sebastian Kayser, fio

On 2010-11-10 20:50, John Cagle wrote:
> If the disk is 2TB, then your 100GB test is only using 5% of it-- thus
> your observed IOPS will be a lot better than expected due to
> short-stroking.  Right?

That is also true, but probably not that much. See my previous email,
rotational latency should dominate here.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10 19:52             ` Sebastian Kayser
@ 2010-11-10 20:04               ` Jens Axboe
  2010-11-12 14:38                 ` Sebastian Kayser
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2010-11-10 20:04 UTC (permalink / raw)
  To: Sebastian Kayser; +Cc: fio

On 2010-11-10 20:52, Sebastian Kayser wrote:
> * John Cagle <jcagle@gmail.com> wrote:
>> If the disk is 2TB, then your 100GB test is only using 5% of it-- thus
>> your observed IOPS will be a lot better than expected due to
>> short-stroking.  Right?
> 
> I don't know. The inital 80 IOPS (observed over about 2 full minutes)
> made me believe that 100GB would have covered a high enough percentage
> to at least eliminate track-to-track seeks. Are short-stroked seeks also
> that much faster compared to average seek times? And where would the
> steady increase in IOPS during the test come from?
> 
> But you are definitly right when it comes to the test setup. I just
> started a test with size=1800g. Looking foward to what that will show.

If you have the full device, you could just test on that instead of
using a filesystem and file. Just to get more 'raw' performance.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10 19:51           ` Jens Axboe
@ 2010-11-10 20:35             ` Sebastian Kayser
  0 siblings, 0 replies; 23+ messages in thread
From: Sebastian Kayser @ 2010-11-10 20:35 UTC (permalink / raw)
  To: fio

* Jens Axboe <jaxboe@fusionio.com> wrote:
> On 2010-11-10 19:58, Sebastian Kayser wrote:
> > Interim update. Exported the whole 2TB disk as a LUN, mkfs.ext3'd it and
> > set size=100g in fio's configuration. Also set runtime=1800, re-started
> > the test and could observe ~80 IOPS ... my dear heart was jumping with
> > joy :)
> > 
> > However, a few minutes into the test, IOPS started to increase steadily
> > and by now have again reached (non-bursty) regions that don't seem
> > plausible for a single 7.2K SATA disk.
> > 
> > root@ubuntu-804-x64:~# ./fio --section=iscsi patterns.fio 
> > iscsi: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
> > Starting 1 process
> > iscsi: Laying out IO file(s) (1 file(s) / 102400MB)
> > Jobs: 1 (f=1): [w] [48.4% done] [0K/986K /s] [0/240 iops] [eta 15m:28s] 
> 
> A 7200RPM drive will spin around 120 times per second, that yields an
> average rotational latency of 8.3 msecs. For truly random IO, rotational
> latency will dominate the seek and the average wait-for-platter-spin
> will be half that, so 4.17 msecs. That gives us about 240 IOPS.
> 
> So your results don't seem all that out of whack. What are your reasons
> for expecting ~80 IOPS?

I always learnt that:

  disk latency = avg. seek time + (rotational delay / 2)
                 + negligible amount of transfer time

Where "seek time" is not neglible. The Hitachi deskstar which is built
into our storage box [1] doesn't come with seek time specs, but taking a
similar enterprise model [2] Hitachi specifies an average seek time of
8.2ms - about twice as much as the average rotational delay component of
4.17ms.  Adding the two together gives me 8.2 + 4.17 = 12.37ms per IO.
Which in turn gives me 1000 / 12.37 = 80.84 IOPS.

Sebastian

[1] http://www.hitachigst.com/deskstar-7k2000
[2] http://www.hitachigst.com/ultrastar-a7k2000

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10 19:48         ` Jens Axboe
@ 2010-11-10 21:32           ` Udi.S.Karni
  2010-11-11 17:43           ` Sebastian Kayser
  1 sibling, 0 replies; 23+ messages in thread
From: Udi.S.Karni @ 2010-11-10 21:32 UTC (permalink / raw)
  To: Jens Axboe; +Cc: fio, fio-owner, sebastian

[-- Attachment #1: Type: text/plain, Size: 3842 bytes --]


Guys - if top IOPS is really crucial and you have the luxury of not needing too much space - I heard of a credit card company that only used 1 cylinder on every disk.

So -

(1) no seek time - the disk arm is always above the "one" cylinder
(2) no actual search time - the contents of a single cylinder can easily be completely cached in the cache buffer of the disk
(3) writes still need to be commited to the platter from the cache - but with only 1 cylinder - it's not a problem keeping up
(4) so Random IO isn't any more onerous than Sequential IO

In other words - SSD performance on disk. If you need more space you can use 2 or 3 cylinders.... however many can fit on the disk cache. The seek time to move a couple of cylinders over is pretty negligible.

Regards,

Udi Karni
DBA
Pharmacy Analytical Services
Kaiser Pharmacy
desk: 562.401.4229 / 8-345
cell: 310.995.4104
email: udi.s.karni@kp.org



NOTICE TO RECIPIENT:  If you are not the intended recipient of this e-mail, you are prohibited from sharing, copying, or otherwise using or disclosing its contents.  If you have received this e-mail in error, please notify the sender immediately by reply e-mail and permanently delete this e-mail and any attachments without reading, forwarding or saving them.  Thank you.



Jens Axboe <jaxboe@fusionio.com>
Sent by: fio-owner@vger.kernel.org

11/10/2010 11:48 AM
To
Sebastian Kayser <sebastian@skayser.de>
cc
<fio@vger.kernel.org>
Subject
Re: IOPS higher than expected on randwrite, direct=1 tests





On 2010-11-10 18:18, Sebastian Kayser wrote:
> * Jens Axboe <jaxboe@fusionio.com> wrote:
>> On 2010-11-10 09:22, Sebastian Kayser wrote:
>>> Just to make sure my understanding is correct:
>>> - direct=1 should mitigate (disable?) OS caching effects
>>> - sync=1, iodepth=1 should make sure that an I/O has really made it to
>>>   disk before the next on is issued, i.e. should de-facto disable
>>>   I/O coalescing or device caching
>>>
>>> Are these sane/valid assumptions?
>>
>> Yes, your assumptions are valid. I think the issue here is that you give
>> randwrite, but don't also specify that you want overwrites. So what
>> happens is that the file will be truncated and then randomly written,
>> allowing the file system to place randomly chosen file block together.
>> If you remove the test file and re-run with overwrite=1 added, then see
>> if those results are more in line with what you expect.
>
> Thanks for the additional aspect! So the possible pitfall is that w/o
> overwrite=1 the test file could be sparse and the file system then
> re-arrange random file addresses to fall onto consecutive device blocks?
>
> (If so, wouldn't that be evil / deliberate data fragmentation?)

Depends on whether you care about the write performance or the read-back
performance :-)

> Tested it again with overwrite=1, same IOPS figures. Test file size as
> measured with du -ks is the same in both cases btw. (which combined with
> the limited runtime=120 doesn't seem to indicate sparseness in case of
> overwrite=0, right?).

Was the file pre-created? I'm pretty sure that fio will not lay it out
first with a write workload, even for random writes (I could be wrong
though, and if the file is indeed non-sparse and the test stopped before
all IO was written, then it would have been sparse for this case).

> What I will do now is to export the whole 2TB of the disk (instead of
> just 10GB) and increase size= to see whether that makes any difference
> (hopefully). Other than that, further ideas?

Lets see what that yields.

--
Jens Axboe

--
To unsubscribe from this list: send the line "unsubscribe fio" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[-- Attachment #2: Type: text/html, Size: 5510 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10  8:22   ` Sebastian Kayser
  2010-11-10 14:09     ` Jens Axboe
@ 2010-11-11 16:22     ` Sebastian Kayser
  2010-11-11 21:25       ` Shawn Lewis
  1 sibling, 1 reply; 23+ messages in thread
From: Sebastian Kayser @ 2010-11-11 16:22 UTC (permalink / raw)
  To: fio

* Sebastian Kayser <sebastian@skayser.de> wrote:
> Just to make sure my understanding is correct:
> - direct=1 should mitigate (disable?) OS caching effects
> - sync=1, iodepth=1 should make sure that an I/O has really made it to
>   disk before the next on is issued, i.e. should de-facto disable
>   I/O coalescing or device caching
> 
> Are these sane/valid assumptions?

Shawn Lewis sent me an email off-list and suggested that the disk itself
will still do write-caching (despite O_SYNC) unless this has been
explicitly disabled. I dug into the storage configuration and found an
option which sounded promising ("Drive Delayed Write", not to be confused
with the RAID controller write cache which was disabled from the start).
Disabled the delayed writes and the IOPS went down from ~240 to

* ~100 for a filesystem test (size=100g out of 2TB)
* ~110 for a raw device test

Still a bit more than I would expect [1], but definitely closer. I am not
sure whether this can be brought down any further, i.e. whether there's
another caching effect hidden somwhere or if these figures now actually
reflect the "raw" harddisk performance.

Either way, I am surprised how much gain in IOPS a simple hard drive cache
delivers ... and I am also very grateful for all the suggestions so far!
Thanks guys!

Sebastian

[1] http://www.spinics.net/lists/fio/msg00558.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10 19:48         ` Jens Axboe
  2010-11-10 21:32           ` Udi.S.Karni
@ 2010-11-11 17:43           ` Sebastian Kayser
  1 sibling, 0 replies; 23+ messages in thread
From: Sebastian Kayser @ 2010-11-11 17:43 UTC (permalink / raw)
  To: fio

* Jens Axboe <jaxboe@fusionio.com> wrote:
> On 2010-11-10 18:18, Sebastian Kayser wrote:
> > Thanks for the additional aspect! So the possible pitfall is that w/o
> > overwrite=1 the test file could be sparse and the file system then
> > re-arrange random file addresses to fall onto consecutive device blocks?
> > 
> > (If so, wouldn't that be evil / deliberate data fragmentation?)
> 
> Depends on whether you care about the write performance or the read-back
> performance :-)

Point taken. :) Out of curiousity, do file systems actually do this?
Sounds like write-once-read-seldom or an assumption of: random data
that's being written during the same time will likely be read back
simultaneously at another point in time.

Sebastian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-11 16:22     ` Sebastian Kayser
@ 2010-11-11 21:25       ` Shawn Lewis
  2010-11-12 13:43         ` Sebastian Kayser
  0 siblings, 1 reply; 23+ messages in thread
From: Shawn Lewis @ 2010-11-11 21:25 UTC (permalink / raw)
  To: Sebastian Kayser; +Cc: fio

On Thu, Nov 11, 2010 at 8:22 AM, Sebastian Kayser <sebastian@skayser.de> wrote:
> * Sebastian Kayser <sebastian@skayser.de> wrote:
>> Just to make sure my understanding is correct:
>> - direct=1 should mitigate (disable?) OS caching effects
>> - sync=1, iodepth=1 should make sure that an I/O has really made it to
>> � disk before the next on is issued, i.e. should de-facto disable
>> � I/O coalescing or device caching
>>
>> Are these sane/valid assumptions?
>
> Shawn Lewis sent me an email off-list and suggested that the disk itself
> will still do write-caching (despite O_SYNC) unless this has been
> explicitly disabled. I dug into the storage configuration and found an
> option which sounded promising ("Drive Delayed Write", not to be confused
> with the RAID controller write cache which was disabled from the start).
> Disabled the delayed writes and the IOPS went down from ~240 to
>
> * ~100 for a filesystem test (size=100g out of 2TB)
> * ~110 for a raw device test
>
> Still a bit more than I would expect [1], but definitely closer. I am not
> sure whether this can be brought down any further, i.e. whether there's
> another caching effect hidden somwhere or if these figures now actually
> reflect the "raw" harddisk performance.
Sorry, I didn't mean to send that off list. There shouldn't be any other caches.

Did any of your tests use the full device or were you still setting
the size= parameter? Set size=2T or get rid of the size= parameter
entirely for the raw device test and you should see more like 70 or 80
iops.

I believe part of the reason the gain is so huge with the drive cache
is that you were also only using the outer diameter of the disk. If
you use the full disk but enable drive cache I think you'll see about
140 or 150iops.
>
> Either way, I am surprised how much gain in IOPS a simple hard drive cache
> delivers ... and I am also very grateful for all the suggestions so far!
> Thanks guys!
>
> Sebastian
>
> [1] http://www.spinics.net/lists/fio/msg00558.html
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at �http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-11 21:25       ` Shawn Lewis
@ 2010-11-12 13:43         ` Sebastian Kayser
  2010-11-12 18:00           ` Shawn Lewis
  0 siblings, 1 reply; 23+ messages in thread
From: Sebastian Kayser @ 2010-11-12 13:43 UTC (permalink / raw)
  To: fio

* Shawn Lewis <shawnlewis@google.com> wrote:
> On Thu, Nov 11, 2010 at 8:22 AM, Sebastian Kayser <sebastian@skayser.de> wrote:
> > Shawn Lewis sent me an email off-list and suggested that the disk itself
> > will still do write-caching (despite O_SYNC) unless this has been
> > explicitly disabled. I dug into the storage configuration and found an
> > option which sounded promising ("Drive Delayed Write", not to be confused
> > with the RAID controller write cache which was disabled from the start).
> > Disabled the delayed writes and the IOPS went down from ~240 to
> >
> > * ~100 for a filesystem test (size=100g out of 2TB)
> > * ~110 for a raw device test
> 
> Did any of your tests use the full device or were you still setting
> the size= parameter? Set size=2T or get rid of the size= parameter
> entirely for the raw device test and you should see more like 70 or 80
> iops.

Shibby. :) Raw device tests without size= deliver an average of 71 IOPS,
thus the disk actually delivers less than what the theoretical datasheet
calculation [1] suggests. Or the non-available specs for our desktop range
harddisk are simply a bit inferior.

root@ubuntu-804-x64:~# ./fio --section=iscsi-raw patterns.fio
iscsi-raw: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
Starting 1 process
Jobs: 1 (f=1): [w] [100.0% done] [0K/311K /s] [0/76 iops] [eta 00m:00s]
iscsi-raw: (groupid=0, jobs=1): err= 0: pid=5309
  write: io=515496KB, bw=293258B/s, iops=71, runt=1800008msec
...

> I believe part of the reason the gain is so huge with the drive cache
> is that you were also only using the outer diameter of the disk. If
> you use the full disk but enable drive cache I think you'll see about
> 140 or 150iops.

That's gold! Enabled the disk write cache again, tested without size=
and got 160-170 IOPS. Less caching effect than with size=100g.

So the moral of the story is "IOPS tests which don't utilize the full
device won't show its worst case IOPS performance and are pretty much
misleading (unless the application which will later be put onto the
device also only uses a similar subset of the device)"?!

I wonder how many times I saw someone - including myself - fire up
bonnie++ or iozone with a rather small test file compared to the full
disk size ... Thanks very much everyone! This was an (overdue) eye
opener.

Sebastian

[1] http://www.spinics.net/lists/fio/msg00558.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-10 20:04               ` Jens Axboe
@ 2010-11-12 14:38                 ` Sebastian Kayser
  2010-11-12 17:59                   ` Shawn Lewis
  0 siblings, 1 reply; 23+ messages in thread
From: Sebastian Kayser @ 2010-11-12 14:38 UTC (permalink / raw)
  To: fio

* Jens Axboe <jaxboe@fusionio.com> wrote:
> On 2010-11-10 20:52, Sebastian Kayser wrote:
> > * John Cagle <jcagle@gmail.com> wrote:
> >> If the disk is 2TB, then your 100GB test is only using 5% of it-- thus
> >> your observed IOPS will be a lot better than expected due to
> >> short-stroking.  Right?
> > 
> > I don't know. The inital 80 IOPS (observed over about 2 full minutes)
> > made me believe that 100GB would have covered a high enough percentage
> > to at least eliminate track-to-track seeks. Are short-stroked seeks also
> > that much faster compared to average seek times? And where would the
> > steady increase in IOPS during the test come from?
> > 
> > But you are definitly right when it comes to the test setup. I just
> > started a test with size=1800g. Looking foward to what that will show.
> 
> If you have the full device, you could just test on that instead of
> using a filesystem and file. Just to get more 'raw' performance.

Thanks. This also turned out to be much more feasible for "quick" test
turn-around. As soon as I ran the tests with size=1800g it took ages to
prepare the test file (~30 MB/s towards the iSCSI target) and one disk
even quit on me over night.

When staying within the realm of file based testing, can fio determine
and display the progress (think: progress bar or percentage) when it
prepares the test file?

Sebastian

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-12 14:38                 ` Sebastian Kayser
@ 2010-11-12 17:59                   ` Shawn Lewis
  0 siblings, 0 replies; 23+ messages in thread
From: Shawn Lewis @ 2010-11-12 17:59 UTC (permalink / raw)
  To: Sebastian Kayser; +Cc: fio

On Fri, Nov 12, 2010 at 6:38 AM, Sebastian Kayser <sebastian@skayser.de> wrote:
>
> * Jens Axboe <jaxboe@fusionio.com> wrote:
> > On 2010-11-10 20:52, Sebastian Kayser wrote:
> > > * John Cagle <jcagle@gmail.com> wrote:
> > >> If the disk is 2TB, then your 100GB test is only using 5% of it-- thus
> > >> your observed IOPS will be a lot better than expected due to
> > >> short-stroking. �Right?
> > >
> > > I don't know. The inital 80 IOPS (observed over about 2 full minutes)
> > > made me believe that 100GB would have covered a high enough percentage
> > > to at least eliminate track-to-track seeks. Are short-stroked seeks also
> > > that much faster compared to average seek times? And where would the
> > > steady increase in IOPS during the test come from?
> > >
> > > But you are definitly right when it comes to the test setup. I just
> > > started a test with size=1800g. Looking foward to what that will show.
> >
> > If you have the full device, you could just test on that instead of
> > using a filesystem and file. Just to get more 'raw' performance.
>
> Thanks. This also turned out to be much more feasible for "quick" test
> turn-around. As soon as I ran the tests with size=1800g it took ages to
> prepare the test file (~30 MB/s towards the iSCSI target) and one disk
> even quit on me over night.
>
> When staying within the realm of file based testing, can fio determine
> and display the progress (think: progress bar or percentage) when it
> prepares the test file?
If you use a sequential write fio job to lay the file out you should
see the progress bar.
>
> Sebastian
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at �http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-12 13:43         ` Sebastian Kayser
@ 2010-11-12 18:00           ` Shawn Lewis
  2010-11-13 20:02             ` Jens Axboe
  0 siblings, 1 reply; 23+ messages in thread
From: Shawn Lewis @ 2010-11-12 18:00 UTC (permalink / raw)
  To: Sebastian Kayser; +Cc: fio

On Fri, Nov 12, 2010 at 5:43 AM, Sebastian Kayser <sebastian@skayser.de> wrote:
> * Shawn Lewis <shawnlewis@google.com> wrote:
>> On Thu, Nov 11, 2010 at 8:22 AM, Sebastian Kayser <sebastian@skayser.de> wrote:
>> > Shawn Lewis sent me an email off-list and suggested that the disk itself
>> > will still do write-caching (despite O_SYNC) unless this has been
>> > explicitly disabled. I dug into the storage configuration and found an
>> > option which sounded promising ("Drive Delayed Write", not to be confused
>> > with the RAID controller write cache which was disabled from the start).
>> > Disabled the delayed writes and the IOPS went down from ~240 to
>> >
>> > * ~100 for a filesystem test (size=100g out of 2TB)
>> > * ~110 for a raw device test
>>
>> Did any of your tests use the full device or were you still setting
>> the size= parameter? Set size=2T or get rid of the size= parameter
>> entirely for the raw device test and you should see more like 70 or 80
>> iops.
>
> Shibby. :) Raw device tests without size= deliver an average of 71 IOPS,
> thus the disk actually delivers less than what the theoretical datasheet
> calculation [1] suggests. Or the non-available specs for our desktop range
> harddisk are simply a bit inferior.
>
> root@ubuntu-804-x64:~# ./fio --section=iscsi-raw patterns.fio
> iscsi-raw: (g=0): rw=randwrite, bs=4K-4K/4K-4K, ioengine=sync, iodepth=1
> Starting 1 process
> Jobs: 1 (f=1): [w] [100.0% done] [0K/311K /s] [0/76 iops] [eta 00m:00s]
> iscsi-raw: (groupid=0, jobs=1): err= 0: pid=5309
> �write: io=515496KB, bw=293258B/s, iops=71, runt=1800008msec
> ...
>
>> I believe part of the reason the gain is so huge with the drive cache
>> is that you were also only using the outer diameter of the disk. If
>> you use the full disk but enable drive cache I think you'll see about
>> 140 or 150iops.
>
> That's gold! Enabled the disk write cache again, tested without size=
> and got 160-170 IOPS. Less caching effect than with size=100g.
>
> So the moral of the story is "IOPS tests which don't utilize the full
> device won't show its worst case IOPS performance and are pretty much
> misleading (unless the application which will later be put onto the
> device also only uses a similar subset of the device)"?!
>
> I wonder how many times I saw someone - including myself - fire up
> bonnie++ or iozone with a rather small test file compared to the full
> disk size ... Thanks very much everyone! This was an (overdue) eye
> opener.
In addition, testing with a file is dependent on the state of the file
system. For example, if you're using a 100GB file and there's not enough
contiguous free space to lay it out in one chunk you could have regions on
different parts of the disk. Further, even there is enough space, that space
could be near the inner or outer diameter which would also affect
performance.

There is a way to get a file address to lba map (which would give you some
insight into how a file is laid out). I think there's a syscall. Jens should
know.
>
> Sebastian
>
> [1] http://www.spinics.net/lists/fio/msg00558.html
> --
> To unsubscribe from this list: send the line "unsubscribe fio" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at �http://vger.kernel.org/majordomo-info.html
>

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-12 18:00           ` Shawn Lewis
@ 2010-11-13 20:02             ` Jens Axboe
  2010-11-15 13:36               ` Jeff Moyer
  0 siblings, 1 reply; 23+ messages in thread
From: Jens Axboe @ 2010-11-13 20:02 UTC (permalink / raw)
  To: Shawn Lewis; +Cc: Sebastian Kayser, fio

>> I wonder how many times I saw someone - including myself - fire up
>> bonnie++ or iozone with a rather small test file compared to the full
>> disk size ... Thanks very much everyone! This was an (overdue) eye
>> opener.
> In addition, testing with a file is dependent on the state of the file
> system. For example, if you're using a 100GB file and there's not enough
> contiguous free space to lay it out in one chunk you could have regions on
> different parts of the disk. Further, even there is enough space, that space
> could be near the inner or outer diameter which would also affect
> performance.
> 
> There is a way to get a file address to lba map (which would give you some
> insight into how a file is laid out). I think there's a syscall. Jens should
> know.

Yep, you can use FIBMAP/FIEMAP to walk the extents of the file and get a
full map of how it's laid out on disk. Not sure if there are tools
that'll do this for you, if not is pretty trivial to write.

-- 
Jens Axboe


^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: IOPS higher than expected on randwrite, direct=1 tests
  2010-11-13 20:02             ` Jens Axboe
@ 2010-11-15 13:36               ` Jeff Moyer
  0 siblings, 0 replies; 23+ messages in thread
From: Jeff Moyer @ 2010-11-15 13:36 UTC (permalink / raw)
  To: Jens Axboe; +Cc: Shawn Lewis, Sebastian Kayser, fio

Jens Axboe <jaxboe@fusionio.com> writes:

>>> I wonder how many times I saw someone - including myself - fire up
>>> bonnie++ or iozone with a rather small test file compared to the full
>>> disk size ... Thanks very much everyone! This was an (overdue) eye
>>> opener.
>> In addition, testing with a file is dependent on the state of the file
>> system. For example, if you're using a 100GB file and there's not enough
>> contiguous free space to lay it out in one chunk you could have regions on
>> different parts of the disk. Further, even there is enough space, that space
>> could be near the inner or outer diameter which would also affect
>> performance.
>> 
>> There is a way to get a file address to lba map (which would give you some
>> insight into how a file is laid out). I think there's a syscall. Jens should
>> know.
>
> Yep, you can use FIBMAP/FIEMAP to walk the extents of the file and get a
> full map of how it's laid out on disk. Not sure if there are tools
> that'll do this for you, if not is pretty trivial to write.

man 8 filefrag; it's part of e2fsprogs.  I'm sure xfs has its own
utility to do this, too.

Cheers,
Jeff


^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2010-11-15 13:36 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-11-09 18:28 IOPS higher than expected on randwrite, direct=1 tests Sebastian Kayser
2010-11-10  6:57 ` John Cagle
2010-11-10  8:22   ` Sebastian Kayser
2010-11-10 14:09     ` Jens Axboe
2010-11-10 17:18       ` Sebastian Kayser
2010-11-10 18:58         ` Sebastian Kayser
2010-11-10 19:50           ` John Cagle
2010-11-10 19:52             ` Sebastian Kayser
2010-11-10 20:04               ` Jens Axboe
2010-11-12 14:38                 ` Sebastian Kayser
2010-11-12 17:59                   ` Shawn Lewis
2010-11-10 19:52             ` Jens Axboe
2010-11-10 19:51           ` Jens Axboe
2010-11-10 20:35             ` Sebastian Kayser
2010-11-10 19:48         ` Jens Axboe
2010-11-10 21:32           ` Udi.S.Karni
2010-11-11 17:43           ` Sebastian Kayser
2010-11-11 16:22     ` Sebastian Kayser
2010-11-11 21:25       ` Shawn Lewis
2010-11-12 13:43         ` Sebastian Kayser
2010-11-12 18:00           ` Shawn Lewis
2010-11-13 20:02             ` Jens Axboe
2010-11-15 13:36               ` Jeff Moyer

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.