Re: fio file test patterns

From: Brian Fallik <bfallik@bamboom.com>
To: Jens Axboe <JAxboe@fusionio.com>
Cc: Jeff Moyer <jmoyer@redhat.com>,
	"fio@vger.kernel.org" <fio@vger.kernel.org>
Subject: Re: fio file test patterns
Date: Wed, 31 Aug 2011 14:34:22 -0600	[thread overview]
Message-ID: <CAOYTo+3JyNsxMf+UO5C3N=rrtkGR999XM9Guo5-0TXFqSuUr+Q@mail.gmail.com> (raw)
In-Reply-To: <4E5E91D9.4000702@fusionio.com>

Hi,

Hmm.  I'm still unable to generate random test data using fio.   My
new job control file is:
  [foo0]
  rate =,200k
  rw =write
  refill_buffers =1
  size =4m

  [foo1]
  rate =,200k
  rw =write
  refill_buffers =1
  size =4m
But:
  $ diff foo0.1.0 foo1.2.0
reports they're identical.  I also tried enabling data verification by
adding "verify=crc32c-intel" but that had no effect.  Am I missing
something obvious or is this potentially broken in fio 1.57?

I also have another (maybe related?) question.  Apologies if this
belongs in a separate thread, but are there any notes explaining why
fio lays out the files before starting sequential writes?  The
workload I was hoping to simulate is sustained, sequential writes to
disk.  I'm trying to answer the question "How many simultaneous
200kBps writers can we support?"  Using my current jobs file, fio
starts by creating the files (e.g "foo0: Laying out IO file(s) (1
file(s) / 4MB)") before it starts processing.  However, creating the
files in advance accounts for a chunk of performance that doesn't seem
to be measured by fio.  Am I misunderstanding how to configure fio or
its intended usage?

Thanks,
brian

On Wed, Aug 31, 2011 at 3:56 PM, Jens Axboe <jaxboe@fusionio.com> wrote:
> On 2011-08-31 12:30, Jeff Moyer wrote:
>> Brian Fallik <bfallik@bamboom.com> writes:
>>
>>> Hi,
>>>
>>> Apologies if this is documented somewhere else but I couldn't find it
>>> in the fio man page, example job files, or list archives.
>>>
>>> I'm exploring fio as a testing tool and it seems very well suited for
>>> my needs.  I'm currently running experiments with N sequential writers
>>> all writing at 200k.  The jobs file is very simple:
>>>     [global]
>>>     size=10m
>>>     directory=.
>>>
>>>     [foo1]
>>>     rw=write
>>>     rate=200k
>>>
>>>     [foo2]
>>>     ...
>>> fio creates various foo* files as part of its test but they all seem
>>> to contain the same content.  I would have expected fio to generate
>>> random data in each file to avoid potential optimizations like
>>> deduplication.   Am I missing the flag to generate random test
>>> patterns or is this behavior intentional?
>>
>>        refill_buffers
>>               If this option is given, fio will refill the IO buffers on every
>>               submit. The default is to only fill it at init  time and reuse
>>               that  data.  Only  makes  sense if zero_buffers isn't specified,
>>               naturally. If data verification is  enabled, refill_buffers is
>>               also automatically enabled.
>
> Yes. Fio does use random data by default, but for to avoid slowing down
> too much, it also defaults to reusing the same random data all the time.
> If you set the above option, you get fully fresh random data for every
> write, thus fully defeating any de-dupe/compression attempts on the
> target.
>
> --
> Jens Axboe
>
>