All of lore.kernel.org
 help / color / mirror / Atom feed
* random writes with different patterns
@ 2016-04-08 20:01 Foley, Robert
  2016-04-09  5:04 ` Sitsofe Wheeler
       [not found] ` <CALjAwximeyPand181Q71B9f0CJgXcFHNGTBheTk2MGQkaf5+dQ@mail.gmail.com>
  0 siblings, 2 replies; 5+ messages in thread
From: Foley, Robert @ 2016-04-08 20:01 UTC (permalink / raw)
  To: fio

Hello all,
We use fio and it more than meets our needs in the majority of cases.  There is one use case where we would like to write to randomly selected blocks with one data pattern, and then overwrite that same set of blocks with a different data pattern.  We did not find the fio parameters that would allow this.

Our use case is one where we initially write randomly to a portion of a test area. Here is an example of our job file parameters:
verify=crc32c 
rw=randwrite
bs=4k
size=128G
io_size=32M 
randseed=42

Our understanding is that this job allows us to write 32 meg of data to random areas in the 128 gig region.  Since we are using randseed, this allows us to later use a read job to verify the data on these random blocks.

Suppose at time A we wrote out a pattern using the parameters above.  Later at time B we would like to write the same blocks (offsets) of the test area but with a different pattern of data bytes.  Our understanding is that randseed will seed the generation of both a) the pattern of I/O blocks that we generate as well as b) the pattern of data in the buffers we are writing.  In other words since the randseed controls both a) and b), there is no way for us to get a different data pattern written to the same pattern of blocks (offsets).

Does anyone know how to accomplish this with fio?

We had some ideas around how to solve this if it is not currently supported.   It might be useful to have an optional "verify_io_stamp" parameter, which would specify a simple 32 or 64 bit integer that could get added to and verified with the verify_header.  

At time A in the example above, the job would specify one value for the verify_io_stamp (A)  and at time B the job would specify a different value (B).  Since reads would verify the verify_io_stamp in the header, we could distinguish between the (A) and the (B) data and solve the use case we mentioned above.

Another benefit of this verify_io_stamp is that if we ever received incorrect data, it might help us find out from which point that data was.  So for example if we wrote with a different verify_io_stamp at times A, B, and C, and we received data pattern "A" at time "C" then it would give us potentially valuable information about the failure, which might help us to debug it faster.  

Does this seem useful to you?  We are willing to contribute this if it seems beneficial to the community. 

Thank You !
-Rob Foley



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: random writes with different patterns
  2016-04-08 20:01 random writes with different patterns Foley, Robert
@ 2016-04-09  5:04 ` Sitsofe Wheeler
       [not found] ` <CALjAwximeyPand181Q71B9f0CJgXcFHNGTBheTk2MGQkaf5+dQ@mail.gmail.com>
  1 sibling, 0 replies; 5+ messages in thread
From: Sitsofe Wheeler @ 2016-04-09  5:04 UTC (permalink / raw)
  To: Foley, Robert; +Cc: fio

Hi,

On 8 April 2016 at 21:01, Foley, Robert <robert.foley@emc.com> wrote:
>
> Suppose at time A we wrote out a pattern using the parameters above.  Later at time B we would like to write the same blocks (offsets) of the test area but with a different pattern of data bytes.  Our understanding is that randseed will seed the generation of both a) the pattern of I/O blocks that we generate as well as b) the pattern of data in the buffers we are writing.  In other words since the randseed controls both a) and b), there is no way for us to get a different data pattern written to the same pattern of blocks (offsets).

You could always force a particular pattern to be put into every block
by using verify_pattern
(https://github.com/axboe/fio/blob/fio-2.8/HOWTO#L1407 ) and use its
%o option for generating more sophisticated patterns. Another choice
would be to change the block size otherwise you're going to struggle
to work out what "run" the pattern in the block came from.

> Does anyone know how to accomplish this with fio?
>
> We had some ideas around how to solve this if it is not currently supported.   It might be useful to have an optional "verify_io_stamp" parameter, which would specify a simple 32 or 64 bit integer that could get added to and verified with the verify_header.

Might you be able to do this by creating a particular verify pattern
and specifying a custom format using %o?

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: random writes with different patterns
       [not found] ` <CALjAwximeyPand181Q71B9f0CJgXcFHNGTBheTk2MGQkaf5+dQ@mail.gmail.com>
@ 2016-04-11 13:54   ` Foley, Robert
  2016-04-13  5:38     ` Sitsofe Wheeler
  0 siblings, 1 reply; 5+ messages in thread
From: Foley, Robert @ 2016-04-11 13:54 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: fio

Hi Sitsofe,
Thanks for the information about the %o option.   

On Saturday, April 09, 2016 1:00 AM, Sitsofe Wheeler [mailto:sitsofe@gmail.com] wrote:
>>  You could always force a particular pattern to be put into every block by using verify_pattern (https://github.com/axboe/fio/blob/fio-2.8/HOWTO#L1407) and use it's %o option for generating more sophisticated patterns. Another choice would be to change the block size otherwise you're going to struggle to work out what "run" the pattern in the block came from.

I apologize for not mentioning before that we have use cases with compression where we cannot use a repeating pattern.  The %o seems to allow the pattern to vary between blocks, but within the block the pattern repeats.   We really appreciate the use of the randomly generated data bytes by fio.  

>>We had some ideas around how to solve this if it is not currently supported.   It might be useful to have an optional "verify_io_stamp" parameter, which would specify a simple 32 or 64 bit integer that could get added to and verified with the verify_header.

>Might you be able to do this by creating a particular verify pattern and specifying a custom header format?

We could create a new verify pattern with a custom header format.  That would give us the ability to save and validate this verify_io_stamp parameter.   But it seems we would lose the ability to specify the different validation types (md5, crc64, crc32, etc).

We really appreciate the flexibility of fio in being able to use different validation types (md5, crc64, crc32, etc) along with randomly generated data, and we would like to leverage those options along with the ability to specify a new verify_io_stamp.  Adding a new parameter for verify_io_stamp seems like the best option to achieve this, but we would appreciate more thoughts or ideas here.

Thanks!
-Rob











^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: random writes with different patterns
  2016-04-11 13:54   ` Foley, Robert
@ 2016-04-13  5:38     ` Sitsofe Wheeler
  2016-04-13 20:08       ` Foley, Robert
  0 siblings, 1 reply; 5+ messages in thread
From: Sitsofe Wheeler @ 2016-04-13  5:38 UTC (permalink / raw)
  To: Foley, Robert; +Cc: fio

On 11 April 2016 at 14:54, Foley, Robert <robert.foley@emc.com> wrote:
>
> Thanks for the information about the %o option.
>
> On Saturday, April 09, 2016 1:00 AM, Sitsofe Wheeler [mailto:sitsofe@gmail.com] wrote:
>>>  You could always force a particular pattern to be put into every block by using verify_pattern (https://github.com/axboe/fio/blob/fio-2.8/HOWTO#L1407) and use it's %o option for generating more sophisticated patterns. Another choice would be to change the block size otherwise you're going to struggle to work out what "run" the pattern in the block came from.
>
> I apologize for not mentioning before that we have use cases with compression where we cannot use a repeating pattern.  The %o seems to allow the pattern to vary between blocks, but within the block the pattern repeats.   We really appreciate the use of the randomly generated data bytes by fio.
>
>>>We had some ideas around how to solve this if it is not currently supported.   It might be useful to have an optional "verify_io_stamp" parameter, which would specify a simple 32 or 64 bit integer that could get added to and verified with the verify_header.
>
>>Might you be able to do this by creating a particular verify pattern and specifying a custom header format?
>
> We could create a new verify pattern with a custom header format.  That would give us the ability to save and validate this verify_io_stamp parameter.   But it seems we would lose the ability to specify the different validation types (md5, crc64, crc32, etc).
>
> We really appreciate the flexibility of fio in being able to use different validation types (md5, crc64, crc32, etc) along with randomly generated data, and we would like to leverage those options along with the ability to specify a new verify_io_stamp.  Adding a new parameter for verify_io_stamp seems like the best option to achieve this, but we would appreciate more thoughts or ideas here.

Some sort of "generation" id in the header would allow better
verification when using something like loops (assuming it was being
incremented on a per iteration basis). Perhaps what is needed is some
sort of verify_header_extra_pattern that allows a few extra bytes to
be set in the header and verified later...

However this doesn't seem to solve your entire problem as I understood
it: given the same randseed you want the ability for the same blocks
to be written in the same order but with different pseudorandom data
contents? Further this data must be verifiable in a separate job?
Would changing the buffer_compress_percentage option do?

-- 
Sitsofe | http://sucs.org/~sits/

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: random writes with different patterns
  2016-04-13  5:38     ` Sitsofe Wheeler
@ 2016-04-13 20:08       ` Foley, Robert
  0 siblings, 0 replies; 5+ messages in thread
From: Foley, Robert @ 2016-04-13 20:08 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: fio

>On Wednesday, April 13, 2016 1:39 AM,  Sitsofe Wheeler [mailto:sitsofe@gmail.com] wrote:
>Some sort of "generation" id in the header would allow better verification when using something like 
>loops (assuming it was being incremented on a per iteration basis). Perhaps what is needed is some 
> sort of verify_header_extra_pattern that allows a few extra bytes to be set in the header and verified later...

This is a good point.  The verify_header_extra_pattern parameter alone would solve one of our use cases where we want to write a sequence of random blocks with a pattern and then overwrite the same sequence with patterns that vary just by that extra pattern.   With use of randseed, we will be able to write to the same set of blocks with random data.  But in between runs it would be enough for only this extra_pattern in the header to vary.  Also, the naming that you suggested seems just right here.  We can start putting this together soon, and will contribute it when it is ready.

> However this doesn't seem to solve your entire problem as I understood
> it: given the same randseed you want the ability for the same blocks to be written in the same order but with different 
>pseudorandom data contents? Further this data must be verifiable in a separate job?
>Would changing the buffer_compress_percentage option do?

You bring up a good point in that we do have a use case where we want the data in the block to also vary between runs to the same sequence of blocks.  So the use case is where we write a set of random blocks with random data.  Later we do want to be able to verify that data is the same.  But we also want to overwrite that same sequence of blocks with a different data pattern.  We looked at the buffer_compress_percentage option and we believe that this does not help us since we want to be able to write blocks that are completely different from the prior run block.  We were concerned that we might be testing a use case where we actually do not want the data to be compressible/dedupable so it would be better for the entire block to vary.

It seems that when we use the randseed option, this single seed will effectively seed everything including the offset generation (I/O pattern) and the data pattern generation.  It seems like a new parameter (rand_verify_seed) that allows us to provide the random seed for the data pattern alone would be quite useful in general and would help us solve this case.  It would allow us in this case to specify the same randseed so that the I/O pattern is the same, but then use a different verify seed so that the data pattern is different. 

This potential new parameter for rand_verify_seed seems like a good option here, but as always we would like to hear thoughts and ideas here.  We would be willing to contribute this if it seems useful.

Thanks ! 
-Rob

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-04-13 20:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-04-08 20:01 random writes with different patterns Foley, Robert
2016-04-09  5:04 ` Sitsofe Wheeler
     [not found] ` <CALjAwximeyPand181Q71B9f0CJgXcFHNGTBheTk2MGQkaf5+dQ@mail.gmail.com>
2016-04-11 13:54   ` Foley, Robert
2016-04-13  5:38     ` Sitsofe Wheeler
2016-04-13 20:08       ` Foley, Robert

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.