All of lore.kernel.org
 help / color / mirror / Atom feed
* Fio Checksum tracking and enhanced trim workloads
@ 2017-05-08  3:54 paul houlihan
  2017-05-08 14:18 ` Fwd: " Jens Axboe
  0 siblings, 1 reply; 8+ messages in thread
From: paul houlihan @ 2017-05-08  3:54 UTC (permalink / raw)
  To: fio, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 26048 bytes --]

I have a submission for fio that enhances the data corruption detection and
diagnosis capabilities taking fio from pretty good corruption detection to
absolute guarantees. I would like these changes on the tracking branch????
to be reviewed and considered for inclusion to fio. A quick review would be
helpful as I am losing access to test systems shortly.


These changes were used by a Virtual Machine caching company to assure data
integrity. Most testing was on Linux 64 bits and windows 32/64 bits. The
windows build still had an issue with compile time asserts in libfio.c that
I worked around by commenting out the asserts as this looked like a
performance restriction. This should be researched more. The initial
development was on version fio 2.2.10 sources and I just ported the changes
to fio latest sources and tested on linux but haven’t yet test on windows.
No testing on all other fio supported OSes was done, although the changes
are almost exclusively to OS independent code.


The absolute guarantees are brought about by tracking checksums to prevent
a stale but intact prior version of a block being returned and by verifying
all reads. I was surprised to learn about the number of times fio performed
concurrent I/O to the same blocks which yields indeterminate results that
prevent data integrity verification. Thus a number of options are not
supported when tracking is enabled.


Finally I have enhanced the usage of trims and am able to verify data
integrity of these operations in an integrated fashion.


Here is a list of changes in this submission:

 * Bug where expected version of verify_interval is not generated
correctly, dummy io_u not setup correctly

 * Bug where unknown header_interval referenced in HOWTO, fixed a bunch of
typos.

 * Bug where windows hangs on nano sleep in windows 7.

 * Also stonewall= option does not seem to work on windows 7, seems fixed
in later releases so painfully worked around this by having separate init
and run fio scripts. No change was made here but just mentioning this in
passing.

 * Fixed bug where FD_IO logging was screwed up in io_c.h. Here is example
of logging problem:

io       2212  io complete: io_u 0x787280: off=1048576/len=2097152/ddir=0io
      2212  /b.datio       2212

io       2212  fill_io_u: io_u 0x787280: off=3145728/len=2097152/ddir=1io
    2212  /b.datio       2212

io       2212  prep: io_u 0x787280: off=3145728/len=2097152/ddir=1io
2212  /b.datio       2212

io       2212  ->prep(0x787280)=0

io       2212  queue: io_u 0x787280: off=3145728/len=2097152/ddir=1io
2212  /b.datio       2212

 * In order to make fio into an superb data integrity test tool, a number
of shortcomings were addressed. New verify_track switch enables in memory
tracking of checksums within each fio job, preventing a block from rolling
back to prior version. The in memory checksums can be written to a tracking
log file to provide an absolute checksum guarantees between fio jobs or
between fio runs. Verification of trim operations is supported in an
integrated fashion. See HOWTO description of verify_tracking.
verify_tacking_log, verify_tracking_required, verify_tracking_dir,
verify_trim_zero

 * Enhanced description surrounding corruption added to HOWTO as well as
providing some corruption analyze tools.

 * Bad header will dump received buffer into *.received before you gave you
an error message

 * If verify_interval is less than the block size, fio will now always dump
the complete buffer in an additional file called *.complete. Seeing whole
buffer can reveal more about the corruption pattern.

 * Changed the printing of the hex checksum to display in MSB to LSB order
to facilitate compares to memory dumps and debug logging

 * Added a dump of the complete return buffer on trim write verification
failure.

 * Debug logging was being truncated at the end of a job so you could not
see the full set of debug log messages, so added a log flush at the end of
each job if debug= switch is used.

 * rw=readwrite seems to have independent last_pos read/write pointers as
you sequentially access the file. If the mix is 50/50 then you could have
fio reading and writing the same block as the read and write pointer cross
each other which is not reliably verifiable. This pattern result is chaos
and contradicts all the other sequential patterns and even randrw.
Overlapping I/O makes little sense and is usually a sign of a broken
application. Moreover readwrite workload would not complete a sequential
pass over the entire file which everyone I spoke to assumed it was doing.
So a change was made to the existing read/write workload functionality. Now
the max of the file’s last_pos pointers for DDIR_READ and DDIR_WRITE are
used for selecting the next offset as we sequentially scan a file. If the
old behavior is somehow useful then an option can be added to preserve it.
If preserved, it should never be the default and should disable
verification.


My changes revolve around maintaining the last_pos array in a special way.
When multiple operations (read/write/trim) are requested by a workload then
as the last position is changed, the changes are reflected in all three
entries in the array. This way a randomly selected next operation always
use the right last_pos. However we retained the old behavior for single
operation workloads and for trimwrite which operates like a single
operation workload.

 * Synchronous Trim I/O completions were not updating bytes_issued in
backend.c and thus trimwrite was actually making 2 passes of the file.

 * I kept the new verify_tracking verification entirely separate from the
experimental_verify code. These new tracking changes provides fully
persistent verification of trims integrated into standard verify, so we
might want to consider deprecating support for experimental_verify. Note
that verify_track and experimental_verify cannot both be enabled.

 * With the wide adoption of thin LUN datastores and recent expanded OS
support for trim operations to reclaim unused space, testing trim
operations in a wide variety of contexts has been a necessity. Added some
new trim I/O workloads to the existing trim workloads, that require use of
verify_tracking option to verify:

trim Sequential trims

readtrim Sequential mixed reads and trims

writetrim Sequential mixed writes and trims.

Each block will be trimmed or written.

readwritetrim Sequential mixed reads/writes/trims

randtrim Random trims

randreadtrim Random mixed reads and trims

randwritetrim Random mixed writes and trims

randrwt Random mixed reads/writes/trims

 * A second change to existing fio functionality involves an inconsistency
of counting read verification bytes against the size= argument. Some rw=
workloads count read verification I/Os or bytes against size= values (like
readwrite and randrw) and some do not  like write, trim and trimwrite.
Counting read verifications bytes makes it hard to predict the number of
bytes or I/Os that will be performed in the readwrite workload and the new
rw= workloads increases the unpredictability with even more read
verifications in a readwritetrim workload. Normally I expect that fio
should process all the bytes in a file pass but when the bytes from read
verifies count towards the total bytes to process in size=, only part of
the file is processed. So I made it consistent for size and io_limit by not
counting read verify bytes. One could argue that number_os= could also be
similarly changed but I left this alone and it still uses raw I/O counts
which include read verification I/Os. Another justification is that
this_io_bytes never records verification reads for the dry_run and we need
dry_run and do_io to be in synch. Note this explains why I removed code to
add extra bytes to total_bytes in do_io for verify_backlog.

 * Seems like the processing of TD_F_VER_NONE is backwards from its name.
If verify != VERIFY_NONE then the bit is set but the name implies it should
be clear. So now it sets the bit only if verify == VERIFY_NONE to avoid
this very confusing state.

 * Added a sync and invalidate after the close in iolog.c ipo_special().
This is needed if you capture checksums in the tracking log and there is a
close followed immediately by an open. The close is not immediate if you
have iodepth set to a large number. The file is still marked “open” but
“closing” on return from the close  and will close only after the last I/O
completes. The sync avoids the assert on trying to open an already open
file which has a close pending.

 * —read_iolog does not support trims at this time.

 * io_u.c get_next_seq_offset() seems to suggest that ddir_seq_add can be
negative but there are a number of unhandled cases with such a setting. Add
TODOs to document issues. I have a number of reservations about the
correctness of get_next_seq_offset(). Note whenever I saw a possible
problem in the code but did not have time to research it, I added a TODO
comment.

 * io_u.c get_next_seq_offset() has a problem when it uses fixed value when
relative values are what is being manipulated, so this code:

if (pos >= f->real_file_size)

pos = f->file_offset;

should be:

if (pos >= f->io_size)

pos = 0;

 * Given there are a couple of changes to existing fio workload behavior,
you might want to consider going to a V3.0.




Here are two new sections on Verification Tracking and Data Corruption
Troubleshooting from HOWTO:


Verification Tracking

---------------------


Absolute data integrity guarantee is the primary mission of a storage

software/hardware subsystem. Fio is good at detecting data corruption but

there are gaps. Currently only when rw option is set to read only are

workload reads verified. It is desirable to validate all reads in addition

to writes to protect against data rolling back to earlier versions.


With the addition of the block's offset to the header in recent fio
releases,

block data returned for another block will be flagged as corrupt. However

a limitation of the fio header and data embedded checksums is that fio
cannot

detect if a prior intact version of a block was returned on a read. If the

header and data checksum match the block is declared valid.


These limitations can be addressed by setting the verify_track option which

allocates a memory array to track the header and data checksums to assure

data integrity is absolute. The array starts out empty at the beginning of

each fio job and is filled in as reads or writes occur, once defined the

checksums from succeeding I/Os must all match. This option extends checksum

verification to all reads in all workloads, not just the read-only
workloads.


However use of verify_track requires that fio avoid overlapping, concurrent

reads and writes to the same block. Reading and writing a block at the same

time yields indeterminate results and making guaranteeing data integrity

impossible. So some fio options where this is a risk are disabled when using

verify_track. See verify_track argument for list of restrictions.


Even better verification would validate data more persistently. You would

like to track checksums persistently between fio jobs or between runs of fio

which could be after a shutdown/restart of the system or on a different
system

that shares storage. Proving seamless data integrity from the application

perspective over complex failover and recovery situations like reverting a

virtual machine to a prior snapshot is quite valuable.


Also the popularity of thin LUNs in the storage world has caused problems

if the unused disk space is not reclaimed by use of trims. So we would like

to have the ability to mix and match trims with reads and writes. The rw
option

now supports a full set of combinations and the rwtmix=read%,write%,trim%
option

allows specifying the mix percentages of all three types of I/O in one
argument.

However trims do have special requirements as documented under the rw
option.

Finally we would like to verify trims operations. If you read a trimmed
block

before re-writing the block, it should return a block of zeroes.


The verify_track_log option permits persistent checksum tracking and

verification of trims by enabling the saving of the tracking array to a
tracking

log on the close of a data file at the end of a fio job and reading it back
in

at the next start. A clean shutdown of fio is needed for tracking log to be

persistent. When no errors occur checksum context is automatically preserved

between fio jobs and fio runs. On revert of a virtual machine snapshot if

the tracking log is restored from the time of the snapshot then checksum

context is again preserved. There is a tracking log for each data file.


Tracking log filename format is: [dir] / [filename].tracking.log

where:

   filename - is name of file system file or block device name like “sdb”

   dir - is log directory that defaults to directory of data file.

         For block devices, dir defaults to the process current default

         directory.


The tracking log is plain text and contains data from when it was first
created:

the data file name it is tracking, the size of the data file, the starting

file offset for I/Os, its verify_interval option setting. From the last

save of the log it has: timestamp of last save and a checksum of the

tracking log contents. For checksums, Bit 0 = 1 defines a valid checksum.

Bit 0 = 0 signifies special case entries (dddddddc indicates a trimmed block

and 0 indicates an undefined entry).


Tracking Log Example with "--" comments added:


$ cat xxx.tracking.log

Fio-tracking-log-version: 1

DataFileName: xxx

DataFileSize: 2048

DataFileOffset: 0

DataFileVerifyInterval: 512

TrackingLogSaveTimestamp: 2017-02-23T14:25:32.446981

TrackingLogChecksum: cae34cd8

VerifyIntervalChecksums:

4028ab33    -- Checksums from read or write of 3 blocks, Bit 0 = 1

a450bffb

81858a3

dddddddc    -- Means trimmed block, Bit 0 = 0

0           -- Means undefined entry never been accessed, Bit 0 = 0

$


Tracking arguments are:


verify_track=bool - enables checksum tracking in memory

verify_track_log=bool - enable savings and restoring of tracking log

verify_track_required=bool - By default fio will create a log on the fly.

    If a log is found at the start it is read and then the log file is
deleted.

    If any error occurs during the fio run then the tracking log is not

    written on close so compromised logs do not cause false failures.
However

    testing requiring absolute data integrity guarantees will want to use
this

    option to require that the tracking log always be present between fio
jobs

    or at the start of a new fio run.

verify_track_dir=str - Specifies dir to place all tracking logs. It is
advisable

    when evaluating the data integrity of device to place the tracking log
on a

    different, more trusted device.

verify_track_trim_zero=bool - When no tracking array entry exists, this
option

    allows a zeroed block from prior fio run to be treated as previously
trimmed

    instead of as data corruption. Once the array entry for a block is
defined,

    this option is no longer used as the array entry determines the required

    verification.

debug=chksum - a new debug option allows tracing of all checksum entry

    additions/changes to the tracking array or entry use in verification


There are a couple considerations to be aware of when using tracking log.

Tracking log is sticky. If you change the following options that make

the tracking log no longer match the data layout then you will receive

a persistent error until the tracking log is recreated: size= or offset=

or verify_interval= options. You do get a friendly error indicating

what tracking log file to delete to start with a fresh tracking log. Note

if a fio run an fails with other errors, the tracking log is discarded so
that

stale checksums do not cause false failures on subsequent runs.


The tracking log uses 4 bytes for tracking each verify_interval block

in the data file or block device as specified by 4*(size/verify_interval).

So there are scaling implications for memory usage and log file size.

However blocks are only tracked for the active I/O range from:

offset - (offset+size-1).


The performance impact of the few extra I/Os to read and write the tracking
log

between fio jobs and fio runs is negligible since one is not usually
verifying

data when doing performance studies. There is no overhead when verify
tracking

is disabled and no extra I/Os when verify_track_log is disabled.



Data Corruption Troubleshooting

-------------------------------


When a corruption occurs immediate analysis can reveal many clues as to the

source of the corruption. Is the corruption persistent? In memory and on
disk?

The exact pattern of the corruption is often revealing: At the beginning of

an I/O block? Sector aligned? All zeroes or garbage? What is the exact range

of the corruption? Is corruption a stale but intact prior version of the

block?


When a corruption is detected, three possible corrupt data files are
created:


*.received - the corrupt data which is possibly a verify_interval block
within

              the full block used in the I/O.

*. complete - the full block used in the I/O

*. expected - if the block's header is intact, the expected data pattern for

              the *.received block can be generated


Two scripts exist in the analyze directory to assist in analysis:


corruption_triage.sh - a bash script that contains a sequence of diagnostic

              steps

fio_header.py - a python script that displays the contents of the block
header

              in a corrupt data file.




Here are the related parameter descriptions from HOWTO:


option verify_track=bool


Fio normally verifies data within a verify_interval with checksums and file

offsets embedded in the data. However a prior version of a block could be

returned and verified successfully. When verify_track is enabled the
checksum

for every verify_interval in the file is stored in a table and all read data

must match the checksums in the table. The tracking table is sized as

(size / verify_interval) * 4 bytes. For very large size= option settings,

such a large memory allocation may impact testing. Reads assume that the
entire

file has been previously written with a verification format using the same

verify_interval. When verify_track is enabled, all reads are verified,
whether

writes are present in the workload or not. Sharing files by threads within
a job

is supported but not between jobs running concurrently so use the stonewall

option when more than one non-global job is present. Verify of trimmed
blocks

is described for the verify_track_trim_zero option. When disabled, fio falls

back on verification described under the verify option. The restrictions
when

enabling the verify_track option are:

- randommap is required

- softrandommap is not supported

- lfsr random generator not supported when using multiple block sizes

- stonewall option required when more than one job present

- file size must be an even multiple of the block size when iodepth > 1

- verify_backlog not supported when iodepth > 1

- verify_async is not supported

- file sharing between concurrent jobs not supported

- numjobs must be 1

- io_submit_mode must be set to "inline"

- verify=null or pattern are not supported

- verify_only is not supported

- io_submit_mode must be set to 'inline'

- supplying a sequence number with rw option is not supported

- experimental_verify is not supported

Defaults to off.


You can enable verify_track for individual jobs and each job will start with

a empty table which is filled in as each block is initially read or written
and

enforced on subsequent reads within the job. For persistent tracking of
checksums

between jobs or fio runs, see verify_track_log.


option verify_track_log=bool


If set when verify_track is set then on a clean shutdown, fio writes the
checksum

for each data block that has been read or written to a log named

(datafilename).tracking.log. If set when fio reopens this data file and a
tracking

log exists then the checksums are read into the tracking table and used to
validate

every subsequent read. This allows rigorous validation of data integrity as
data

files are passed between fio jobs or over the termination of fio and
restart on

the same system or on another system or after an OS reboot. Reverting a
virtual

machine to a snapshot can be tested by saving the tracking log after a
successful

fio run and later restoring the saved log after reverting the virtual
machine.

The log is deleted after being read in, so on abnormal termination no stale

checksums can be used. This option, the data file size and verify_interval

parameters should not change between jobs in the same run or on restart of
fio.

Defaults to off. verify_track_dir defines the tracking log's directory.


option verify_track_required=bool


If set when verify_track_log is set then the tracking log for each file
must exist

at the start of a fio job or an error is returned. Defaults to off which is

the case for the first job in a new fio run. Subsequent jobs in this run can

require use of the tracking log. If set to off then any tracking log found
will be

used otherwise an empty tracking table is used. If a prior fio run created a

tracking log for the data file then all jobs can require use of the
tracking log.


option verify_track_dir=str


If verify_track_log is set then this defines the single directory for all
tracking

logs. The default is to use the same directory where each data file resides.

When filename points to a block device or pipe then the directory defaults
to the

current process default directory. To assure data integrity of the tracking
log,

each tracking log also contains its own checksum. However when checking a
device

for data integrity it is advisable to place tracking logs containing
checksums on

a different, more trusted device.


option verify_track_trim_zero=bool


Typically a read of a trimmed block that has not been re-written will
return a block

of zeros. If set with verify_tracking enabled then all zeroed blocks with
no tracking

information are assumed to have resulted from a trim. If clear zeroed
blocks are

treated as corruption. If your device does not return zeroed blocks for
reads after

a trim then it cannot participate in tracking verification. Fio sets to 1
if trims

are present in the rw argument and defaults 0 otherwise. You would only use
this when

verify_tracking is enabled, trims are not specified in the rw argument and
a prior

fio job or run had performed trims.


option readwrite=str, rw=str


Type of I/O pattern. Accepted values are:


read

Sequential reads.

write

Sequential writes.

randwrite

Random writes.

randread

Random reads.

rw,readwrite

Sequential mixed reads or writes.

randrw

Random mixed reads or writes.


Trim I/O has several requirements:

- File system and OS support varies but Linux block devices

  accept trims. You need privilege to write to a Linux block device.

  See example fio: track-mem.fio

- Often minimal block size required. Linux on VMware requires

  at least 1 MB in size aligned on 1 MB boundary

- VMware requires minimum VM OS hardware level 11

- To verify the trim I/Os requires verify_track


Trim I/O patterns are:


trim

Sequential trims

readtrim

Sequential mixed reads or trims

trimwrite

Sequential mixed trim then write. Each block

will be trimmed first, then written to.

writetrim

Sequential mixed writes or trims.

Each block will be trimmed or written.

rwt,readwritetrim

Sequential mixed reads/writes/trims

randtrim

Random trims

randreadtrim

Random mixed reads or trims

randwritetrim

Random mixed writes or trims

randrwt

Random mixed reads/writes/trims


Fio defaults to read if the option is not specified.  For the mixed I/O

types, the default is to split them 50/50.  For certain types of I/O the

result may still be skewed a bit, since the speed may be different. It is

possible to specify a number of I/O's to do before getting a new offset,

this is done by appending a ``:[nr]`` to the end of the string given.  For a

random read, it would look like ``rw=randread:8`` for passing in an offset

modifier with a value of 8. If the suffix is used with a sequential I/O

pattern, then the value specified will be added to the generated offset for

each I/O.  For instance, using ``rw=write:4k`` will skip 4k for every

write. It turns sequential I/O into sequential I/O with holes.  See the

:option:`rw_sequencer` option. Storage array vendors often require

trims to use a minimum block size.


option rwtmix=int[,int][,int]


When trims along with reads and/or writes are specified in the rw option
then

this is the preferred argument for specifying mix percentages. The argument
is

of the form: read,write,trim and the percentages must total 100.  Note any

argument may be empty to leave that value at its default from rwmix*
arguments

of 50,50,0. If a trailing comma isn't given, the remainder will inherit

the last value set.

[-- Attachment #2: Type: text/html, Size: 57590 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Fwd: Fio Checksum tracking and enhanced trim workloads
  2017-05-08  3:54 Fio Checksum tracking and enhanced trim workloads paul houlihan
@ 2017-05-08 14:18 ` Jens Axboe
  2017-05-08 20:05   ` Sitsofe Wheeler
  0 siblings, 1 reply; 8+ messages in thread
From: Jens Axboe @ 2017-05-08 14:18 UTC (permalink / raw)
  To: fio; +Cc: paul houlihan

Paul is having trouble sending this to the reflector, let's see
if this works.


-------- Forwarded Message --------
Subject: 	Fio Checksum tracking and enhanced trim workloads
Date: 	Sun, 7 May 2017 23:54:16 -0400
From: 	paul houlihan <phoulihan9@gmail.com>
To: 	fio@vger.kernel.org, Jens Axboe <axboe@kernel.dk>



I have a submission for fio that enhances the data corruption detection and diagnosis capabilities taking fio from pretty good corruption detection to absolute guarantees. I would like these changes on the tracking branch???? to be reviewed and considered for inclusion to fio. A quick review would be helpful as I am losing access to test systems shortly.


These changes were used by a Virtual Machine caching company to assure data integrity. Most testing was on Linux 64 bits and windows 32/64 bits. The windows build still had an issue with compile time asserts in libfio.c that I worked around by commenting out the asserts as this looked like a performance restriction. This should be researched more. The initial development was on version fio 2.2.10 sources and I just ported the changes to fio latest sources and tested on linux but haven’t yet test on windows. No testing on all other fio supported OSes was done, although the changes are almost exclusively to OS independent code.


The absolute guarantees are brought about by tracking checksums to prevent a stale but intact prior version of a block being returned and by verifying all reads. I was surprised to learn about the number of times fio performed concurrent I/O to the same blocks which yields indeterminate results that prevent data integrity verification. Thus a number of options are not supported when tracking is enabled. 


Finally I have enhanced the usage of trims and am able to verify data integrity of these operations in an integrated fashion.


Here is a list of changes in this submission:

 * Bug where expected version of verify_interval is not generated correctly, dummy io_u not setup correctly

 * Bug where unknown header_interval referenced in HOWTO, fixed a bunch of typos.

 * Bug where windows hangs on nano sleep in windows 7.

 * Also stonewall= option does not seem to work on windows 7, seems fixed in later releases so painfully worked around this by having separate init and run fio scripts. No change was made here but just mentioning this in passing.

 * Fixed bug where FD_IO logging was screwed up in io_c.h. Here is example of logging problem:

io       2212  io complete: io_u 0x787280: off=1048576/len=2097152/ddir=0io       2212  /b.datio       2212  

io       2212  fill_io_u: io_u 0x787280: off=3145728/len=2097152/ddir=1io       2212  /b.datio       2212  

io       2212  prep: io_u 0x787280: off=3145728/len=2097152/ddir=1io       2212  /b.datio       2212  

io       2212  ->prep(0x787280)=0

io       2212  queue: io_u 0x787280: off=3145728/len=2097152/ddir=1io       2212  /b.datio       2212  

 * In order to make fio into an superb data integrity test tool, a number of shortcomings were addressed. New verify_track switch enables in memory tracking of checksums within each fio job, preventing a block from rolling back to prior version. The in memory checksums can be written to a tracking log file to provide an absolute checksum guarantees between fio jobs or between fio runs. Verification of trim operations is supported in an integrated fashion. See HOWTO description of verify_tracking. verify_tacking_log, verify_tracking_required, verify_tracking_dir, verify_trim_zero

 * Enhanced description surrounding corruption added to HOWTO as well as providing some corruption analyze tools.

 * Bad header will dump received buffer into *.received before you gave you an error message 

 * If verify_interval is less than the block size, fio will now always dump the complete buffer in an additional file called *.complete. Seeing whole buffer can reveal more about the corruption pattern.

 * Changed the printing of the hex checksum to display in MSB to LSB order to facilitate compares to memory dumps and debug logging

 * Added a dump of the complete return buffer on trim write verification failure. 

 * Debug logging was being truncated at the end of a job so you could not see the full set of debug log messages, so added a log flush at the end of each job if debug= switch is used.

 * rw=readwrite seems to have independent last_pos read/write pointers as you sequentially access the file. If the mix is 50/50 then you could have fio reading and writing the same block as the read and write pointer cross each other which is not reliably verifiable. This pattern result is chaos and contradicts all the other sequential patterns and even randrw. Overlapping I/O makes little sense and is usually a sign of a broken application. Moreover readwrite workload would not complete a sequential pass over the entire file which everyone I spoke to assumed it was doing. So a change was made to the existing read/write workload functionality. Now the max of the file’s last_pos pointers for DDIR_READ and DDIR_WRITE are used for selecting the next offset as we sequentially scan a file. If the old behavior is somehow useful then an option can be added to preserve it. If preserved, it should never be the default and should disable verification.


My changes revolve around maintaining the last_pos array in a special way. When multiple operations (read/write/trim) are requested by a workload then as the last position is changed, the changes are reflected in all three entries in the array. This way a randomly selected next operation always use the right last_pos. However we retained the old behavior for single operation workloads and for trimwrite which operates like a single operation workload.

 * Synchronous Trim I/O completions were not updating bytes_issued in backend.c and thus trimwrite was actually making 2 passes of the file.

 * I kept the new verify_tracking verification entirely separate from the experimental_verify code. These new tracking changes provides fully persistent verification of trims integrated into standard verify, so we might want to consider deprecating support for experimental_verify. Note that verify_track and experimental_verify cannot both be enabled.

 * With the wide adoption of thin LUN datastores and recent expanded OS support for trim operations to reclaim unused space, testing trim operations in a wide variety of contexts has been a necessity. Added some new trim I/O workloads to the existing trim workloads, that require use of verify_tracking option to verify:

trimSequential trims

readtrimSequential mixed reads and trims

writetrimSequential mixed writes and trims.

Each block will be trimmed or written.

readwritetrimSequential mixed reads/writes/trims

randtrimRandom trims

randreadtrimRandom mixed reads and trims

randwritetrimRandom mixed writes and trims

randrwtRandom mixed reads/writes/trims

 * A second change to existing fio functionality involves an inconsistency of counting read verification bytes against the size= argument. Some rw= workloads count read verification I/Os or bytes against size= values (like readwrite and randrw) and some do not  like write, trim and trimwrite. Counting read verifications bytes makes it hard to predict the number of bytes or I/Os that will be performed in the readwrite workload and the new rw= workloads increases the unpredictability with even more read verifications in a readwritetrim workload. Normally I expect that fio should process all the bytes in a file pass but when the bytes from read verifies count towards the total bytes to process in size=, only part of the file is processed. So I made it consistent for size and io_limit by not counting read verify bytes. One could argue that number_os= could also be similarly changed but I left this alone and it still uses raw I/O counts which include read verification I/Os.
Another justification is that this_io_bytes never records verification reads for the dry_run and we need dry_run and do_io to be in synch. Note this explains why I removed code to add extra bytes to total_bytes in do_io for verify_backlog. 

 * Seems like the processing of TD_F_VER_NONE is backwards from its name. If verify != VERIFY_NONE then the bit is set but the name implies it should be clear. So now it sets the bit only if verify == VERIFY_NONE to avoid this very confusing state.

 * Added a sync and invalidate after the close in iolog.c ipo_special(). This is needed if you capture checksums in the tracking log and there is a close followed immediately by an open. The close is not immediate if you have iodepth set to a large number. The file is still marked “open” but “closing” on return from the close  and will close only after the last I/O completes. The sync avoids the assert on trying to open an already open file which has a close pending.

 * —read_iolog does not support trims at this time.

 * io_u.c get_next_seq_offset() seems to suggest that ddir_seq_add can be negative but there are a number of unhandled cases with such a setting. Add TODOs to document issues. I have a number of reservations about the correctness of get_next_seq_offset(). Note whenever I saw a possible problem in the code but did not have time to research it, I added a TODO comment.

 * io_u.c get_next_seq_offset() has a problem when it uses fixed value when relative values are what is being manipulated, so this code:

if (pos >= f->real_file_size)

pos = f->file_offset;

should be:

if (pos >= f->io_size)

pos = 0;

 * Given there are a couple of changes to existing fio workload behavior, you might want to consider going to a V3.0. 




Here are two new sections on Verification Tracking and Data Corruption Troubleshooting from HOWTO:


Verification Tracking

---------------------


Absolute data integrity guarantee is the primary mission of a storage

software/hardware subsystem. Fio is good at detecting data corruption but

there are gaps. Currently only when rw option is set to read only are

workload reads verified. It is desirable to validate all reads in addition

to writes to protect against data rolling back to earlier versions.


With the addition of the block's offset to the header in recent fio releases,

block data returned for another block will be flagged as corrupt. However

a limitation of the fio header and data embedded checksums is that fio cannot

detect if a prior intact version of a block was returned on a read. If the

header and data checksum match the block is declared valid.


These limitations can be addressed by setting the verify_track option which

allocates a memory array to track the header and data checksums to assure

data integrity is absolute. The array starts out empty at the beginning of

each fio job and is filled in as reads or writes occur, once defined the

checksums from succeeding I/Os must all match. This option extends checksum

verification to all reads in all workloads, not just the read-only workloads.


However use of verify_track requires that fio avoid overlapping, concurrent

reads and writes to the same block. Reading and writing a block at the same

time yields indeterminate results and making guaranteeing data integrity

impossible. So some fio options where this is a risk are disabled when using

verify_track. See verify_track argument for list of restrictions.


Even better verification would validate data more persistently. You would

like to track checksums persistently between fio jobs or between runs of fio

which could be after a shutdown/restart of the system or on a different system

that shares storage. Proving seamless data integrity from the application

perspective over complex failover and recovery situations like reverting a

virtual machine to a prior snapshot is quite valuable.


Also the popularity of thin LUNs in the storage world has caused problems

if the unused disk space is not reclaimed by use of trims. So we would like

to have the ability to mix and match trims with reads and writes. The rw option

now supports a full set of combinations and the rwtmix=read%,write%,trim% option

allows specifying the mix percentages of all three types of I/O in one argument.

However trims do have special requirements as documented under the rw option.

Finally we would like to verify trims operations. If you read a trimmed block

before re-writing the block, it should return a block of zeroes.


The verify_track_log option permits persistent checksum tracking and

verification of trims by enabling the saving of the tracking array to a tracking

log on the close of a data file at the end of a fio job and reading it back in

at the next start. A clean shutdown of fio is needed for tracking log to be

persistent. When no errors occur checksum context is automatically preserved

between fio jobs and fio runs. On revert of a virtual machine snapshot if

the tracking log is restored from the time of the snapshot then checksum

context is again preserved. There is a tracking log for each data file.


Tracking log filename format is: [dir] / [filename].tracking.log

where:

   filename - is name of file system file or block device name like “sdb”

   dir - is log directory that defaults to directory of data file.

         For block devices, dir defaults to the process current default

         directory.


The tracking log is plain text and contains data from when it was first created:

the data file name it is tracking, the size of the data file, the starting

file offset for I/Os, its verify_interval option setting. From the last

save of the log it has: timestamp of last save and a checksum of the

tracking log contents. For checksums, Bit 0 = 1 defines a valid checksum.

Bit 0 = 0 signifies special case entries (dddddddc indicates a trimmed block

and 0 indicates an undefined entry).


Tracking Log Example with "--" comments added:


$ cat xxx.tracking.log

Fio-tracking-log-version: 1

DataFileName: xxx

DataFileSize: 2048

DataFileOffset: 0

DataFileVerifyInterval: 512

TrackingLogSaveTimestamp: 2017-02-23T14:25:32.446981

TrackingLogChecksum: cae34cd8

VerifyIntervalChecksums:

4028ab33    -- Checksums from read or write of 3 blocks, Bit 0 = 1

a450bffb

81858a3

dddddddc    -- Means trimmed block, Bit 0 = 0

0           -- Means undefined entry never been accessed, Bit 0 = 0

$


Tracking arguments are:


verify_track=bool - enables checksum tracking in memory

verify_track_log=bool - enable savings and restoring of tracking log

verify_track_required=bool - By default fio will create a log on the fly.

    If a log is found at the start it is read and then the log file is deleted.

    If any error occurs during the fio run then the tracking log is not

    written on close so compromised logs do not cause false failures. However

    testing requiring absolute data integrity guarantees will want to use this

    option to require that the tracking log always be present between fio jobs

    or at the start of a new fio run.

verify_track_dir=str - Specifies dir to place all tracking logs. It is advisable

    when evaluating the data integrity of device to place the tracking log on a

    different, more trusted device.

verify_track_trim_zero=bool - When no tracking array entry exists, this option

    allows a zeroed block from prior fio run to be treated as previously trimmed

    instead of as data corruption. Once the array entry for a block is defined,

    this option is no longer used as the array entry determines the required

    verification.

debug=chksum - a new debug option allows tracing of all checksum entry

    additions/changes to the tracking array or entry use in verification


There are a couple considerations to be aware of when using tracking log.

Tracking log is sticky. If you change the following options that make

the tracking log no longer match the data layout then you will receive

a persistent error until the tracking log is recreated: size= or offset=

or verify_interval= options. You do get a friendly error indicating

what tracking log file to delete to start with a fresh tracking log. Note

if a fio run an fails with other errors, the tracking log is discarded so that

stale checksums do not cause false failures on subsequent runs.


The tracking log uses 4 bytes for tracking each verify_interval block

in the data file or block device as specified by 4*(size/verify_interval).

So there are scaling implications for memory usage and log file size.

However blocks are only tracked for the active I/O range from:

offset - (offset+size-1).


The performance impact of the few extra I/Os to read and write the tracking log

between fio jobs and fio runs is negligible since one is not usually verifying

data when doing performance studies. There is no overhead when verify tracking

is disabled and no extra I/Os when verify_track_log is disabled.



Data Corruption Troubleshooting

-------------------------------


When a corruption occurs immediate analysis can reveal many clues as to the

source of the corruption. Is the corruption persistent? In memory and on disk?

The exact pattern of the corruption is often revealing: At the beginning of

an I/O block? Sector aligned? All zeroes or garbage? What is the exact range

of the corruption? Is corruption a stale but intact prior version of the

block?


When a corruption is detected, three possible corrupt data files are created:


*.received - the corrupt data which is possibly a verify_interval block within

              the full block used in the I/O.

*. complete - the full block used in the I/O

*. expected - if the block's header is intact, the expected data pattern for

              the *.received block can be generated


Two scripts exist in the analyze directory to assist in analysis:


corruption_triage.sh - a bash script that contains a sequence of diagnostic

              steps

fio_header.py - a python script that displays the contents of the block header

              in a corrupt data file.




Here are the related parameter descriptions from HOWTO:


option verify_track=bool


Fio normally verifies data within a verify_intervalwith checksums and file

offsets embedded in the data. However a prior version of a block could be

returned and verified successfully. When verify_track is enabled the checksum

for every verify_interval in the file is stored in a table and all read data

must match the checksums in the table. The tracking table is sized as

(size / verify_interval) * 4 bytes. For very large size= option settings,

such a large memory allocation may impact testing. Reads assume that the entire

file has been previously written with a verification format using the same

verify_interval. When verify_track is enabled, all reads are verified, whether

writes are present in the workload or not. Sharing files by threads within a job

is supported but not between jobs running concurrently so use the stonewall

option when more than one non-global job is present. Verify of trimmed blocks

is described for the verify_track_trim_zero option. When disabled, fio falls

back on verification described under the verify option. The restrictions when

enabling the verify_track option are:

- randommap is required

- softrandommap is not supported

- lfsr random generator not supported when using multiple block sizes

- stonewall option required when more than one job present

- file size must be an even multiple of the block size when iodepth > 1

- verify_backlog not supported when iodepth > 1

- verify_async is not supported

- file sharing between concurrent jobs not supported

- numjobs must be 1

- io_submit_mode must be set to "inline"

- verify=null or pattern are not supported

- verify_only is not supported

- io_submit_mode must be set to 'inline'

- supplying a sequence number with rw option is not supported

- experimental_verify is not supported

Defaults to off.


You can enable verify_track for individual jobs and each job will start with

a empty table which is filled in as each block is initially read or written and

enforced on subsequent reads within the job. For persistent tracking of checksums

between jobs or fio runs, see verify_track_log.


option verify_track_log=bool


If set when verify_track is set then on a clean shutdown, fio writes the checksum

for each data block that has been read or written to a log named

(datafilename).tracking.log. If set when fio reopens this data file and a tracking

log exists then the checksums are read into the tracking table and used to validate

every subsequent read. This allows rigorous validation of data integrity as data

files are passed between fio jobs or over the termination of fio and restart on

the same system or on another system or after an OS reboot. Reverting a virtual

machine to a snapshot can be tested by saving the tracking log after a successful

fio run and later restoring the saved log after reverting the virtual machine.

The log is deleted after being read in, so on abnormal termination no stale

checksums can be used. This option, the data file size and verify_interval

parameters should not change between jobs in the same run or on restart of fio.

Defaults to off. verify_track_dir defines the tracking log's directory.


option verify_track_required=bool


If set when verify_track_log is set then the tracking log for each file must exist

at the start of a fio job or an error is returned. Defaults to off which is

the case for the first job in a new fio run. Subsequent jobs in this run can

require use of the tracking log. If set to off then any tracking log found will be

used otherwise an empty tracking table is used. If a prior fio run created a

tracking log for the data file then all jobs can require use of the tracking log.


option verify_track_dir=str


If verify_track_log is set then this defines the single directory for all tracking

logs. The default is to use the same directory where each data file resides.

When filename points to a block device or pipe then the directory defaults to the

current process default directory. To assure data integrity of the tracking log,

each tracking log also contains its own checksum. However when checking a device

for data integrity it is advisable to place tracking logs containing checksums on

a different, more trusted device.


option verify_track_trim_zero=bool


Typically a read of a trimmed block that has not been re-written will return a block

of zeros. If set with verify_tracking enabled then all zeroed blocks with no tracking

information are assumed to have resulted from a trim. If clear zeroed blocks are

treated as corruption. If your device does not return zeroed blocks for reads after

a trim then it cannot participate in tracking verification. Fio sets to 1 if trims

are present in the rw argument and defaults 0 otherwise. You would only use this when

verify_tracking is enabled, trims are not specified in the rw argument and a prior

fio job or run had performed trims.


option readwrite=str, rw=str


Type of I/O pattern. Accepted values are:


read

Sequential reads.

write

Sequential writes.

randwrite

Random writes.

randread

Random reads.

rw,readwrite

Sequential mixed reads or writes.

randrw

Random mixed reads or writes.


Trim I/O has several requirements:

- File system and OS support varies but Linux block devices

  accept trims. You need privilege to write to a Linux block device.

  See example fio: track-mem.fio

- Often minimal block size required. Linux on VMware requires

  at least 1 MB in size aligned on 1 MB boundary

- VMware requires minimum VM OS hardware level 11

- To verify the trim I/Os requires verify_track


Trim I/O patterns are:


trim

Sequential trims

readtrim

Sequential mixed reads or trims

trimwrite

Sequential mixed trim then write. Each block

will be trimmed first, then written to.

writetrim

Sequential mixed writes or trims.

Each block will be trimmed or written.

rwt,readwritetrim

Sequential mixed reads/writes/trims

randtrim

Random trims

randreadtrim

Random mixed reads or trims

randwritetrim

Random mixed writes or trims

randrwt

Random mixed reads/writes/trims


Fio defaults to read if the option is not specified.  For the mixed I/O

types, the default is to split them 50/50.  For certain types of I/O the

result may still be skewed a bit, since the speed may be different. It is

possible to specify a number of I/O's to do before getting a new offset,

this is done by appending a ``:[nr]`` to the end of the string given.  For a

random read, it would look like ``rw=randread:8`` for passing in an offset

modifier with a value of 8. If the suffix is used with a sequential I/O

pattern, then the value specified will be added to the generated offset for

each I/O.  For instance, using ``rw=write:4k`` will skip 4k for every

write. It turns sequential I/O into sequential I/O with holes.  See the

:option:`rw_sequencer` option. Storage array vendors often require

trims to use a minimum block size.


option rwtmix=int[,int][,int]


When trims along with reads and/or writes are specified in the rw option then

this is the preferred argument for specifying mix percentages. The argument is

of the form: read,write,trim and the percentages must total 100.  Note any

argument may be empty to leave that value at its default from rwmix* arguments

of 50,50,0. If a trailing comma isn't given, the remainder will inherit

the last value set.



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fio Checksum tracking and enhanced trim workloads
  2017-05-08 14:18 ` Fwd: " Jens Axboe
@ 2017-05-08 20:05   ` Sitsofe Wheeler
  2017-05-09  1:01     ` paul houlihan
  0 siblings, 1 reply; 8+ messages in thread
From: Sitsofe Wheeler @ 2017-05-08 20:05 UTC (permalink / raw)
  To: paul houlihan; +Cc: Jens Axboe, fio

> -------- Forwarded Message --------
> Subject:        Fio Checksum tracking and enhanced trim workloads
> Date:   Sun, 7 May 2017 23:54:16 -0400
> From:   paul houlihan <phoulihan9@gmail.com>
> To:     fio@vger.kernel.org, Jens Axboe <axboe@kernel.dk>
>
> I have a submission for fio that enhances the data corruption detection and diagnosis capabilities taking fio from pretty good corruption detection to absolute guarantees. I would like these changes on the tracking branch???? to be reviewed and considered for inclusion to fio. A quick review would be helpful as I am losing access to test systems shortly.
>
>
> These changes were used by a Virtual Machine caching company to assure data integrity. Most testing was on Linux 64 bits and windows 32/64 bits. The windows build still had an issue with compile time asserts in libfio.c that I worked around by commenting out the asserts as this looked like a performance restriction. This should be researched more. The initial development was on version fio 2.2.10 sources and I just ported the changes to fio latest sources and tested on linux but haven’t yet test on windows. No testing on all other fio supported OSes was done, although the changes are almost exclusively to OS independent code.
>
>
> The absolute guarantees are brought about by tracking checksums to prevent a stale but intact prior version of a block being returned and by verifying all reads. I was surprised to learn about the number of times fio performed concurrent I/O to the same blocks which yields indeterminate results that prevent data integrity verification. Thus a number of options are not supported when tracking is enabled.

Sounds interesting! Are the patches available on Github or otherwise published?

I have a couple of patches related to overlapping I/Os (see
https://github.com/axboe/fio/pull/343 and
https://github.com/sitsofe/fio/commit/6b4cfeb95fca2d75a291c54ca20162470c837a38
) because I can't afford for such I/O to be sent to the storage.
Perhaps it would be possible to amend your work to detect the overlaps
in a more efficient manner?

How does this work play with varying blocksizes (bsrange etc)?

> These limitations can be addressed by setting the verify_track option which
>
> allocates a memory array to track the header and data checksums to assure
>
> data integrity is absolute. The array starts out empty at the beginning of
>
> each fio job and is filled in as reads or writes occur, once defined the
>
> checksums from succeeding I/Os must all match. This option extends checksum
>
> verification to all reads in all workloads, not just the read-only workloads.

Are you sure? I thought readwrite verifying workloads caused the reads
to be verifying if the block had been written in the same workload?

> fio_header.py - a python script that displays the contents of the block header
>
>               in a corrupt data file.

Many would find this useful. Perhaps it could be ported to C like
t/verify-state.c ?

> Here are the related parameter descriptions from HOWTO:
>
>
> option verify_track=bool
>
>
> Fio normally verifies data within a verify_intervalwith checksums and file
>
> offsets embedded in the data. However a prior version of a block could be
>
> returned and verified successfully. When verify_track is enabled the checksum
>
> for every verify_interval in the file is stored in a table and all read data
>
> must match the checksums in the table. The tracking table is sized as
>
> (size / verify_interval) * 4 bytes. For very large size= option settings,
>
> such a large memory allocation may impact testing. Reads assume that the entire
>
> file has been previously written with a verification format using the same
>
> verify_interval. When verify_track is enabled, all reads are verified, whether
>
> writes are present in the workload or not. Sharing files by threads within a job
>
> is supported but not between jobs running concurrently so use the stonewall
>
> option when more than one non-global job is present. Verify of trimmed blocks
>
> is described for the verify_track_trim_zero option. When disabled, fio falls
>
> back on verification described under the verify option. The restrictions when
>
> enabling the verify_track option are:
>
> - randommap is required
>
> - softrandommap is not supported
>
> - lfsr random generator not supported when using multiple block sizes
>
> - stonewall option required when more than one job present
>
> - file size must be an even multiple of the block size when iodepth > 1
>
> - verify_backlog not supported when iodepth > 1
>
> - verify_async is not supported
>
> - file sharing between concurrent jobs not supported
>
> - numjobs must be 1
>
> - io_submit_mode must be set to "inline"

Yeah I've noticed a number of issues trying to verify with
io_submit_mode=offload ...

> option verify_track_log=bool
>
>
> If set when verify_track is set then on a clean shutdown, fio writes the checksum
>
> for each data block that has been read or written to a log named
>
> (datafilename).tracking.log. If set when fio reopens this data file and a tracking
>
> log exists then the checksums are read into the tracking table and used to validate
>
> every subsequent read. This allows rigorous validation of data integrity as data
>
> files are passed between fio jobs or over the termination of fio and restart on
>
> the same system or on another system or after an OS reboot. Reverting a virtual
>
> machine to a snapshot can be tested by saving the tracking log after a successful
>
> fio run and later restoring the saved log after reverting the virtual machine.
>
> The log is deleted after being read in, so on abnormal termination no stale
>
> checksums can be used. This option, the data file size and verify_interval
>
> parameters should not change between jobs in the same run or on restart of fio.
>
> Defaults to off. verify_track_dir defines the tracking log's directory.

How does this interact with verify_state_*
(http://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-arg-verify-state-load
) ?

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fio Checksum tracking and enhanced trim workloads
  2017-05-08 20:05   ` Sitsofe Wheeler
@ 2017-05-09  1:01     ` paul houlihan
  2017-05-09  1:51       ` paul houlihan
  0 siblings, 1 reply; 8+ messages in thread
From: paul houlihan @ 2017-05-09  1:01 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: Jens Axboe, fio

[-- Attachment #1: Type: text/plain, Size: 6923 bytes --]

The second sentence was incorrect.

It should read: I would like the changes at
https://github.com/phoulihan9/fio/pull/1/commits  to be reviewed and
considered for inclusion to fio.

On Mon, May 8, 2017 at 4:05 PM, Sitsofe Wheeler <sitsofe@gmail.com> wrote:

> > -------- Forwarded Message --------
> > Subject:        Fio Checksum tracking and enhanced trim workloads
> > Date:   Sun, 7 May 2017 23:54:16 -0400
> > From:   paul houlihan <phoulihan9@gmail.com>
> > To:     fio@vger.kernel.org, Jens Axboe <axboe@kernel.dk>
> >
> > I have a submission for fio that enhances the data corruption detection
> and diagnosis capabilities taking fio from pretty good corruption detection
> to absolute guarantees. I would like these changes on the tracking
> branch???? to be reviewed and considered for inclusion to fio. A quick
> review would be helpful as I am losing access to test systems shortly.
> >
> >
> > These changes were used by a Virtual Machine caching company to assure
> data integrity. Most testing was on Linux 64 bits and windows 32/64 bits.
> The windows build still had an issue with compile time asserts in libfio.c
> that I worked around by commenting out the asserts as this looked like a
> performance restriction. This should be researched more. The initial
> development was on version fio 2.2.10 sources and I just ported the changes
> to fio latest sources and tested on linux but haven’t yet test on windows.
> No testing on all other fio supported OSes was done, although the changes
> are almost exclusively to OS independent code.
> >
> >
> > The absolute guarantees are brought about by tracking checksums to
> prevent a stale but intact prior version of a block being returned and by
> verifying all reads. I was surprised to learn about the number of times fio
> performed concurrent I/O to the same blocks which yields indeterminate
> results that prevent data integrity verification. Thus a number of options
> are not supported when tracking is enabled.
>
> Sounds interesting! Are the patches available on Github or otherwise
> published?
>
> I have a couple of patches related to overlapping I/Os (see
> https://github.com/axboe/fio/pull/343 and
> https://github.com/sitsofe/fio/commit/6b4cfeb95fca2d75a291c54ca20162
> 470c837a38
> ) because I can't afford for such I/O to be sent to the storage.
> Perhaps it would be possible to amend your work to detect the overlaps
> in a more efficient manner?
>
> How does this work play with varying blocksizes (bsrange etc)?
>
> > These limitations can be addressed by setting the verify_track option
> which
> >
> > allocates a memory array to track the header and data checksums to assure
> >
> > data integrity is absolute. The array starts out empty at the beginning
> of
> >
> > each fio job and is filled in as reads or writes occur, once defined the
> >
> > checksums from succeeding I/Os must all match. This option extends
> checksum
> >
> > verification to all reads in all workloads, not just the read-only
> workloads.
>
> Are you sure? I thought readwrite verifying workloads caused the reads
> to be verifying if the block had been written in the same workload?
>
> > fio_header.py - a python script that displays the contents of the block
> header
> >
> >               in a corrupt data file.
>
> Many would find this useful. Perhaps it could be ported to C like
> t/verify-state.c ?
>
> > Here are the related parameter descriptions from HOWTO:
> >
> >
> > option verify_track=bool
> >
> >
> > Fio normally verifies data within a verify_intervalwith checksums and
> file
> >
> > offsets embedded in the data. However a prior version of a block could be
> >
> > returned and verified successfully. When verify_track is enabled the
> checksum
> >
> > for every verify_interval in the file is stored in a table and all read
> data
> >
> > must match the checksums in the table. The tracking table is sized as
> >
> > (size / verify_interval) * 4 bytes. For very large size= option settings,
> >
> > such a large memory allocation may impact testing. Reads assume that the
> entire
> >
> > file has been previously written with a verification format using the
> same
> >
> > verify_interval. When verify_track is enabled, all reads are verified,
> whether
> >
> > writes are present in the workload or not. Sharing files by threads
> within a job
> >
> > is supported but not between jobs running concurrently so use the
> stonewall
> >
> > option when more than one non-global job is present. Verify of trimmed
> blocks
> >
> > is described for the verify_track_trim_zero option. When disabled, fio
> falls
> >
> > back on verification described under the verify option. The restrictions
> when
> >
> > enabling the verify_track option are:
> >
> > - randommap is required
> >
> > - softrandommap is not supported
> >
> > - lfsr random generator not supported when using multiple block sizes
> >
> > - stonewall option required when more than one job present
> >
> > - file size must be an even multiple of the block size when iodepth > 1
> >
> > - verify_backlog not supported when iodepth > 1
> >
> > - verify_async is not supported
> >
> > - file sharing between concurrent jobs not supported
> >
> > - numjobs must be 1
> >
> > - io_submit_mode must be set to "inline"
>
> Yeah I've noticed a number of issues trying to verify with
> io_submit_mode=offload ...
>
> > option verify_track_log=bool
> >
> >
> > If set when verify_track is set then on a clean shutdown, fio writes the
> checksum
> >
> > for each data block that has been read or written to a log named
> >
> > (datafilename).tracking.log. If set when fio reopens this data file and
> a tracking
> >
> > log exists then the checksums are read into the tracking table and used
> to validate
> >
> > every subsequent read. This allows rigorous validation of data integrity
> as data
> >
> > files are passed between fio jobs or over the termination of fio and
> restart on
> >
> > the same system or on another system or after an OS reboot. Reverting a
> virtual
> >
> > machine to a snapshot can be tested by saving the tracking log after a
> successful
> >
> > fio run and later restoring the saved log after reverting the virtual
> machine.
> >
> > The log is deleted after being read in, so on abnormal termination no
> stale
> >
> > checksums can be used. This option, the data file size and
> verify_interval
> >
> > parameters should not change between jobs in the same run or on restart
> of fio.
> >
> > Defaults to off. verify_track_dir defines the tracking log's directory.
>
> How does this interact with verify_state_*
> (http://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-
> arg-verify-state-load
> ) ?
>
> --
> Sitsofe | http://sucs.org/~sits/
>

[-- Attachment #2: Type: text/html, Size: 8757 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fio Checksum tracking and enhanced trim workloads
  2017-05-09  1:01     ` paul houlihan
@ 2017-05-09  1:51       ` paul houlihan
  2017-05-10 18:38         ` Sitsofe Wheeler
  0 siblings, 1 reply; 8+ messages in thread
From: paul houlihan @ 2017-05-09  1:51 UTC (permalink / raw)
  To: Sitsofe Wheeler; +Cc: Jens Axboe, fio

[-- Attachment #1: Type: text/plain, Size: 9549 bytes --]

Let me know if the changes are not accessible. I am new to github and not
sure I am doing it right. They are reviewable for me using the
aforementioned link..

I can look at Sitsofe overlapping changes but I don't have a lot of time
for major rework. What is here works fairly robustly, is quite useful and
that is what I am offering. Although I realize that there is huge number of
fio arguments and there might still be contradictions I don't use. I can
rework it a bit if there are small issues to tackle.

I have not spent a lot of time using bsrange. I am usually interested in
stressing specific block sizes. I think it will all just work as long as no
overlapping I/O results.

> Are you sure? I thought readwrite verifying workloads caused the reads
> to be verifying if the block had been written in the same workload?

Yes for read/write workload, all writes do result in a read verification.
However that still leaves unverified those blocks that were only read and
never written in this workload. That's not acceptable if your requirement
is absolute data integrity guarantees. My goal was that all reads (and for
that matter trims) in all workloads be verified including those blocks
written by a prior fio job or even a prior fio run on a different virtual
machine. The current fio verifies all read blocks only for read-only
workloads where it assumes a prior job has initialized the blocks.

> Many would find this useful. Perhaps it could be ported to C like
> t/verify-state.c ?
Yes I wish I had done it in C and there would be no extra dependency on
python for troubleshooting. Also the C could use the definition in verify.h
in case the header ever changes. However python is ubiquitous so it seems a
minor dependency. I could consider this if the list of requested changes is
doable.

> How does this interact with verify_state_*
Again something I have not stress. Bear in mind that if fio aborts on an
error the verify log is not created as this can lead to false corruption
detection. Note control-c is a normal exit and the tracking log is
preserved in this case. However it should just work if the tracking log is
preserved. I thought about tracking checksums within verify state but it
was not easily extended to track all checksums for all blocks.

paul

On Mon, May 8, 2017 at 9:01 PM, paul houlihan <phoulihan9@gmail.com> wrote:

> The second sentence was incorrect.
>
> It should read: I would like the changes at https://github.com/
> phoulihan9/fio/pull/1/commits  to be reviewed and considered for
> inclusion to fio.
>
> On Mon, May 8, 2017 at 4:05 PM, Sitsofe Wheeler <sitsofe@gmail.com> wrote:
>
>> > -------- Forwarded Message --------
>> > Subject:        Fio Checksum tracking and enhanced trim workloads
>> > Date:   Sun, 7 May 2017 23:54:16 -0400
>> > From:   paul houlihan <phoulihan9@gmail.com>
>> > To:     fio@vger.kernel.org, Jens Axboe <axboe@kernel.dk>
>> >
>> > I have a submission for fio that enhances the data corruption detection
>> and diagnosis capabilities taking fio from pretty good corruption detection
>> to absolute guarantees. I would like these changes on the tracking
>> branch???? to be reviewed and considered for inclusion to fio. A quick
>> review would be helpful as I am losing access to test systems shortly.
>> >
>> >
>> > These changes were used by a Virtual Machine caching company to assure
>> data integrity. Most testing was on Linux 64 bits and windows 32/64 bits.
>> The windows build still had an issue with compile time asserts in libfio.c
>> that I worked around by commenting out the asserts as this looked like a
>> performance restriction. This should be researched more. The initial
>> development was on version fio 2.2.10 sources and I just ported the changes
>> to fio latest sources and tested on linux but haven’t yet test on windows.
>> No testing on all other fio supported OSes was done, although the changes
>> are almost exclusively to OS independent code.
>> >
>> >
>> > The absolute guarantees are brought about by tracking checksums to
>> prevent a stale but intact prior version of a block being returned and by
>> verifying all reads. I was surprised to learn about the number of times fio
>> performed concurrent I/O to the same blocks which yields indeterminate
>> results that prevent data integrity verification. Thus a number of options
>> are not supported when tracking is enabled.
>>
>> Sounds interesting! Are the patches available on Github or otherwise
>> published?
>>
>> I have a couple of patches related to overlapping I/Os (see
>> https://github.com/axboe/fio/pull/343 and
>> https://github.com/sitsofe/fio/commit/6b4cfeb95fca2d75a291c5
>> 4ca20162470c837a38
>> ) because I can't afford for such I/O to be sent to the storage.
>> Perhaps it would be possible to amend your work to detect the overlaps
>> in a more efficient manner?
>>
>> How does this work play with varying blocksizes (bsrange etc)?
>>
>> > These limitations can be addressed by setting the verify_track option
>> which
>> >
>> > allocates a memory array to track the header and data checksums to
>> assure
>> >
>> > data integrity is absolute. The array starts out empty at the beginning
>> of
>> >
>> > each fio job and is filled in as reads or writes occur, once defined the
>> >
>> > checksums from succeeding I/Os must all match. This option extends
>> checksum
>> >
>> > verification to all reads in all workloads, not just the read-only
>> workloads.
>>
>> Are you sure? I thought readwrite verifying workloads caused the reads
>> to be verifying if the block had been written in the same workload?
>>
>> > fio_header.py - a python script that displays the contents of the block
>> header
>> >
>> >               in a corrupt data file.
>>
>> Many would find this useful. Perhaps it could be ported to C like
>> t/verify-state.c ?
>>
>> > Here are the related parameter descriptions from HOWTO:
>> >
>> >
>> > option verify_track=bool
>> >
>> >
>> > Fio normally verifies data within a verify_intervalwith checksums and
>> file
>> >
>> > offsets embedded in the data. However a prior version of a block could
>> be
>> >
>> > returned and verified successfully. When verify_track is enabled the
>> checksum
>> >
>> > for every verify_interval in the file is stored in a table and all read
>> data
>> >
>> > must match the checksums in the table. The tracking table is sized as
>> >
>> > (size / verify_interval) * 4 bytes. For very large size= option
>> settings,
>> >
>> > such a large memory allocation may impact testing. Reads assume that
>> the entire
>> >
>> > file has been previously written with a verification format using the
>> same
>> >
>> > verify_interval. When verify_track is enabled, all reads are verified,
>> whether
>> >
>> > writes are present in the workload or not. Sharing files by threads
>> within a job
>> >
>> > is supported but not between jobs running concurrently so use the
>> stonewall
>> >
>> > option when more than one non-global job is present. Verify of trimmed
>> blocks
>> >
>> > is described for the verify_track_trim_zero option. When disabled, fio
>> falls
>> >
>> > back on verification described under the verify option. The
>> restrictions when
>> >
>> > enabling the verify_track option are:
>> >
>> > - randommap is required
>> >
>> > - softrandommap is not supported
>> >
>> > - lfsr random generator not supported when using multiple block sizes
>> >
>> > - stonewall option required when more than one job present
>> >
>> > - file size must be an even multiple of the block size when iodepth > 1
>> >
>> > - verify_backlog not supported when iodepth > 1
>> >
>> > - verify_async is not supported
>> >
>> > - file sharing between concurrent jobs not supported
>> >
>> > - numjobs must be 1
>> >
>> > - io_submit_mode must be set to "inline"
>>
>> Yeah I've noticed a number of issues trying to verify with
>> io_submit_mode=offload ...
>>
>> > option verify_track_log=bool
>> >
>> >
>> > If set when verify_track is set then on a clean shutdown, fio writes
>> the checksum
>> >
>> > for each data block that has been read or written to a log named
>> >
>> > (datafilename).tracking.log. If set when fio reopens this data file and
>> a tracking
>> >
>> > log exists then the checksums are read into the tracking table and used
>> to validate
>> >
>> > every subsequent read. This allows rigorous validation of data
>> integrity as data
>> >
>> > files are passed between fio jobs or over the termination of fio and
>> restart on
>> >
>> > the same system or on another system or after an OS reboot. Reverting a
>> virtual
>> >
>> > machine to a snapshot can be tested by saving the tracking log after a
>> successful
>> >
>> > fio run and later restoring the saved log after reverting the virtual
>> machine.
>> >
>> > The log is deleted after being read in, so on abnormal termination no
>> stale
>> >
>> > checksums can be used. This option, the data file size and
>> verify_interval
>> >
>> > parameters should not change between jobs in the same run or on restart
>> of fio.
>> >
>> > Defaults to off. verify_track_dir defines the tracking log's directory.
>>
>> How does this interact with verify_state_*
>> (http://fio.readthedocs.io/en/latest/fio_doc.html#cmdoption-
>> arg-verify-state-load
>> ) ?
>>
>> --
>> Sitsofe | http://sucs.org/~sits/
>>
>
>

[-- Attachment #2: Type: text/html, Size: 12087 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Fio Checksum tracking and enhanced trim workloads
  2017-05-09  1:51       ` paul houlihan
@ 2017-05-10 18:38         ` Sitsofe Wheeler
  0 siblings, 0 replies; 8+ messages in thread
From: Sitsofe Wheeler @ 2017-05-10 18:38 UTC (permalink / raw)
  To: paul houlihan; +Cc: Jens Axboe, fio

For some reason Paul's emails don't seem to be going to the list -
Paul are you using a non-gmail SMTP server but using a gmail address?

On 9 May 2017 at 02:01, paul houlihan <phoulihan9@gmail.com> wrote:
> The second sentence was incorrect.
>
> It should read: I would like the changes at
> https://github.com/phoulihan9/fio/pull/1/commits  to be reviewed and
> considered for inclusion to fio.
>
> On Mon, May 8, 2017 at 4:05 PM, Sitsofe Wheeler <sitsofe@gmail.com> wrote:
>>
>> > -------- Forwarded Message --------
>> > Subject:        Fio Checksum tracking and enhanced trim workloads
>> > Date:   Sun, 7 May 2017 23:54:16 -0400
>> > From:   paul houlihan <phoulihan9@gmail.com>
>> > To:     fio@vger.kernel.org, Jens Axboe <axboe@kernel.dk>
>> >
>> > I have a submission for fio that enhances the data corruption detection
>> > and diagnosis capabilities taking fio from pretty good corruption detection
>> > to absolute guarantees. I would like these changes on the tracking
>> > branch???? to be reviewed and considered for inclusion to fio. A quick
>> > review would be helpful as I am losing access to test systems shortly.

On 9 May 2017 at 02:51, paul houlihan <phoulihan9@gmail.com> wrote:
> Let me know if the changes are not accessible. I am new to github and not
> sure I am doing it right. They are reviewable for me using the
> aforementioned link..

I've posted some comments up on Github. Are they visible to you? I
think the key thing is it would help if the commit was to be split up
into smaller commits...

> I can look at Sitsofe overlapping changes but I don't have a lot of time for
> major rework. What is here works fairly robustly, is quite useful and that
> is what I am offering. Although I realize that there is huge number of fio
> arguments and there might still be contradictions I don't use. I can rework
> it a bit if there are small issues to tackle.
>
> I have not spent a lot of time using bsrange. I am usually interested in
> stressing specific block sizes. I think it will all just work as long as no
> overlapping I/O results.

Sadly overlapping I/O often turns up with random block sizes.

>> Are you sure? I thought readwrite verifying workloads caused the reads
>> to be verifying if the block had been written in the same workload?
>
> Yes for read/write workload, all writes do result in a read verification.
> However that still leaves unverified those blocks that were only read and
> never written in this workload. That's not acceptable if your requirement is
> absolute data integrity guarantees. My goal was that all reads (and for that
> matter trims) in all workloads be verified including those blocks written by
> a prior fio job or even a prior fio run on a different virtual machine. The
> current fio verifies all read blocks only for read-only workloads where it
> assumes a prior job has initialized the blocks.

>> Many would find this useful. Perhaps it could be ported to C like
>> t/verify-state.c ?
> Yes I wish I had done it in C and there would be no extra dependency on
> python for troubleshooting. Also the C could use the definition in verify.h
> in case the header ever changes. However python is ubiquitous so it seems a
> minor dependency. I could consider this if the list of requested changes is
> doable.

OK.

>> How does this interact with verify_state_*
> Again something I have not stress. Bear in mind that if fio aborts on an
> error the verify log is not created as this can lead to false corruption
> detection. Note control-c is a normal exit and the tracking log is preserved
> in this case. However it should just work if the tracking log is preserved.
> I thought about tracking checksums within verify state but it was not easily
> extended to track all checksums for all blocks.

Fair enough.

-- 
Sitsofe | http://sucs.org/~sits/


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Fio Checksum tracking and enhanced trim workloads
@ 2017-05-08  1:49 paul houlihan
  0 siblings, 0 replies; 8+ messages in thread
From: paul houlihan @ 2017-05-08  1:49 UTC (permalink / raw)
  To: fio, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 26344 bytes --]

I have a submission for fio that enhances the data corruption detection and
diagnosis capabilities taking fio from pretty good corruption detection to
absolute guarantees. I would like the changes at
https://github.com/phoulihan9/fio/pull/1/commits to be reviewed and
considered for inclusion to fio. A quick review would be helpful as I am
losing access to test systems shortly.


These changes were used by a Virtual Machine caching company to assure data
integrity. Most testing was on Linux 64 bits and windows 32/64 bits. The
windows build still had an issue with compile time asserts in libfio.c that
I worked around by commenting out the asserts as this looked like a
performance restriction. This should be researched more. The initial
development was on version fio 2.2.10 sources and I just ported the changes
to fio latest sources and tested on linux but haven’t yet test on windows.
No testing on all other fio supported OSes was done, although the changes
are almost exclusively to OS independent code.



The absolute guarantees are brought about by tracking checksums to prevent
a stale but intact prior version of a block being returned and by verifying
all reads. I was surprised to learn about the number of times fio performed
concurrent I/O to the same blocks which yields indeterminate results that
prevent data integrity verification. Thus a number of options are not
supported when tracking is enabled.


Finally I have enhanced the usage of trims and am able to verify data
integrity of these operations in an integrated fashion.


Here is a list of changes in this submission:

• Bug where expected version of verify_interval is not generated correctly,
dummy io_u not setup correctly

• Bug where unknown header_interval referenced in HOWTO, fixed a bunch of
typos.

• Bug where windows hangs on nano sleep in windows 7.

• Also stonewall= option does not seem to work on windows 7, seems fixed in
later releases so painfully worked around this by having separate init and
run fio scripts. No change was made here but just mentioning this in
passing.

• Fixed bug where FD_IO logging was screwed up in io_c.h. Here is example
of logging problem:

• io       2212  io complete: io_u 0x787280:
off=1048576/len=2097152/ddir=0io       2212  /b.datio       2212

• io       2212  fill_io_u: io_u 0x787280: off=3145728/len=2097152/ddir=1io
      2212  /b.datio       2212

• io       2212  prep: io_u 0x787280: off=3145728/len=2097152/ddir=1io
  2212  /b.datio       2212

• io       2212  ->prep(0x787280)=0

• io       2212  queue: io_u 0x787280: off=3145728/len=2097152/ddir=1io
  2212  /b.datio       2212

• In order to make fio into an superb data integrity test tool, a number of
shortcomings were addressed. New verify_track switch enables in memory
tracking of checksums within each fio job, preventing a block from rolling
back to prior version. The in memory checksums can be written to a tracking
log file to provide an absolute checksum guarantees between fio jobs or
between fio runs. Verification of trim operations is supported in an
integrated fashion. See HOWTO description of verify_tracking.
verify_tacking_log, verify_tracking_required, verify_tracking_dir,
verify_trim_zero

• Enhanced description surrounding corruption added to HOWTO as well as
providing some corruption analyze tools.

• Bad header will dump received buffer into *.received before you gave you
an error message

• If verify_interval is less than the block size, fio will now always dump
the complete buffer in an additional file called *.complete. Seeing whole
buffer can reveal more about the corruption pattern.

• Changed the printing of the hex checksum to display in MSB to LSB order
to facilitate compares to memory dumps and debug logging

• Added a dump of the complete return buffer on trim write verification
failure.

• Debug logging was being truncated at the end of a job so you could not
see the full set of debug log messages, so added a log flush at the end of
each job if debug= switch is used.

• rw=readwrite seems to have independent last_pos read/write pointers as
you sequentially access the file. If the mix is 50/50 then you could have
fio reading and writing the same block as the read and write pointer cross
each other which is not reliably verifiable. This pattern result is chaos
and contradicts all the other sequential patterns and even randrw.
Overlapping I/O makes little sense and is usually a sign of a broken
application. Moreover readwrite workload would not complete a sequential
pass over the entire file which everyone I spoke to assumed it was doing.
So a change was made to the existing read/write workload functionality. Now
the max of the file’s last_pos pointers for DDIR_READ and DDIR_WRITE are
used for selecting the next offset as we sequentially scan a file. If the
old behavior is somehow useful then an option can be added to preserve it.
If preserved, it should never be the default and should disable
verification.


My changes revolve around maintaining the last_pos array in a special way.
When multiple operations (read/write/trim) are requested by a workload then
as the last position is changed, the changes are reflected in all three
entries in the array. This way a randomly selected next operation always
use the right last_pos. However we retained the old behavior for single
operation workloads and for trimwrite which operates like a single
operation workload.

• Synchronous Trim I/O completions were not updating bytes_issued in
backend.c and thus trimwrite was actually making 2 passes of the file.

• I kept the new verify_tracking verification entirely separate from the
experimental_verify code. These new tracking changes provides fully
persistent verification of trims integrated into standard verify, so we
might want to consider deprecating support for experimental_verify. Note
that verify_track and experimental_verify cannot both be enabled.

• With the wide adoption of thin LUN datastores and recent expanded OS
support for trim operations to reclaim unused space, testing trim
operations in a wide variety of contexts has been a necessity. Added some
new trim I/O workloads to the existing trim workloads, that require use of
verify_tracking option to verify:

• trim                Sequential trims

• readtrim         Sequential mixed reads and trims

• writetrim        Sequential mixed writes and trims.

•                       Each block will be trimmed or written.

• readwritetrim Sequential mixed reads/writes/trims

• randtrim         Random trims

• randreadtrim  Random mixed reads and trims

• randwritetrim Random mixed writes and trims

• randrwt          Random mixed reads/writes/trims

• A second change to existing fio functionality involves an inconsistency
of counting read verification bytes against the size= argument. Some rw=
workloads count read verification I/Os or bytes against size= values (like
readwrite and randrw) and some do not  like write, trim and trimwrite.
Counting read verifications bytes makes it hard to predict the number of
bytes or I/Os that will be performed in the readwrite workload and the new
rw= workloads increases the unpredictability with even more read
verifications in a readwritetrim workload. Normally I expect that fio
should process all the bytes in a file pass but when the bytes from read
verifies count towards the total bytes to process in size=, only part of
the file is processed. So I made it consistent for size and io_limit by not
counting read verify bytes. One could argue that number_os= could also be
similarly changed but I left this alone and it still uses raw I/O counts
which include read verification I/Os. Another justification is that
this_io_bytes never records verification reads for the dry_run and we need
dry_run and do_io to be in synch. Note this explains why I removed code to
add extra bytes to total_bytes in do_io for verify_backlog.

• Seems like the processing of TD_F_VER_NONE is backwards from its name. If
verify != VERIFY_NONE then the bit is set but the name implies it should be
clear. So now it sets the bit only if verify == VERIFY_NONE to avoid this
very confusing state.

• Added a sync and invalidate after the close in iolog.c ipo_special().
This is needed if you capture checksums in the tracking log and there is a
close followed immediately by an open. The close is not immediate if you
have iodepth set to a large number. The file is still marked “open” but
“closing” on return from the close  and will close only after the last I/O
completes. The sync avoids the assert on trying to open an already open
file which has a close pending.

• —read_iolog does not support trims at this time.

• io_u.c get_next_seq_offset() seems to suggest that ddir_seq_add can be
negative but there are a number of unhandled cases with such a setting. Add
TODOs to document issues. I have a number of reservations about the
correctness of get_next_seq_offset(). Note whenever I saw a possible
problem in the code but did not have time to research it, I added a TODO
comment.

• io_u.c get_next_seq_offset() has a problem when it uses fixed value when
relative values are what is being manipulated, so this code:

• if (pos >= f->real_file_size)

• pos = f->file_offset;

• should be:

• if (pos >= f->io_size)

• pos = 0;

• Given there are a couple of changes to existing fio workload behavior,
you might want to consider going to a V3.0.




Here are two new sections on Verification Tracking and Data Corruption
Troubleshooting from HOWTO:


Verification Tracking

---------------------


Absolute data integrity guarantee is the primary mission of a storage

software/hardware subsystem. Fio is good at detecting data corruption but

there are gaps. Currently only when rw option is set to read only are

workload reads verified. It is desirable to validate all reads in addition

to writes to protect against data rolling back to earlier versions.


With the addition of the block's offset to the header in recent fio
releases,

block data returned for another block will be flagged as corrupt. However

a limitation of the fio header and data embedded checksums is that fio
cannot

detect if a prior intact version of a block was returned on a read. If the

header and data checksum match the block is declared valid.


These limitations can be addressed by setting the verify_track option which

allocates a memory array to track the header and data checksums to assure

data integrity is absolute. The array starts out empty at the beginning of

each fio job and is filled in as reads or writes occur, once defined the

checksums from succeeding I/Os must all match. This option extends checksum

verification to all reads in all workloads, not just the read-only
workloads.


However use of verify_track requires that fio avoid overlapping, concurrent

reads and writes to the same block. Reading and writing a block at the same

time yields indeterminate results and making guaranteeing data integrity

impossible. So some fio options where this is a risk are disabled when using

verify_track. See verify_track argument for list of restrictions.


Even better verification would validate data more persistently. You would

like to track checksums persistently between fio jobs or between runs of fio

which could be after a shutdown/restart of the system or on a different
system

that shares storage. Proving seamless data integrity from the application

perspective over complex failover and recovery situations like reverting a

virtual machine to a prior snapshot is quite valuable.


Also the popularity of thin LUNs in the storage world has caused problems

if the unused disk space is not reclaimed by use of trims. So we would like

to have the ability to mix and match trims with reads and writes. The rw
option

now supports a full set of combinations and the rwtmix=read%,write%,trim%
option

allows specifying the mix percentages of all three types of I/O in one
argument.

However trims do have special requirements as documented under the rw
option.

Finally we would like to verify trims operations. If you read a trimmed
block

before re-writing the block, it should return a block of zeroes.


The verify_track_log option permits persistent checksum tracking and

verification of trims by enabling the saving of the tracking array to a
tracking

log on the close of a data file at the end of a fio job and reading it back
in

at the next start. A clean shutdown of fio is needed for tracking log to be

persistent. When no errors occur checksum context is automatically preserved

between fio jobs and fio runs. On revert of a virtual machine snapshot if

the tracking log is restored from the time of the snapshot then checksum

context is again preserved. There is a tracking log for each data file.


Tracking log filename format is: [dir] / [filename].tracking.log

where:

   filename - is name of file system file or block device name like “sdb”

   dir - is log directory that defaults to directory of data file.

         For block devices, dir defaults to the process current default

         directory.


The tracking log is plain text and contains data from when it was first
created:

the data file name it is tracking, the size of the data file, the starting

file offset for I/Os, its verify_interval option setting. From the last

save of the log it has: timestamp of last save and a checksum of the

tracking log contents. For checksums, Bit 0 = 1 defines a valid checksum.

Bit 0 = 0 signifies special case entries (dddddddc indicates a trimmed block

and 0 indicates an undefined entry).


Tracking Log Example with "--" comments added:


$ cat xxx.tracking.log

Fio-tracking-log-version: 1

DataFileName: xxx

DataFileSize: 2048

DataFileOffset: 0

DataFileVerifyInterval: 512

TrackingLogSaveTimestamp: 2017-02-23T14:25:32.446981

TrackingLogChecksum: cae34cd8

VerifyIntervalChecksums:

4028ab33    -- Checksums from read or write of 3 blocks, Bit 0 = 1

a450bffb

81858a3

dddddddc    -- Means trimmed block, Bit 0 = 0

0           -- Means undefined entry never been accessed, Bit 0 = 0

$


Tracking arguments are:


verify_track=bool - enables checksum tracking in memory

verify_track_log=bool - enable savings and restoring of tracking log

verify_track_required=bool - By default fio will create a log on the fly.

    If a log is found at the start it is read and then the log file is
deleted.

    If any error occurs during the fio run then the tracking log is not

    written on close so compromised logs do not cause false failures.
However

    testing requiring absolute data integrity guarantees will want to use
this

    option to require that the tracking log always be present between fio
jobs

    or at the start of a new fio run.

verify_track_dir=str - Specifies dir to place all tracking logs. It is
advisable

    when evaluating the data integrity of device to place the tracking log
on a

    different, more trusted device.

verify_track_trim_zero=bool - When no tracking array entry exists, this
option

    allows a zeroed block from prior fio run to be treated as previously
trimmed

    instead of as data corruption. Once the array entry for a block is
defined,

    this option is no longer used as the array entry determines the required

    verification.

debug=chksum - a new debug option allows tracing of all checksum entry

    additions/changes to the tracking array or entry use in verification


There are a couple considerations to be aware of when using tracking log.

Tracking log is sticky. If you change the following options that make

the tracking log no longer match the data layout then you will receive

a persistent error until the tracking log is recreated: size= or offset=

or verify_interval= options. You do get a friendly error indicating

what tracking log file to delete to start with a fresh tracking log. Note

if a fio run an fails with other errors, the tracking log is discarded so
that

stale checksums do not cause false failures on subsequent runs.


The tracking log uses 4 bytes for tracking each verify_interval block

in the data file or block device as specified by 4*(size/verify_interval).

So there are scaling implications for memory usage and log file size.

However blocks are only tracked for the active I/O range from:

offset - (offset+size-1).


The performance impact of the few extra I/Os to read and write the tracking
log

between fio jobs and fio runs is negligible since one is not usually
verifying

data when doing performance studies. There is no overhead when verify
tracking

is disabled and no extra I/Os when verify_track_log is disabled.



Data Corruption Troubleshooting

-------------------------------


When a corruption occurs immediate analysis can reveal many clues as to the

source of the corruption. Is the corruption persistent? In memory and on
disk?

The exact pattern of the corruption is often revealing: At the beginning of

an I/O block? Sector aligned? All zeroes or garbage? What is the exact range

of the corruption? Is corruption a stale but intact prior version of the

block?


When a corruption is detected, three possible corrupt data files are
created:


*.received - the corrupt data which is possibly a verify_interval block
within

              the full block used in the I/O.

*. complete - the full block used in the I/O

*. expected - if the block's header is intact, the expected data pattern for

              the *.received block can be generated


Two scripts exist in the analyze directory to assist in analysis:


corruption_triage.sh - a bash script that contains a sequence of diagnostic

              steps

fio_header.py - a python script that displays the contents of the block
header

              in a corrupt data file.




Here are the related parameter descriptions from HOWTO:


.. option:: verify_track=bool


Fio normally verifies data within a verify_interval with checksums and file

offsets embedded in the data. However a prior version of a block could be

returned and verified successfully. When verify_track is enabled the
checksum

for every verify_interval in the file is stored in a table and all read data

must match the checksums in the table. The tracking table is sized as

(size / verify_interval) * 4 bytes. For very large size= option settings,

such a large memory allocation may impact testing. Reads assume that the
entire

file has been previously written with a verification format using the same

verify_interval. When verify_track is enabled, all reads are verified,
whether

writes are present in the workload or not. Sharing files by threads within
a job

is supported but not between jobs running concurrently so use the stonewall

option when more than one non-global job is present. Verify of trimmed
blocks

is described for the verify_track_trim_zero option. When disabled, fio falls

back on verification described under the verify option. The restrictions
when

enabling the verify_track option are:

- randommap is required

- softrandommap is not supported

- lfsr random generator not supported when using multiple block sizes

- stonewall option required when more than one job present

- file size must be an even multiple of the block size when iodepth > 1

- verify_backlog not supported when iodepth > 1

- verify_async is not supported

- file sharing between concurrent jobs not supported

- numjobs must be 1

- io_submit_mode must be set to "inline"

- verify=null or pattern are not supported

- verify_only is not supported

- io_submit_mode must be set to 'inline'

- supplying a sequence number with rw option is not supported

- experimental_verify is not supported

Defaults to off.


You can enable verify_track for individual jobs and each job will start with

a empty table which is filled in as each block is initially read or written
and

enforced on subsequent reads within the job. For persistent tracking of
checksums

between jobs or fio runs, see verify_track_log.


.. option:: verify_track_log=bool


If set when verify_track is set then on a clean shutdown, fio writes the
checksum

for each data block that has been read or written to a log named

(datafilename).tracking.log. If set when fio reopens this data file and a
tracking

log exists then the checksums are read into the tracking table and used to
validate

every subsequent read. This allows rigorous validation of data integrity as
data

files are passed between fio jobs or over the termination of fio and
restart on

the same system or on another system or after an OS reboot. Reverting a
virtual

machine to a snapshot can be tested by saving the tracking log after a
successful

fio run and later restoring the saved log after reverting the virtual
machine.

The log is deleted after being read in, so on abnormal termination no stale

checksums can be used. This option, the data file size and verify_interval

parameters should not change between jobs in the same run or on restart of
fio.

Defaults to off. verify_track_dir defines the tracking log's directory.


.. option:: verify_track_required=bool


If set when verify_track_log is set then the tracking log for each file
must exist

at the start of a fio job or an error is returned. Defaults to off which is

the case for the first job in a new fio run. Subsequent jobs in this run can

require use of the tracking log. If set to off then any tracking log found
will be

used otherwise an empty tracking table is used. If a prior fio run created a

tracking log for the data file then all jobs can require use of the
tracking log.


.. option:: verify_track_dir=str


If verify_track_log is set then this defines the single directory for all
tracking

logs. The default is to use the same directory where each data file resides.

When filename points to a block device or pipe then the directory defaults
to the

current process default directory. To assure data integrity of the tracking
log,

each tracking log also contains its own checksum. However when checking a
device

for data integrity it is advisable to place tracking logs containing
checksums on

a different, more trusted device.


.. option:: verify_track_trim_zero=bool


Typically a read of a trimmed block that has not been re-written will
return a block

of zeros. If set with verify_tracking enabled then all zeroed blocks with
no tracking

information are assumed to have resulted from a trim. If clear zeroed
blocks are

treated as corruption. If your device does not return zeroed blocks for
reads after

a trim then it cannot participate in tracking verification. Fio sets to 1
if trims

are present in the rw argument and defaults 0 otherwise. You would only use
this when

verify_tracking is enabled, trims are not specified in the rw argument and
a prior

fio job or run had performed trims.


.. option:: readwrite=str, rw=str


Type of I/O pattern. Accepted values are:


**read**

Sequential reads.

**write**

Sequential writes.

**randwrite**

Random writes.

**randread**

Random reads.

**rw,readwrite**

Sequential mixed reads or writes.

**randrw**

Random mixed reads or writes.


Trim I/O has several requirements:

- File system and OS support varies but Linux block devices

  accept trims. You need privilege to write to a Linux block device.

  See example fio: track-mem.fio

- Often minimal block size required. Linux on VMware requires

  at least 1 MB in size aligned on 1 MB boundary

- VMware requires minimum VM OS hardware level 11

- To verify the trim I/Os requires verify_track


Trim I/O patterns are:


**trim**

Sequential trims

**readtrim**

Sequential mixed reads or trims

**trimwrite**

Sequential mixed trim then write. Each block

will be trimmed first, then written to.

**writetrim**

Sequential mixed writes or trims.

Each block will be trimmed or written.

**rwt,readwritetrim**

Sequential mixed reads/writes/trims

**randtrim**

Random trims

**randreadtrim**

Random mixed reads or trims

**randwritetrim**

Random mixed writes or trims

**randrwt**

Random mixed reads/writes/trims


Fio defaults to read if the option is not specified.  For the mixed I/O

types, the default is to split them 50/50.  For certain types of I/O the

result may still be skewed a bit, since the speed may be different. It is

possible to specify a number of I/O's to do before getting a new offset,

this is done by appending a ``:[nr]`` to the end of the string given.  For a

random read, it would look like ``rw=randread:8`` for passing in an offset

modifier with a value of 8. If the suffix is used with a sequential I/O

pattern, then the value specified will be added to the generated offset for

each I/O.  For instance, using ``rw=write:4k`` will skip 4k for every

write. It turns sequential I/O into sequential I/O with holes.  See the

:option:`rw_sequencer` option. Storage array vendors often require

trims to use a minimum block size.


.. option:: rwtmix=int[,int][,int]


When trims along with reads and/or writes are specified in the rw option
then

this is the preferred argument for specifying mix percentages. The argument
is

of the form: read,write,trim and the percentages must total 100.  Note any

argument may be empty to leave that value at its default from rwmix*
arguments

of 50,50,0. If a trailing comma isn't given, the remainder will inherit

the last value set.

[-- Attachment #2: Type: text/html, Size: 59903 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Fio Checksum tracking and enhanced trim workloads
@ 2017-05-08  1:26 paul houlihan
  0 siblings, 0 replies; 8+ messages in thread
From: paul houlihan @ 2017-05-08  1:26 UTC (permalink / raw)
  To: fio, Jens Axboe

[-- Attachment #1: Type: text/plain, Size: 26330 bytes --]

I have a submission for fio that enhances the data corruption detection and
diagnosis capabilities taking fio from pretty good corruption detection to
absolute guarantees. I would like the changes at
https://github.com/phoulihan9/fio/pull/1/commits to be reviewed and
considered for inclusion to fio. A quick review would be helpful as I am
losing access to test systems shortly.

These changes were used by a Virtual Machine caching company to assure data
integrity. Most testing was on Linux 64 bits and windows 32/64 bits. The
windows build still had an issue with compile time asserts in libfio.c that
I worked around by commenting out the asserts as this looked like a
performance restriction. This should be researched more. The initial
development was on version fio 2.2.10 sources and I just ported the changes
to fio latest sources and tested on linux but haven’t yet test on windows.
No testing on all other fio supported OSes was done, although the changes
are almost exclusively to OS independent code.

The absolute guarantees are brought about by tracking checksums to prevent
a stale but intact prior version of a block being returned and by verifying
all reads. I was surprised to learn about the number of times fio performed
concurrent I/O to the same blocks which yields indeterminate results that
prevent data integrity verification. Thus a number of options are not
supported when tracking is enabled.

Finally I have enhanced the usage of trims and am able to verify data
integrity of these operations in an integrated fashion.


Here is a list of changes in this submission:

• Bug where expected version of verify_interval is not generated correctly,
dummy io_u not setup correctly

• Bug where unknown header_interval referenced in HOWTO, fixed a bunch of
typos.

• Bug where windows hangs on nano sleep in windows 7.

• Also stonewall= option does not seem to work on windows 7, seems fixed in
later releases so painfully worked around this by having separate init and
run fio scripts. No change was made here but just mentioning this in
passing.

• Fixed bug where FD_IO logging was screwed up in io_c.h. Here is example
of logging problem:

• io       2212  io complete: io_u 0x787280:
off=1048576/len=2097152/ddir=0io       2212  /b.datio       2212

• io       2212  fill_io_u: io_u 0x787280: off=3145728/len=2097152/ddir=1io
      2212  /b.datio       2212

• io       2212  prep: io_u 0x787280: off=3145728/len=2097152/ddir=1io
  2212  /b.datio       2212

• io       2212  ->prep(0x787280)=0

• io       2212  queue: io_u 0x787280: off=3145728/len=2097152/ddir=1io
  2212  /b.datio       2212

• In order to make fio into an superb data integrity test tool, a number of
shortcomings were addressed. New verify_track switch enables in memory
tracking of checksums within each fio job, preventing a block from rolling
back to prior version. The in memory checksums can be written to a tracking
log file to provide an absolute checksum guarantees between fio jobs or
between fio runs. Verification of trim operations is supported in an
integrated fashion. See HOWTO description of verify_tracking.
verify_tacking_log, verify_tracking_required, verify_tracking_dir,
verify_trim_zero

• Enhanced description surrounding corruption added to HOWTO as well as
providing some corruption analyze tools.

• Bad header will dump received buffer into *.received before you gave you
an error message

• If verify_interval is less than the block size, fio will now always dump
the complete buffer in an additional file called *.complete. Seeing whole
buffer can reveal more about the corruption pattern.

• Changed the printing of the hex checksum to display in MSB to LSB order
to facilitate compares to memory dumps and debug logging

• Added a dump of the complete return buffer on trim write verification
failure.

• Debug logging was being truncated at the end of a job so you could not
see the full set of debug log messages, so added a log flush at the end of
each job if debug= switch is used.

• rw=readwrite seems to have independent last_pos read/write pointers as
you sequentially access the file. If the mix is 50/50 then you could have
fio reading and writing the same block as the read and write pointer cross
each other which is not reliably verifiable. This pattern result is chaos
and contradicts all the other sequential patterns and even randrw.
Overlapping I/O makes little sense and is usually a sign of a broken
application. Moreover readwrite workload would not complete a sequential
pass over the entire file which everyone I spoke to assumed it was doing.
So a change was made to the existing read/write workload functionality. Now
the max of the file’s last_pos pointers for DDIR_READ and DDIR_WRITE are
used for selecting the next offset as we sequentially scan a file. If the
old behavior is somehow useful then an option can be added to preserve it.
If preserved, it should never be the default and should disable
verification.


My changes revolve around maintaining the last_pos array in a special way.
When multiple operations (read/write/trim) are requested by a workload then
as the last position is changed, the changes are reflected in all three
entries in the array. This way a randomly selected next operation always
use the right last_pos. However we retained the old behavior for single
operation workloads and for trimwrite which operates like a single
operation workload.

• Synchronous Trim I/O completions were not updating bytes_issued in
backend.c and thus trimwrite was actually making 2 passes of the file.

• I kept the new verify_tracking verification entirely separate from the
experimental_verify code. These new tracking changes provides fully
persistent verification of trims integrated into standard verify, so we
might want to consider deprecating support for experimental_verify. Note
that verify_track and experimental_verify cannot both be enabled.

• With the wide adoption of thin LUN datastores and recent expanded OS
support for trim operations to reclaim unused space, testing trim
operations in a wide variety of contexts has been a necessity. Added some
new trim I/O workloads to the existing trim workloads, that require use of
verify_tracking option to verify:

• trim                Sequential trims

• readtrim         Sequential mixed reads and trims

• writetrim        Sequential mixed writes and trims.

•                       Each block will be trimmed or written.

• readwritetrim Sequential mixed reads/writes/trims

• randtrim         Random trims

• randreadtrim  Random mixed reads and trims

• randwritetrim Random mixed writes and trims

• randrwt          Random mixed reads/writes/trims

• A second change to existing fio functionality involves an inconsistency
of counting read verification bytes against the size= argument. Some rw=
workloads count read verification I/Os or bytes against size= values (like
readwrite and randrw) and some do not  like write, trim and trimwrite.
Counting read verifications bytes makes it hard to predict the number of
bytes or I/Os that will be performed in the readwrite workload and the new
rw= workloads increases the unpredictability with even more read
verifications in a readwritetrim workload. Normally I expect that fio
should process all the bytes in a file pass but when the bytes from read
verifies count towards the total bytes to process in size=, only part of
the file is processed. So I made it consistent for size and io_limit by not
counting read verify bytes. One could argue that number_os= could also be
similarly changed but I left this alone and it still uses raw I/O counts
which include read verification I/Os. Another justification is that
this_io_bytes never records verification reads for the dry_run and we need
dry_run and do_io to be in synch. Note this explains why I removed code to
add extra bytes to total_bytes in do_io for verify_backlog.

• Seems like the processing of TD_F_VER_NONE is backwards from its name. If
verify != VERIFY_NONE then the bit is set but the name implies it should be
clear. So now it sets the bit only if verify == VERIFY_NONE to avoid this
very confusing state.

• Added a sync and invalidate after the close in iolog.c ipo_special().
This is needed if you capture checksums in the tracking log and there is a
close followed immediately by an open. The close is not immediate if you
have iodepth set to a large number. The file is still marked “open” but
“closing” on return from the close  and will close only after the last I/O
completes. The sync avoids the assert on trying to open an already open
file which has a close pending.

• —read_iolog does not support trims at this time.

• io_u.c get_next_seq_offset() seems to suggest that ddir_seq_add can be
negative but there are a number of unhandled cases with such a setting. Add
TODOs to document issues. I have a number of reservations about the
correctness of get_next_seq_offset(). Note whenever I saw a possible
problem in the code but did not have time to research it, I added a TODO
comment.

• io_u.c get_next_seq_offset() has a problem when it uses fixed value when
relative values are what is being manipulated, so this code:

• if (pos >= f->real_file_size)

• pos = f->file_offset;

• should be:

• if (pos >= f->io_size)

• pos = 0;

• Given there are a couple of changes to existing fio workload behavior,
you might want to consider going to a V3.0.




Here are two new sections on Verification Tracking and Data Corruption
Troubleshooting from HOWTO:


Verification Tracking

---------------------


Absolute data integrity guarantee is the primary mission of a storage

software/hardware subsystem. Fio is good at detecting data corruption but

there are gaps. Currently only when rw option is set to read only are

workload reads verified. It is desirable to validate all reads in addition

to writes to protect against data rolling back to earlier versions.


With the addition of the block's offset to the header in recent fio
releases,

block data returned for another block will be flagged as corrupt. However

a limitation of the fio header and data embedded checksums is that fio
cannot

detect if a prior intact version of a block was returned on a read. If the

header and data checksum match the block is declared valid.


These limitations can be addressed by setting the verify_track option which

allocates a memory array to track the header and data checksums to assure

data integrity is absolute. The array starts out empty at the beginning of

each fio job and is filled in as reads or writes occur, once defined the

checksums from succeeding I/Os must all match. This option extends checksum

verification to all reads in all workloads, not just the read-only
workloads.


However use of verify_track requires that fio avoid overlapping, concurrent

reads and writes to the same block. Reading and writing a block at the same

time yields indeterminate results and making guaranteeing data integrity

impossible. So some fio options where this is a risk are disabled when using

verify_track. See verify_track argument for list of restrictions.


Even better verification would validate data more persistently. You would

like to track checksums persistently between fio jobs or between runs of fio

which could be after a shutdown/restart of the system or on a different
system

that shares storage. Proving seamless data integrity from the application

perspective over complex failover and recovery situations like reverting a

virtual machine to a prior snapshot is quite valuable.


Also the popularity of thin LUNs in the storage world has caused problems

if the unused disk space is not reclaimed by use of trims. So we would like

to have the ability to mix and match trims with reads and writes. The rw
option

now supports a full set of combinations and the rwtmix=read%,write%,trim%
option

allows specifying the mix percentages of all three types of I/O in one
argument.

However trims do have special requirements as documented under the rw
option.

Finally we would like to verify trims operations. If you read a trimmed
block

before re-writing the block, it should return a block of zeroes.


The verify_track_log option permits persistent checksum tracking and

verification of trims by enabling the saving of the tracking array to a
tracking

log on the close of a data file at the end of a fio job and reading it back
in

at the next start. A clean shutdown of fio is needed for tracking log to be

persistent. When no errors occur checksum context is automatically preserved

between fio jobs and fio runs. On revert of a virtual machine snapshot if

the tracking log is restored from the time of the snapshot then checksum

context is again preserved. There is a tracking log for each data file.


Tracking log filename format is: <dir>/<filename>.tracking.log

where:

   filename - is name of file system file or block device name like “sdb”

   dir - is log directory that defaults to directory of data file.

         For block devices, dir defaults to the process current default

         directory.


The tracking log is plain text and contains data from when it was first
created:

the data file name it is tracking, the size of the data file, the starting

file offset for I/Os, its verify_interval option setting. From the last

save of the log it has: timestamp of last save and a checksum of the

tracking log contents. For checksums, Bit 0 = 1 defines a valid checksum.

Bit 0 = 0 signifies special case entries (dddddddc indicates a trimmed block

and 0 indicates an undefined entry).


Tracking Log Example with "--" comments added:


$ cat xxx.tracking.log

Fio-tracking-log-version: 1

DataFileName: xxx

DataFileSize: 2048

DataFileOffset: 0

DataFileVerifyInterval: 512

TrackingLogSaveTimestamp: 2017-02-23T14:25:32.446981

TrackingLogChecksum: cae34cd8

VerifyIntervalChecksums:

4028ab33    -- Checksums from read or write of 3 blocks, Bit 0 = 1

a450bffb

81858a3

dddddddc    -- Means trimmed block, Bit 0 = 0

0           -- Means undefined entry never been accessed, Bit 0 = 0

$


Tracking arguments are:


verify_track=bool - enables checksum tracking in memory

verify_track_log=bool - enable savings and restoring of tracking log

verify_track_required=bool - By default fio will create a log on the fly.

    If a log is found at the start it is read and then the log file is
deleted.

    If any error occurs during the fio run then the tracking log is not

    written on close so compromised logs do not cause false failures.
However

    testing requiring absolute data integrity guarantees will want to use
this

    option to require that the tracking log always be present between fio
jobs

    or at the start of a new fio run.

verify_track_dir=str - Specifies dir to place all tracking logs. It is
advisable

    when evaluating the data integrity of device to place the tracking log
on a

    different, more trusted device.

verify_track_trim_zero=bool - When no tracking array entry exists, this
option

    allows a zeroed block from prior fio run to be treated as previously
trimmed

    instead of as data corruption. Once the array entry for a block is
defined,

    this option is no longer used as the array entry determines the required

    verification.

debug=chksum - a new debug option allows tracing of all checksum entry

    additions/changes to the tracking array or entry use in verification


There are a couple considerations to be aware of when using tracking log.

Tracking log is sticky. If you change the following options that make

the tracking log no longer match the data layout then you will receive

a persistent error until the tracking log is recreated: size= or offset=

or verify_interval= options. You do get a friendly error indicating

what tracking log file to delete to start with a fresh tracking log. Note

if a fio run an fails with other errors, the tracking log is discarded so
that

stale checksums do not cause false failures on subsequent runs.


The tracking log uses 4 bytes for tracking each verify_interval block

in the data file or block device as specified by 4*(size/verify_interval).

So there are scaling implications for memory usage and log file size.

However blocks are only tracked for the active I/O range from:

offset-<offset+size-1>.


The performance impact of the few extra I/Os to read and write the tracking
log

between fio jobs and fio runs is negligible since one is not usually
verifying

data when doing performance studies. There is no overhead when verify
tracking

is disabled and no extra I/Os when verify_track_log is disabled.



Data Corruption Troubleshooting

-------------------------------

When a corruption occurs immediate analysis can reveal many clues as to the

source of the corruption. Is the corruption persistent? In memory and on
disk?

The exact pattern of the corruption is often revealing: At the beginning of

an I/O block? Sector aligned? All zeroes or garbage? What is the exact range

of the corruption? Is corruption a stale but intact prior version of the

block?


When a corruption is detected, three possible corrupt data files are
created:


*.received - the corrupt data which is possibly a verify_interval block
within

              the full block used in the I/O.

*. complete - the full block used in the I/O

*. expected - if the block's header is intact, the expected data pattern for

              the *.received block can be generated


Two scripts exist in the analyze directory to assist in analysis:


corruption_triage.sh - a bash script that contains a sequence of diagnostic

              steps

fio_header.py - a python script that displays the contents of the block
header

              in a corrupt data file.




Here are the related parameter descriptions from HOWTO:


.. option:: verify_track=bool


Fio normally verifies data within a verify_interval with checksums and file

offsets embedded in the data. However a prior version of a block could be

returned and verified successfully. When verify_track is enabled the
checksum

for every verify_interval in the file is stored in a table and all read data

must match the checksums in the table. The tracking table is sized as

(size / verify_interval) * 4 bytes. For very large size= option settings,

such a large memory allocation may impact testing. Reads assume that the
entire

file has been previously written with a verification format using the same

verify_interval. When verify_track is enabled, all reads are verified,
whether

writes are present in the workload or not. Sharing files by threads within
a job

is supported but not between jobs running concurrently so use the stonewall

option when more than one non-global job is present. Verify of trimmed
blocks

is described for the verify_track_trim_zero option. When disabled, fio falls

back on verification described under the verify option. The restrictions
when

enabling the verify_track option are:

- randommap is required

- softrandommap is not supported

- lfsr random generator not supported when using multiple block sizes

- stonewall option required when more than one job present

- file size must be an even multiple of the block size when iodepth > 1

- verify_backlog not supported when iodepth > 1

- verify_async is not supported

- file sharing between concurrent jobs not supported

- numjobs must be 1

- io_submit_mode must be set to "inline"

- verify=null or pattern are not supported

- verify_only is not supported

- io_submit_mode must be set to 'inline'

- supplying a sequence number with rw option is not supported

- experimental_verify is not supported

Defaults to off.


You can enable verify_track for individual jobs and each job will start with

a empty table which is filled in as each block is initially read or written
and

enforced on subsequent reads within the job. For persistent tracking of
checksums

between jobs or fio runs, see verify_track_log.


.. option:: verify_track_log=bool


If set when verify_track is set then on a clean shutdown, fio writes the
checksum

for each data block that has been read or written to a log named

<datafilename>.tracking.log. If set when fio reopens this data file and a
tracking

log exists then the checksums are read into the tracking table and used to
validate

every subsequent read. This allows rigorous validation of data integrity as
data

files are passed between fio jobs or over the termination of fio and
restart on

the same system or on another system or after an OS reboot. Reverting a
virtual

machine to a snapshot can be tested by saving the tracking log after a
successful

fio run and later restoring the saved log after reverting the virtual
machine.

The log is deleted after being read in, so on abnormal termination no stale

checksums can be used. This option, the data file size and verify_interval

parameters should not change between jobs in the same run or on restart of
fio.

Defaults to off. verify_track_dir defines the tracking log's directory.


.. option:: verify_track_required=bool


If set when verify_track_log is set then the tracking log for each file
must exist

at the start of a fio job or an error is returned. Defaults to off which is

the case for the first job in a new fio run. Subsequent jobs in this run can

require use of the tracking log. If set to off then any tracking log found
will be

used otherwise an empty tracking table is used. If a prior fio run created a

tracking log for the data file then all jobs can require use of the
tracking log.


.. option:: verify_track_dir=str


If verify_track_log is set then this defines the single directory for all
tracking

logs. The default is to use the same directory where each data file resides.

When filename points to a block device or pipe then the directory defaults
to the

current process default directory. To assure data integrity of the tracking
log,

each tracking log also contains its own checksum. However when checking a
device

for data integrity it is advisable to place tracking logs containing
checksums on

a different, more trusted device.


.. option:: verify_track_trim_zero=bool


Typically a read of a trimmed block that has not been re-written will
return a block

of zeros. If set with verify_tracking enabled then all zeroed blocks with
no tracking

information are assumed to have resulted from a trim. If clear zeroed
blocks are

treated as corruption. If your device does not return zeroed blocks for
reads after

a trim then it cannot participate in tracking verification. Fio sets to 1
if trims

are present in the rw argument and defaults 0 otherwise. You would only use
this when

verify_tracking is enabled, trims are not specified in the rw argument and
a prior

fio job or run had performed trims.


.. option:: readwrite=str, rw=str


Type of I/O pattern. Accepted values are:


**read**

Sequential reads.

**write**

Sequential writes.

**randwrite**

Random writes.

**randread**

Random reads.

**rw,readwrite**

Sequential mixed reads or writes.

**randrw**

Random mixed reads or writes.


Trim I/O has several requirements:

- File system and OS support varies but Linux block devices

  accept trims. You need privilege to write to a Linux block device.

  See example fio: track-mem.fio

- Often minimal block size required. Linux on VMware requires

  at least 1 MB in size aligned on 1 MB boundary

- VMware requires minimum VM OS hardware level 11

- To verify the trim I/Os requires verify_track


Trim I/O patterns are:


**trim**

Sequential trims

**readtrim**

Sequential mixed reads or trims

**trimwrite**

Sequential mixed trim then write. Each block

will be trimmed first, then written to.

**writetrim**

Sequential mixed writes or trims.

Each block will be trimmed or written.

**rwt,readwritetrim**

Sequential mixed reads/writes/trims

**randtrim**

Random trims

**randreadtrim**

Random mixed reads or trims

**randwritetrim**

Random mixed writes or trims

**randrwt**

Random mixed reads/writes/trims


Fio defaults to read if the option is not specified.  For the mixed I/O

types, the default is to split them 50/50.  For certain types of I/O the

result may still be skewed a bit, since the speed may be different. It is

possible to specify a number of I/O's to do before getting a new offset,

this is done by appending a ``:<nr>`` to the end of the string given.  For a

random read, it would look like ``rw=randread:8`` for passing in an offset

modifier with a value of 8. If the suffix is used with a sequential I/O

pattern, then the value specified will be added to the generated offset for

each I/O.  For instance, using ``rw=write:4k`` will skip 4k for every

write. It turns sequential I/O into sequential I/O with holes.  See the

:option:`rw_sequencer` option. Storage array vendors often require

trims to use a minimum block size.


.. option:: rwtmix=int[,int][,int]


When trims along with reads and/or writes are specified in the rw option
then

this is the preferred argument for specifying mix percentages. The argument
is

of the form: read,write,trim and the percentages must total 100.  Note any

argument may be empty to leave that value at its default from rwmix*
arguments

of 50,50,0. If a trailing comma isn't given, the remainder will inherit

the last value set.

[-- Attachment #2: Type: text/html, Size: 59752 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-05-10 18:38 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-08  3:54 Fio Checksum tracking and enhanced trim workloads paul houlihan
2017-05-08 14:18 ` Fwd: " Jens Axboe
2017-05-08 20:05   ` Sitsofe Wheeler
2017-05-09  1:01     ` paul houlihan
2017-05-09  1:51       ` paul houlihan
2017-05-10 18:38         ` Sitsofe Wheeler
  -- strict thread matches above, loose matches on Subject: below --
2017-05-08  1:49 paul houlihan
2017-05-08  1:26 paul houlihan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.