fio.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Adding stale-data detection to verify logic
@ 2023-02-03 19:08 Adam Horshack
  0 siblings, 0 replies; only message in thread
From: Adam Horshack @ 2023-02-03 19:08 UTC (permalink / raw)
  To: fio

As written the verify function doesn't appear to have logic for 
detecting dropped-writes (stale data) for data at rest. There are only 
two temporally-variant fields presently utilized in the verify pattern:

verify_header.rand_seed
verify_header.numberio

These fields are verified during read+write invocations but not for 
read-only invocations. This means any dropped data for the most recent 
write to a given block won't be detected because all the non-temporally 
variant fields will pass verification. This is particularly problematic 
when reusing a device for separate fio invocations during a series of 
tests, as there will be valid but stale data at rest from previous 
invocations.

For example, if a user does the following after previous fio invocations:

1) Performs a write workload, without verify. When complete, runs a 
subsequent invocation with a read/verify-only workload against the same 
dataset.

2) Performs a write workload and use a trigger to perform a 
power-interruption test. Run a subsequent invocation with a 
read/verify-only workload, using verify_state_load=1.

It could be argued the onus is on the user to wipe data before every 
invocation but I'm not sure that's reasonable.

I'd like to implement an invocation-variant check that will catch the 
case of any data at rest stale relative to previous invocations. There 
would be an invocation-unique identifier, either passed via a 
command-line option or generated randomly. It would be added to 
verify_header and checked during all verify-reads. To support its use 
for subsequent read-only invocations it would be added to the 
verify_state file and used whenever verify_state_load=1. It would also 
be utilized when the identifier is specified on the command line.

An alternative would be to use the existing verify_header.time_sec field 
and check for any blocks older than the start time of the most recent 
invocation time that we'd encode in the state file. This would make a 
command-line option for specifying the time a little more cumbersome 
than an opaque identifier.

Note this wont catch missed multiple writes within a given invocation as 
that would require a block-specific sidecar map that tracks write counts 
per block (or stores a subset of the hash for the most recent write for 
each block). I've implemented such a feature in a proprietary tool and 
would consider it for fio if there's interest. The downside is the 
creation and dependency of a large side-car file. The upside is it would 
add verification support for sparsely-random workloads.

Code references for the temporal-variant field not being used for 
read-only workloads:

verify_io_u() forces the seeds to match the header's seed when !td_rw():

/*
  * Make rand_seed check pass when have verify_backlog or
  * zone reset frequency for zonemode=zbd.
  */
if (!td_rw(td) || (td->flags & TD_F_VER_BACKLOG) ||
     td->o.zrf.u.f)
     io_u->rand_seed = hdr->rand_seed;

verify_header() bypasses numberio check for read-only invocations:

/*
  * For read-only workloads, the program cannot be certain of the
  * last numberio written to a block. Checking of numberio will be
  * done only for workloads that write data.  For verify_only,
  * numberio check is skipped.
  */
if (td_write(td) && (td_min_bs(td) == td_max_bs(td)) &&
     !td->o.time_based)
     if (!td->o.verify_only)
         if (hdr->numberio != io_u->numberio) {
             log_err("verify: bad header numberio %"PRIu16
                 ", wanted %"PRIu16,
                 hdr->numberio, io_u->numberio);
             goto err;
         }

Adam (horshack@live.com)

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2023-02-03 19:08 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-03 19:08 Adding stale-data detection to verify logic Adam Horshack

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).