On 04/18/2018 09:25 AM, Vladimir Sementsov-Ogievskiy wrote:
>>> Do your code with
>>>
>>>      /* Found an extent, and we're inside it.  */
>>>      *next = f.fe.fe_logical + f.fe.fe_length;
>>>      if (f.fe.fe_flags & FIEMAP_EXTENT_UNWRITTEN) {
>>>          return BDRV_BLOCK_DATA|BDRV_BLOCK_ZERO;
>>>      } else {
>>>          return BDRV_BLOCK_DATA;
>>>      }
>>>
>>> provide safe block_status based on FIEMAP without FLAG_SYNC?
>> No, in fact we found data corruption with FIEMAP.
> 
> How to reproduce it? I've tried your code, looks like it shows all
> "data" regions even if I didn't call "sync".
> 

There's no easy way to reproduce unsafe data races reliably; but FIEMAP
without sync is such an unsafe data race (most of the time, you will get
the answer you expect, but under the right conditions, FIEMAP may report
the area as unallocated even though you have already called write(); if
you treat that unallocated region as BDRV_BLOCK_ZERO, rather than
read()ing it, you have corrupted data).  That's because FIEMAP only
reports what the disk has allocated, but file systems can have delayed
allocations where contents in the kernel cache are NOT yet flushed to
disk unless you use sync; but using sync kills performance.

If you want examples of FIEMAP corrupting data, look at the coreutils
archive from several years ago, where FIEMAP without sync caused
corruptions during cp. A quick search found at least this example:
https://lists.gnu.org/archive/html/bug-coreutils/2011-04/msg00023.html

For more details, see qemu commits c4875e5b and 38c4d0a, and discussion
at https://lists.gnu.org/archive/html/qemu-devel/2014-09/msg04921.html

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org