All of lore.kernel.org
 help / color / mirror / Atom feed
From: Dave Chinner <dchinner@redhat.com>
To: Niels de Vos <ndevos@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>, Fam Zheng <famz@redhat.com>,
	Eric Blake <eblake@redhat.com>, Kevin Wolf <kwolf@redhat.com>,
	qemu-block@nongnu.org, qemu-devel@nongnu.org,
	Max Reitz <mreitz@redhat.com>,
	Lukas Czerner <lczerner@redhat.com>,
	P@draigBrady.com
Subject: Re: [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server)
Date: Thu, 21 Jul 2016 21:43:42 +1000	[thread overview]
Message-ID: <20160721114342.GQ2031@devil.localdomain> (raw)
In-Reply-To: <20160720133517.GE23632@ndevos-x240.usersys.redhat.com>

On Wed, Jul 20, 2016 at 03:35:17PM +0200, Niels de Vos wrote:
> On Wed, Jul 20, 2016 at 10:30:25PM +1000, Dave Chinner wrote:
> > On Wed, Jul 20, 2016 at 05:19:37AM -0400, Paolo Bonzini wrote:
> > > Adding ext4 and XFS guys (Lukas and Dave respectively).  As a quick recap, the
> > > issue here is the semantics of FIEMAP and SEEK_HOLE/SEEK_DATA, which we use in
> > > "qemu-img map".  This command prints metadata about a virtual disk image---which
> > > in the case of a raw image amounts to detecting holes and unwritten extents.
> > > 
> > > First, it seems like SEEK_HOLE and SEEK_DATA really should be "SEEK_NONZERO" and
> > > "SEEK_ZERO", on both ext4 and XFS.    You can see that unwritten extents are
> > > reported by "qemu-img map" as holes:
> > 
> > Correctly so. seek hole/data knows nothing about the underlying
> > filesystem storage implementation. A "hole" is defined as a region
> > of zeros, and the filesystem is free call anything it kows for
> > certain contains zeros as a hole.
> > 
> > FYI, SEEK_HOLE/DATA were named after the the pre-existing Solaris
> > seek API that uses these definitions for hole and data....
> > 
> > >     $ dd if=/dev/urandom of=test.img bs=1M count=100
> > >     $ fallocate -z -o 10M -l 10M test.img
> > >     $ du -h test.img
> > >     $ qemu-img map --output=json test.img
> > >     [{ "start": 0, "length": 10485760, "depth": 0, "zero": false, "data": true, "offset": 0},
> > >     { "start": 10485760, "length": 10485760, "depth": 0, "zero": true, "data": false, "offset": 10485760},
> > >     { "start": 20971520, "length": 83886080, "depth": 0, "zero": false, "data": true, "offset": 20971520}]
> > 
> > > 
> > > On the second line, zero=true data=false identifies a hole.  The right output
> > > would either have zero=true data=true (unwritten extent) or just
> > > 
> > >     [{ "start": 0, "length": 104857600, "depth": 0, "zero": false, "data": true, "offset": 0},
> > >
> > > since the zero flag is advisory (it doesn't try to detect zeroes beyond what the
> > > filesystem says).
> > 
> > Not from SEEK_HOLE/SEEK_DATA. All it conveys is "this is the start
> > of a range of zeros" and "this is the start of a range of data". And
> > for filesystems that don't specifically implement these seek
> > operations, zeros are considered data. i.e. SEEK_HOLE will take you
> > to the end of file, SEEK_DATA returns the current position....
> > 
> > i.e. unwritten extents contain no data, so they are semantically
> > identical to holes for the purposes of seeking and hence SEEK_DATA
> > can skip over them.
> > 
> > > The reason why we disabled FIEMAP was a combination of a corruption and performance
> > > issue.  The data corruption bug was at https://bugs.launchpad.net/qemu/+bug/1368815
> > > and it was reported on Ubuntu Trusty (kernel 3.13 based on the release notes at
> > > https://wiki.ubuntu.com/TrustyTahr/ReleaseNotes).  We corrected that by using
> > > FIEMAP_FLAG_SYNC, based on a similar patch to coreutils.  This turned out to be too
> > > slow, so we dropped FIEMAP altogether.
> > 
> > Yes, because FIEMAP output is only useful for diagnostic purposes as
> > it can be stale even before the syscall returns to userspace. i.e.
> > it can't be used in userspace for optimising file copies....
> 
> Oh... And I was surprised to learn that "cp" does use FIEMAP and not
> SEEK_HOLE/SEEK_DATA. You give a strong suggestion to fix that.
>   http://git.savannah.gnu.org/cgit/coreutils.git/tree/src/extent-scan.c#n87

My understanding was that FIEMAP was disabled almost immediately
after the data corruption problems were understood, and it hasn't
been turned back on since. I don't see FIEMAP calls in strace when I
copy sparse files, even when I use --sparse=auto which is supposed
to trigger the sparse source file checks using FIEMAP.

So, yeah, while there's FIEMAP code present, that doesn't mean it's
actually used. And, yes, there are comments in coreutils commits in
2011 saying that the FIEMAP workarounds are temporary until
SEK_HOLE/DATA are supported, which were added to the kernel in 2011
soon after the 'cp-corrupts-data-with-fiemap' debacle came to light.
Five years later, and the fiemap code is stil there?

Cheers,

Dave.
-- 
Dave Chinner
dchinner@redhat.com

  reply	other threads:[~2016-07-21 11:43 UTC|newest]

Thread overview: 55+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-19  4:07 [Qemu-devel] [PATCH for-2.7 v5 00/14] nbd: efficient write zeroes Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 01/14] nbd: Fix bad flag detection on server Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 02/14] nbd: Add qemu-nbd -D for human-readable description Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 03/14] nbd: Limit nbdflags to 16 bits Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 04/14] nbd: Treat flags vs. command type as separate fields Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 05/14] nbd: Share common reply-sending code in server Eric Blake
2016-07-19  5:10   ` Fam Zheng
2016-07-19 14:52     ` Eric Blake
2016-07-20  4:39       ` Fam Zheng
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 06/14] nbd: Send message along with server NBD_REP_ERR errors Eric Blake
2016-07-19  5:15   ` Fam Zheng
2016-10-11 15:12     ` Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 07/14] nbd: Share common option-sending code in client Eric Blake
2016-07-19  5:31   ` Fam Zheng
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 08/14] nbd: Let server know when client gives up negotiation Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 09/14] nbd: Let client skip portions of server reply Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 10/14] nbd: Less allocation during NBD_OPT_LIST Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 11/14] nbd: Support shorter handshake Eric Blake
2016-07-19  4:07 ` [Qemu-devel] [PATCH v5 12/14] nbd: Improve server handling of shutdown requests Eric Blake
2016-07-19  4:08 ` [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server Eric Blake
2016-07-19  6:21   ` Fam Zheng
2016-07-19 15:28     ` Eric Blake
2016-07-19 15:45       ` Paolo Bonzini
2016-07-20  3:34         ` Fam Zheng
2016-07-20  3:47           ` Eric Blake
2016-07-20  4:37             ` Fam Zheng
2016-07-20  7:09               ` Paolo Bonzini
2016-07-20  7:38                 ` Fam Zheng
2016-07-20  8:16                   ` Paolo Bonzini
2016-07-20  9:04                     ` Fam Zheng
2016-07-20  9:19                   ` [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC (was Re: [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server) Paolo Bonzini
2016-07-20 12:30                     ` Dave Chinner
2016-07-20 13:35                       ` Niels de Vos
2016-07-21 11:43                         ` Dave Chinner [this message]
2016-07-21 12:31                           ` Pádraig Brady
2016-07-21 13:15                             ` Dave Chinner
2016-07-20 13:40                       ` Paolo Bonzini
2016-07-21 12:41                         ` Dave Chinner
2016-07-21 13:01                           ` Pádraig Brady
2016-07-21 14:23                           ` Paolo Bonzini
2016-07-22  8:58                             ` Dave Chinner
2016-07-22 10:41                               ` Paolo Bonzini
2018-02-15 16:40                                 ` Vladimir Sementsov-Ogievskiy
2018-02-15 16:42                                   ` Paolo Bonzini
2018-04-18 14:25                                     ` Vladimir Sementsov-Ogievskiy
2018-04-18 14:41                                       ` [Qemu-devel] semantics of FIEMAP without FIEMAP_FLAG_SYNC Eric Blake
2016-08-18 13:50   ` [Qemu-devel] [PATCH v5 13/14] nbd: Implement NBD_CMD_WRITE_ZEROES on server Vladimir Sementsov-Ogievskiy
2016-08-18 13:52     ` Paolo Bonzini
2016-07-19  4:08 ` [Qemu-devel] [PATCH v5 14/14] nbd: Implement NBD_CMD_WRITE_ZEROES on client Eric Blake
2016-07-19  6:24   ` Fam Zheng
2016-07-19 15:31     ` Eric Blake
2016-07-19  6:33 ` [Qemu-devel] [PATCH for-2.7 v5 00/14] nbd: efficient write zeroes Fam Zheng
2016-07-19  8:53 ` Paolo Bonzini
2016-07-19 15:33   ` Eric Blake
2016-07-19 15:41     ` Paolo Bonzini

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160721114342.GQ2031@devil.localdomain \
    --to=dchinner@redhat.com \
    --cc=P@draigBrady.com \
    --cc=eblake@redhat.com \
    --cc=famz@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=lczerner@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=ndevos@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.