Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: Trond Myklebust <trondmy@hammerspace.com>
To: "darrick.wong@oracle.com" <darrick.wong@oracle.com>,
	"david@fromorbit.com" <david@fromorbit.com>
Cc: "linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>
Subject: Re: Question about clone_range() metadata stability
Date: Tue, 3 Dec 2019 07:36:29 +0000
Message-ID: <52f1afb6e0a2026840da6f4b98a5e01a247447e5.camel@hammerspace.com> (raw)
In-Reply-To: <20191201210519.GB2418@dread.disaster.area>

On Mon, 2019-12-02 at 08:05 +1100, Dave Chinner wrote:
> On Wed, Nov 27, 2019 at 12:21:36PM -0800, Darrick J. Wong wrote:
> > On Wed, Nov 27, 2019 at 06:38:46PM +0000, Trond Myklebust wrote:
> > > Hi all
> > > 
> > > A quick question about clone_range() and guarantees around
> > > metadata
> > > stability.
> > > 
> > > Are users required to call fsync/fsync_range() after calling
> > > clone_range() in order to guarantee that the cloned range
> > > metadata is
> > > persisted?
> > 
> > Yes.
> > 
> > > I'm assuming that it is required in order to guarantee that
> > > data is persisted.
> > 
> > Data and metadata.  XFS and ocfs2's reflink implementations will
> > flush
> > the page cache before starting the remap, but they both require
> > fsync to
> > force the log/journal to disk.
> 
> So we need to call xfs_fs_nfs_commit_metadata() to get that done
> post vfs_clone_file_range() completion on the server side, yes?
> 

I chose to implement this using a full call to vfs_fsync_range(), since
we really do want to ensure data stability as well. Consider, for
instance, the case where client A is running an application, and client
B runs vfs_clone_file_range() in order to create a point in time
snapshot of the file for disaster recovery purposes...

> > (AFAICT the same reasoning applies to btrfs, but don't trust my
> > word for
> > it.)
> > 
> > > I'm asking because knfsd currently just does a call to
> > > vfs_clone_file_range() when parsing a NFSv4.2 CLONE operation. It
> > > does
> > > not call fsync()/fsync_range() on the destination file, and since
> > > the
> > > NFSv4.2 protocol does not require you to perform any other
> > > operation in
> > > order to persist data/metadata, I'm worried that we may be
> > > corrupting
> > > the cloned file if the NFS server crashes at the wrong moment
> > > after the
> > > client has been told the clone completed.
> 
> Yup, that's exactly what server side calls to commit_metadata() are
> supposed to address.
> 
> I suspect to be correct, this might require commit_metadata() to be
> called on both the source and destination inodes, as both of them
> may have modified metadata as a result of the clone operation. For
> XFS one of them will be a no-op, but for other filesystems that
> don't implement ->commit_metadata, we'll need to call
> sync_inode_metadata() on both inodes...
> 

That's interesting. I hadn't considered that a clone might cause the
source metadata to change as well. What kind of change specifically are
we talking about? Is it just delayed block allocation, or is there
more?

Thanks
  Trond

-- 
Trond Myklebust
Linux NFS client maintainer, Hammerspace
trond.myklebust@hammerspace.com



  parent reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-27 18:38 Trond Myklebust
2019-11-27 20:21 ` Darrick J. Wong
2019-11-29 12:43   ` Filipe Manana
2019-12-01 21:05   ` Dave Chinner
2019-12-02 17:09     ` Darrick J. Wong
2019-12-03  7:36     ` Trond Myklebust [this message]
2019-12-03 16:35       ` Darrick J. Wong
2019-12-03 23:00         ` Trond Myklebust
2019-12-06  1:31       ` Dave Chinner

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=52f1afb6e0a2026840da6f4b98a5e01a247447e5.camel@hammerspace.com \
    --to=trondmy@hammerspace.com \
    --cc=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git