Linux-NFS Archive on lore.kernel.org
 help / color / Atom feed
From: "Darrick J. Wong" <darrick.wong@oracle.com>
To: Dave Chinner <david@fromorbit.com>
Cc: Trond Myklebust <trondmy@hammerspace.com>,
	"linux-fsdevel@vger.kernel.org" <linux-fsdevel@vger.kernel.org>,
	"linux-nfs@vger.kernel.org" <linux-nfs@vger.kernel.org>
Subject: Re: Question about clone_range() metadata stability
Date: Mon, 2 Dec 2019 09:09:26 -0800
Message-ID: <20191202170926.GA7323@magnolia> (raw)
In-Reply-To: <20191201210519.GB2418@dread.disaster.area>

On Mon, Dec 02, 2019 at 08:05:19AM +1100, Dave Chinner wrote:
> On Wed, Nov 27, 2019 at 12:21:36PM -0800, Darrick J. Wong wrote:
> > On Wed, Nov 27, 2019 at 06:38:46PM +0000, Trond Myklebust wrote:
> > > Hi all
> > > 
> > > A quick question about clone_range() and guarantees around metadata
> > > stability.
> > > 
> > > Are users required to call fsync/fsync_range() after calling
> > > clone_range() in order to guarantee that the cloned range metadata is
> > > persisted?
> > 
> > Yes.
> > 
> > > I'm assuming that it is required in order to guarantee that
> > > data is persisted.
> > 
> > Data and metadata.  XFS and ocfs2's reflink implementations will flush
> > the page cache before starting the remap, but they both require fsync to
> > force the log/journal to disk.
> 
> So we need to call xfs_fs_nfs_commit_metadata() to get that done
> post vfs_clone_file_range() completion on the server side, yes?

That sounds like a much better/less hastily researched answer! :)

> 
> > 
> > (AFAICT the same reasoning applies to btrfs, but don't trust my word for
> > it.)
> > 
> > > I'm asking because knfsd currently just does a call to
> > > vfs_clone_file_range() when parsing a NFSv4.2 CLONE operation. It does
> > > not call fsync()/fsync_range() on the destination file, and since the
> > > NFSv4.2 protocol does not require you to perform any other operation in
> > > order to persist data/metadata, I'm worried that we may be corrupting
> > > the cloned file if the NFS server crashes at the wrong moment after the
> > > client has been told the clone completed.
> 
> Yup, that's exactly what server side calls to commit_metadata() are
> supposed to address.
> 
> I suspect to be correct, this might require commit_metadata() to be
> called on both the source and destination inodes, as both of them
> may have modified metadata as a result of the clone operation. For
> XFS one of them will be a no-op,

Hmm.  If xfs had to set its reflink flag on the source inode then we
want to ->commit_metadata the source inode to push the log forward far
enough to record the metadata change.  That said, we set the reflink
flag on both inodes before we remap anything, so chances are that
->commit_metadata on the dest inode will be enough to push the log
forward.

I suspect that from NFS' point of view it probably ought to
->commit_metadata both inodes to insulate itself from fs-specific
behaviors and avoid weird crash dataloss bugs.  Someday, someone will
design a filesystem with per-inode logs /and/ hook it up to NFS.

> but for other filesystems that
> don't implement ->commit_metadata, we'll need to call
> sync_inode_metadata() on both inodes...

<nod>

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@fromorbit.com

  reply index

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-11-27 18:38 Trond Myklebust
2019-11-27 20:21 ` Darrick J. Wong
2019-11-29 12:43   ` Filipe Manana
2019-12-01 21:05   ` Dave Chinner
2019-12-02 17:09     ` Darrick J. Wong [this message]
2019-12-03  7:36     ` Trond Myklebust
2019-12-03 16:35       ` Darrick J. Wong
2019-12-03 23:00         ` Trond Myklebust
2019-12-06  1:31       ` Dave Chinner

Reply instructions:

You may reply publically to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20191202170926.GA7323@magnolia \
    --to=darrick.wong@oracle.com \
    --cc=david@fromorbit.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-nfs@vger.kernel.org \
    --cc=trondmy@hammerspace.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-NFS Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-nfs/0 linux-nfs/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-nfs linux-nfs/ https://lore.kernel.org/linux-nfs \
		linux-nfs@vger.kernel.org
	public-inbox-index linux-nfs

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-nfs


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git