Re: XFS reflink copy to different filesystem performance question

From: Dave Chinner <david@fromorbit.com>
To: nate <linux-xfs@linuxpowered.net>
Cc: linux-xfs@vger.kernel.org
Subject: Re: XFS reflink copy to different filesystem performance question
Date: Thu, 17 Mar 2022 09:23:04 +1100	[thread overview]
Message-ID: <20220316222304.GR3927073@dread.disaster.area> (raw)
In-Reply-To: <e99689e6c1232ffb564b0c2aecd8b0dd@linuxpowered.net>

On Wed, Mar 16, 2022 at 10:08:30AM -0700, nate wrote:
> On 2022-03-16 1:33, Dave Chinner wrote:
> 
> > Yeah, Veeam appears to use the shared data extent functionality in
> > XFS for deduplication and cloning. reflink is the use facing name
> > for space efficient file cloning (via cp --reflink).
> 
> I read bits and pieces about cp --reflink, I guess using that would be
> a more "standard" *nix way of using dedupe?

reflink is not dedupe. file clones simply make a copy by reference,
so it doesn't duplicate the data in the first place. IOWs, it ends
up with a single physical copy that has multiple references to it.

dedupe is done by a different operation, which requires comparing
the data in two different locations and if they are the same
reducing it to a single physical copy with multiple references.

In the end they look the same on disk (shared physical extent with
multiple references) but the operations are distinctly different.

> For example cp --reflink then
> using rsync to do a delta sync against the new copy(to get the updates?
> Not that I have a need to do this just curious on the workflow.

IIUC, you are asking about whether you can run a reflink copy on
the destination before you run rsync, then do a delta sync using
rsync to only move the changed blocks, so only store the changed
blocks in the backup image?

If so, then yes. This is how a reflink-based file-level backup farm
would work. It is very similar to a hardlink based farm, but instead
of keeping a repository of every version of the every file that is
backed up in an object store and then creating the directory
structure via hardlinks to the object store, it creates the new
directory structure with reflink copies of the previous version and
then does delta updates to the files directly.

> > I'm guessing that you're trying to copy a deduplicated file,
> > resulting in the same physical blocks being read over and over
> > again at different file offsets and causing the disks to seek
> > because it's not physically sequential data.
> 
> Thanks for confirming that, it's what I suspected.

I haven't confirmed anything, just made a guess same as you have.

> [..]
> 
> > Maybe they are doing that with FIEMAP to resolve deduplicated
> > regions and caching them, or they have some other infomration in
> > their backup/deduplication data store that allows them to optimise
> > the IO. You'll need to actually run things like strace on the copies
> > to find out exactly what it is doing....
> 
> ok thanks for the info. I do see a couple of times there are periods of lots
> of disk reads on the source and no writes happening on the destination
> I guess it is sorting through what it needs to get, one of those lasted
> about 20mins.

That sounds more like the dedupe process searching for duplicate
blocks to dedupe....

> > No, they don't exist because largely reading a reflinked file
> > performs no differently to reading a non-shared file.
> 
> Good to know, certainly would be nice if there was at least a way to
> identify a file as having X number of links.

You can use FIEMAP (filefrag(1) or xfs_bmap(8)) to tell you if a
specific extent is shared or not. But it cannot tell you how many
references there are to it, nor what file those references belong
to. For that, you need root permissions, ioctl_getfsmap(2) and
rmapbt=1 support in your filesystem.

> > To do that efficiently (i.e. without a full filesystem scan) you
> > need to look up the filesystem reverse mapping table to find all the
> > owners of pointers to a given block.  I bet you didn't make the
> > filesystem with "-m rmapbt=1" to enable that functionality - nobody
> > does that unless they have a reason to because it's not enabled by
> > default (yet).
> 
> I'm sure I did not do that either, but I can do that if you think it
> would be advantageous. I do plan to ship this DL380Gen10 XFS system to
> another location and am happy to reformat the XFS volume with that extra
> option if it would be useful.

Unless you have an immediate use for filesystem metadata level
introspection (generally unlikely), there's no need to enable it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com