All of lore.kernel.org
 help / color / mirror / Atom feed
From: nate <linux-xfs@linuxpowered.net>
To: linux-xfs@vger.kernel.org
Subject: Re: XFS reflink copy to different filesystem performance question
Date: Wed, 16 Mar 2022 10:08:30 -0700	[thread overview]
Message-ID: <e99689e6c1232ffb564b0c2aecd8b0dd@linuxpowered.net> (raw)
In-Reply-To: <20220316083333.GQ3927073@dread.disaster.area>

On 2022-03-16 1:33, Dave Chinner wrote:

> Yeah, Veeam appears to use the shared data extent functionality in
> XFS for deduplication and cloning. reflink is the use facing name
> for space efficient file cloning (via cp --reflink).

I read bits and pieces about cp --reflink, I guess using that would be
a more "standard" *nix way of using dedupe? For example cp --reflink 
then
using rsync to do a delta sync against the new copy(to get the updates?
Not that I have a need to do this just curious on the workflow.

> I'm guessing that you're trying to copy a deduplicated file,
> resulting in the same physical blocks being read over and over again
> at different file offsets and causing the disks to seek because it's
> not physically sequential data.

Thanks for confirming that, it's what I suspected.

[..]

> Maybe they are doing that with FIEMAP to resolve deduplicated
> regions and caching them, or they have some other infomration in
> their backup/deduplication data store that allows them to optimise
> the IO. You'll need to actually run things like strace on the copies
> to find out exactly what it is doing....

ok thanks for the info. I do see a couple of times there are periods of 
lots
of disk reads on the source and no writes happening on the destination
I guess it is sorting through what it needs to get, one of those lasted
about 20mins.

> No, they don't exist because largely reading a reflinked file
> performs no differently to reading a non-shared file.

Good to know, certainly would be nice if there was at least a way to
identify a file as having X number of links.

> To do that efficiently (i.e. without a full filesystem scan) you
> need to look up the filesystem reverse mapping table to find all the
> owners of pointers to a given block.  I bet you didn't make the
> filesystem with "-m rmapbt=1" to enable that functionality - nobody
> does that unless they have a reason to because it's not enabled by
> default (yet).

I'm sure I did not do that either, but I can do that if you think it
would be advantageous. I do plan to ship this DL380Gen10 XFS system to
another location and am happy to reformat the XFS volume with that extra
option if it would be useful.

I don't anticipate needing to deal directly with this reflinked data,
just let Veeam do it's thing. Thanks for clearing things up for
me so quickly!

nate


  reply	other threads:[~2022-03-16 17:08 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-16  0:45 XFS reflink copy to different filesystem performance question nate
2022-03-16  8:33 ` Dave Chinner
2022-03-16 17:08   ` nate [this message]
2022-03-16 22:23     ` Dave Chinner
2022-03-17 16:43       ` nate

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e99689e6c1232ffb564b0c2aecd8b0dd@linuxpowered.net \
    --to=linux-xfs@linuxpowered.net \
    --cc=linux-xfs@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.