All of lore.kernel.org
 help / color / mirror / Atom feed
* XFS reflink copy to different filesystem performance question
@ 2022-03-16  0:45 nate
  2022-03-16  8:33 ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: nate @ 2022-03-16  0:45 UTC (permalink / raw)
  To: linux-xfs

Hello -

Blast from the past this is the first Majordomo mailing list I can 
recall joining in probably
15-18+ years..

Anyway, I ran into a situation today and was wondering if someone could 
clarify or point me to
some docs so I just have a better understanding as to what is going on 
with this XFS setup.

Hardware: HP DL380 Gen10 6x8TB disks in hardware RAID 10
Software: Ubuntu 20.04 kernel 5.4.0 using XFS with reflinks enabled
Purpose: this system is used to store Veeam Backups (VMware VM backup)

I've used XFS off and on for many years but it wasn't until I set this 
up last year
I had even heard of reflinks. Veeam docs specifically suggested enabling 
it if possible
so I did. Things have been working fine since.

Recently we had a situation come up where we want to copy some of this 
data to a
local USB drive to ship to another location to restore the data. A 
simple enough
process I thought just a basic file copy.

Total of 8.6TB, most of that is in a single 8.3TB file. We got a 18TB 
USB drive , I formatted
it ext4 (feel more comfortable with ext4 on a USB drive).

I started an rsync to copy this data over as I assumed that would be the 
simplest method.
I was pretty surprised to see rsync averaging between 25-30MB/sec. I 
expected more of course.
I checked iostat and was even more surprised to see the 6 disk RAID 10 
array showing
100% i/o utilization - the reads were maxed out, the USB drive was 
barely being touched.
Consider me super confused at this point.. there was no other activity 
on the system.
So I tried a basic cp -a command instead maybe data access for rsync is 
different, I
didn't think so but couldn't help to try.. results were similar. iostat 
showed periodic
bursts to 50-60MB/s but most often below 30MB/s. I like rsync with the 
--progress option
so I went back to rsync again.

So then I looked for other data on the same filesystem that I knew was 
not Veeam data,
so it would not be using reflinks. I found a stash of ~5GB of data and 
copied that,
easily over 100MB/sec(files were smaller and going so fast it was hard 
to tell for
sure).

So the conclusion here is something special with the reflink data causes 
regular
linux copy operations to suffer. I did a bunch of web searches but only 
results seemed
to be people talking about how great reflinks were to make clones of 
data, not
references to people copying reflinked data to another filesystem.

So I was wondering, maybe Veeam does something funky with how it 
accesses data.
Obviously this is going from XFS to EXT4 so there can't be any special 
sauce since
the file systems are totally different.

So I kicked off a copy using Veeam, I don't know what it does on the 
backend. But
iostat showed sustained reads at over 200MB/sec, so call it 8X faster 
than
rsync or cp. At this point the USB drive seemed more of the bottleneck 
(which
is fine).

I can only guess that Veeam is more intelligent in that it is using some 
API
call to XFS to pull the sequential data for the most recent backup, vs 
using
a linux CLI tool is pulling the entire file which probably has a ton of 
different
pointers in it causing a lot more random I/O.

So again, not having a problem really just looking to get a better 
simple
understanding as to why a rsync or cp from reflinked data to another
filesystem is so much slower than veeam doing it itself. I could try to
ask Veeam support but I'm quite confident they'd have no idea what I was
talking about.

and with that said are there tools that can copy reflinked data more
intelligently from the command line (specifically to another 
filesystem)?
I checked the XFS faq and there is no mention of reflink. I couldn't
find info on how to find how many "links" there were or how big each
one was or how to reference them directly.

thanks

nate


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS reflink copy to different filesystem performance question
  2022-03-16  0:45 XFS reflink copy to different filesystem performance question nate
@ 2022-03-16  8:33 ` Dave Chinner
  2022-03-16 17:08   ` nate
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2022-03-16  8:33 UTC (permalink / raw)
  To: nate; +Cc: linux-xfs

On Tue, Mar 15, 2022 at 05:45:37PM -0700, nate wrote:
> Hello -
> 
> Blast from the past this is the first Majordomo mailing list I can recall
> joining in probably
> 15-18+ years..
> 
> Anyway, I ran into a situation today and was wondering if someone could
> clarify or point me to
> some docs so I just have a better understanding as to what is going on with
> this XFS setup.
> 
> Hardware: HP DL380 Gen10 6x8TB disks in hardware RAID 10
> Software: Ubuntu 20.04 kernel 5.4.0 using XFS with reflinks enabled
> Purpose: this system is used to store Veeam Backups (VMware VM backup)
> 
> I've used XFS off and on for many years but it wasn't until I set this up
> last year
> I had even heard of reflinks. Veeam docs specifically suggested enabling it
> if possible
> so I did. Things have been working fine since.

Yeah, Veeam appears to use the shared data extent functionality in
XFS for deduplication and cloning. reflink is the use facing name
for space efficient file cloning (via cp --reflink).

> Recently we had a situation come up where we want to copy some of this data
> to a
> local USB drive to ship to another location to restore the data. A simple
> enough
> process I thought just a basic file copy.
> 
> Total of 8.6TB, most of that is in a single 8.3TB file. We got a 18TB USB
> drive , I formatted
> it ext4 (feel more comfortable with ext4 on a USB drive).
> 
> I started an rsync to copy this data over as I assumed that would be the
> simplest method.
> I was pretty surprised to see rsync averaging between 25-30MB/sec. I
> expected more of course.
> I checked iostat and was even more surprised to see the 6 disk RAID 10 array
> showing
> 100% i/o utilization - the reads were maxed out, the USB drive was barely
> being touched.
> Consider me super confused at this point.. there was no other activity on
> the system.
> So I tried a basic cp -a command instead maybe data access for rsync is
> different, I
> didn't think so but couldn't help to try.. results were similar. iostat
> showed periodic
> bursts to 50-60MB/s but most often below 30MB/s. I like rsync with the
> --progress option
> so I went back to rsync again.

I'm guessing that you're trying to copy a deduplicated file,
resulting in the same physical blocks being read over and over again
at different file offsets and causing the disks to seek because it's
not physically sequential data.

> So then I looked for other data on the same filesystem that I knew was not
> Veeam data,
> so it would not be using reflinks. I found a stash of ~5GB of data and
> copied that,
> easily over 100MB/sec(files were smaller and going so fast it was hard to
> tell for
> sure).

It could have been using reflinks - upstream coreutils defaults to
reflink copies with cp these days (i.e. default is cp --reflink=auto
which means it tries a file clone first, then falls back to a data
copy if cloning fails. reflink copies are identical to the original
file - they *are* the original file - until they are overwritten.
Hence cp doesn't perform any differently with reflinked files vs
normal files.

[...]

> So I kicked off a copy using Veeam, I don't know what it does on the
> backend. But
> iostat showed sustained reads at over 200MB/sec, so call it 8X faster than
> rsync or cp. At this point the USB drive seemed more of the bottleneck
> (which
> is fine).

Because Veeam knows about the dedeuplicated data, it is quite likely
that it something smarter to optimise reading from files it has
deduplicated...

> I can only guess that Veeam is more intelligent in that it is using some API
> call to XFS to pull the sequential data for the most recent backup, vs using
> a linux CLI tool is pulling the entire file which probably has a ton of
> different
> pointers in it causing a lot more random I/O.

Maybe they are doing that with FIEMAP to resolve deduplicated
regions and caching them, or they have some other infomration in
their backup/deduplication data store that allows them to optimise
the IO. You'll need to actually run things like strace on the copies
to find out exactly what it is doing....

> So again, not having a problem really just looking to get a better simple
> understanding as to why a rsync or cp from reflinked data to another

rsync and cp are dumb, lowest common denominator copying programs.
They don't do anything smart like use threads, direct I/O, AIO,
io_uring, etc that applications that optimise for IO performance
typically use...

> and with that said are there tools that can copy reflinked data more
> intelligently from the command line (specifically to another filesystem)?

No, they don't exist because largely reading a reflinked file
performs no differently to reading a non-shared file.

> I checked the XFS faq and there is no mention of reflink. I couldn't
> find info on how to find how many "links" there were or how big each
> one was or how to reference them directly.

To do that efficiently (i.e. without a full filesystem scan) you
need to look up the filesystem reverse mapping table to find all the
owners of pointers to a given block.  I bet you didn't make the
filesystem with "-m rmapbt=1" to enable that functionality - nobody
does that unless they have a reason to because it's not enabled by
default (yet).

Cheers,

Dave.
> 
> thanks
> 
> nate
> 
> 

-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS reflink copy to different filesystem performance question
  2022-03-16  8:33 ` Dave Chinner
@ 2022-03-16 17:08   ` nate
  2022-03-16 22:23     ` Dave Chinner
  0 siblings, 1 reply; 5+ messages in thread
From: nate @ 2022-03-16 17:08 UTC (permalink / raw)
  To: linux-xfs

On 2022-03-16 1:33, Dave Chinner wrote:

> Yeah, Veeam appears to use the shared data extent functionality in
> XFS for deduplication and cloning. reflink is the use facing name
> for space efficient file cloning (via cp --reflink).

I read bits and pieces about cp --reflink, I guess using that would be
a more "standard" *nix way of using dedupe? For example cp --reflink 
then
using rsync to do a delta sync against the new copy(to get the updates?
Not that I have a need to do this just curious on the workflow.

> I'm guessing that you're trying to copy a deduplicated file,
> resulting in the same physical blocks being read over and over again
> at different file offsets and causing the disks to seek because it's
> not physically sequential data.

Thanks for confirming that, it's what I suspected.

[..]

> Maybe they are doing that with FIEMAP to resolve deduplicated
> regions and caching them, or they have some other infomration in
> their backup/deduplication data store that allows them to optimise
> the IO. You'll need to actually run things like strace on the copies
> to find out exactly what it is doing....

ok thanks for the info. I do see a couple of times there are periods of 
lots
of disk reads on the source and no writes happening on the destination
I guess it is sorting through what it needs to get, one of those lasted
about 20mins.

> No, they don't exist because largely reading a reflinked file
> performs no differently to reading a non-shared file.

Good to know, certainly would be nice if there was at least a way to
identify a file as having X number of links.

> To do that efficiently (i.e. without a full filesystem scan) you
> need to look up the filesystem reverse mapping table to find all the
> owners of pointers to a given block.  I bet you didn't make the
> filesystem with "-m rmapbt=1" to enable that functionality - nobody
> does that unless they have a reason to because it's not enabled by
> default (yet).

I'm sure I did not do that either, but I can do that if you think it
would be advantageous. I do plan to ship this DL380Gen10 XFS system to
another location and am happy to reformat the XFS volume with that extra
option if it would be useful.

I don't anticipate needing to deal directly with this reflinked data,
just let Veeam do it's thing. Thanks for clearing things up for
me so quickly!

nate


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS reflink copy to different filesystem performance question
  2022-03-16 17:08   ` nate
@ 2022-03-16 22:23     ` Dave Chinner
  2022-03-17 16:43       ` nate
  0 siblings, 1 reply; 5+ messages in thread
From: Dave Chinner @ 2022-03-16 22:23 UTC (permalink / raw)
  To: nate; +Cc: linux-xfs

On Wed, Mar 16, 2022 at 10:08:30AM -0700, nate wrote:
> On 2022-03-16 1:33, Dave Chinner wrote:
> 
> > Yeah, Veeam appears to use the shared data extent functionality in
> > XFS for deduplication and cloning. reflink is the use facing name
> > for space efficient file cloning (via cp --reflink).
> 
> I read bits and pieces about cp --reflink, I guess using that would be
> a more "standard" *nix way of using dedupe?

reflink is not dedupe. file clones simply make a copy by reference,
so it doesn't duplicate the data in the first place. IOWs, it ends
up with a single physical copy that has multiple references to it.

dedupe is done by a different operation, which requires comparing
the data in two different locations and if they are the same
reducing it to a single physical copy with multiple references.

In the end they look the same on disk (shared physical extent with
multiple references) but the operations are distinctly different.

> For example cp --reflink then
> using rsync to do a delta sync against the new copy(to get the updates?
> Not that I have a need to do this just curious on the workflow.

IIUC, you are asking about whether you can run a reflink copy on
the destination before you run rsync, then do a delta sync using
rsync to only move the changed blocks, so only store the changed
blocks in the backup image?

If so, then yes. This is how a reflink-based file-level backup farm
would work. It is very similar to a hardlink based farm, but instead
of keeping a repository of every version of the every file that is
backed up in an object store and then creating the directory
structure via hardlinks to the object store, it creates the new
directory structure with reflink copies of the previous version and
then does delta updates to the files directly.

> > I'm guessing that you're trying to copy a deduplicated file,
> > resulting in the same physical blocks being read over and over
> > again at different file offsets and causing the disks to seek
> > because it's not physically sequential data.
> 
> Thanks for confirming that, it's what I suspected.

I haven't confirmed anything, just made a guess same as you have.

> [..]
> 
> > Maybe they are doing that with FIEMAP to resolve deduplicated
> > regions and caching them, or they have some other infomration in
> > their backup/deduplication data store that allows them to optimise
> > the IO. You'll need to actually run things like strace on the copies
> > to find out exactly what it is doing....
> 
> ok thanks for the info. I do see a couple of times there are periods of lots
> of disk reads on the source and no writes happening on the destination
> I guess it is sorting through what it needs to get, one of those lasted
> about 20mins.

That sounds more like the dedupe process searching for duplicate
blocks to dedupe....

> > No, they don't exist because largely reading a reflinked file
> > performs no differently to reading a non-shared file.
> 
> Good to know, certainly would be nice if there was at least a way to
> identify a file as having X number of links.

You can use FIEMAP (filefrag(1) or xfs_bmap(8)) to tell you if a
specific extent is shared or not. But it cannot tell you how many
references there are to it, nor what file those references belong
to. For that, you need root permissions, ioctl_getfsmap(2) and
rmapbt=1 support in your filesystem.

> > To do that efficiently (i.e. without a full filesystem scan) you
> > need to look up the filesystem reverse mapping table to find all the
> > owners of pointers to a given block.  I bet you didn't make the
> > filesystem with "-m rmapbt=1" to enable that functionality - nobody
> > does that unless they have a reason to because it's not enabled by
> > default (yet).
> 
> I'm sure I did not do that either, but I can do that if you think it
> would be advantageous. I do plan to ship this DL380Gen10 XFS system to
> another location and am happy to reformat the XFS volume with that extra
> option if it would be useful.

Unless you have an immediate use for filesystem metadata level
introspection (generally unlikely), there's no need to enable it.

Cheers,

Dave.
-- 
Dave Chinner
david@fromorbit.com

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: XFS reflink copy to different filesystem performance question
  2022-03-16 22:23     ` Dave Chinner
@ 2022-03-17 16:43       ` nate
  0 siblings, 0 replies; 5+ messages in thread
From: nate @ 2022-03-17 16:43 UTC (permalink / raw)
  To: linux-xfs

On 2022-03-16 15:23, Dave Chinner wrote:
> reflink is not dedupe. file clones simply make a copy by reference,
> so it doesn't duplicate the data in the first place. IOWs, it ends
> up with a single physical copy that has multiple references to it.
> 
> dedupe is done by a different operation, which requires comparing
> the data in two different locations and if they are the same
> reducing it to a single physical copy with multiple references.

Yeah sorry I didn't phrase that statement right but I understand
the situation.

> IIUC, you are asking about whether you can run a reflink copy on
> the destination before you run rsync, then do a delta sync using
> rsync to only move the changed blocks, so only store the changed
> blocks in the backup image?
> 
> If so, then yes. This is how a reflink-based file-level backup farm
> would work. It is very similar to a hardlink based farm, but instead
> of keeping a repository of every version of the every file that is
> backed up in an object store and then creating the directory
> structure via hardlinks to the object store, it creates the new
> directory structure with reflink copies of the previous version and
> then does delta updates to the files directly.

ok thanks


> I haven't confirmed anything, just made a guess same as you have.

Well good enough for me thanks anyway!


> That sounds more like the dedupe process searching for duplicate
> blocks to dedupe....

I think so too.

> You can use FIEMAP (filefrag(1) or xfs_bmap(8)) to tell you if a
> specific extent is shared or not. But it cannot tell you how many
> references there are to it, nor what file those references belong
> to. For that, you need root permissions, ioctl_getfsmap(2) and
> rmapbt=1 support in your filesystem.

Sounds more complex than I would like to deal with.

> Unless you have an immediate use for filesystem metadata level
> introspection (generally unlikely), there's no need to enable it.

ok thanks for the info.

I am leaving the list now, thanks a bunch for the replies.

nate

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2022-03-17 16:43 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-16  0:45 XFS reflink copy to different filesystem performance question nate
2022-03-16  8:33 ` Dave Chinner
2022-03-16 17:08   ` nate
2022-03-16 22:23     ` Dave Chinner
2022-03-17 16:43       ` nate

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.