linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [LSF/MM/BPF TOPIC] Enhancing Linux Copy Performance and Function and improving backup scenarios
@ 2020-01-22 23:13 Steve French
  2020-01-30  1:52 ` Darrick J. Wong
  0 siblings, 1 reply; 4+ messages in thread
From: Steve French @ 2020-01-22 23:13 UTC (permalink / raw)
  To: linux-fsdevel; +Cc: CIFS, samba-technical, lsf-pc

As discussed last year:

Current Linux copy tools have various problems compared to other
platforms - small I/O sizes (and most don't allow it to be
configured), lack of parallel I/O for multi-file copies, inability to
reduce metadata updates by setting file size first, lack of cross
mount (to the same file system) copy optimizations, limited ability to
handle the wide variety of server side copy (and copy offload)
mechanisms and error handling problems.   And copy tools rely less on
the kernel file system (vs. code in the user space tool) in Linux than
would be expected, in order to determine which optimizations to use.

But some progress has been made since last year's summit, with new
copy tools being released and improvements to some of the kernel file
systems, and also some additional feedback on lwn and on the mailing
lists.  In addition these discussions have prompted additional
feedback on how to improve file backup/restore scenarios (e.g. to
mounts to the cloud from local Linux systems) which require preserving
more timestamps, ACLs and metadata, and preserving them efficiently.

Let's continue our discussions from last year, and see how we can move
forward on improving the performance and function of Linux fs
(including the VFS and user space tools) for various backup, restore
and copy scenarios operations.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Enhancing Linux Copy Performance and Function and improving backup scenarios
  2020-01-22 23:13 [LSF/MM/BPF TOPIC] Enhancing Linux Copy Performance and Function and improving backup scenarios Steve French
@ 2020-01-30  1:52 ` Darrick J. Wong
  2020-02-01 19:54   ` Steve French
  0 siblings, 1 reply; 4+ messages in thread
From: Darrick J. Wong @ 2020-01-30  1:52 UTC (permalink / raw)
  To: Steve French; +Cc: linux-fsdevel, CIFS, samba-technical, lsf-pc

On Wed, Jan 22, 2020 at 05:13:53PM -0600, Steve French wrote:
> As discussed last year:
> 
> Current Linux copy tools have various problems compared to other
> platforms - small I/O sizes (and most don't allow it to be
> configured), lack of parallel I/O for multi-file copies, inability to
> reduce metadata updates by setting file size first, lack of cross

...and yet weirdly we tell everyone on xfs not to do that or to use
fallocate, so that delayed speculative allocation can do its thing.
We also tell them not to create deep directory trees because xfs isn't
ext4.

> mount (to the same file system) copy optimizations, limited ability to
> handle the wide variety of server side copy (and copy offload)
> mechanisms and error handling problems.   And copy tools rely less on
> the kernel file system (vs. code in the user space tool) in Linux than
> would be expected, in order to determine which optimizations to use.

What kernel interfaces would we expect userspace to use to figure out
the confusing mess of optimizations? :)

There's a whole bunch of xfs ioctls like dioinfo and the like that we
ought to push to statx too.  Is that an example of what you mean?

(I wasn't at last year's LSF.)

> But some progress has been made since last year's summit, with new
> copy tools being released and improvements to some of the kernel file
> systems, and also some additional feedback on lwn and on the mailing
> lists.  In addition these discussions have prompted additional
> feedback on how to improve file backup/restore scenarios (e.g. to
> mounts to the cloud from local Linux systems) which require preserving
> more timestamps, ACLs and metadata, and preserving them efficiently.

I suppose it would be useful to think a little more about cross-device
fs copies considering that the "devices" can be VM block devs backed by
files on a filesystem that supports reflink.  I have no idea how you
manage that sanely though.

--D

> Let's continue our discussions from last year, and see how we can move
> forward on improving the performance and function of Linux fs
> (including the VFS and user space tools) for various backup, restore
> and copy scenarios operations.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Enhancing Linux Copy Performance and Function and improving backup scenarios
  2020-01-30  1:52 ` Darrick J. Wong
@ 2020-02-01 19:54   ` Steve French
  2020-02-01 23:16     ` Andreas Dilger
  0 siblings, 1 reply; 4+ messages in thread
From: Steve French @ 2020-02-01 19:54 UTC (permalink / raw)
  To: Darrick J. Wong; +Cc: linux-fsdevel, CIFS, samba-technical, lsf-pc

On Wed, Jan 29, 2020 at 7:54 PM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>
> On Wed, Jan 22, 2020 at 05:13:53PM -0600, Steve French wrote:
> > As discussed last year:
> >
> > Current Linux copy tools have various problems compared to other
> > platforms - small I/O sizes (and most don't allow it to be
> > configured), lack of parallel I/O for multi-file copies, inability to
> > reduce metadata updates by setting file size first, lack of cross
>
> ...and yet weirdly we tell everyone on xfs not to do that or to use
> fallocate, so that delayed speculative allocation can do its thing.
> We also tell them not to create deep directory trees because xfs isn't
> ext4.

Delayed speculative allocation may help xfs but changing file size
thousands of times for network and cluster fs for a single file copy
can be a disaster for other file systems (due to the excessive cost
it adds to metadata sync time) - so there are file systems where
setting the file size first can help

> >  And copy tools rely less on
> > the kernel file system (vs. code in the user space tool) in Linux than
> > would be expected, in order to determine which optimizations to use.
>
> What kernel interfaces would we expect userspace to use to figure out
> the confusing mess of optimizations? :)

copy_file_range and clone_file_range are a good start ... few tools
use them ...

> There's a whole bunch of xfs ioctls like dioinfo and the like that we
> ought to push to statx too.  Is that an example of what you mean?

That is a good example.   And then getting tools to use these,
even if there are some file system dependent cases.

>
> > But some progress has been made since last year's summit, with new
> > copy tools being released and improvements to some of the kernel file
> > systems, and also some additional feedback on lwn and on the mailing
> > lists.  In addition these discussions have prompted additional
> > feedback on how to improve file backup/restore scenarios (e.g. to
> > mounts to the cloud from local Linux systems) which require preserving
> > more timestamps, ACLs and metadata, and preserving them efficiently.
>
> I suppose it would be useful to think a little more about cross-device
> fs copies considering that the "devices" can be VM block devs backed by
> files on a filesystem that supports reflink.  I have no idea how you
> manage that sanely though.

I trust XFS and BTRFS and SMB3 and cluster fs etc. to solve this better
than the block level (better locking, leases/delegation, state management, etc.)
though.

-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [LSF/MM/BPF TOPIC] Enhancing Linux Copy Performance and Function and improving backup scenarios
  2020-02-01 19:54   ` Steve French
@ 2020-02-01 23:16     ` Andreas Dilger
  0 siblings, 0 replies; 4+ messages in thread
From: Andreas Dilger @ 2020-02-01 23:16 UTC (permalink / raw)
  To: Steve French
  Cc: Darrick J. Wong, linux-fsdevel, CIFS, samba-technical, lsf-pc

[-- Attachment #1: Type: text/plain, Size: 6539 bytes --]

On Feb 1, 2020, at 12:54 PM, Steve French <smfrench@gmail.com> wrote:
> 
> On Wed, Jan 29, 2020 at 7:54 PM Darrick J. Wong <darrick.wong@oracle.com> wrote:
>> 
>> On Wed, Jan 22, 2020 at 05:13:53PM -0600, Steve French wrote:
>>> As discussed last year:
>>> 
>>> Current Linux copy tools have various problems compared to other
>>> platforms - small I/O sizes (and most don't allow it to be
>>> configured), lack of parallel I/O for multi-file copies, inability to
>>> reduce metadata updates by setting file size first, lack of cross
>> 
>> ...and yet weirdly we tell everyone on xfs not to do that or to use
>> fallocate, so that delayed speculative allocation can do its thing.
>> We also tell them not to create deep directory trees because xfs isn't
>> ext4.
> 
> Delayed speculative allocation may help xfs but changing file size
> thousands of times for network and cluster fs for a single file copy
> can be a disaster for other file systems (due to the excessive cost
> it adds to metadata sync time) - so there are file systems where
> setting the file size first can help

Sometimes I think it is worthwhile to bite the bullet and just submit
patches to the important upstream tools to make them work well.  I've
sone that in the past for cp, tar, rsync, ls, etc. so that they work
better.  If you've ever straced those tools, you will see they do a
lot of needless filesystem operations (repeated stat() in particular)
that could be optimized - no syscall is better than a fast syscall.

For cp it was changed to not allocate the st_blksize buffer on the
stack, which choked when Lustre reported st_blksize=8MB.  I'm starting
to think that it makes sense for all filesystems to use multi-MB buffers
when reading/copying file data, rather than 4KB or 32KB as it does today.
It might also be good for cp to use O_DIRECT for large file copies rather
than buffered IO to avoid polluting the cache?  Having it use AIO/DIO
would likely be a huge improvement as well.

That probably holds true for many other tools that still use st_blksize.
Maybe filesystems like ext4/xfs/btrfs should start reporting a larger
st_blksize as well?

As for parallel file copying, we've been working on MPIFileUtils, which
has parallel tree/file operations (also multi-node), but has the drawback
that it depends on MPI for remote thread startup, and isn't for everyone.
It should be possible to change it to run in parallel on a single node if
MPI wasn't installed, which would make the tools more generally usable.

>>> And copy tools rely less on
>>> the kernel file system (vs. code in the user space tool) in Linux than
>>> would be expected, in order to determine which optimizations to use.
>> 
>> What kernel interfaces would we expect userspace to use to figure out
>> the confusing mess of optimizations? :)
> 
> copy_file_range and clone_file_range are a good start ... few tools
> use them ...

One area that is really lacking a parallel interface is for directory
and namespace operations.  We still need to do serialized readdir()
and stat for operations in a directory.  There are now parallel VFS
lookups, but it would be useful to allow parallel create and unlink
for regular files, and possibly renames of files within a directory.

For ext4 at least, it would be possible to have parallel readdir()
by generating synthetic telldir() cookies to divide up the directory
into several chunks that can be read in parallel.  Something like:

     seek(dir_fd[0], 0, SEEK_END)
     pos_max = telldir(dir_fd[0])
     pos_inc = pos_max / num_threads
     for (i = 0; i < num_threads; i++)
         seekdir(dir_fd[i], i * pos_inc)

but I don't know if that would be portable to other filesystems.

XFS has a "bulkstat" interface which would likely be useful for
directory traversal tools.

>> There's a whole bunch of xfs ioctls like dioinfo and the like that we
>> ought to push to statx too.  Is that an example of what you mean?
> 
> That is a good example.   And then getting tools to use these,
> even if there are some file system dependent cases.

I've seen that copy to/from userspace is a bottleneck if the storage is
fast.  Since the cross-filesystem copy_file_range() patches have landed,
getting those into userspace tools would be a big performance win.

Dave talked a few times about adding better info than st_blksize for
different IO-related parameters (alignment, etc).  It was not included
in the initial statx() landing because of excessive bikeshedding, but
makes sense to re-examine what could be used there.  Since statx() is
flexible, applications could be patched immediately to check for the
new fields, without having to wait for a new syscall to propagate out.

That said, if data copies are done in the kernel, this may be moot for
some tools, but would still be useful for others.

>>> But some progress has been made since last year's summit, with new
>>> copy tools being released and improvements to some of the kernel file
>>> systems, and also some additional feedback on lwn and on the mailing
>>> lists.

I think if the tools are named anything other than cp, dd, tar, find
it is much less likely that anyone will use them, so focussing developer
efforts on the common GNU tools is more likely to be a win than making
another new copy tool that nobody will use, IMHO.

>>> In addition these discussions have prompted additional
>>> feedback on how to improve file backup/restore scenarios (e.g. to
>>> mounts to the cloud from local Linux systems) which require preserving
>>> more timestamps, ACLs and metadata, and preserving them efficiently.
>> 
>> I suppose it would be useful to think a little more about cross-device
>> fs copies considering that the "devices" can be VM block devs backed by
>> files on a filesystem that supports reflink.  I have no idea how you
>> manage that sanely though.
> 
> I trust XFS and BTRFS and SMB3 and cluster fs etc. to solve this better
> than the block level (better locking, leases/delegation, state management,
> etc.) though.

Getting RichACLs into the kernel would definitely help here.  Non-Linux
filesystems have some variant of NFSv4 ACLs, and having only POSIX ACLs
on Linux is a real hassle here.  Either the outside ACLs are stored as an
xattr blob, which leads to different semantics depending on the access
method (CIFS, NFS, etc) or they are shoe-horned into the POSIX ACL and
lose information.

Cheers, Andreas






[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2020-02-01 23:16 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-01-22 23:13 [LSF/MM/BPF TOPIC] Enhancing Linux Copy Performance and Function and improving backup scenarios Steve French
2020-01-30  1:52 ` Darrick J. Wong
2020-02-01 19:54   ` Steve French
2020-02-01 23:16     ` Andreas Dilger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).