All of lore.kernel.org
 help / color / mirror / Atom feed
* Best A->B large file copy performance
@ 2009-03-12 21:00 Jim Callahan
  2009-03-13  2:43 ` Greg Banks
  2009-03-13 19:16 ` Trond Myklebust
  0 siblings, 2 replies; 4+ messages in thread
From: Jim Callahan @ 2009-03-12 21:00 UTC (permalink / raw)
  To: linux-nfs

I'm trying to determine the most optimal way to have a single NFS client 
copy large numbers (100-1000) of fairly large (1-50M) files from one 
location on an file server to another location on the same file server.  
There seem to be several API layers which influence this:

1. Number of OS level processes performing the copy in parallel.
2. Record size used buy the C-library read()/write() calls from these 
processes.
3. NFS client rsize/wsize settings.
4. Ethernet MTU size.
5. Bandwidth of the ethernet network and switches.

So far we've played around with larger MTU and rsize/wsize settings 
without seeing a huge difference.  Since we have been using "cp" to 
perform (1), we've not tweaked the record size at all at this point.   
My suspicion is that we should be carefully coordinating the sizes 
specified in for the layers 2, 3 and 4.  Perhaps we should be using "dd" 
instead of "cp" so we can control the record size being used.   Since 
the number of permutations of these three settings are large I was 
hoping that I might get some advise from this list about a range of 
values we should be investigating and any unpleasant interactions 
between these levels of settings we should be aware of to narrow our 
search.  Also, if there are other major factors outside those listed I'd 
appreciate being pointed in the right direction.

---

While I'm on the subject, has there been any discussion about adding an 
NFS request that would allow copying files from one location to another 
on the same NFS server without requiring a round trip to a client?  Its 
not at all uncommon to need to move data around in this manner and it 
seems a huge waste of bandwidth to have to send all this data from the 
server to the client just to have the client send the data back 
unaltered to a different location.  Such a COPY request would be high 
level along the lines of RENAME and each server vendor could optimize 
this for their particular hardware architecture.  For our particular 
application, having such a request would make a huge difference in 
performance.

-- 
Jim Callahan - President - Temerity Software <www.temerity.us>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Best A->B large file copy performance
  2009-03-12 21:00 Best A->B large file copy performance Jim Callahan
@ 2009-03-13  2:43 ` Greg Banks
  2009-03-13 19:16 ` Trond Myklebust
  1 sibling, 0 replies; 4+ messages in thread
From: Greg Banks @ 2009-03-13  2:43 UTC (permalink / raw)
  To: Jim Callahan; +Cc: linux-nfs

Jim Callahan wrote:
> I'm trying to determine the most optimal way to have a single NFS
> client copy large numbers (100-1000) of fairly large (1-50M) files [...]
I'd like to propose a new rule of thumb:  to be considered "fairly
large", a file should be larger than the capacity of a USB key which
could be comfortably swallowed.

> [...] Since the number of permutations of these three settings are
> large I was hoping that I might get some advise from this list about a
> range of values we should be investigating and any unpleasant
> interactions between these levels of settings we should be aware of to
> narrow our search.  Also, if there are other major factors outside
> those listed I'd appreciate being pointed in the right direction.
Try

http://mirror.linux.org.au/pub/linux.conf.au/2008/slides/130-lca2008-nfs-tuning-secrets-d7.odp

-- 
Greg Banks, P.Engineer, SGI Australian Software Group.
the brightly coloured sporks of revolution.
I don't speak for SGI.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Best A->B large file copy performance
  2009-03-12 21:00 Best A->B large file copy performance Jim Callahan
  2009-03-13  2:43 ` Greg Banks
@ 2009-03-13 19:16 ` Trond Myklebust
  2009-03-13 21:40   ` Jim Callahan
  1 sibling, 1 reply; 4+ messages in thread
From: Trond Myklebust @ 2009-03-13 19:16 UTC (permalink / raw)
  To: Jim Callahan; +Cc: linux-nfs

On Thu, 2009-03-12 at 17:00 -0400, Jim Callahan wrote:
> I'm trying to determine the most optimal way to have a single NFS client 
> copy large numbers (100-1000) of fairly large (1-50M) files from one 
> location on an file server to another location on the same file server.  
> There seem to be several API layers which influence this:
> 
> 1. Number of OS level processes performing the copy in parallel.
> 2. Record size used buy the C-library read()/write() calls from these 
> processes.
> 3. NFS client rsize/wsize settings.
> 4. Ethernet MTU size.
> 5. Bandwidth of the ethernet network and switches.
> 
> So far we've played around with larger MTU and rsize/wsize settings 
> without seeing a huge difference.  Since we have been using "cp" to 
> perform (1), we've not tweaked the record size at all at this point.   
> My suspicion is that we should be carefully coordinating the sizes 
> specified in for the layers 2, 3 and 4.  Perhaps we should be using "dd" 
> instead of "cp" so we can control the record size being used.   Since 
> the number of permutations of these three settings are large I was 
> hoping that I might get some advise from this list about a range of 
> values we should be investigating and any unpleasant interactions 
> between these levels of settings we should be aware of to narrow our 
> search.  Also, if there are other major factors outside those listed I'd 
> appreciate being pointed in the right direction.

MTU, and rsize/wsize settings shouldn't matter much unless you're using
a UDP connection. I'd recommend just using the default r/wsize
negotiated by the client and server, and then whatever MTU is most
convenient for the other applications you may have.

Bandwidth and switch quality do matter (a lot). Particularly so if you
have many clients...

If you're just copying and not interested in using the file or its
contents afterwards, then you might consider using direct i/o instead of
ordinary cached i/o.

> While I'm on the subject, has there been any discussion about adding an 
> NFS request that would allow copying files from one location to another 
> on the same NFS server without requiring a round trip to a client?  Its 
> not at all uncommon to need to move data around in this manner and it 
> seems a huge waste of bandwidth to have to send all this data from the 
> server to the client just to have the client send the data back 
> unaltered to a different location.  Such a COPY request would be high 
> level along the lines of RENAME and each server vendor could optimize 
> this for their particular hardware architecture.  For our particular 
> application, having such a request would make a huge difference in 
> performance.

I don't think anyone has talked about a server-to-server protocol, but I
believe there will be a proposal for file copy at the coming IETF
meeting. If you want server-to-server, then now is the time to speak up
and make the case. You'd probably want to start a thread on
nfsv4@ietf.org...

Cheers
  Trond


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Best A->B large file copy performance
  2009-03-13 19:16 ` Trond Myklebust
@ 2009-03-13 21:40   ` Jim Callahan
  0 siblings, 0 replies; 4+ messages in thread
From: Jim Callahan @ 2009-03-13 21:40 UTC (permalink / raw)
  To: Trond Myklebust; +Cc: linux-nfs

Trond Myklebust wrote:
> On Thu, 2009-03-12 at 17:00 -0400, Jim Callahan wrote:
>> While I'm on the subject, has there been any discussion about adding an 
>> NFS request that would allow copying files from one location to another 
>> on the same NFS server without requiring a round trip to a client?  Its 
>> not at all uncommon to need to move data around in this manner and it 
>> seems a huge waste of bandwidth to have to send all this data from the 
>> server to the client just to have the client send the data back 
>> unaltered to a different location.  Such a COPY request would be high 
>> level along the lines of RENAME and each server vendor could optimize 
>> this for their particular hardware architecture.  For our particular 
>> application, having such a request would make a huge difference in 
>> performance.
>>     
>
> I don't think anyone has talked about a server-to-server protocol, but I
> believe there will be a proposal for file copy at the coming IETF
> meeting. If you want server-to-server, then now is the time to speak up
> and make the case. You'd probably want to start a thread on
> nfsv4@ietf.org...
>   
Thanks for the responses Trond.  I wasn't actually suggesting a 
server-to-server protocol, but rather an additional client-server 
protocol request to tell the server to copy files internally.   The idea 
being that the typical usage of "cp" via NFS is wasting bandwidth 
transmitting the contents of the source file from the server to client 
only to have the client send it back unaltered.   If this was instead 
performed internally on the server itself, it seems to me that it might 
be dramatically faster and not waste valuable network bandwidth.  The 
calling convention would be identical to the current RENAME request.  
The implementation would of course be different in this new COPY request 
would create a new i-node for the target and then copy all data from he 
source to target file.   A vendor could choose the most efficient manner 
for performing this based on their hardware/software architecture.

Thanks for the pointer to nfsv4@ietf.org.  I'll bring this up there as 
well...

In case you are wondering, we make an application which includes version 
control features somewhat along the lines of CVS or SVN.  In other 
words, there is a central repository for checked-in versions and 
independent scratch areas for users who can have their own copies of 
files.  So both check-in and check-out operations frequently involve 
performing a "cp" from file A to B located on the same NFS server.

-- 
Jim Callahan - President - Temerity Software <www.temerity.us>

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2009-03-13 22:40 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-03-12 21:00 Best A->B large file copy performance Jim Callahan
2009-03-13  2:43 ` Greg Banks
2009-03-13 19:16 ` Trond Myklebust
2009-03-13 21:40   ` Jim Callahan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.