linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: OT: why no file copy() libc/syscall ??
@ 2003-11-10 12:09 Bradley Chapman
  2003-11-10 18:47 ` Tomas Konir
  2003-11-10 22:44 ` Derek Foreman
  0 siblings, 2 replies; 77+ messages in thread
From: Bradley Chapman @ 2003-11-10 12:09 UTC (permalink / raw)
  To: davide.rossetti; +Cc: linux-kernel

Mr. Rossetti,

It is horribly RTFM.

man 2 sendfile is what you're after.

Brad

=====
Brad Chapman

Permanent e-mail: kakadu_croc@yahoo.com

__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-10 12:09 OT: why no file copy() libc/syscall ?? Bradley Chapman
@ 2003-11-10 18:47 ` Tomas Konir
  2003-11-10 22:44 ` Derek Foreman
  1 sibling, 0 replies; 77+ messages in thread
From: Tomas Konir @ 2003-11-10 18:47 UTC (permalink / raw)
  Cc: linux-kernel

On Mon, 10 Nov 2003, Bradley Chapman wrote:

> Mr. Rossetti,
> 
> It is horribly RTFM.
> 
> man 2 sendfile is what you're after.

mhm
sendfile() can copy extended attributes and ACL ?

(i'm not think, that copy is the right candidate to syscall)

	MOJE

-- 
Konir Tomas
Czech Republic
Brno
ICQ 25849167


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-10 12:09 OT: why no file copy() libc/syscall ?? Bradley Chapman
  2003-11-10 18:47 ` Tomas Konir
@ 2003-11-10 22:44 ` Derek Foreman
  1 sibling, 0 replies; 77+ messages in thread
From: Derek Foreman @ 2003-11-10 22:44 UTC (permalink / raw)
  To: Bradley Chapman; +Cc: davide.rossetti, linux-kernel

On Mon, 10 Nov 2003, Bradley Chapman wrote:

> Mr. Rossetti,
>
> It is horribly RTFM.
>
> man 2 sendfile is what you're after.

I'm afraid it's not horribly RTFM at all.

sendfile won't do what he needs in 2.6.x.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-27 10:58                             ` David Lang
@ 2003-12-01 16:20                               ` Jesse Pollard
  0 siblings, 0 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-12-01 16:20 UTC (permalink / raw)
  To: David Lang, =?CP 1252?q?J=F6rn=20Engel?=
  Cc: Nick Piggin,
	Robert White <rwhite@casabyte.com> "'Florian
	Weimer'", Valdis.Kletnieks, 'Daniel Gryniewicz',
	'linux-kernel mailing list'

On Thursday 27 November 2003 04:58, David Lang wrote:
[snip]
> actually thinking about it a bit more, did I make a stupid mistake and
> think that the FD points at the beginning of the file when it really
> points at the inode? if it points at the inode then the problems I was
> refering to don't exist.

Actually, it points to inode and offset in the file. The advantage this has
is in the case of appending to a file... open the destination file, seek to
the end, then copy. It also allows seeking some offset in the input file,
then copying the rest of the file.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-27 10:02                           ` Jörn Engel
@ 2003-11-27 10:58                             ` David Lang
  2003-12-01 16:20                               ` Jesse Pollard
  0 siblings, 1 reply; 77+ messages in thread
From: David Lang @ 2003-11-27 10:58 UTC (permalink / raw)
  To: Jörn Engel
  Cc: Nick Piggin, Robert White, 'Jesse Pollard',
	'Florian Weimer',
	Valdis.Kletnieks, 'Daniel Gryniewicz',
	'linux-kernel mailing list'

On Thu, 27 Nov 2003, Jörn Engel wrote:

> On Thu, 27 November 2003 01:50:46 -0800, David Lang wrote:
> > >
> > > I don't think it should do any linking / unlinking it should just work
> > > with file descriptors. Concurrent writes to a file don't have many
> > > guarantees. sys_copy shouldn't have to be any stronger (read weaker).
> >
> > I'm thinking that it may actually be easier to do this via file paths
> > instead of file descripters. with file paths something like COW or
> > zero-copy copy can be done trivially (and the kernel knows the user
> > credentials of the program issuing the command and can pass them on to the
> > filesystem to see if it's allowed). I don't see how this can be done with
> > file descripters (if all you have is a file descripter you can truncate
> > and write a file, but you don't know all the links to that file so you
> > can't reposition that first inode for example).
>
> And how is userspace supposed to protect itself from race conditions?
> Just compare:
>
> fd1 = open(path1);
> if (stat(fd1) looks fishy)
> 	abort();
> fd2 = open(path2);
> if (stat(fd2) looks fishy)
> 	abort();
> copy(fd1, fd2);
>
> and:
>
> fd1 = open(path1);
> if (stat(fd1) looks fishy)
> 	abort();
> fd2 = open(path2);
> if (stat(fd2) looks fishy)
> 	abort();
> copy(path1, path2);
>
> Jörn
>

Ok, good point. my first reaction is to make copy refuse to function
unless the target doesn't exist (protect the output), but that doesn't
solve the problem of protecting the input or preventing someone else from
tampering with the output (unless you have copy return the FD to use to
access the output)

actually thinking about it a bit more, did I make a stupid mistake and
think that the FD points at the beginning of the file when it really
points at the inode? if it points at the inode then the problems I was
refering to don't exist.

David Lang

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-27  9:50                         ` David Lang
@ 2003-11-27 10:02                           ` Jörn Engel
  2003-11-27 10:58                             ` David Lang
  0 siblings, 1 reply; 77+ messages in thread
From: Jörn Engel @ 2003-11-27 10:02 UTC (permalink / raw)
  To: David Lang
  Cc: Nick Piggin, Robert White, 'Jesse Pollard',
	'Florian Weimer',
	Valdis.Kletnieks, 'Daniel Gryniewicz',
	'linux-kernel mailing list'

On Thu, 27 November 2003 01:50:46 -0800, David Lang wrote:
> >
> > I don't think it should do any linking / unlinking it should just work
> > with file descriptors. Concurrent writes to a file don't have many
> > guarantees. sys_copy shouldn't have to be any stronger (read weaker).
> 
> I'm thinking that it may actually be easier to do this via file paths
> instead of file descripters. with file paths something like COW or
> zero-copy copy can be done trivially (and the kernel knows the user
> credentials of the program issuing the command and can pass them on to the
> filesystem to see if it's allowed). I don't see how this can be done with
> file descripters (if all you have is a file descripter you can truncate
> and write a file, but you don't know all the links to that file so you
> can't reposition that first inode for example).

And how is userspace supposed to protect itself from race conditions?
Just compare:

fd1 = open(path1);
if (stat(fd1) looks fishy)
	abort();
fd2 = open(path2);
if (stat(fd2) looks fishy)
	abort();
copy(fd1, fd2);

and:

fd1 = open(path1);
if (stat(fd1) looks fishy)
	abort();
fd2 = open(path2);
if (stat(fd2) looks fishy)
	abort();
copy(path1, path2);

Jörn

-- 
Don't worry about people stealing your ideas. If your ideas are any good,
you'll have to ram them down people's throats.
-- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by
   Raph Levien, 1979

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-27  8:56                       ` Nick Piggin
@ 2003-11-27  9:50                         ` David Lang
  2003-11-27 10:02                           ` Jörn Engel
  0 siblings, 1 reply; 77+ messages in thread
From: David Lang @ 2003-11-27  9:50 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Robert White, 'Jesse Pollard', 'Florian Weimer',
	Valdis.Kletnieks, 'Daniel Gryniewicz',
	'linux-kernel mailing list'

On Thu, 27 Nov 2003, Nick Piggin wrote:
> >
> >if the destination exists it would need to be unlinked (overwrite doesn't
> >make sense in the COW context)
> >
>
> Well it would be implementation specific. Presumably it should keep
> the semantics of an overwrite.
>
> >
> >I don't understand the in-kernel page locking issues refered to above
> >
> >the concurrancy issues are a good question, but I would suggest that the
> >syscall fully setup the copy and then create the link to it. this would
> >make the final creation an atomic operation (or as close to it as a
> >particular filesystem allows) and if you have multiple writers doing a
> >copy to the same destination then the last one wins, the earlier copies
> >get unlinked and deleted
> >
>
> I don't think it should do any linking / unlinking it should just work
> with file descriptors. Concurrent writes to a file don't have many
> guarantees. sys_copy shouldn't have to be any stronger (read weaker).

I'm thinking that it may actually be easier to do this via file paths
instead of file descripters. with file paths something like COW or
zero-copy copy can be done trivially (and the kernel knows the user
credentials of the program issuing the command and can pass them on to the
filesystem to see if it's allowed). I don't see how this can be done with
file descripters (if all you have is a file descripter you can truncate
and write a file, but you don't know all the links to that file so you
can't reposition that first inode for example).

> >
> >I definantly don't see it being worth it to make a syscall to just
> >implement the read/write loop, but a copy syscall designed from the outset
> >to do a COW copy that falls back to a read/write loop for filesystems that
> >don't do COW has some real benifits
> >
>
> No I just mean the semantics.
>
>
>

-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-27  7:29                   ` Nick Piggin
@ 2003-11-27  9:15                     ` David Lang
  2003-11-27  8:56                       ` Nick Piggin
  0 siblings, 1 reply; 77+ messages in thread
From: David Lang @ 2003-11-27  9:15 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Robert White, 'Jesse Pollard', 'Florian Weimer',
	Valdis.Kletnieks, 'Daniel Gryniewicz',
	'linux-kernel mailing list'

On Thu, 27 Nov 2003, Nick Piggin wrote:

> Robert White wrote:
>
> >(Among the other N objections, add things like the lack of any sort of
> >control or option parameters)
> >...
> >N += 1: Sparse Copying (e.g. seeking past blocks of zeros)
> >N += 1: Unlink or overwrite or what?
> >N += 1: In-Kernel locking and resolution for pages that are mandatory
> >lock(ed)
> >N += 1: No fine-grained control for concurrency issues (multiple writers)
> >
> >Start with doing a cp --help and move on from there for an unbounded list of
> >issues that sys_copy(int fd1, int fd2) does not even come close to
> >addressing.
> >
> >
>
> To be fair, sys_copy is never intended to replace cp or try to be
> very smart. I don't think it is semantically supposed to do much more
> than replace a read, write loop (of course, the syscall also has an
> offset and count).
>
> sparse copying would be implementation dependant. If cp wanted to do
> something special it would not use one big copy call. I think unlink
> / overwrite is irrelevant if its semantically a read write loop.
>

actually if this syscall is allowed to do a COW at the filesystem level
(which I think is one of the better reasons for implementing this) then
sparse files would produce sparse copies.

if the destination exists it would need to be unlinked (overwrite doesn't
make sense in the COW context)

I don't understand the in-kernel page locking issues refered to above

the concurrancy issues are a good question, but I would suggest that the
syscall fully setup the copy and then create the link to it. this would
make the final creation an atomic operation (or as close to it as a
particular filesystem allows) and if you have multiple writers doing a
copy to the same destination then the last one wins, the earlier copies
get unlinked and deleted

I definantly don't see it being worth it to make a syscall to just
implement the read/write loop, but a copy syscall designed from the outset
to do a COW copy that falls back to a read/write loop for filesystems that
don't do COW has some real benifits

David Lang



-- 
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-27  9:15                     ` David Lang
@ 2003-11-27  8:56                       ` Nick Piggin
  2003-11-27  9:50                         ` David Lang
  0 siblings, 1 reply; 77+ messages in thread
From: Nick Piggin @ 2003-11-27  8:56 UTC (permalink / raw)
  To: David Lang
  Cc: Robert White, 'Jesse Pollard', 'Florian Weimer',
	Valdis.Kletnieks, 'Daniel Gryniewicz',
	'linux-kernel mailing list'



David Lang wrote:

>On Thu, 27 Nov 2003, Nick Piggin wrote:
>
>
>>Robert White wrote:
>>
>>
>>>(Among the other N objections, add things like the lack of any sort of
>>>control or option parameters)
>>>...
>>>N += 1: Sparse Copying (e.g. seeking past blocks of zeros)
>>>N += 1: Unlink or overwrite or what?
>>>N += 1: In-Kernel locking and resolution for pages that are mandatory
>>>lock(ed)
>>>N += 1: No fine-grained control for concurrency issues (multiple writers)
>>>
>>>Start with doing a cp --help and move on from there for an unbounded list of
>>>issues that sys_copy(int fd1, int fd2) does not even come close to
>>>addressing.
>>>
>>>
>>>
>>To be fair, sys_copy is never intended to replace cp or try to be
>>very smart. I don't think it is semantically supposed to do much more
>>than replace a read, write loop (of course, the syscall also has an
>>offset and count).
>>
>>sparse copying would be implementation dependant. If cp wanted to do
>>something special it would not use one big copy call. I think unlink
>>/ overwrite is irrelevant if its semantically a read write loop.
>>
>>
>
>actually if this syscall is allowed to do a COW at the filesystem level
>(which I think is one of the better reasons for implementing this) then
>sparse files would produce sparse copies.
>

Sure, I just mean the semantics should be equivalent to a read write
loop. Another example is zero copy copy for a remote fs that supports
it.

>
>if the destination exists it would need to be unlinked (overwrite doesn't
>make sense in the COW context)
>

Well it would be implementation specific. Presumably it should keep
the semantics of an overwrite.

>
>I don't understand the in-kernel page locking issues refered to above
>
>the concurrancy issues are a good question, but I would suggest that the
>syscall fully setup the copy and then create the link to it. this would
>make the final creation an atomic operation (or as close to it as a
>particular filesystem allows) and if you have multiple writers doing a
>copy to the same destination then the last one wins, the earlier copies
>get unlinked and deleted
>

I don't think it should do any linking / unlinking it should just work
with file descriptors. Concurrent writes to a file don't have many
guarantees. sys_copy shouldn't have to be any stronger (read weaker).

>
>I definantly don't see it being worth it to make a syscall to just
>implement the read/write loop, but a copy syscall designed from the outset
>to do a COW copy that falls back to a read/write loop for filesystems that
>don't do COW has some real benifits
>

No I just mean the semantics.




^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-27  2:40                 ` Robert White
@ 2003-11-27  7:29                   ` Nick Piggin
  2003-11-27  9:15                     ` David Lang
  0 siblings, 1 reply; 77+ messages in thread
From: Nick Piggin @ 2003-11-27  7:29 UTC (permalink / raw)
  To: Robert White
  Cc: 'Jesse Pollard', 'Florian Weimer',
	Valdis.Kletnieks, 'Daniel Gryniewicz',
	'linux-kernel mailing list'



Robert White wrote:

>(Among the other N objections, add things like the lack of any sort of
>control or option parameters)
>...
>N += 1: Sparse Copying (e.g. seeking past blocks of zeros)
>N += 1: Unlink or overwrite or what?
>N += 1: In-Kernel locking and resolution for pages that are mandatory
>lock(ed)
>N += 1: No fine-grained control for concurrency issues (multiple writers)
>
>Start with doing a cp --help and move on from there for an unbounded list of
>issues that sys_copy(int fd1, int fd2) does not even come close to
>addressing.
>
>

To be fair, sys_copy is never intended to replace cp or try to be
very smart. I don't think it is semantically supposed to do much more
than replace a read, write loop (of course, the syscall also has an
offset and count).

sparse copying would be implementation dependant. If cp wanted to do
something special it would not use one big copy call. I think unlink
/ overwrite is irrelevant if its semantically a read write loop.



^ permalink raw reply	[flat|nested] 77+ messages in thread

* RE: OT: why no file copy() libc/syscall ??
  2003-11-20 19:08               ` Jesse Pollard
                                   ` (3 preceding siblings ...)
  2003-11-20 22:31                 ` Xavier Bestel
@ 2003-11-27  2:40                 ` Robert White
  2003-11-27  7:29                   ` Nick Piggin
  4 siblings, 1 reply; 77+ messages in thread
From: Robert White @ 2003-11-27  2:40 UTC (permalink / raw)
  To: 'Jesse Pollard', 'Florian Weimer'
  Cc: Valdis.Kletnieks, 'Daniel Gryniewicz',
	'linux-kernel mailing list'

(Among the other N objections, add things like the lack of any sort of
control or option parameters)
...
N += 1: Sparse Copying (e.g. seeking past blocks of zeros)
N += 1: Unlink or overwrite or what?
N += 1: In-Kernel locking and resolution for pages that are mandatory
lock(ed)
N += 1: No fine-grained control for concurrency issues (multiple writers)

Start with doing a cp --help and move on from there for an unbounded list of
issues that sys_copy(int fd1, int fd2) does not even come close to
addressing.




^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-22 19:50                           ` Jamie Lokier
@ 2003-11-22 23:07                             ` Andreas Schwab
  0 siblings, 0 replies; 77+ messages in thread
From: Andreas Schwab @ 2003-11-22 23:07 UTC (permalink / raw)
  To: Jamie Lokier
  Cc: Pavel Machek, Timothy Miller, Andreas Dilger, Justin Cormack,
	Jesse Pollard, linux-kernel mailing list

Jamie Lokier <jamie@shareable.org> writes:

> Pavel Machek wrote:
>> > It is, though.  If you run out of space copying a file, you know it when 
>> > you're copying.  Applications don't usually expect to get out-of-space 
>> > errors while overwriting something in the middle of a file.
>> 
>> Same can happen on compressed filesystem...
>
> Or a filesystem with snapshots, e.g. using LVM.

Or writing to a sparse file.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-22 14:50                         ` Pavel Machek
@ 2003-11-22 19:50                           ` Jamie Lokier
  2003-11-22 23:07                             ` Andreas Schwab
  0 siblings, 1 reply; 77+ messages in thread
From: Jamie Lokier @ 2003-11-22 19:50 UTC (permalink / raw)
  To: Pavel Machek
  Cc: Timothy Miller, Andreas Dilger, Justin Cormack, Jesse Pollard,
	linux-kernel mailing list

Pavel Machek wrote:
> > It is, though.  If you run out of space copying a file, you know it when 
> > you're copying.  Applications don't usually expect to get out-of-space 
> > errors while overwriting something in the middle of a file.
> 
> Same can happen on compressed filesystem...

Or a filesystem with snapshots, e.g. using LVM.

-- Jamie

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 21:30                       ` Timothy Miller
  2003-11-20 21:49                         ` Maciej Zenczykowski
  2003-11-20 21:58                         ` Hua Zhong
@ 2003-11-22 14:50                         ` Pavel Machek
  2003-11-22 19:50                           ` Jamie Lokier
  2 siblings, 1 reply; 77+ messages in thread
From: Pavel Machek @ 2003-11-22 14:50 UTC (permalink / raw)
  To: Timothy Miller
  Cc: Andreas Dilger, Justin Cormack, Jesse Pollard, linux-kernel mailing list

Hi!

> >>This could be a problem if COW causes you to run out of space when 
> >>writing to the file.
> >
> >
> >Not much different than running out of space copying a file.
> 
> It is, though.  If you run out of space copying a file, you know it when 
> you're copying.  Applications don't usually expect to get out-of-space 
> errors while overwriting something in the middle of a file.

Same can happen on compressed filesystem...
								Pavel
-- 
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 21:48                 ` Maciej Zenczykowski
@ 2003-11-21 16:34                   ` Jesse Pollard
  0 siblings, 0 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-11-21 16:34 UTC (permalink / raw)
  To: Maciej Zenczykowski
  Cc: Florian Weimer, Valdis.Kletnieks, Daniel Gryniewicz,
	linux-kernel mailing list

On Thursday 20 November 2003 15:48, Maciej Zenczykowski wrote:
> Assume 'fast'copy(int fd_in, int fd_out) where fd_in and fd_out reference
> files.  fd_in is opened for read and fd_out is opened for write.  Ignore
> filepos locations in both fd's.  fd_out must reference an empty/truncated
> file (if not then fail).  Usually you'd call copy on fd_out straight out
> of a creat call (and thus this would be a non-issue).
>
> > 1. what happens if the copy is aborted?
>
> I'd say the copy operation should be 'atomic', either it succeeds (full
> copy) or fails (no changes to filesystems except for the truncate).  An
> abort would obviously usually result in a failure (thus a possible revert,
> which is rather easy since it's likely just an truncate of whatever has
> already been copied) or if we've just finished and than a successful
> result.

Really? what happens if the abort is local to the system making the request?
what happens if the abort is on the remote server?

> > 2. what happens if the network drops while the remote server continues?
>
> If the remote server has enough data to perform the operation then it does
> complete it otherwise there ain't enough info anyway (afterall the
> entire idea of this is to fit the entire copy into a single copy
> instruction thus a single packet/command whatever, no extra data is
> passed)...

And back to aborts?

> > 3. what about buffer synchronization?
>
> If this is happening remotely then I don't see what requires sync???

Multiple hosts remote to the server that have afile open. Though this
already happens with NFS.

> > 4. what errors should be reported ?
>
> This is tougher:
>
> Tests first performed locally (if they can be) than request forwarded to
> remote end and tests performed remotely - return either error or
> ACCEPTED, at which point local end tells it to go ahead, (at this
> point the operation is effectively performed (unless an abort is
> signalled) regardless of network connectivity).  On completion remote end
> will return info on completion or error code.
>
> a) operation not supported by kernel :) - ENOSYS
> b) fd_in/fd_out invalid file descriptor - EBADF
> c) fd_in/fd_out is directory - EISDIR
> d) can't read/write from/to fd_in/fd_out - EINVAL
> e) an error if fd_out ain't empty - ENOTEMPTY
> f) operation not supported by this combination of devices - EOPNOTSUPP
>    [so you need to do it via usual loop]
> g) input file bigger then output file can be - EFBIG
>    [ie copy of 5GB file from remote filesystem which supports it to
>    another filesystem on the same server with 2GB max file size]
> h) low-level IO error - EIO - serious problems (i.e. HDD read/write error)
> i) out of disk space during copy - ENOSPC
> j) out of memory during copy - ENOMEM (unlikely, needed?)
> k) lost network connection - ENETRESET (unknown whether succeeded)
>    or ENOLINK ?
> l) operation was aborted - EINTR [probably should be some other error
>    code, not sure]
> m) success - either return 0 or the number of bytes copied
>   [probably best to return the # of bytes copied, even if (for now?) we
>    only accept full copies]
>
> Did I miss anything?  What about non-blocking call? Basically as above but
> return INPROGRESS as soon as we tell remote end to go ahead... or perhaps
> don't support non-blocking call?
>
> > 5. what happens when the syscall is interupted? Especially if the remote
> >    copy may take a while (I've seen some require an hour or more - worst
> >    case: days due to a media error (completed after the disk was
> > replaced)).
>
> Well, if it's interrupted by a SIGINT or the like then return EINTR and
> the copy was not performed (ie we backed the copy out, unless net failure
> detected during abort then ENOLINK/ENETRESET).

Ooop - the copy is being done on the remote server.

> If it's a more normal signal than it should behave like any normal kernel
> restartable syscall (i.e. via ERESTARTNOHAND or something like that).

Again, the copy may be being made on the remote server.

> > 6. what about a client opening the copy before it is finished copying?
>
> The file copy is atomic and thus the file doesn't per se exist until the
> copy operation completes (or the file exists with zero size and is locked
> and can't be opened).

It does under all other methods of copying.

> Perhaps in the future we could support partial copies and restarting an
> interrupted copy, but let's first agree (or not) on the above.
>
> I think a copy syscall would be very useful.  What I'd really like to see
> is some sort of block-hashed-space-compression with copy-on-write
> semantics file system for linux (for my 500 CD collection which probably
> has a 10-12 data duplicity factor).

It could be usefull. What you describe now is a migrating filesystem on a 
server. And note that your COW is going from two different filesystems (hmm
or maybe a custom union mount?)...

Which is where the migrating filesystem. The served filesystem should already
know how to transfer a file from the archive.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 19:44                 ` Justin Cormack
  2003-11-20 20:44                   ` Timothy Miller
@ 2003-11-21 16:24                   ` Jesse Pollard
  1 sibling, 0 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-11-21 16:24 UTC (permalink / raw)
  To: Justin Cormack; +Cc: linux-kernel mailing list

On Thursday 20 November 2003 13:44, Justin Cormack wrote:
> On Thu, 2003-11-20 at 19:08, Jesse Pollard wrote:
[snip]
>
> If you really want a filesystem that supports efficient copying you
> probably want it to have the equivalent of COW blocks, so that a copy
> just sets up a few pointers, and the copy only happens when the original
> or copied files are changed.

Ummmm... I REALLY don't like COW on a disk. Much too big a chance that the
filesystem will deadlock, and with no recovery method. (oversubscribed, then
crash, corrupting the homeblock, repair (committing journal?) requires
space... no space... therefore mostly dead. You'd have to be able to mount
without the journal or the homeblock, then delete something, then commit the
journal, dismount, recover the rest-- though this might be overboard, the
homebock might not even be damaged).

> But basically you wont get a syscall until you have a filesystem with
> semantics that only maps onto this sort of operation.

I belive NFSv3/4 has a file copy request included. And I understand that
the SAMBA server does too.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 22:31                 ` Xavier Bestel
@ 2003-11-20 22:44                   ` Andreas Dilger
  0 siblings, 0 replies; 77+ messages in thread
From: Andreas Dilger @ 2003-11-20 22:44 UTC (permalink / raw)
  To: Xavier Bestel
  Cc: Jesse Pollard, Florian Weimer, Valdis.Kletnieks,
	Daniel Gryniewicz, Linux Kernel Mailing List

On Nov 20, 2003  23:31 +0100, Xavier Bestel wrote:
> Le jeu 20/11/2003 à 20:08, Jesse Pollard a écrit :
> > 1. what happens if the copy is aborted?

Same as now with "cp" - partial copy.

> > 2. what happens if the network drops while the remote server continues?

Irrelevant, since you can't access the file at that point (i.e. if server
continues then great, but if it doesn't it's no different than the server
disconnecting/crashing in the middle of a regular copy.

> > 3. what about buffer synchronization?

Sync file locally before starting, and no buffers on client are created.
If you write to file while it is being copied, how is that different
than two writers for same file now (i.e. usually broken).  If the network
filesystem doesn't support locking, that's the filesystem's problem and
this API doesn't change it.

> > 4. what errors should be reported ?

Covered pretty well elsewhere.  Of course EINTR should be reserved for
"interrupted, please continue if you want" as opposed to a hard error.

> > 5. what happens when the syscall is interupted? Especially if the remote
> >    copy may take a while (I've seen some require an hour or more - worst
> >    case: days due to a media error (completed after the disk was replaced)).

Partial copy, no different than now.

> > 6. what about a client opening the copy before it is finished copying?

Reads partial file, no different than now.

> 7. How to report progress with your average file manager ?

Support signals and restart the copy where it left off.  Interrupting
once a second or whatever isn't onerous if needed and you can restart.
You could even support some sort of "SIGUSR1" like dd does to get status
back without actually killing things.  Alternately, just stat the target
file as it is being copied to watch progress.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 19:08               ` Jesse Pollard
                                   ` (2 preceding siblings ...)
  2003-11-20 21:48                 ` Maciej Zenczykowski
@ 2003-11-20 22:31                 ` Xavier Bestel
  2003-11-20 22:44                   ` Andreas Dilger
  2003-11-27  2:40                 ` Robert White
  4 siblings, 1 reply; 77+ messages in thread
From: Xavier Bestel @ 2003-11-20 22:31 UTC (permalink / raw)
  To: Jesse Pollard
  Cc: Florian Weimer, Valdis.Kletnieks, Daniel Gryniewicz,
	Linux Kernel Mailing List

Le jeu 20/11/2003 à 20:08, Jesse Pollard a écrit :

> 1. what happens if the copy is aborted?
> 2. what happens if the network drops while the remote server continues?
> 3. what about buffer synchronization?
> 4. what errors should be reported ?
> 5. what happens when the syscall is interupted? Especially if the remote
>    copy may take a while (I've seen some require an hour or more - worst
>    case: days due to a media error (completed after the disk was replaced)).
> 6. what about a client opening the copy before it is finished copying?

7. How to report progress with your average file manager ?


^ permalink raw reply	[flat|nested] 77+ messages in thread

* RE: OT: why no file copy() libc/syscall ??
  2003-11-20 21:30                       ` Timothy Miller
  2003-11-20 21:49                         ` Maciej Zenczykowski
@ 2003-11-20 21:58                         ` Hua Zhong
  2003-11-22 14:50                         ` Pavel Machek
  2 siblings, 0 replies; 77+ messages in thread
From: Hua Zhong @ 2003-11-20 21:58 UTC (permalink / raw)
  To: 'Timothy Miller', 'Andreas Dilger'
  Cc: 'Justin Cormack', 'Jesse Pollard',
	'linux-kernel mailing list'

> Andreas Dilger wrote:
> > On Nov 20, 2003  15:44 -0500, Timothy Miller wrote:
> > 
> >>This could be a problem if COW causes you to run out of space when 
> >>writing to the file.
> > 
> > 
> > Not much different than running out of space copying a file.
> 
> It is, though.  If you run out of space copying a file, you 
> know it when you're copying.  Applications don't usually expect to get

> out-of-space errors while overwriting something in the middle of a
file.

It could for journaling filesystem already.

It's not in any spec that writing to the middle of a file would not
cause ENOSPC, is it?

> In effect, your free space and your used space add up to greater than 
> the capacity of the disk.  An application that checks for free space 
> before doing something would be fooled into thinking there is 
> more free space than there really is.  How can an application find out

> in advance that a file that it's about to modify (without appending 
> anything to the end) is going to need more disk space?


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 21:49                         ` Maciej Zenczykowski
@ 2003-11-20 21:52                           ` Timothy Miller
  0 siblings, 0 replies; 77+ messages in thread
From: Timothy Miller @ 2003-11-20 21:52 UTC (permalink / raw)
  To: Maciej Zenczykowski
  Cc: Andreas Dilger, Justin Cormack, Jesse Pollard, linux-kernel mailing list



Maciej Zenczykowski wrote:
>>It is, though.  If you run out of space copying a file, you know it when 
>>you're copying.  Applications don't usually expect to get out-of-space 
>>errors while overwriting something in the middle of a file.
> 
> 
> What about sparse files?


Ah, good point.  Never mind.  :)



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 21:30                       ` Timothy Miller
@ 2003-11-20 21:49                         ` Maciej Zenczykowski
  2003-11-20 21:52                           ` Timothy Miller
  2003-11-20 21:58                         ` Hua Zhong
  2003-11-22 14:50                         ` Pavel Machek
  2 siblings, 1 reply; 77+ messages in thread
From: Maciej Zenczykowski @ 2003-11-20 21:49 UTC (permalink / raw)
  To: Timothy Miller
  Cc: Andreas Dilger, Justin Cormack, Jesse Pollard, linux-kernel mailing list

> It is, though.  If you run out of space copying a file, you know it when 
> you're copying.  Applications don't usually expect to get out-of-space 
> errors while overwriting something in the middle of a file.

What about sparse files?

> In effect, your free space and your used space add up to greater than 
> the capacity of the disk.  An application that checks for free space 
> before doing something would be fooled into thinking there is more free 
> space than there really is.  How can an application find out in advance 
> that a file that it's about to modify (without appending anything to the 
> end) is going to need more disk space?

I don't think it can do that already now with sparse files, can it?

Cheers,
MaZe.



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 19:08               ` Jesse Pollard
  2003-11-20 19:12                 ` Florian Weimer
  2003-11-20 19:44                 ` Justin Cormack
@ 2003-11-20 21:48                 ` Maciej Zenczykowski
  2003-11-21 16:34                   ` Jesse Pollard
  2003-11-20 22:31                 ` Xavier Bestel
  2003-11-27  2:40                 ` Robert White
  4 siblings, 1 reply; 77+ messages in thread
From: Maciej Zenczykowski @ 2003-11-20 21:48 UTC (permalink / raw)
  To: Jesse Pollard
  Cc: Florian Weimer, Valdis.Kletnieks, Daniel Gryniewicz,
	linux-kernel mailing list

Assume 'fast'copy(int fd_in, int fd_out) where fd_in and fd_out reference
files.  fd_in is opened for read and fd_out is opened for write.  Ignore
filepos locations in both fd's.  fd_out must reference an empty/truncated
file (if not then fail).  Usually you'd call copy on fd_out straight out
of a creat call (and thus this would be a non-issue).

> 1. what happens if the copy is aborted?

I'd say the copy operation should be 'atomic', either it succeeds (full
copy) or fails (no changes to filesystems except for the truncate).  An
abort would obviously usually result in a failure (thus a possible revert,
which is rather easy since it's likely just an truncate of whatever has
already been copied) or if we've just finished and than a successful 
result.

> 2. what happens if the network drops while the remote server continues?

If the remote server has enough data to perform the operation then it does 
complete it otherwise there ain't enough info anyway (afterall the 
entire idea of this is to fit the entire copy into a single copy 
instruction thus a single packet/command whatever, no extra data is 
passed)...

> 3. what about buffer synchronization?

If this is happening remotely then I don't see what requires sync???

> 4. what errors should be reported ?

This is tougher:

Tests first performed locally (if they can be) than request forwarded to 
remote end and tests performed remotely - return either error or 
ACCEPTED, at which point local end tells it to go ahead, (at this 
point the operation is effectively performed (unless an abort is 
signalled) regardless of network connectivity).  On completion remote end 
will return info on completion or error code.

a) operation not supported by kernel :) - ENOSYS
b) fd_in/fd_out invalid file descriptor - EBADF
c) fd_in/fd_out is directory - EISDIR
d) can't read/write from/to fd_in/fd_out - EINVAL
e) an error if fd_out ain't empty - ENOTEMPTY
f) operation not supported by this combination of devices - EOPNOTSUPP
   [so you need to do it via usual loop]
g) input file bigger then output file can be - EFBIG
   [ie copy of 5GB file from remote filesystem which supports it to
   another filesystem on the same server with 2GB max file size]
h) low-level IO error - EIO - serious problems (i.e. HDD read/write error)
i) out of disk space during copy - ENOSPC
j) out of memory during copy - ENOMEM (unlikely, needed?)
k) lost network connection - ENETRESET (unknown whether succeeded)
   or ENOLINK ?
l) operation was aborted - EINTR [probably should be some other error 
   code, not sure]
m) success - either return 0 or the number of bytes copied
  [probably best to return the # of bytes copied, even if (for now?) we
   only accept full copies]

Did I miss anything?  What about non-blocking call? Basically as above but 
return INPROGRESS as soon as we tell remote end to go ahead... or perhaps 
don't support non-blocking call?

> 5. what happens when the syscall is interupted? Especially if the remote
>    copy may take a while (I've seen some require an hour or more - worst
>    case: days due to a media error (completed after the disk was replaced)).

Well, if it's interrupted by a SIGINT or the like then return EINTR and 
the copy was not performed (ie we backed the copy out, unless net failure 
detected during abort then ENOLINK/ENETRESET).

If it's a more normal signal than it should behave like any normal kernel 
restartable syscall (i.e. via ERESTARTNOHAND or something like that).

> 6. what about a client opening the copy before it is finished copying?

The file copy is atomic and thus the file doesn't per se exist until the 
copy operation completes (or the file exists with zero size and is locked 
and can't be opened).

Perhaps in the future we could support partial copies and restarting an 
interrupted copy, but let's first agree (or not) on the above.

I think a copy syscall would be very useful.  What I'd really like to see 
is some sort of block-hashed-space-compression with copy-on-write 
semantics file system for linux (for my 500 CD collection which probably 
has a 10-12 data duplicity factor).

Comments?

Cheers,
MaZe.



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 21:07                     ` Andreas Dilger
@ 2003-11-20 21:30                       ` Timothy Miller
  2003-11-20 21:49                         ` Maciej Zenczykowski
                                           ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Timothy Miller @ 2003-11-20 21:30 UTC (permalink / raw)
  To: Andreas Dilger; +Cc: Justin Cormack, Jesse Pollard, linux-kernel mailing list



Andreas Dilger wrote:
> On Nov 20, 2003  15:44 -0500, Timothy Miller wrote:
> 
>>This could be a problem if COW causes you to run out of space when 
>>writing to the file.
> 
> 
> Not much different than running out of space copying a file.

It is, though.  If you run out of space copying a file, you know it when 
you're copying.  Applications don't usually expect to get out-of-space 
errors while overwriting something in the middle of a file.

In effect, your free space and your used space add up to greater than 
the capacity of the disk.  An application that checks for free space 
before doing something would be fooled into thinking there is more free 
space than there really is.  How can an application find out in advance 
that a file that it's about to modify (without appending anything to the 
end) is going to need more disk space?



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 20:44                   ` Timothy Miller
@ 2003-11-20 21:07                     ` Andreas Dilger
  2003-11-20 21:30                       ` Timothy Miller
  0 siblings, 1 reply; 77+ messages in thread
From: Andreas Dilger @ 2003-11-20 21:07 UTC (permalink / raw)
  To: Timothy Miller; +Cc: Justin Cormack, Jesse Pollard, linux-kernel mailing list

On Nov 20, 2003  15:44 -0500, Timothy Miller wrote:
> This could be a problem if COW causes you to run out of space when 
> writing to the file.

Not much different than running out of space copying a file.

> This could also be a benefit if, for whatever reason, you have lots of 
> copies of the same file that you never change.  But that sounds somewhat 
> pointless to me.

Umm, snapshots-in-time of your /home, /usr/src, etc?  Copies of the kernel?
Lots of reasons to have mostly-identical versions of files.  Almost like
hard links, except you aren't at the mercy of your editor/patch to do the
right thing when modifying one of those copies.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 19:44                 ` Justin Cormack
@ 2003-11-20 20:44                   ` Timothy Miller
  2003-11-20 21:07                     ` Andreas Dilger
  2003-11-21 16:24                   ` Jesse Pollard
  1 sibling, 1 reply; 77+ messages in thread
From: Timothy Miller @ 2003-11-20 20:44 UTC (permalink / raw)
  To: Justin Cormack; +Cc: Jesse Pollard, linux-kernel mailing list



Justin Cormack wrote:
> On Thu, 2003-11-20 at 19:08, Jesse Pollard wrote:

> If you really want a filesystem that supports efficient copying you
> probably want it to have the equivalent of COW blocks, so that a copy
> just sets up a few pointers, and the copy only happens when the original
> or copied files are changed.
> 
> But basically you wont get a syscall until you have a filesystem with
> semantics that only maps onto this sort of operation.


This could be a problem if COW causes you to run out of space when 
writing to the file.

This could also be a benefit if, for whatever reason, you have lots of 
copies of the same file that you never change.  But that sounds somewhat 
pointless to me.


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 19:08               ` Jesse Pollard
  2003-11-20 19:12                 ` Florian Weimer
@ 2003-11-20 19:44                 ` Justin Cormack
  2003-11-20 20:44                   ` Timothy Miller
  2003-11-21 16:24                   ` Jesse Pollard
  2003-11-20 21:48                 ` Maciej Zenczykowski
                                   ` (2 subsequent siblings)
  4 siblings, 2 replies; 77+ messages in thread
From: Justin Cormack @ 2003-11-20 19:44 UTC (permalink / raw)
  To: Jesse Pollard; +Cc: linux-kernel mailing list

On Thu, 2003-11-20 at 19:08, Jesse Pollard wrote:
> Now if you wanted the remote server to deny the network copy... could
> be done - after all the credentials for both input and output files
> are present on the server. If the server decides NOT to copy, then fine.
> It would just cause the user to make the copy with a read/write loop.
> 
> I was only thinking of it as a way to gain access to any filesystem
> support that may be available for copying files. If none is available,
> then do it in user mode.
> 
> Personally, I'm not sure it is a good idea, partly because the semantics
> of a file copy operation are not well defined (some of the following IS 
> known).
> 
> 1. what happens if the copy is aborted?
> 2. what happens if the network drops while the remote server continues?
> 3. what about buffer synchronization?
> 4. what errors should be reported ?
> 5. what happens when the syscall is interupted? Especially if the remote
>    copy may take a while (I've seen some require an hour or more - worst
>    case: days due to a media error (completed after the disk was replaced)).
> 6. what about a client opening the copy before it is finished copying?

If you really want a filesystem that supports efficient copying you
probably want it to have the equivalent of COW blocks, so that a copy
just sets up a few pointers, and the copy only happens when the original
or copied files are changed.

But basically you wont get a syscall until you have a filesystem with
semantics that only maps onto this sort of operation.

Justin



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 19:08               ` Jesse Pollard
@ 2003-11-20 19:12                 ` Florian Weimer
  2003-11-20 19:44                 ` Justin Cormack
                                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 77+ messages in thread
From: Florian Weimer @ 2003-11-20 19:12 UTC (permalink / raw)
  To: Jesse Pollard
  Cc: Valdis.Kletnieks, Daniel Gryniewicz, linux-kernel mailing list

Jesse Pollard wrote:

> > > > > 	int sys_copy(int fd_src, int fd_dst)

> > The default attributes in the new location might be less strict than the
> > attributes of the source file.
> 
> So what. the user was authorized to open the input file. The user was
> authorized to open the output file. A file copy should be possible remotely
> since the equivalent implementation of a local read/write loop would
> accomplish the same thing.

The potential for race conditions worries me.  However, the questions
you gave are more fundamental and may be enough to kill this idea (if it
wasn't already dead)...

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-20 17:21             ` Florian Weimer
@ 2003-11-20 19:08               ` Jesse Pollard
  2003-11-20 19:12                 ` Florian Weimer
                                   ` (4 more replies)
  0 siblings, 5 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-11-20 19:08 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Valdis.Kletnieks, Daniel Gryniewicz, linux-kernel mailing list

On Thursday 20 November 2003 11:21, Florian Weimer wrote:
> Jesse Pollard wrote:
> > > > 	int sys_copy(int fd_src, int fd_dst)
> > >
> > > Doesn't work.  You have to set the security attributes while you open
> > > fd_dst.
> >
> > Why? the open for fd_src should have the security attributes (both
> > locally and in the file server if networked). Opening fd_dst should SET
> > the security attributes desired (again, locally and in the target
> > fileserver).
>
> The default attributes in the new location might be less strict than the
> attributes of the source file.

So what. the user was authorized to open the input file. The user was
authorized to open the output file. A file copy should be possible remotely
since the equivalent implementation of a local read/write loop would
accomplish the same thing.

> If sys_copy() is just an API to introduce a new copy-on-write hard link,
> these problems disappear.  They are only relevant if sys_copy() is
> intended to be a generic "copy that file" interface.

Now if you wanted the remote server to deny the network copy... could
be done - after all the credentials for both input and output files
are present on the server. If the server decides NOT to copy, then fine.
It would just cause the user to make the copy with a read/write loop.

I was only thinking of it as a way to gain access to any filesystem
support that may be available for copying files. If none is available,
then do it in user mode.

Personally, I'm not sure it is a good idea, partly because the semantics
of a file copy operation are not well defined (some of the following IS 
known).

1. what happens if the copy is aborted?
2. what happens if the network drops while the remote server continues?
3. what about buffer synchronization?
4. what errors should be reported ?
5. what happens when the syscall is interupted? Especially if the remote
   copy may take a while (I've seen some require an hour or more - worst
   case: days due to a media error (completed after the disk was replaced)).
6. what about a client opening the copy before it is finished copying?

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-12 15:36           ` Jesse Pollard
@ 2003-11-20 17:21             ` Florian Weimer
  2003-11-20 19:08               ` Jesse Pollard
  0 siblings, 1 reply; 77+ messages in thread
From: Florian Weimer @ 2003-11-20 17:21 UTC (permalink / raw)
  To: Jesse Pollard
  Cc: Valdis.Kletnieks, Daniel Gryniewicz, linux-kernel mailing list

Jesse Pollard wrote:

> > > 	int sys_copy(int fd_src, int fd_dst)
> >
> > Doesn't work.  You have to set the security attributes while you open
> > fd_dst.
> 
> Why? the open for fd_src should have the security attributes (both locally
> and in the file server if networked). Opening fd_dst should SET the security
> attributes desired (again, locally and in the target fileserver).

The default attributes in the new location might be less strict than the
attributes of the source file.

If sys_copy() is just an API to introduce a new copy-on-write hard link,
these problems disappear.  They are only relevant if sys_copy() is
intended to be a generic "copy that file" interface.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-18 15:49                 ` Jamie Lokier
  2003-11-18 16:05                   ` Andi Kleen
@ 2003-11-19 13:30                   ` Jesse Pollard
  1 sibling, 0 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-11-19 13:30 UTC (permalink / raw)
  To: Jamie Lokier, Andi Kleen; +Cc: H. Peter Anvin, linux-kernel

On Tuesday 18 November 2003 09:49, Jamie Lokier wrote:
> Andi Kleen wrote:
> > > s/EINTR/short count/, of course :)
> >
> > That would be buggy because existing users of sendfile don't know
> > about this and would silently only copy part of the file when a signal
> > happens.
>
> That doesn't make sense.  There aren't any existing users of sendfile
> to copy files.

True. It also doesn't address the issue of what to do when the file copy is
being done on a remote server and not by something local. Synchronizing
a remote interrupt could really be nasty.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-14 15:26               ` Andi Kleen
                                   ` (2 preceding siblings ...)
  2003-11-19  2:12                 ` Linus Torvalds
@ 2003-11-19  4:04                 ` Chris Adams
  3 siblings, 0 replies; 77+ messages in thread
From: Chris Adams @ 2003-11-19  4:04 UTC (permalink / raw)
  To: linux-kernel

Once upon a time, Andi Kleen <ak@suse.de> wrote:
>"H. Peter Anvin" <hpa@zytor.com> writes:
>> s/EINTR/short count/, of course :)
>That would be buggy because existing users of sendfile don't know
>about this and would silently only copy part of the file when a signal
>happens.

Tru64 5.1B sendfile(2) page includes:

       [EINTR]
           A signal interrupted  sendfile  before  any  data  was
           transmitted.   If some data was transmitted, the func-
           tion returns the  number  of  bytes  sent  before  the
           interrupt and does not set errno to [EINTR].

There are quite a few more documented return values under Tru64,
although TCP sockets are the only supported destination.  See

http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51B_HTML/MAN/MAN2/0024____.HTM

-- 
Chris Adams <cmadams@hiwaay.net>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-14 15:26               ` Andi Kleen
  2003-11-18 15:49                 ` Jamie Lokier
  2003-11-18 16:58                 ` H. Peter Anvin
@ 2003-11-19  2:12                 ` Linus Torvalds
  2003-11-19  4:04                 ` Chris Adams
  3 siblings, 0 replies; 77+ messages in thread
From: Linus Torvalds @ 2003-11-19  2:12 UTC (permalink / raw)
  To: Andi Kleen; +Cc: H. Peter Anvin, linux-kernel


On 14 Nov 2003, Andi Kleen wrote:
> 
> That would be buggy because existing users of sendfile don't know
> about this and would silently only copy part of the file when a signal
> happens.

Don't be silly.

Existing sendfile users had _better_ accept short writes.

They happen all the time. If the destination is the network, it _will_ be 
interruptible.

		Linus


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-14 15:26               ` Andi Kleen
  2003-11-18 15:49                 ` Jamie Lokier
@ 2003-11-18 16:58                 ` H. Peter Anvin
  2003-11-19  2:12                 ` Linus Torvalds
  2003-11-19  4:04                 ` Chris Adams
  3 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2003-11-18 16:58 UTC (permalink / raw)
  To: Andi Kleen; +Cc: linux-kernel

Andi Kleen wrote:
> 
> That would be buggy because existing users of sendfile don't know
> about this and would silently only copy part of the file when a signal
> happens.
> 

It would be consistent with the documented semantics for other file 
operations.  Obviously, return zero only on EOF.

	-hpa


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-18 16:05                   ` Andi Kleen
@ 2003-11-18 16:25                     ` Trond Myklebust
  0 siblings, 0 replies; 77+ messages in thread
From: Trond Myklebust @ 2003-11-18 16:25 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Jamie Lokier, hpa, linux-kernel

>>>>> " " == Andi Kleen <ak@suse.de> writes:

    >> > That would be buggy because existing users of sendfile don't
    >> > know about this and would silently only copy part of the file
    >> > when a signal happens.
    >>
    >> That doesn't make sense.  There aren't any existing users of
    >> sendfile to copy files.

     > [ignore the mail, it was an stuck mail queue]

     > But note that arbitary changes in the signal handling would
     > affect all users of sendfile, not just those that attempt to
     > copy files or do other things that should be done in user
     > space.

That 'change' is already in effect for people who mount their NFS
partitions with the "intr" or "soft" flags.

See the return value of generic_file_sendfile(): it already has the
read()/write-like semantics of returning number of bytes written if
non-zero, or the value of desc.error if not.

Cheers,
  Trond

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-18 15:49                 ` Jamie Lokier
@ 2003-11-18 16:05                   ` Andi Kleen
  2003-11-18 16:25                     ` Trond Myklebust
  2003-11-19 13:30                   ` Jesse Pollard
  1 sibling, 1 reply; 77+ messages in thread
From: Andi Kleen @ 2003-11-18 16:05 UTC (permalink / raw)
  To: Jamie Lokier; +Cc: hpa, linux-kernel

On Tue, 18 Nov 2003 15:49:21 +0000
Jamie Lokier <jamie@shareable.org> wrote:

> Andi Kleen wrote:
> > > s/EINTR/short count/, of course :)
> > 
> > That would be buggy because existing users of sendfile don't know
> > about this and would silently only copy part of the file when a signal
> > happens.
> 
> That doesn't make sense.  There aren't any existing users of sendfile
> to copy files.

[ignore the mail, it was an stuck mail queue]

But note that arbitary changes in the signal handling would affect all users of sendfile, not just 
those that attempt to copy files or do other things that should be done in user space.

-Andi

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-14 15:26               ` Andi Kleen
@ 2003-11-18 15:49                 ` Jamie Lokier
  2003-11-18 16:05                   ` Andi Kleen
  2003-11-19 13:30                   ` Jesse Pollard
  2003-11-18 16:58                 ` H. Peter Anvin
                                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 77+ messages in thread
From: Jamie Lokier @ 2003-11-18 15:49 UTC (permalink / raw)
  To: Andi Kleen; +Cc: H. Peter Anvin, linux-kernel

Andi Kleen wrote:
> > s/EINTR/short count/, of course :)
> 
> That would be buggy because existing users of sendfile don't know
> about this and would silently only copy part of the file when a signal
> happens.

That doesn't make sense.  There aren't any existing users of sendfile
to copy files.

-- Jamie

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
       [not found]             ` <3FB42CC4.9030009@zytor.com.suse.lists.linux.kernel>
@ 2003-11-14 15:26               ` Andi Kleen
  2003-11-18 15:49                 ` Jamie Lokier
                                   ` (3 more replies)
  0 siblings, 4 replies; 77+ messages in thread
From: Andi Kleen @ 2003-11-14 15:26 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

"H. Peter Anvin" <hpa@zytor.com> writes:

> Andrea Arcangeli wrote:
> > On Thu, Nov 13, 2003 at 04:36:26PM -0800, H. Peter Anvin wrote:
> > 
> >>... or we could put in checks into the kernel for signal pending, and
> >>return EINTR.
> > 
> > that would be even better indeed.
> >
> 
> s/EINTR/short count/, of course :)

That would be buggy because existing users of sendfile don't know
about this and would silently only copy part of the file when a signal
happens.

-Andi

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-12 15:19 ` Jesse Pollard
@ 2003-11-14  3:42   ` Albert Cahalan
  0 siblings, 0 replies; 77+ messages in thread
From: Albert Cahalan @ 2003-11-14  3:42 UTC (permalink / raw)
  To: Jesse Pollard
  Cc: Albert Cahalan, linux-kernel mailing list, davide.rossetti,
	filia, dwmw2, moje, kakadu_croc

On Wed, 2003-11-12 at 10:19, Jesse Pollard wrote:
> On Monday 10 November 2003 19:05, Albert Cahalan wrote:

> > > The security context of the output depends
> > > on the user process. If it is a privileged
> > > process (ie, may change the context of the
> > > result) then the user process has to setup
> > > that context before the file is copied.
> >
> > So open the file, change context, and then:
> >
> > long copy_fd_to_file(int fd, const char *name, ...)
> 
> Easy to do in user mode.

It isn't, because the user-mode code would
need to have a full understanding of whatever
fancy (seLinux, RSBAC, lomac...) security
mechanism the kernel is using. It's not enough
to just know about switching to some named
context via a common API.

> > >> Is it? Please explain the simple steps which
> > >> cp(1) should take in order to observe that it
> > >> is being asked to duplicate a file on a file
> > >> system such as CIFS (or NFSv4?) which allows
> > >> the client to issue a 'copy file' command
> > >> over the network without actually transferring
> > >> the data twice, and to invoke such a command.
> > >
> > > Ah. That is an optimization question, not a
> > > question of kernel/user mode.
> >
> > Note that /bin/cp isn't always going to have
> > the necessary passwords and such. You're headed
> > down a path toward setuid /bin/cp.
> 
> If cp doesn't have access to the proper security credentials,
> then the file should not be copied.

You have proper credentials for access through
the mounted filesystem. That filesystem was
mounted by root, using some secret key that is
specific to the local machine. You could try
to directly contact the server over the network,
but you won't have the keys.

You're allowed to indirectly use the keys by
going through the mounted filesystem. For example,
you can call rmdir() to remove a directory but
you can not cause the same effect by sending a
message over the network directly to the server.
You have no ability to bypass the local kernel.

So you can copy that file, but you have to use
the file-oriented system calls to do it. You'll
need kernel support to invoke a remote-copy
operation. (or a setuid-root /bin/cp that looks
up the keys, determines the correct server, makes
a network connection, etc.)


> > > And since both source and destination may
> > > be remote you do get to decide based on
> > > source and destination devices: if they
> > > are the same, and one on a remote node,
> > > then BOTH will be on the remote, then you
> > > get to use the CIFS/NFS file copy. (check
> > > the doc on "stat/statfs" for additional info).
> > >
> > > I don't believe it works when source and
> > > destination are on DIFFERENT remote nodes,
> > > though.
> > >
> > > Strictly up to the implementation of cp/mv.
> > >
> > > Though you will loose portability of cp/mv.
> > > (Of course, you also loose it with a syscall
> > > for file copy too; as well as the MUCH more
> > > complicated implementation/security checks).
> >
> > Doing that in cp/mv is just insane. For one,
> > it bypasses any local security control over
> > access to the filesystem. There's not even a
> > way to be sure you're dealing with the server
> > you think you're dealing with.
> 
> It shouldn't matter - first the source file must be opened
> for read AND the destination file opened for write.
> This should give the proper local security evaluation and
> context for the copy. Once this has been approved,
> the remote copy request can be made (provided they are
> on the same "networked" device). Just making
> the request still doesn't mean that it will succeed -
> after all, the final security decisions are made by
> the remote server implementing the file copy.
> 
> Though if the copy is valid locally, then the use of
> the filesystem supported copy should work. It is an
> equivalent operation, it just all takes place on the server.
> 
> Identity of the server is irrelevent, as long as it is
> the same server (or farm) for both source and destination.
> If the remote file copy is defined, then it should work
> even when the actual source and destination are different
> physical machines - the remote filesystem CLAIMS it will
> work (identical is determined from the "device" mounted,
> one mount, one device as far as network filesystems go).
> And if they are not identical then you fall back to using
> a local copy.
> 
> All bets are off if the local pathnames are required by
> the remote server. That is silly. How would a networked
> client even know what the pathname would be? The parameters
> should be the two file handles passed to the remote filesystem.

You may need a filename relative to the root
of the exported part of the tree.

Remote side:
J:\groups\rteng\John Smith\tests\a.out
(with rteng exported as \\RTENG)

Local side:
/home/john/tests/a.out
(the mount point is "/home/john")

Path needed:
\\RTENG\John Smith\tests\a.out

You have that, since the kernel knows that a
"\\\\RTENG\\John Smith" directory was mounted
on /home/john and you're trying to deal with
a tests/a.out file.

> Personally, I don't think any changes should be made.
> It's just that this level of transfer is what the original
> poster was talking about. It just shouldn't be done in
> kernel mode.

Anywhere else would be buggy and most likely setuid.



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-14  1:10           ` Andrea Arcangeli
@ 2003-11-14  1:15             ` H. Peter Anvin
  0 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2003-11-14  1:15 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Andrea Arcangeli wrote:
> On Thu, Nov 13, 2003 at 04:36:26PM -0800, H. Peter Anvin wrote:
> 
>>... or we could put in checks into the kernel for signal pending, and
>>return EINTR.
> 
> that would be even better indeed.
>

s/EINTR/short count/, of course :)

	-hpa


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-14  0:36         ` H. Peter Anvin
@ 2003-11-14  1:10           ` Andrea Arcangeli
  2003-11-14  1:15             ` H. Peter Anvin
  0 siblings, 1 reply; 77+ messages in thread
From: Andrea Arcangeli @ 2003-11-14  1:10 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

On Thu, Nov 13, 2003 at 04:36:26PM -0800, H. Peter Anvin wrote:
> ... or we could put in checks into the kernel for signal pending, and
> return EINTR.

that would be even better indeed.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-13 23:39       ` Andrea Arcangeli
  2003-11-14  0:04         ` jw schultz
@ 2003-11-14  0:36         ` H. Peter Anvin
  2003-11-14  1:10           ` Andrea Arcangeli
  1 sibling, 1 reply; 77+ messages in thread
From: H. Peter Anvin @ 2003-11-14  0:36 UTC (permalink / raw)
  To: Andrea Arcangeli; +Cc: linux-kernel

Andrea Arcangeli wrote:
> 
> I actually hacked cp for a while and it improved cp some point percent
> on normal machines.
> 
> See ftp://ftp.suse.com/pub/people/andrea/cp-sendfile/
> 
> the main downside and the reason it wasn't applied IIRC is the lack of
> interruption of sendfile, basically for an huge file it would take a
> while before C^c has any effect. The kernel isn't interrupting the
> syscall.  This is no different from a huge read or write syscall (but
> read/write are never huge or the buffer would need to be huge too, not
> the case for sendfile that works zerocopy), so in theory we could
> workaround it by entering/exiting kernel multiple times just to allow
> the signal to be handled like in the read/write case.

... or we could put in checks into the kernel for signal pending, and
return EINTR.

	-hpa


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-13 23:39       ` Andrea Arcangeli
@ 2003-11-14  0:04         ` jw schultz
  2003-11-14  0:36         ` H. Peter Anvin
  1 sibling, 0 replies; 77+ messages in thread
From: jw schultz @ 2003-11-14  0:04 UTC (permalink / raw)
  To: linux-kernel

On Fri, Nov 14, 2003 at 12:39:15AM +0100, Andrea Arcangeli wrote:
> On Thu, Nov 13, 2003 at 12:22:14PM -0800, H. Peter Anvin wrote:
> > Followup to:  <20031111085323.M8854@devserv.devel.redhat.com>
> > By author:    Jakub Jelinek <jakub@redhat.com>
> > In newsgroup: linux.dev.kernel
> > > > 
> > > > Actually, I think we should have a: 
> > > > 
> > > > 	long copy_fd_to_fd (int src, int dst, int len)
> > > > 
> > > > type of systemcall. 
> > > 
> > > We have one, sendfile(2).
> > > 
> > 
> > It would be very nice if we could (a) expand the uses of sendfile(2),
> > and (b) have the libc do the fallback to read/write/mmap as needed.
> 
> I actually hacked cp for a while and it improved cp some point percent
> on normal machines.
> 
> See ftp://ftp.suse.com/pub/people/andrea/cp-sendfile/
> 
> the main downside and the reason it wasn't applied IIRC is the lack of
> interruption of sendfile, basically for an huge file it would take a
> while before C^c has any effect. The kernel isn't interrupting the
> syscall.  This is no different from a huge read or write syscall (but
> read/write are never huge or the buffer would need to be huge too, not
> the case for sendfile that works zerocopy), so in theory we could
> workaround it by entering/exiting kernel multiple times just to allow
> the signal to be handled like in the read/write case.

Until interrupt and restart (as has been discussed
here for other syscalls) handling is improved there could be
a sanity check with an E2BIG or something if the size is
insane.  I dislike the thought of sendfile going sitting in D
state on a multi-gigabyte file.

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-13 20:22     ` H. Peter Anvin
@ 2003-11-13 23:39       ` Andrea Arcangeli
  2003-11-14  0:04         ` jw schultz
  2003-11-14  0:36         ` H. Peter Anvin
  0 siblings, 2 replies; 77+ messages in thread
From: Andrea Arcangeli @ 2003-11-13 23:39 UTC (permalink / raw)
  To: H. Peter Anvin; +Cc: linux-kernel

On Thu, Nov 13, 2003 at 12:22:14PM -0800, H. Peter Anvin wrote:
> Followup to:  <20031111085323.M8854@devserv.devel.redhat.com>
> By author:    Jakub Jelinek <jakub@redhat.com>
> In newsgroup: linux.dev.kernel
> > > 
> > > Actually, I think we should have a: 
> > > 
> > > 	long copy_fd_to_fd (int src, int dst, int len)
> > > 
> > > type of systemcall. 
> > 
> > We have one, sendfile(2).
> > 
> 
> It would be very nice if we could (a) expand the uses of sendfile(2),
> and (b) have the libc do the fallback to read/write/mmap as needed.

I actually hacked cp for a while and it improved cp some point percent
on normal machines.

See ftp://ftp.suse.com/pub/people/andrea/cp-sendfile/

the main downside and the reason it wasn't applied IIRC is the lack of
interruption of sendfile, basically for an huge file it would take a
while before C^c has any effect. The kernel isn't interrupting the
syscall.  This is no different from a huge read or write syscall (but
read/write are never huge or the buffer would need to be huge too, not
the case for sendfile that works zerocopy), so in theory we could
workaround it by entering/exiting kernel multiple times just to allow
the signal to be handled like in the read/write case.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11 13:53   ` Jakub Jelinek
  2003-11-11 13:58     ` David Woodhouse
@ 2003-11-13 20:22     ` H. Peter Anvin
  2003-11-13 23:39       ` Andrea Arcangeli
  1 sibling, 1 reply; 77+ messages in thread
From: H. Peter Anvin @ 2003-11-13 20:22 UTC (permalink / raw)
  To: linux-kernel

Followup to:  <20031111085323.M8854@devserv.devel.redhat.com>
By author:    Jakub Jelinek <jakub@redhat.com>
In newsgroup: linux.dev.kernel
> > 
> > Actually, I think we should have a: 
> > 
> > 	long copy_fd_to_fd (int src, int dst, int len)
> > 
> > type of systemcall. 
> 
> We have one, sendfile(2).
> 

It would be very nice if we could (a) expand the uses of sendfile(2),
and (b) have the libc do the fallback to read/write/mmap as needed.

	-hpa

-- 
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11  8:58         ` Florian Weimer
  2003-11-11 10:27           ` jw schultz
@ 2003-11-12 15:36           ` Jesse Pollard
  2003-11-20 17:21             ` Florian Weimer
  1 sibling, 1 reply; 77+ messages in thread
From: Jesse Pollard @ 2003-11-12 15:36 UTC (permalink / raw)
  To: Florian Weimer, Valdis.Kletnieks, Daniel Gryniewicz,
	linux-kernel mailing list

On Tuesday 11 November 2003 02:58, Florian Weimer wrote:
> Andreas Dilger wrote:
> > > This is fast turning into a creeping horror of aggregation.  I defy
> > > anybody to create an API to cover all the options mentioned so far and
> > > *not* have it look like the process_clone horror we so roundly derided
> > > a few weeks ago.
> >
> > 	int sys_copy(int fd_src, int fd_dst)
>
> Doesn't work.  You have to set the security attributes while you open
> fd_dst.

Why? the open for fd_src should have the security attributes (both locally
and in the file server if networked). Opening fd_dst should SET the security
attributes desired (again, locally and in the target fileserver).

Then the sys_copy(fd_src,fd_dst) can take place in the FS code. And of course
it is necessary that fd_src and fd_dst reside on the same device. If they 
don't, then the sys_copy should fail.

If the sys_copy is a remote filesystem then fd_src, and fd_dst must be 
replaced by the remote file handles and this passed to the remote server.
Any additional checks may then be made from the evaluation of the file handles
locally on the file server, using the security credentials belonging to the
file handles.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11  1:05 Albert Cahalan
  2003-11-11  3:50 ` Andreas Dilger
  2003-11-11 13:38 ` Rogier Wolff
@ 2003-11-12 15:19 ` Jesse Pollard
  2003-11-14  3:42   ` Albert Cahalan
  2 siblings, 1 reply; 77+ messages in thread
From: Jesse Pollard @ 2003-11-12 15:19 UTC (permalink / raw)
  To: Albert Cahalan, linux-kernel mailing list
  Cc: davide.rossetti, filia, jesse, dwmw2, moje, kakadu_croc

On Monday 10 November 2003 19:05, Albert Cahalan wrote:
> > It is too simple to implement in user mode.
>
> That works for a plain byte-stream on a
> local UNIX-style filesystem. (though it
> likely isn't the fastest)

Yes - this was the local copy

> It doesn't work for Macintosh files.
> It's too slow for CIFS over a modem.
> It doesn't work for Windows security data.
> It doesn't allow copy-on-write files.
> It eats CPU time on compressed filesystems.
>
> > The security context of the output depends
> > on the user process. If it is a privileged
> > process (ie, may change the context of the
> > result) then the user process has to setup
> > that context before the file is copied.
>
> So open the file, change context, and then:
>
> long copy_fd_to_file(int fd, const char *name, ...)

Easy to do in user mode.

>
> (if you can no longer read from the OPEN fd,
> either we override that or we just don't care
> about such mostly-fictional cases)

correct - If you can't read, fail.

> > There are also some issues with mandatory
> > security controls. If it is copied in kernel
> > mode, then the previous labels could be
> > automatically carried over to the resulting
> > file... But that may not be what you want
> > (and frequently, it isn't).
>
> If it matters:
>
> // security as if a new file were created
> #define CF_REPLACE_SECURITY 0x00000001
> // if unable to replicate, up or down?
> #define CF_ROUND_SECURITY_UP 0x00000002
> #define CF_ROUND_SECURITY_DOWN 0x00000004
> // fail if security can't be replicated
> #define CF_SECURITY_EXACT 0x00000008
>
> > Now back to the copy.. You don't have to
> > use a read/write loop- mmap is faster.
>
> It's slower. (this is Linux, not SunOS)
> Use a 4 kB or 8 kB read/write loop.

yup local.

> > And this is the other reason for not doing
> > it in Kernel mode. Buffer management of
> > this type is much easier in user space
> > since the copy procedure doesn't have to
> > deal with memory limitations, cache flushes
> > page faulting of processes unrelated to the
> > copy, but is related to cache pressure.
>
> Buffer management is very much a kernel thing.

Yes it is, but do you want to push process dependant
buffer management into the page management? It's just
easier to do this in user mode, and allow the kernel
to handle global page managment.

> >> Is it? Please explain the simple steps which
> >> cp(1) should take in order to observe that it
> >> is being asked to duplicate a file on a file
> >> system such as CIFS (or NFSv4?) which allows
> >> the client to issue a 'copy file' command
> >> over the network without actually transferring
> >> the data twice, and to invoke such a command.
> >
> > Ah. That is an optimization question, not a
> > question of kernel/user mode.
>
> Note that /bin/cp isn't always going to have
> the necessary passwords and such. You're headed
> down a path toward setuid /bin/cp.

If cp doesn't have access to the proper security credentials,
then the file should not be copied.

> > Since the error checking for source and
> > destination both include doing a stat and
> > statfs, the device information (and FS info)
> > can both be retrieved.
> >
> > And mmap doesn't require data transfer "twice"
> > (local copy).
>
> Huh? Over the network from server to client
> counts as once. Then /bin/cp gets the data.
> Then it goes back over the network from the
> client to the server. That's "twice". That's
> horribly painful for a multi-gigabyte file
> and a DSL or cable-modem connection, never
> mind a dial-up connection.

True for all networked file systems. I had ment
to say (local filesystem copy).

> > Since that copy only pagefaults (though
> > read/write may be faster for some files
> > - I thought that was true for small files
> > that fit in cache, and large files faster
> > via mmap and depends on the page size;
> > and the tradeoff would be system dependant).
>
> Keep the read/write loop small for speed.

yes.

> > And since both source and destination may
> > be remote you do get to decide based on
> > source and destination devices: if they
> > are the same, and one on a remote node,
> > then BOTH will be on the remote, then you
> > get to use the CIFS/NFS file copy. (check
> > the doc on "stat/statfs" for additional info).
> >
> > I don't believe it works when source and
> > destination are on DIFFERENT remote nodes,
> > though.
> >
> > Strictly up to the implementation of cp/mv.
> >
> > Though you will loose portability of cp/mv.
> > (Of course, you also loose it with a syscall
> > for file copy too; as well as the MUCH more
> > complicated implementation/security checks).
>
> Doing that in cp/mv is just insane. For one,
> it bypasses any local security control over
> access to the filesystem. There's not even a
> way to be sure you're dealing with the server
> you think you're dealing with.

It shouldn't matter - first the source file must be opened
for read AND the destination file opened for write.
This should give the proper local security evaluation and
context for the copy. Once this has been approved,
the remote copy request can be made (provided they are
on the same "networked" device). Just making
the request still doesn't mean that it will succeed -
after all, the final security decisions are made by
the remote server implementing the file copy.

Though if the copy is valid locally, then the use of
the filesystem supported copy should work. It is an
equivalent operation, it just all takes place on the server.

Identity of the server is irrelevent, as long as it is
the same server (or farm) for both source and destination.
If the remote file copy is defined, then it should work
even when the actual source and destination are different
physical machines - the remote filesystem CLAIMS it will
work (identical is determined from the "device" mounted,
one mount, one device as far as network filesystems go).
And if they are not identical then you fall back to using
a local copy.

All bets are off if the local pathnames are required by
the remote server. That is silly. How would a networked
client even know what the pathname would be? The parameters
should be the two file handles passed to the remote filesystem.

Personally, I don't think any changes should be made.
It's just that this level of transfer is what the original
poster was talking about. It just shouldn't be done in
kernel mode.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-10 14:22     ` Daniel Jacobowitz
@ 2003-11-11 20:57       ` Jakob Oestergaard
  0 siblings, 0 replies; 77+ messages in thread
From: Jakob Oestergaard @ 2003-11-11 20:57 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Mon, Nov 10, 2003 at 09:22:22AM -0500, Daniel Jacobowitz wrote:
> On Mon, Nov 10, 2003 at 07:29:15AM -0600, Jesse Pollard wrote:
> > Now back to the copy.. You don't have to use a read/write loop- mmap
> > is faster. And this is the other reason for not doing it in Kernel mode.
> 
> Actually, last I checked, read/write was actually faster.  Linus
> explained why a month or two ago.

It would also not break on large files...

-- 
................................................................
:   jakob@unthought.net   : And I see the elder races,         :
:.........................: putrid forms of man                :
:   Jakob Østergaard      : See him rise and claim the earth,  :
:        OZ9ABN           : his downfall is at hand.           :
:.........................:............{Konkhra}...............:

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11 20:22       ` Jan Harkes
@ 2003-11-11 20:31         ` Valdis.Kletnieks
  0 siblings, 0 replies; 77+ messages in thread
From: Valdis.Kletnieks @ 2003-11-11 20:31 UTC (permalink / raw)
  To: Jan Harkes; +Cc: Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 302 bytes --]

On Tue, 11 Nov 2003 15:22:09 EST, Jan Harkes <jaharkes@cs.cmu.edu>  said:

> Similarily, we might at some point be able to optimize sendfile between
> two sockets by pushing the connection off to a router somewhere in the
> network completely bypassing the local NIC.

Security can of worms there.. :)

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11 15:02     ` Rogier Wolff
  2003-11-11 15:31       ` Ihar 'Philips' Filipau
@ 2003-11-11 20:22       ` Jan Harkes
  2003-11-11 20:31         ` Valdis.Kletnieks
  1 sibling, 1 reply; 77+ messages in thread
From: Jan Harkes @ 2003-11-11 20:22 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Tue, Nov 11, 2003 at 04:02:56PM +0100, Rogier Wolff wrote:
> Fine. For compatibilty we'll leave "sendfile" in place. But if somehow
> someone builds a filesystem which cannot use the pagecache, then
> "sendfile" will fail. Or if somehow we manage to get the socket hooked
...
> (*) Suppose I manage to stop and restart an application. The "restart"
> program might need to "sit between" the original application and its
> filedescriptors. So now, what used to be a socket suddenly becomes a
> pipe. It'd be nice if things would continue to work. Everything is a
> file remember?

man sendfile(2)

NOTES
    ...
    Applications may wish to fall back to read/write in the case
    where sendfile() fails with EINVAL or ENOSYS.

So we get something in a userspace library (libc?) that does
copyfile(whatever, whereever) and uses a few kernel primitives like
open/close/sendfile and the appropriate fallback code to a read/write
loop whenever the sendfile doesn't work.

It works now, and it will work better when sendfile becomes more
versatile, and the sky is the limit once the underlying filesystem can
provide it's own optimized implementation for instance when both fd's
refer to objects within the same (remote) filesystem.

Similarily, we might at some point be able to optimize sendfile between
two sockets by pushing the connection off to a router somewhere in the
network completely bypassing the local NIC.

Jan


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11 10:27           ` jw schultz
@ 2003-11-11 20:08             ` Jan Harkes
  0 siblings, 0 replies; 77+ messages in thread
From: Jan Harkes @ 2003-11-11 20:08 UTC (permalink / raw)
  To: linux-kernel mailing list

On Tue, Nov 11, 2003 at 02:27:42AM -0800, jw schultz wrote:
> On Tue, Nov 11, 2003 at 09:58:06AM +0100, Florian Weimer wrote:
> > Andreas Dilger wrote:
> > 
> > > > This is fast turning into a creeping horror of aggregation.  I defy anybody
> > > > to create an API to cover all the options mentioned so far and *not* have it
> > > > look like the process_clone horror we so roundly derided a few weeks ago.
> > > 
> > > 	int sys_copy(int fd_src, int fd_dst)
> 
> That sounds a lot like a sendfile with a file as the
> destination.  Useful but still happening on the local system.
> My understanding was that this was to be sent to a remote
> system where the file descriptors might not be open.

It probably should be sendfile, where the destination fd is a local file
instead of a socket. We really do not want to pass pathnames down into
the filesystem layer. As far as I know, no existing VFS operation does
that and it probably isn't a good idea to start doing it now.

Somehow the filesystem that 'hosts' the src_fd object should get a
chance to see/intercept the sendfile syscall, and it can then decide
based on the dst_fd object what to do. If the destination happens to be
in the same filesystem it could possibly use a special internal copyfile
rpc call or CoW implementation.

The userspace/libc code could provide a copyfile(char* src, char* dst,
int flags, int mode) wrapper, which can also handle falling back to a
simple read/write loop when sendfile fails.

So we clearly don't need a new system call, sendfile would do fine and
interestingly the manual page I'm reading now mentions that the source
has to be a mmap-able object, but lists no such restrictions on the
destination fd. Maybe sendfile already works and we just need to give the
filesystems a chance to override it.

Jan


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11 15:02     ` Rogier Wolff
@ 2003-11-11 15:31       ` Ihar 'Philips' Filipau
  2003-11-11 20:22       ` Jan Harkes
  1 sibling, 0 replies; 77+ messages in thread
From: Ihar 'Philips' Filipau @ 2003-11-11 15:31 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Linux Kernel Mailing List

Rogier Wolff wrote:
> 
> Fine. For compatibilty we'll leave "sendfile" in place. But if somehow
> someone builds a filesystem which cannot use the pagecache, then
> "sendfile" will fail. Or if somehow we manage to get the socket hooked
> up to something else (*). Either CVS needs to handle that case
> internally, or it will fail. In the first case, that causes extra code
> in lots of applications that want to continue to work, in the latter
> case, it's bad.
> 

   I beleive - if you really want to have something like this - you need 
to go to e.g. nfs/coda/smbfs developers and talk with them: how it can 
be implemented in this situations.

   Implement it with ioctl() - to really see make it sense or it just 
complicates things enourmously. Actually given networked file systems 
could be just NOT capable of this kind of operation at all.

   Insisting on new syscall is silly: syscall is interface - it has 
nothing to do with functionality. Ocasionally syscalls are used to 
access functionality ;-)  So start from functionality first. Syscall (or 
whatever interface will fit better) can be implemented in 15 minutes any 
time after functionality is in place.

-- 
Ihar 'Philips' Filipau  / with best regards from Saarbruecken.
--                                                           _ _ _
  "... and for $64000 question, could you get yourself       |_|*|_|
    vaguely familiar with the notion of on-topic posting?"   |_|_|*|
                                 -- Al Viro @ LKML           |*|*|*|


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11 14:11   ` Ihar 'Philips' Filipau
@ 2003-11-11 15:02     ` Rogier Wolff
  2003-11-11 15:31       ` Ihar 'Philips' Filipau
  2003-11-11 20:22       ` Jan Harkes
  0 siblings, 2 replies; 77+ messages in thread
From: Rogier Wolff @ 2003-11-11 15:02 UTC (permalink / raw)
  To: Ihar 'Philips' Filipau; +Cc: Rogier Wolff, Linux Kernel Mailing List

On Tue, Nov 11, 2003 at 03:11:26PM +0100, Ihar 'Philips' Filipau wrote:
> Rogier Wolff wrote:
> >But alas, last time Linus didn't agree with me and decided we should
> >do something like "sendfile", which is IMHO just a special case of
> >this one.
> >
> 
>   I will reply on behalf of Linus: "Send patch!"
> 
>   I beleive you are not developer - so you even cannot estimate what 
> you are proposing.

Wrong.

>   This kind of patch will never be accepted.

Yes. As I said: Linus doesn't agree with me. I don't sleep less from
knowing that. Feel free to disagree with me as well. 

>   Just try to imagine: 20 file systems, so 20*20 == 400 ifs?

Right! And: Wrong!

The idea is that the default will make sure that the kernel handles
the call. It's just as efficient as the userspace implementation.

But currently we have decided that the extra efficiency of 

	"local file -> socket" 

matters enough to us that we want to optimize that case. Fine. So now
we have "sendfile". This is currently implemented as a special
systemcall. I.e. one of those 400 cases you mentioned. 

But I expect that only a few cases will be important enough
that we care to optimize their implementation. 

If we end up with 400 ifs, because we CAN optimize each and every case
by itself, and we find that important enough to actually implement,
then of course the "string of ifs" is a nice candidate to optimize
again.

>   So I beleive you will get more more positive responses, If you will 
> start improveing vfs, e.g. adding generic routines for optimized move of 
> file from one file system to another, with API which allow it to 
> extrapolate nicely to networked file systems.

Once my proposed "copy_fd_to_fd" is in place, the road is open towards
just leaving the current special case that detects: "src uses pagecache
dst is a socket" and then calls the current sendfile implementation.


>    Silly. cp is least frequent application I use.

Yeah. So? You reject a general idea just because you don't use the
application that I used as an example in my proposal.

>    And cvs I beleive already uses sendfile().

Fine. For compatibilty we'll leave "sendfile" in place. But if somehow
someone builds a filesystem which cannot use the pagecache, then
"sendfile" will fail. Or if somehow we manage to get the socket hooked
up to something else (*). Either CVS needs to handle that case
internally, or it will fail. In the first case, that causes extra code
in lots of applications that want to continue to work, in the latter
case, it's bad.

			Roger. 

(*) Suppose I manage to stop and restart an application. The "restart"
program might need to "sit between" the original application and its
filedescriptors. So now, what used to be a socket suddenly becomes a
pipe. It'd be nice if things would continue to work. Everything is a
file remember?

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11 13:38 ` Rogier Wolff
  2003-11-11 13:53   ` Jakub Jelinek
@ 2003-11-11 14:11   ` Albert Cahalan
  1 sibling, 0 replies; 77+ messages in thread
From: Albert Cahalan @ 2003-11-11 14:11 UTC (permalink / raw)
  To: Rogier Wolff
  Cc: linux-kernel mailing list, davide.rossetti, filia, jesse, dwmw2,
	moje, kakadu_croc

On Tue, 2003-11-11 at 08:38, Rogier Wolff wrote:
> On Mon, Nov 10, 2003 at 08:05:11PM -0500, Albert Cahalan wrote:
> > So open the file, change context, and then:
> > 
> > long copy_fd_to_file(int fd, const char *name, ...)
> > 
> > (if you can no longer read from the OPEN fd,
> > either we override that or we just don't care
> > about such mostly-fictional cases)
> 
> 
> Actually, I think we should have a: 
> 
> 	long copy_fd_to_fd (int src, int dst, int len)
> 
> type of systemcall. 

I don't think that works. To have a destination
file descriptor, you have to already have created
the destination file. Having done so, it may now
be impossible to transfer the security data. This
is especially the case with network filesystems.

I can well imagine providing a file descriptor for
the destination directory and making the filename
optional. This helps pin things down if there's
worry about an attacker moving directories, and it
neatly allows for fully anonymous temporary files
if a file descriptor is returned.



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
       [not found] ` <QH4e.eV.3@gated-at.bofh.it>
@ 2003-11-11 14:11   ` Ihar 'Philips' Filipau
  2003-11-11 15:02     ` Rogier Wolff
  0 siblings, 1 reply; 77+ messages in thread
From: Ihar 'Philips' Filipau @ 2003-11-11 14:11 UTC (permalink / raw)
  To: Rogier Wolff; +Cc: Linux Kernel Mailing List

Rogier Wolff wrote:
> On Mon, Nov 10, 2003 at 08:05:11PM -0500, Albert Cahalan wrote:
> 
> 	long copy_fd_to_fd (int src, int dst, int len)
> 
> The kernel then becomes something
> 
> 	if (islocalfile (src) && issocket (dst)) 
> 		/* Call the old sendfile */ 
> 		return sendfile (....);
> 
> 	if (isCIFS (src), isCIFS(dst))
> 		/* Tell remote host to copy the file. */
> 		return CIFS_copy_file (....); 
> 

   B.S.

> 
> But alas, last time Linus didn't agree with me and decided we should
> do something like "sendfile", which is IMHO just a special case of
> this one.
> 

   I will reply on behalf of Linus: "Send patch!"

   I beleive you are not developer - so you even cannot estimate what 
you are proposing.

   This kind of patch will never be accepted.

   Just try to imagine: 20 file systems, so 20*20 == 400 ifs?

   So I beleive you will get more more positive responses, If you will 
start improveing vfs, e.g. adding generic routines for optimized move of 
file from one file system to another, with API which allow it to 
extrapolate nicely to networked file systems.
   Since right now there is no way to pass file from one fs to another - 
so basicly this thread is already, well, over ;-)

> 
> If we implement this in kernel (at first just the copy_fd_fd and the
> default implementation), then we can get "cp" to use this, and then
> suddenly whenever we upgrade the kernel, cp can use the newly
> optimized copying mechanism. (e.g. whenever we manage to specify a
> socket as the destination, cp would suddenly start to use
> "sendfile"!!)
> 

    Silly. cp is least frequent application I use.
    And cvs I beleive already uses sendfile().
    So all your /arguments/ go directly into /dev/null, since if file is 
not in cvs - you know - it just doesn't exist ;-)))

> 
> 		Roger. 
> 


-- 
Ihar 'Philips' Filipau  / with best regards from Saarbruecken.
--                                                           _ _ _
  "... and for $64000 question, could you get yourself       |_|*|_|
    vaguely familiar with the notion of on-topic posting?"   |_|_|*|
                                 -- Al Viro @ LKML           |*|*|*|


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11 13:53   ` Jakub Jelinek
@ 2003-11-11 13:58     ` David Woodhouse
  2003-11-13 20:22     ` H. Peter Anvin
  1 sibling, 0 replies; 77+ messages in thread
From: David Woodhouse @ 2003-11-11 13:58 UTC (permalink / raw)
  To: Jakub Jelinek
  Cc: Rogier Wolff, Albert Cahalan, linux-kernel mailing list,
	davide.rossetti, filia, jesse, moje, kakadu_croc

On Tue, 2003-11-11 at 08:53 -0500, Jakub Jelinek wrote:
> But e.g. the CIFS copy can be done as sendfile hook.

Can it? I thought it took filenames.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11 13:38 ` Rogier Wolff
@ 2003-11-11 13:53   ` Jakub Jelinek
  2003-11-11 13:58     ` David Woodhouse
  2003-11-13 20:22     ` H. Peter Anvin
  2003-11-11 14:11   ` Albert Cahalan
  1 sibling, 2 replies; 77+ messages in thread
From: Jakub Jelinek @ 2003-11-11 13:53 UTC (permalink / raw)
  To: Rogier Wolff
  Cc: Albert Cahalan, linux-kernel mailing list, davide.rossetti,
	filia, jesse, dwmw2, moje, kakadu_croc

On Tue, Nov 11, 2003 at 02:38:59PM +0100, Rogier Wolff wrote:
> On Mon, Nov 10, 2003 at 08:05:11PM -0500, Albert Cahalan wrote:
> > So open the file, change context, and then:
> > 
> > long copy_fd_to_file(int fd, const char *name, ...)
> > 
> > (if you can no longer read from the OPEN fd,
> > either we override that or we just don't care
> > about such mostly-fictional cases)
> 
> 
> Actually, I think we should have a: 
> 
> 	long copy_fd_to_fd (int src, int dst, int len)
> 
> type of systemcall. 

We have one, sendfile(2).

> It should do something like:
> 
> 	while ((nbytes = read (src, buf, BUFSIZE)) >= 0) {
> 		if (write (dst, buf, nbytes) < 0) 
> 			return totbytes; 
> 		totbytes += nbytes;
> 	}
> 
> but it allows kernel-space to optimize this whenever possible. Kernel
> then becomes responsible for detecting and handling the optimizable
> cases. 
> 
> The kernel then becomes something
> 
> 	if (islocalfile (src) && issocket (dst)) 
> 		/* Call the old sendfile */ 
> 		return sendfile (....);
> 
> 	if (isCIFS (src), isCIFS(dst))
> 		/* Tell remote host to copy the file. */
> 		return CIFS_copy_file (....); 
> 
> 	...

Can you explain why this cannot be in sys_sendfile?
It doesn't make much sense to provide any default in the kernel,
that's something the userland can handle equally well.
But e.g. the CIFS copy can be done as sendfile hook.

	Jakub

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11  1:05 Albert Cahalan
  2003-11-11  3:50 ` Andreas Dilger
@ 2003-11-11 13:38 ` Rogier Wolff
  2003-11-11 13:53   ` Jakub Jelinek
  2003-11-11 14:11   ` Albert Cahalan
  2003-11-12 15:19 ` Jesse Pollard
  2 siblings, 2 replies; 77+ messages in thread
From: Rogier Wolff @ 2003-11-11 13:38 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: linux-kernel mailing list, davide.rossetti, filia, jesse, dwmw2,
	moje, kakadu_croc

On Mon, Nov 10, 2003 at 08:05:11PM -0500, Albert Cahalan wrote:
> So open the file, change context, and then:
> 
> long copy_fd_to_file(int fd, const char *name, ...)
> 
> (if you can no longer read from the OPEN fd,
> either we override that or we just don't care
> about such mostly-fictional cases)


Actually, I think we should have a: 

	long copy_fd_to_fd (int src, int dst, int len)

type of systemcall. 

It should do something like:

	while ((nbytes = read (src, buf, BUFSIZE)) >= 0) {
		if (write (dst, buf, nbytes) < 0) 
			return totbytes; 
		totbytes += nbytes;
	}

but it allows kernel-space to optimize this whenever possible. Kernel
then becomes responsible for detecting and handling the optimizable
cases. 

The kernel then becomes something

	if (islocalfile (src) && issocket (dst)) 
		/* Call the old sendfile */ 
		return sendfile (....);

	if (isCIFS (src), isCIFS(dst))
		/* Tell remote host to copy the file. */
		return CIFS_copy_file (....); 

	...

and then the default implementation. This is nice and expandible, and
provides a default for the case that cannot be optimized. 

And if you don't want the extra code, we could enclose the different
optimizations with ifdefs.

But alas, last time Linus didn't agree with me and decided we should
do something like "sendfile", which is IMHO just a special case of
this one.


If we implement this in kernel (at first just the copy_fd_fd and the
default implementation), then we can get "cp" to use this, and then
suddenly whenever we upgrade the kernel, cp can use the newly
optimized copying mechanism. (e.g. whenever we manage to specify a
socket as the destination, cp would suddenly start to use
"sendfile"!!)

(It might be better to include a "buffer" argument in the interface,
freeing the implementation of allocating a buffer when an optimization
is not possible).

		Roger. 

-- 
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam -  no windows, no gates, apache inside!" ****

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
       [not found]             ` <QEg2.3zi.9@gated-at.bofh.it>
@ 2003-11-11 12:43               ` Ihar 'Philips' Filipau
  0 siblings, 0 replies; 77+ messages in thread
From: Ihar 'Philips' Filipau @ 2003-11-11 12:43 UTC (permalink / raw)
  To: jw schultz; +Cc: Linux Kernel Mailing List

jw schultz wrote:
> On Tue, Nov 11, 2003 at 10:51:10AM +0100, Ihar 'Philips' Filipau wrote:
> 
>>Florian Weimer wrote:
>>
>>>Andreas Dilger wrote:
>>>
>>>
>>>
>>>>>This is fast turning into a creeping horror of aggregation.  I defy 
>>>>>anybody
>>>>>to create an API to cover all the options mentioned so far and *not* 
>>>>>have it
>>>>>look like the process_clone horror we so roundly derided a few weeks ago.
>>>>
>>>>	int sys_copy(int fd_src, int fd_dst)
>>>
>>>
>>>Doesn't work.  You have to set the security attributes while you open
>>>fd_dst.
>>
>>  int new_fd = sys_copy( int src_fd );  /* cloned copy, out of any fs */
>>  fchmod( new_fd, XXX_WHAT_EVER );      /* do the job. */
>>  ...
>>  flink(new_fd, "/some/path/some/file/name"); /* commit to fs */
> 
> 
> The associate open file descriptor with a new path system
> call (flink here) has already been rejected for solid
> security reasons.
> 

   So it was my point - without flink() IMHO it makes no sense.

   Just try to imagine any application for sys_copy(char*,char*).
   None _I_ _can_ imagine.

   "int new_fd = sys_copy( old_fd );" make sense to me - but you need to 
have counter-part of it - "flink();" - to commit it to file system.

    You really do not need a copy of a file just for copy of a file.
    That's what hard link is for.

    My way vim/emacs can:

    fd = open("originalfile");
    new_fildes = copy(fd);
    close(fd);
     ... [do the editing] ...
    flink(new_fildes, "newfile"); /* if user decides to save this job */
    close(new_fildes);

    This make sense - and this is the way usually we do processing of 
information. Mimicing cp - is really bad example.

    I have re-read thread. I see flink() not as security hole - but they 
use should be managed in some way.

    Original thread about flink() - everthing doable.
http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=20030406190025%241ec6%40gated-at.bofh.it&rnum=50&prev=/&frame=on
    And there was no real security issue given whatsoever.
    Only design considerations ;-)

> 
> So if you can do it with open file descriptors why do you
> need a new system call?
> 

   The point, that different fs's can optimize this as they wish.
   This would be really nice thing to have in our networked age.

   Sshing just to copy huge file - is little bit annoying ;-)

P.S. actually my mind keeps spining idea of cut()/paste(). So file 
descriptor without assoc. file path can be useful.
Say:

    -----------
    fd_part_1 = open("some file");
    seek(fd_part_1, 100, 0);
    fd_part_2 = cut( fd_part_1 );  /* XXX */
    /* here eof(fd_part_1) == 1 && "some file" is truncated to 100b. */

    flink(fd_part_2, "second part"); /* create file
                    with rest of "some file" */
    -----------
    fd_part_1 = open("some file");
    fd_part_2 = open("second part");
    paste(fd_part_1, fd_part_2);   /* XXX */
            /* fd_part_2 is auto close()d
               and "second part" file unlinked */
    close(fd_part_1);
    /* here "some file" will be the same as in the begining */
    -----------

    This should help video/audio editing much.

P.P.S. not relevant but in any way SUSv3 docs for fattach()
http://www.opengroup.org/onlinepubs/007904975/functions/fattach.html


-- 
Ihar 'Philips' Filipau  / with best regards from Saarbruecken.
--                                                           _ _ _
  "... and for $64000 question, could you get yourself       |_|*|_|
    vaguely familiar with the notion of on-topic posting?"   |_|_|*|
                                 -- Al Viro @ LKML           |*|*|*|


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11 12:08       ` Andreas Schwab
@ 2003-11-11 12:23         ` davide.rossetti
  0 siblings, 0 replies; 77+ messages in thread
From: davide.rossetti @ 2003-11-11 12:23 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Linux Kernel Mailing List

On Tue, 11 Nov 2003, Andreas Schwab wrote:

> "davide.rossetti" <rossetti@roma1.infn.it> writes:
> 
> > Maybe I was misunderstood... I'm asking why the libc/iso/ansi/posix 
> > engineer did not add the spec a user-mode API to do copy file to file ???
> 
> Because there was no prior art.

:) but late revisions of specs are really recent!!! 

folks are talking about implementing all sort of stuff (web servers,
parallel filesystems, ...)  (partly) in kernel mode and no one cares of
(maybe accelerated) fs copies ???

-- 
______/ Rossetti Davide   INFN - Roma I - APE group \______________
 pho +390649914507/412   web: http://apegate.roma1.infn.it/~rossetti
 fax +390649914423     email: davide.rossetti@roma1.infn.it        


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11 12:00     ` davide.rossetti
@ 2003-11-11 12:08       ` Andreas Schwab
  2003-11-11 12:23         ` davide.rossetti
  0 siblings, 1 reply; 77+ messages in thread
From: Andreas Schwab @ 2003-11-11 12:08 UTC (permalink / raw)
  To: davide.rossetti
  Cc: Jesse Pollard, Ihar 'Philips' Filipau, Linux Kernel Mailing List

"davide.rossetti" <rossetti@roma1.infn.it> writes:

> Maybe I was misunderstood... I'm asking why the libc/iso/ansi/posix 
> engineer did not add the spec a user-mode API to do copy file to file ???

Because there was no prior art.

Andreas.

-- 
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-10 13:29   ` Jesse Pollard
  2003-11-10 14:22     ` Daniel Jacobowitz
  2003-11-10 15:19     ` David Woodhouse
@ 2003-11-11 12:00     ` davide.rossetti
  2003-11-11 12:08       ` Andreas Schwab
  2 siblings, 1 reply; 77+ messages in thread
From: davide.rossetti @ 2003-11-11 12:00 UTC (permalink / raw)
  To: Jesse Pollard; +Cc: Ihar 'Philips' Filipau, Linux Kernel Mailing List

On Mon, 10 Nov 2003, Jesse Pollard wrote:

> On Monday 10 November 2003 06:08, Ihar 'Philips' Filipau wrote:
> >    sendfile(2) - ?
> I don't think that is what he was referring to.. The sample
> code is strictly user mode file->file copying.
> > Davide Rossetti wrote:
> > > it may be orribly RTFM... but writing a simple framework I realized
> > > there is no libc/POSIX/whoknows
> > > copy(const char* dest_file_name, const char* src_file_name)
> > >
> > > What is the technical reason???
> 
> It isn't an application for the kernel.

Maybe I was misunderstood... I'm asking why the libc/iso/ansi/posix 
engineer did not add the spec a user-mode API to do copy file to file ???

if there was such a standard _user_ API, we could talk about user/kernel 
implementation issues... but my question is more "primitive" somehow :)

> > > I understand that there may be little space for kernel side
> > > optimizations in this area but anyway I'm surprised I have to write
> > >
> > > < the bits to clone the metadata of src_file_name on opening
> > > dest_file_name >
> > > const int BUFSIZE = 1<<12;
> > > char buffer[BUFSIZE];
> > > int nrb;
> > > while((nrb = read(infd, buffer, BUFSIZE) != -1) {
> > >  ret = write(outfd, buffer, nrb);
> > >  if(ret != nrb) {...}
> > > }
> > >
> > > instead of something similar to:
> > > sys_fscopy(...)
> 
> It is too simple to implement in user mode.
> 
> There are some other issues too:
> 
> The security context of the output depends on the user process.
> If it is a privileged process (ie, may change the context of the
> result) then the user process has to setup that context before
> the file is copied.
> 
> There are also some issues with mandatory security controls. If it
> is copied in kernel mode, then the previous labels could be automatically
> carried over to the resulting file... But that may not be what you
> want (and frequently, it isn't).
> 
> Now back to the copy.. You don't have to use a read/write loop- mmap
> is faster. And this is the other reason for not doing it in Kernel mode.
> Buffer management of this type is much easier in user space since the
> copy procedure doesn't have to deal with memory limitations, cache flushes
> page faulting of processes unrelated to the copy, but is related to cache
> pressure.

ok... so I have to code a framework routine which auto-benchmarks (at
either runtime or configure time) and uses at least 2 implementations, one 
using read/write and another mmap(), as I know for sure that on
different Unices they perform differently... ah.. and the day we add
sys_sendfile(fd,fd) (if it is not there yet) I have to add yet another
implementation... and doing file copies of gigabyte sized files with 
mmap() on 32bit archs isn't so trivial, you have to do windowing I 
guess...

seems scary at least ;)

<joke>
it seems similar to saying that we do not need a rename() Posix/XOpen/etc 
API as we can do:

rename(to, from) {
 link(to, from); // make hardlink
 unlink(from); // remove original
}
</joke>

regards

-- 
______/ Rossetti Davide   INFN - Roma I - APE group \______________
 pho +390649914507/412   web: http://apegate.roma1.infn.it/~rossetti
 fax +390649914423     email: davide.rossetti@roma1.infn.it        


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11  9:51           ` Ihar 'Philips' Filipau
@ 2003-11-11 10:41             ` jw schultz
  0 siblings, 0 replies; 77+ messages in thread
From: jw schultz @ 2003-11-11 10:41 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Tue, Nov 11, 2003 at 10:51:10AM +0100, Ihar 'Philips' Filipau wrote:
> Florian Weimer wrote:
> >Andreas Dilger wrote:
> >
> >
> >>>This is fast turning into a creeping horror of aggregation.  I defy 
> >>>anybody
> >>>to create an API to cover all the options mentioned so far and *not* 
> >>>have it
> >>>look like the process_clone horror we so roundly derided a few weeks ago.
> >>
> >>	int sys_copy(int fd_src, int fd_dst)
> >
> >
> >Doesn't work.  You have to set the security attributes while you open
> >fd_dst.
> 
>   int new_fd = sys_copy( int src_fd );  /* cloned copy, out of any fs */
>   fchmod( new_fd, XXX_WHAT_EVER );      /* do the job. */
>   ...
>   flink(new_fd, "/some/path/some/file/name"); /* commit to fs */

The associate open file descriptor with a new path system
call (flink here) has already been rejected for solid
security reasons.

>   close(new_fd);  /* bye-bye */
> 
>   I beleive this can be more useful. Not only in naive tries to replace 
> cp(1) with kernel ;-)

Eliminating the flink and using file descriptors you wind up
with something like:

	in_fd = open(oldpath, O_RDONLY);
	fstat(in_fd, statbuf);
	out_fd = open(newpath, O_WRONLY|flags, statbuf->st_mode);
	sendfile(out_fd, in_fd, 0, statbuf->st_size);
	close(out_fd);
	close(in_fd);

So if you can do it with open file descriptors why do you
need a new system call?

-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11  8:58         ` Florian Weimer
@ 2003-11-11 10:27           ` jw schultz
  2003-11-11 20:08             ` Jan Harkes
  2003-11-12 15:36           ` Jesse Pollard
  1 sibling, 1 reply; 77+ messages in thread
From: jw schultz @ 2003-11-11 10:27 UTC (permalink / raw)
  To: linux-kernel mailing list

On Tue, Nov 11, 2003 at 09:58:06AM +0100, Florian Weimer wrote:
> Andreas Dilger wrote:
> 
> > > This is fast turning into a creeping horror of aggregation.  I defy anybody
> > > to create an API to cover all the options mentioned so far and *not* have it
> > > look like the process_clone horror we so roundly derided a few weeks ago.
> > 
> > 	int sys_copy(int fd_src, int fd_dst)

That sounds a lot like a sendfile with a file as the
destination.  Useful but still happening on the local system.
My understanding was that this was to be sent to a remote
system where the file descriptors might not be open.

> 
> Doesn't work.  You have to set the security attributes while you open
> fd_dst.

That would have been done with open().

To operate on paths so it could be sent to a fileserver it
would need the same arguments as open() with the addition of
the newpath.

	int sys_copy(const char *oldpath, const char *oldpath,
	    int flags, mode_t mode);

O_TRUNC		replace an existing file.
O_EXCL		prevent replacing an existing file.
O_APPEND	concatenate (useful feature creep).
O_NDELAY/O_NONBLOCK	return and ignore ENOSPACE condition, ick!
O_SYNC		if O_SYNC supported for open
O_NOFOLLOW	don't follow symlink (no need for a lcopy())

EXDEV (see link(2)) seems a better error code for cases
where the source and destination are on different servers.
Otherwise the error codes would conform to open(2).

I've long thought a file copy syscall was missing from unix
but until you start networking it isn't an issue.



-- 
________________________________________________________________
	J.W. Schultz            Pegasystems Technologies
	email address:		jw@pegasys.ws

		Remember Cernan and Schmitt

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
       [not found]         ` <QCHh.X6.3@gated-at.bofh.it>
@ 2003-11-11  9:51           ` Ihar 'Philips' Filipau
  2003-11-11 10:41             ` jw schultz
  0 siblings, 1 reply; 77+ messages in thread
From: Ihar 'Philips' Filipau @ 2003-11-11  9:51 UTC (permalink / raw)
  To: Florian Weimer
  Cc: Linux Kernel Mailing List, Valdis.Kletnieks, Andreas Dilger

Florian Weimer wrote:
> Andreas Dilger wrote:
> 
> 
>>>This is fast turning into a creeping horror of aggregation.  I defy anybody
>>>to create an API to cover all the options mentioned so far and *not* have it
>>>look like the process_clone horror we so roundly derided a few weeks ago.
>>
>>	int sys_copy(int fd_src, int fd_dst)
> 
> 
> Doesn't work.  You have to set the security attributes while you open
> fd_dst.

   int new_fd = sys_copy( int src_fd );  /* cloned copy, out of any fs */
   fchmod( new_fd, XXX_WHAT_EVER );      /* do the job. */
   ...
   flink(new_fd, "/some/path/some/file/name"); /* commit to fs */
   close(new_fd);  /* bye-bye */

   I beleive this can be more useful. Not only in naive tries to replace 
cp(1) with kernel ;-)

-- 
Ihar 'Philips' Filipau  / with best regards from Saarbruecken.
--                                                           _ _ _
  "... and for $64000 question, could you get yourself       |_|*|_|
    vaguely familiar with the notion of on-topic posting?"   |_|_|*|
                                 -- Al Viro @ LKML           |*|*|*|


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11  6:00       ` Andreas Dilger
@ 2003-11-11  8:58         ` Florian Weimer
  2003-11-11 10:27           ` jw schultz
  2003-11-12 15:36           ` Jesse Pollard
  0 siblings, 2 replies; 77+ messages in thread
From: Florian Weimer @ 2003-11-11  8:58 UTC (permalink / raw)
  To: Valdis.Kletnieks, Daniel Gryniewicz, linux-kernel mailing list

Andreas Dilger wrote:

> > This is fast turning into a creeping horror of aggregation.  I defy anybody
> > to create an API to cover all the options mentioned so far and *not* have it
> > look like the process_clone horror we so roundly derided a few weeks ago.
> 
> 	int sys_copy(int fd_src, int fd_dst)

Doesn't work.  You have to set the security attributes while you open
fd_dst.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11  3:50 ` Andreas Dilger
  2003-11-11  4:03   ` Daniel Gryniewicz
@ 2003-11-11  8:52   ` Gábor Lénárt
  1 sibling, 0 replies; 77+ messages in thread
From: Gábor Lénárt @ 2003-11-11  8:52 UTC (permalink / raw)
  To: linux-kernel mailing list

On Mon, Nov 10, 2003 at 08:50:12PM -0700, Andreas Dilger wrote:
> On Nov 10, 2003  20:05 -0500, Albert Cahalan wrote:
> > > It is too simple to implement in user mode.
> > 
> > That works for a plain byte-stream on a
> > local UNIX-style filesystem. (though it
> > likely isn't the fastest)

It would be something similar than sendfile() ?


- Gábor (larta'H)

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11  4:14     ` Valdis.Kletnieks
@ 2003-11-11  6:00       ` Andreas Dilger
  2003-11-11  8:58         ` Florian Weimer
  0 siblings, 1 reply; 77+ messages in thread
From: Andreas Dilger @ 2003-11-11  6:00 UTC (permalink / raw)
  To: Valdis.Kletnieks; +Cc: Daniel Gryniewicz, linux-kernel mailing list

On Nov 10, 2003  23:14 -0500, Valdis.Kletnieks@vt.edu wrote:
> On Mon, 10 Nov 2003 23:03:26 EST, Daniel Gryniewicz said:
> > Plus a sys_copy() syscall could be used as a generic way for filesystems
> > to set up Copy-on-Write.  Right now, you'd need to have userspace call
> > sys-reiser4 or something like that.
> 
> This is fast turning into a creeping horror of aggregation.  I defy anybody
> to create an API to cover all the options mentioned so far and *not* have it
> look like the process_clone horror we so roundly derided a few weeks ago.

	int sys_copy(int fd_src, int fd_dst)

It is up to the filesystem to decide if both files are on the same device
and can be copied with a copy RPC (or whatever).  If the filesystem returns
-EOPNOTSUPP then the VFS goes into a simple readpages/writepages loop to do
the copy instead, maybe also copying ACLs or other things the VFS understands.

All of the "extra functionality" is being handled in the filesystem itself
and not the VFS or the API.  Copy-on-write is an fs-internal issue depending
on whether fs supports it, how it was mounted, etc.  Remote copy is also an
fs-internal issue depending on whether inodes are in same filesystem, support,
etc.  You might get into fun things like doing zero-copy.

Telling the filesystem we are doing a copy vs. a bunch of reads mixed
with a bunch of writes is just semantically something that the filesystem
should know about.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11  4:03   ` Daniel Gryniewicz
@ 2003-11-11  4:14     ` Valdis.Kletnieks
  2003-11-11  6:00       ` Andreas Dilger
  0 siblings, 1 reply; 77+ messages in thread
From: Valdis.Kletnieks @ 2003-11-11  4:14 UTC (permalink / raw)
  To: Daniel Gryniewicz; +Cc: linux-kernel mailing list

[-- Attachment #1: Type: text/plain, Size: 473 bytes --]

On Mon, 10 Nov 2003 23:03:26 EST, Daniel Gryniewicz said:

> Plus a sys_copy() syscall could be used as a generic way for filesystems
> to set up Copy-on-Write.  Right now, you'd need to have userspace call
> sys-reiser4 or something like that.

This is fast turning into a creeping horror of aggregation.  I defy anybody
to create an API to cover all the options mentioned so far and *not* have it
look like the process_clone horror we so roundly derided a few weeks ago.

[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11  3:50 ` Andreas Dilger
@ 2003-11-11  4:03   ` Daniel Gryniewicz
  2003-11-11  4:14     ` Valdis.Kletnieks
  2003-11-11  8:52   ` Gábor Lénárt
  1 sibling, 1 reply; 77+ messages in thread
From: Daniel Gryniewicz @ 2003-11-11  4:03 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: Albert Cahalan, linux-kernel mailing list, davide.rossetti,
	filia, jesse, dwmw2, moje, kakadu_croc

[-- Attachment #1: Type: text/plain, Size: 687 bytes --]

On Mon, 2003-11-10 at 22:50, Andreas Dilger wrote:
> Having a sys_copy() syscall would be incredibly useful for Lustre
> (distributed Linux fs).  We could start a copy from one storage node
> to another (or more likely many to many for a file striped over many
> storage nodes) at num_stripes * uni-directional bandwidth with no
> impact to the client node.  Instead, we have to copy files at best a
> single client's bi-directional network_bandwidth.

Plus a sys_copy() syscall could be used as a generic way for filesystems
to set up Copy-on-Write.  Right now, you'd need to have userspace call
sys-reiser4 or something like that.
-- 
Daniel Gryniewicz <dang@fprintf.net>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-11  1:05 Albert Cahalan
@ 2003-11-11  3:50 ` Andreas Dilger
  2003-11-11  4:03   ` Daniel Gryniewicz
  2003-11-11  8:52   ` Gábor Lénárt
  2003-11-11 13:38 ` Rogier Wolff
  2003-11-12 15:19 ` Jesse Pollard
  2 siblings, 2 replies; 77+ messages in thread
From: Andreas Dilger @ 2003-11-11  3:50 UTC (permalink / raw)
  To: Albert Cahalan
  Cc: linux-kernel mailing list, davide.rossetti, filia, jesse, dwmw2,
	moje, kakadu_croc

On Nov 10, 2003  20:05 -0500, Albert Cahalan wrote:
> > It is too simple to implement in user mode.
> 
> That works for a plain byte-stream on a
> local UNIX-style filesystem. (though it
> likely isn't the fastest)
> 
> It doesn't work for Macintosh files.
> It's too slow for CIFS over a modem.
> It doesn't work for Windows security data.
> It doesn't allow copy-on-write files.
> It eats CPU time on compressed filesystems.

Having a sys_copy() syscall would be incredibly useful for Lustre
(distributed Linux fs).  We could start a copy from one storage node
to another (or more likely many to many for a file striped over many
storage nodes) at num_stripes * uni-directional bandwidth with no
impact to the client node.  Instead, we have to copy files at best a
single client's bi-directional network_bandwidth.

Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
@ 2003-11-11  1:05 Albert Cahalan
  2003-11-11  3:50 ` Andreas Dilger
                   ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Albert Cahalan @ 2003-11-11  1:05 UTC (permalink / raw)
  To: linux-kernel mailing list
  Cc: davide.rossetti, filia, jesse, dwmw2, moje, kakadu_croc

> It is too simple to implement in user mode.

That works for a plain byte-stream on a
local UNIX-style filesystem. (though it
likely isn't the fastest)

It doesn't work for Macintosh files.
It's too slow for CIFS over a modem.
It doesn't work for Windows security data.
It doesn't allow copy-on-write files.
It eats CPU time on compressed filesystems.

> The security context of the output depends
> on the user process. If it is a privileged
> process (ie, may change the context of the
> result) then the user process has to setup
> that context before the file is copied.

So open the file, change context, and then:

long copy_fd_to_file(int fd, const char *name, ...)

(if you can no longer read from the OPEN fd,
either we override that or we just don't care
about such mostly-fictional cases)

> There are also some issues with mandatory
> security controls. If it is copied in kernel
> mode, then the previous labels could be
> automatically carried over to the resulting
> file... But that may not be what you want
> (and frequently, it isn't).

If it matters:

// security as if a new file were created
#define CF_REPLACE_SECURITY 0x00000001
// if unable to replicate, up or down?
#define CF_ROUND_SECURITY_UP 0x00000002
#define CF_ROUND_SECURITY_DOWN 0x00000004
// fail if security can't be replicated
#define CF_SECURITY_EXACT 0x00000008

> Now back to the copy.. You don't have to
> use a read/write loop- mmap is faster.

It's slower. (this is Linux, not SunOS)
Use a 4 kB or 8 kB read/write loop.

> And this is the other reason for not doing
> it in Kernel mode. Buffer management of
> this type is much easier in user space
> since the copy procedure doesn't have to
> deal with memory limitations, cache flushes
> page faulting of processes unrelated to the
> copy, but is related to cache pressure.

Buffer management is very much a kernel thing.

>> Is it? Please explain the simple steps which
>> cp(1) should take in order to observe that it
>> is being asked to duplicate a file on a file
>> system such as CIFS (or NFSv4?) which allows
>> the client to issue a 'copy file' command
>> over the network without actually transferring
>> the data twice, and to invoke such a command.
>
> Ah. That is an optimization question, not a
> question of kernel/user mode.

Note that /bin/cp isn't always going to have
the necessary passwords and such. You're headed
down a path toward setuid /bin/cp.

> Since the error checking for source and
> destination both include doing a stat and
> statfs, the device information (and FS info)
> can both be retrieved.
>
> And mmap doesn't require data transfer "twice"
> (local copy).

Huh? Over the network from server to client
counts as once. Then /bin/cp gets the data.
Then it goes back over the network from the
client to the server. That's "twice". That's
horribly painful for a multi-gigabyte file
and a DSL or cable-modem connection, never
mind a dial-up connection.

> Since that copy only pagefaults (though
> read/write may be faster for some files
> - I thought that was true for small files
> that fit in cache, and large files faster
> via mmap and depends on the page size;
> and the tradeoff would be system dependant).

Keep the read/write loop small for speed.

> And since both source and destination may
> be remote you do get to decide based on
> source and destination devices: if they
> are the same, and one on a remote node,
> then BOTH will be on the remote, then you
> get to use the CIFS/NFS file copy. (check
> the doc on "stat/statfs" for additional info).
>
> I don't believe it works when source and
> destination are on DIFFERENT remote nodes,
> though.
>
> Strictly up to the implementation of cp/mv.
>
> Though you will loose portability of cp/mv.
> (Of course, you also loose it with a syscall
> for file copy too; as well as the MUCH more
> complicated implementation/security checks).

Doing that in cp/mv is just insane. For one,
it bypasses any local security control over
access to the filesystem. There's not even a
way to be sure you're dealing with the server
you think you're dealing with.



^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-10 15:19     ` David Woodhouse
@ 2003-11-10 16:15       ` Jesse Pollard
  0 siblings, 0 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-11-10 16:15 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Ihar 'Philips' Filipau, Davide Rossetti,
	Linux Kernel Mailing List

On Monday 10 November 2003 09:19, David Woodhouse wrote:
> On Mon, 2003-11-10 at 07:29 -0600, Jesse Pollard wrote:
> > > > sys_fscopy(...)
> >
> > It is too simple to implement in user mode.
>
> Is it? Please explain the simple steps which cp(1) should take in order
> to observe that it is being asked to duplicate a file on a file system
> such as CIFS (or NFSv4?) which allows the client to issue a 'copy file'
> command over the network without actually transferring the data twice,
> and to invoke such a command.

Ah. That is an optimization question, not a question of kernel/user mode.

Since the error checking for source and destination both include doing
a stat and statfs, the device information (and FS info) can both be retrieved.

And mmap doesn't require data transfer "twice" (local copy). Since that copy 
only pagefaults (though read/write may be faster for some files - I thought
that was true for small files that fit in cache, and large files faster via
mmap and depends on the page size; and the tradeoff would be system
dependant).

And since both source and destination may be remote you do get to decide
based on source and destination devices: if they are the same, and one on
a remote node, then BOTH will be on the remote, then you get to use the
CIFS/NFS file copy. (check the doc on "stat/statfs" for additional info).

I don't believe it works when source and destination are on DIFFERENT remote
nodes, though.

Strictly up to the implementation of cp/mv.

Though you will loose portability of cp/mv. (Of course, you also loose
it with a syscall for file copy too; as well as the MUCH more complicated
implementation/security checks).

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-10 13:29   ` Jesse Pollard
  2003-11-10 14:22     ` Daniel Jacobowitz
@ 2003-11-10 15:19     ` David Woodhouse
  2003-11-10 16:15       ` Jesse Pollard
  2003-11-11 12:00     ` davide.rossetti
  2 siblings, 1 reply; 77+ messages in thread
From: David Woodhouse @ 2003-11-10 15:19 UTC (permalink / raw)
  To: Jesse Pollard
  Cc: Ihar 'Philips' Filipau, Davide Rossetti,
	Linux Kernel Mailing List

On Mon, 2003-11-10 at 07:29 -0600, Jesse Pollard wrote:

> > > sys_fscopy(...)
> 
> It is too simple to implement in user mode.

Is it? Please explain the simple steps which cp(1) should take in order
to observe that it is being asked to duplicate a file on a file system
such as CIFS (or NFSv4?) which allows the client to issue a 'copy file'
command over the network without actually transferring the data twice,
and to invoke such a command.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-10 13:29   ` Jesse Pollard
@ 2003-11-10 14:22     ` Daniel Jacobowitz
  2003-11-11 20:57       ` Jakob Oestergaard
  2003-11-10 15:19     ` David Woodhouse
  2003-11-11 12:00     ` davide.rossetti
  2 siblings, 1 reply; 77+ messages in thread
From: Daniel Jacobowitz @ 2003-11-10 14:22 UTC (permalink / raw)
  To: Linux Kernel Mailing List

On Mon, Nov 10, 2003 at 07:29:15AM -0600, Jesse Pollard wrote:
> Now back to the copy.. You don't have to use a read/write loop- mmap
> is faster. And this is the other reason for not doing it in Kernel mode.

Actually, last I checked, read/write was actually faster.  Linus
explained why a month or two ago.

-- 
Daniel Jacobowitz
MontaVista Software                         Debian GNU/Linux Developer

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
  2003-11-10 12:08 ` Ihar 'Philips' Filipau
@ 2003-11-10 13:29   ` Jesse Pollard
  2003-11-10 14:22     ` Daniel Jacobowitz
                       ` (2 more replies)
  0 siblings, 3 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-11-10 13:29 UTC (permalink / raw)
  To: Ihar 'Philips' Filipau, Davide Rossetti; +Cc: Linux Kernel Mailing List

On Monday 10 November 2003 06:08, Ihar 'Philips' Filipau wrote:
>    sendfile(2) - ?
I don't think that is what he was referring to.. The sample
code is strictly user mode file->file copying.
> Davide Rossetti wrote:
> > it may be orribly RTFM... but writing a simple framework I realized
> > there is no libc/POSIX/whoknows
> > copy(const char* dest_file_name, const char* src_file_name)
> >
> > What is the technical reason???

It isn't an application for the kernel.

> > I understand that there may be little space for kernel side
> > optimizations in this area but anyway I'm surprised I have to write
> >
> > < the bits to clone the metadata of src_file_name on opening
> > dest_file_name >
> > const int BUFSIZE = 1<<12;
> > char buffer[BUFSIZE];
> > int nrb;
> > while((nrb = read(infd, buffer, BUFSIZE) != -1) {
> >  ret = write(outfd, buffer, nrb);
> >  if(ret != nrb) {...}
> > }
> >
> > instead of something similar to:
> > sys_fscopy(...)

It is too simple to implement in user mode.

There are some other issues too:

The security context of the output depends on the user process.
If it is a privileged process (ie, may change the context of the
result) then the user process has to setup that context before
the file is copied.

There are also some issues with mandatory security controls. If it
is copied in kernel mode, then the previous labels could be automatically
carried over to the resulting file... But that may not be what you
want (and frequently, it isn't).

Now back to the copy.. You don't have to use a read/write loop- mmap
is faster. And this is the other reason for not doing it in Kernel mode.
Buffer management of this type is much easier in user space since the
copy procedure doesn't have to deal with memory limitations, cache flushes
page faulting of processes unrelated to the copy, but is related to cache
pressure.

^ permalink raw reply	[flat|nested] 77+ messages in thread

* Re: OT: why no file copy() libc/syscall ??
       [not found] <QiyV.1k3.15@gated-at.bofh.it>
@ 2003-11-10 12:08 ` Ihar 'Philips' Filipau
  2003-11-10 13:29   ` Jesse Pollard
  0 siblings, 1 reply; 77+ messages in thread
From: Ihar 'Philips' Filipau @ 2003-11-10 12:08 UTC (permalink / raw)
  To: Davide Rossetti; +Cc: Linux Kernel Mailing List


   sendfile(2) - ?

Davide Rossetti wrote:
> it may be orribly RTFM... but writing a simple framework I realized 
> there is no libc/POSIX/whoknows
> copy(const char* dest_file_name, const char* src_file_name)
> 
> What is the technical reason???
> 
> I understand that there may be little space for kernel side 
> optimizations in this area but anyway I'm surprised I have to write
> 
> < the bits to clone the metadata of src_file_name on opening 
> dest_file_name >
> const int BUFSIZE = 1<<12;
> char buffer[BUFSIZE];
> int nrb;
> while((nrb = read(infd, buffer, BUFSIZE) != -1) {
>  ret = write(outfd, buffer, nrb);
>  if(ret != nrb) {...}
> }
> 
> instead of something similar to:
> sys_fscopy(...)
> 
> regards
> 


-- 
Ihar 'Philips' Filipau  / with best regards from Saarbruecken.
--                                                           _ _ _
  "... and for $64000 question, could you get yourself       |_|*|_|
    vaguely familiar with the notion of on-topic posting?"   |_|_|*|
                                 -- Al Viro @ LKML           |*|*|*|


^ permalink raw reply	[flat|nested] 77+ messages in thread

* OT: why no file copy() libc/syscall ??
@ 2003-11-10 11:33 Davide Rossetti
  0 siblings, 0 replies; 77+ messages in thread
From: Davide Rossetti @ 2003-11-10 11:33 UTC (permalink / raw)
  To: linux-kernel

it may be orribly RTFM... but writing a simple framework I realized 
there is no libc/POSIX/whoknows
copy(const char* dest_file_name, const char* src_file_name)

What is the technical reason???

I understand that there may be little space for kernel side 
optimizations in this area but anyway I'm surprised I have to write

< the bits to clone the metadata of src_file_name on opening 
dest_file_name >
const int BUFSIZE = 1<<12;
char buffer[BUFSIZE];
int nrb;
while((nrb = read(infd, buffer, BUFSIZE) != -1) {
  ret = write(outfd, buffer, nrb);
  if(ret != nrb) {...}
}

instead of something similar to:
sys_fscopy(...)

regards




^ permalink raw reply	[flat|nested] 77+ messages in thread

end of thread, other threads:[~2003-12-01 16:36 UTC | newest]

Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-10 12:09 OT: why no file copy() libc/syscall ?? Bradley Chapman
2003-11-10 18:47 ` Tomas Konir
2003-11-10 22:44 ` Derek Foreman
     [not found] <1068512710.722.161.camel@cube.suse.lists.linux.kernel>
     [not found] ` <20031111133859.GA11115@bitwizard.nl.suse.lists.linux.kernel>
     [not found]   ` <20031111085323.M8854@devserv.devel.redhat.com.suse.lists.linux.kernel>
     [not found]     ` <bp0p5m$lke$1@cesium.transmeta.com.suse.lists.linux.kernel>
     [not found]       ` <20031113233915.GO1649@x30.random.suse.lists.linux.kernel>
     [not found]         ` <3FB4238A.40605@zytor.com.suse.lists.linux.kernel>
     [not found]           ` <20031114011009.GP1649@x30.random.suse.lists.linux.kernel>
     [not found]             ` <3FB42CC4.9030009@zytor.com.suse.lists.linux.kernel>
2003-11-14 15:26               ` Andi Kleen
2003-11-18 15:49                 ` Jamie Lokier
2003-11-18 16:05                   ` Andi Kleen
2003-11-18 16:25                     ` Trond Myklebust
2003-11-19 13:30                   ` Jesse Pollard
2003-11-18 16:58                 ` H. Peter Anvin
2003-11-19  2:12                 ` Linus Torvalds
2003-11-19  4:04                 ` Chris Adams
     [not found] <Qvw7.5Qf.9@gated-at.bofh.it>
     [not found] ` <QxRl.17Y.9@gated-at.bofh.it>
     [not found]   ` <Qy0W.1sk.9@gated-at.bofh.it>
     [not found]     ` <QyaB.1GK.17@gated-at.bofh.it>
     [not found]       ` <QzSZ.4x1.1@gated-at.bofh.it>
     [not found]         ` <QCHh.X6.3@gated-at.bofh.it>
2003-11-11  9:51           ` Ihar 'Philips' Filipau
2003-11-11 10:41             ` jw schultz
     [not found] ` <QH4e.eV.3@gated-at.bofh.it>
2003-11-11 14:11   ` Ihar 'Philips' Filipau
2003-11-11 15:02     ` Rogier Wolff
2003-11-11 15:31       ` Ihar 'Philips' Filipau
2003-11-11 20:22       ` Jan Harkes
2003-11-11 20:31         ` Valdis.Kletnieks
     [not found] <QDtX.2dq.15@gated-at.bofh.it>
     [not found] ` <QDtX.2dq.17@gated-at.bofh.it>
     [not found]   ` <QDtX.2dq.19@gated-at.bofh.it>
     [not found]     ` <QDtX.2dq.21@gated-at.bofh.it>
     [not found]       ` <QDtX.2dq.23@gated-at.bofh.it>
     [not found]         ` <QDtY.2dq.25@gated-at.bofh.it>
     [not found]           ` <QDtX.2dq.13@gated-at.bofh.it>
     [not found]             ` <QEg2.3zi.9@gated-at.bofh.it>
2003-11-11 12:43               ` Ihar 'Philips' Filipau
  -- strict thread matches above, loose matches on Subject: below --
2003-11-11  1:05 Albert Cahalan
2003-11-11  3:50 ` Andreas Dilger
2003-11-11  4:03   ` Daniel Gryniewicz
2003-11-11  4:14     ` Valdis.Kletnieks
2003-11-11  6:00       ` Andreas Dilger
2003-11-11  8:58         ` Florian Weimer
2003-11-11 10:27           ` jw schultz
2003-11-11 20:08             ` Jan Harkes
2003-11-12 15:36           ` Jesse Pollard
2003-11-20 17:21             ` Florian Weimer
2003-11-20 19:08               ` Jesse Pollard
2003-11-20 19:12                 ` Florian Weimer
2003-11-20 19:44                 ` Justin Cormack
2003-11-20 20:44                   ` Timothy Miller
2003-11-20 21:07                     ` Andreas Dilger
2003-11-20 21:30                       ` Timothy Miller
2003-11-20 21:49                         ` Maciej Zenczykowski
2003-11-20 21:52                           ` Timothy Miller
2003-11-20 21:58                         ` Hua Zhong
2003-11-22 14:50                         ` Pavel Machek
2003-11-22 19:50                           ` Jamie Lokier
2003-11-22 23:07                             ` Andreas Schwab
2003-11-21 16:24                   ` Jesse Pollard
2003-11-20 21:48                 ` Maciej Zenczykowski
2003-11-21 16:34                   ` Jesse Pollard
2003-11-20 22:31                 ` Xavier Bestel
2003-11-20 22:44                   ` Andreas Dilger
2003-11-27  2:40                 ` Robert White
2003-11-27  7:29                   ` Nick Piggin
2003-11-27  9:15                     ` David Lang
2003-11-27  8:56                       ` Nick Piggin
2003-11-27  9:50                         ` David Lang
2003-11-27 10:02                           ` Jörn Engel
2003-11-27 10:58                             ` David Lang
2003-12-01 16:20                               ` Jesse Pollard
2003-11-11  8:52   ` Gábor Lénárt
2003-11-11 13:38 ` Rogier Wolff
2003-11-11 13:53   ` Jakub Jelinek
2003-11-11 13:58     ` David Woodhouse
2003-11-13 20:22     ` H. Peter Anvin
2003-11-13 23:39       ` Andrea Arcangeli
2003-11-14  0:04         ` jw schultz
2003-11-14  0:36         ` H. Peter Anvin
2003-11-14  1:10           ` Andrea Arcangeli
2003-11-14  1:15             ` H. Peter Anvin
2003-11-11 14:11   ` Albert Cahalan
2003-11-12 15:19 ` Jesse Pollard
2003-11-14  3:42   ` Albert Cahalan
     [not found] <QiyV.1k3.15@gated-at.bofh.it>
2003-11-10 12:08 ` Ihar 'Philips' Filipau
2003-11-10 13:29   ` Jesse Pollard
2003-11-10 14:22     ` Daniel Jacobowitz
2003-11-11 20:57       ` Jakob Oestergaard
2003-11-10 15:19     ` David Woodhouse
2003-11-10 16:15       ` Jesse Pollard
2003-11-11 12:00     ` davide.rossetti
2003-11-11 12:08       ` Andreas Schwab
2003-11-11 12:23         ` davide.rossetti
2003-11-10 11:33 Davide Rossetti

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).