* Re: OT: why no file copy() libc/syscall ??
[not found] ` <3FB42CC4.9030009@zytor.com.suse.lists.linux.kernel>
@ 2003-11-14 15:26 ` Andi Kleen
2003-11-18 15:49 ` Jamie Lokier
` (3 more replies)
0 siblings, 4 replies; 77+ messages in thread
From: Andi Kleen @ 2003-11-14 15:26 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: linux-kernel
"H. Peter Anvin" <hpa@zytor.com> writes:
> Andrea Arcangeli wrote:
> > On Thu, Nov 13, 2003 at 04:36:26PM -0800, H. Peter Anvin wrote:
> >
> >>... or we could put in checks into the kernel for signal pending, and
> >>return EINTR.
> >
> > that would be even better indeed.
> >
>
> s/EINTR/short count/, of course :)
That would be buggy because existing users of sendfile don't know
about this and would silently only copy part of the file when a signal
happens.
-Andi
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-14 15:26 ` OT: why no file copy() libc/syscall ?? Andi Kleen
@ 2003-11-18 15:49 ` Jamie Lokier
2003-11-18 16:05 ` Andi Kleen
2003-11-19 13:30 ` Jesse Pollard
2003-11-18 16:58 ` H. Peter Anvin
` (2 subsequent siblings)
3 siblings, 2 replies; 77+ messages in thread
From: Jamie Lokier @ 2003-11-18 15:49 UTC (permalink / raw)
To: Andi Kleen; +Cc: H. Peter Anvin, linux-kernel
Andi Kleen wrote:
> > s/EINTR/short count/, of course :)
>
> That would be buggy because existing users of sendfile don't know
> about this and would silently only copy part of the file when a signal
> happens.
That doesn't make sense. There aren't any existing users of sendfile
to copy files.
-- Jamie
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-18 15:49 ` Jamie Lokier
@ 2003-11-18 16:05 ` Andi Kleen
2003-11-18 16:25 ` Trond Myklebust
2003-11-19 13:30 ` Jesse Pollard
1 sibling, 1 reply; 77+ messages in thread
From: Andi Kleen @ 2003-11-18 16:05 UTC (permalink / raw)
To: Jamie Lokier; +Cc: hpa, linux-kernel
On Tue, 18 Nov 2003 15:49:21 +0000
Jamie Lokier <jamie@shareable.org> wrote:
> Andi Kleen wrote:
> > > s/EINTR/short count/, of course :)
> >
> > That would be buggy because existing users of sendfile don't know
> > about this and would silently only copy part of the file when a signal
> > happens.
>
> That doesn't make sense. There aren't any existing users of sendfile
> to copy files.
[ignore the mail, it was an stuck mail queue]
But note that arbitary changes in the signal handling would affect all users of sendfile, not just
those that attempt to copy files or do other things that should be done in user space.
-Andi
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-18 16:05 ` Andi Kleen
@ 2003-11-18 16:25 ` Trond Myklebust
0 siblings, 0 replies; 77+ messages in thread
From: Trond Myklebust @ 2003-11-18 16:25 UTC (permalink / raw)
To: Andi Kleen; +Cc: Jamie Lokier, hpa, linux-kernel
>>>>> " " == Andi Kleen <ak@suse.de> writes:
>> > That would be buggy because existing users of sendfile don't
>> > know about this and would silently only copy part of the file
>> > when a signal happens.
>>
>> That doesn't make sense. There aren't any existing users of
>> sendfile to copy files.
> [ignore the mail, it was an stuck mail queue]
> But note that arbitary changes in the signal handling would
> affect all users of sendfile, not just those that attempt to
> copy files or do other things that should be done in user
> space.
That 'change' is already in effect for people who mount their NFS
partitions with the "intr" or "soft" flags.
See the return value of generic_file_sendfile(): it already has the
read()/write-like semantics of returning number of bytes written if
non-zero, or the value of desc.error if not.
Cheers,
Trond
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-18 15:49 ` Jamie Lokier
2003-11-18 16:05 ` Andi Kleen
@ 2003-11-19 13:30 ` Jesse Pollard
1 sibling, 0 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-11-19 13:30 UTC (permalink / raw)
To: Jamie Lokier, Andi Kleen; +Cc: H. Peter Anvin, linux-kernel
On Tuesday 18 November 2003 09:49, Jamie Lokier wrote:
> Andi Kleen wrote:
> > > s/EINTR/short count/, of course :)
> >
> > That would be buggy because existing users of sendfile don't know
> > about this and would silently only copy part of the file when a signal
> > happens.
>
> That doesn't make sense. There aren't any existing users of sendfile
> to copy files.
True. It also doesn't address the issue of what to do when the file copy is
being done on a remote server and not by something local. Synchronizing
a remote interrupt could really be nasty.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-14 15:26 ` OT: why no file copy() libc/syscall ?? Andi Kleen
2003-11-18 15:49 ` Jamie Lokier
@ 2003-11-18 16:58 ` H. Peter Anvin
2003-11-19 2:12 ` Linus Torvalds
2003-11-19 4:04 ` Chris Adams
3 siblings, 0 replies; 77+ messages in thread
From: H. Peter Anvin @ 2003-11-18 16:58 UTC (permalink / raw)
To: Andi Kleen; +Cc: linux-kernel
Andi Kleen wrote:
>
> That would be buggy because existing users of sendfile don't know
> about this and would silently only copy part of the file when a signal
> happens.
>
It would be consistent with the documented semantics for other file
operations. Obviously, return zero only on EOF.
-hpa
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-14 15:26 ` OT: why no file copy() libc/syscall ?? Andi Kleen
2003-11-18 15:49 ` Jamie Lokier
2003-11-18 16:58 ` H. Peter Anvin
@ 2003-11-19 2:12 ` Linus Torvalds
2003-11-19 4:04 ` Chris Adams
3 siblings, 0 replies; 77+ messages in thread
From: Linus Torvalds @ 2003-11-19 2:12 UTC (permalink / raw)
To: Andi Kleen; +Cc: H. Peter Anvin, linux-kernel
On 14 Nov 2003, Andi Kleen wrote:
>
> That would be buggy because existing users of sendfile don't know
> about this and would silently only copy part of the file when a signal
> happens.
Don't be silly.
Existing sendfile users had _better_ accept short writes.
They happen all the time. If the destination is the network, it _will_ be
interruptible.
Linus
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-14 15:26 ` OT: why no file copy() libc/syscall ?? Andi Kleen
` (2 preceding siblings ...)
2003-11-19 2:12 ` Linus Torvalds
@ 2003-11-19 4:04 ` Chris Adams
3 siblings, 0 replies; 77+ messages in thread
From: Chris Adams @ 2003-11-19 4:04 UTC (permalink / raw)
To: linux-kernel
Once upon a time, Andi Kleen <ak@suse.de> wrote:
>"H. Peter Anvin" <hpa@zytor.com> writes:
>> s/EINTR/short count/, of course :)
>That would be buggy because existing users of sendfile don't know
>about this and would silently only copy part of the file when a signal
>happens.
Tru64 5.1B sendfile(2) page includes:
[EINTR]
A signal interrupted sendfile before any data was
transmitted. If some data was transmitted, the func-
tion returns the number of bytes sent before the
interrupt and does not set errno to [EINTR].
There are quite a few more documented return values under Tru64,
although TCP sockets are the only supported destination. See
http://h30097.www3.hp.com/docs/base_doc/DOCUMENTATION/V51B_HTML/MAN/MAN2/0024____.HTM
--
Chris Adams <cmadams@hiwaay.net>
Systems and Network Administrator - HiWAAY Internet Services
I don't speak for anybody but myself - that's enough trouble.
^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <Qvw7.5Qf.9@gated-at.bofh.it>]
[parent not found: <QDtX.2dq.15@gated-at.bofh.it>]
* Re: OT: why no file copy() libc/syscall ??
@ 2003-11-11 1:05 Albert Cahalan
2003-11-11 3:50 ` Andreas Dilger
` (2 more replies)
0 siblings, 3 replies; 77+ messages in thread
From: Albert Cahalan @ 2003-11-11 1:05 UTC (permalink / raw)
To: linux-kernel mailing list
Cc: davide.rossetti, filia, jesse, dwmw2, moje, kakadu_croc
> It is too simple to implement in user mode.
That works for a plain byte-stream on a
local UNIX-style filesystem. (though it
likely isn't the fastest)
It doesn't work for Macintosh files.
It's too slow for CIFS over a modem.
It doesn't work for Windows security data.
It doesn't allow copy-on-write files.
It eats CPU time on compressed filesystems.
> The security context of the output depends
> on the user process. If it is a privileged
> process (ie, may change the context of the
> result) then the user process has to setup
> that context before the file is copied.
So open the file, change context, and then:
long copy_fd_to_file(int fd, const char *name, ...)
(if you can no longer read from the OPEN fd,
either we override that or we just don't care
about such mostly-fictional cases)
> There are also some issues with mandatory
> security controls. If it is copied in kernel
> mode, then the previous labels could be
> automatically carried over to the resulting
> file... But that may not be what you want
> (and frequently, it isn't).
If it matters:
// security as if a new file were created
#define CF_REPLACE_SECURITY 0x00000001
// if unable to replicate, up or down?
#define CF_ROUND_SECURITY_UP 0x00000002
#define CF_ROUND_SECURITY_DOWN 0x00000004
// fail if security can't be replicated
#define CF_SECURITY_EXACT 0x00000008
> Now back to the copy.. You don't have to
> use a read/write loop- mmap is faster.
It's slower. (this is Linux, not SunOS)
Use a 4 kB or 8 kB read/write loop.
> And this is the other reason for not doing
> it in Kernel mode. Buffer management of
> this type is much easier in user space
> since the copy procedure doesn't have to
> deal with memory limitations, cache flushes
> page faulting of processes unrelated to the
> copy, but is related to cache pressure.
Buffer management is very much a kernel thing.
>> Is it? Please explain the simple steps which
>> cp(1) should take in order to observe that it
>> is being asked to duplicate a file on a file
>> system such as CIFS (or NFSv4?) which allows
>> the client to issue a 'copy file' command
>> over the network without actually transferring
>> the data twice, and to invoke such a command.
>
> Ah. That is an optimization question, not a
> question of kernel/user mode.
Note that /bin/cp isn't always going to have
the necessary passwords and such. You're headed
down a path toward setuid /bin/cp.
> Since the error checking for source and
> destination both include doing a stat and
> statfs, the device information (and FS info)
> can both be retrieved.
>
> And mmap doesn't require data transfer "twice"
> (local copy).
Huh? Over the network from server to client
counts as once. Then /bin/cp gets the data.
Then it goes back over the network from the
client to the server. That's "twice". That's
horribly painful for a multi-gigabyte file
and a DSL or cable-modem connection, never
mind a dial-up connection.
> Since that copy only pagefaults (though
> read/write may be faster for some files
> - I thought that was true for small files
> that fit in cache, and large files faster
> via mmap and depends on the page size;
> and the tradeoff would be system dependant).
Keep the read/write loop small for speed.
> And since both source and destination may
> be remote you do get to decide based on
> source and destination devices: if they
> are the same, and one on a remote node,
> then BOTH will be on the remote, then you
> get to use the CIFS/NFS file copy. (check
> the doc on "stat/statfs" for additional info).
>
> I don't believe it works when source and
> destination are on DIFFERENT remote nodes,
> though.
>
> Strictly up to the implementation of cp/mv.
>
> Though you will loose portability of cp/mv.
> (Of course, you also loose it with a syscall
> for file copy too; as well as the MUCH more
> complicated implementation/security checks).
Doing that in cp/mv is just insane. For one,
it bypasses any local security control over
access to the filesystem. There's not even a
way to be sure you're dealing with the server
you think you're dealing with.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 1:05 Albert Cahalan
@ 2003-11-11 3:50 ` Andreas Dilger
2003-11-11 4:03 ` Daniel Gryniewicz
2003-11-11 8:52 ` Gábor Lénárt
2003-11-11 13:38 ` Rogier Wolff
2003-11-12 15:19 ` Jesse Pollard
2 siblings, 2 replies; 77+ messages in thread
From: Andreas Dilger @ 2003-11-11 3:50 UTC (permalink / raw)
To: Albert Cahalan
Cc: linux-kernel mailing list, davide.rossetti, filia, jesse, dwmw2,
moje, kakadu_croc
On Nov 10, 2003 20:05 -0500, Albert Cahalan wrote:
> > It is too simple to implement in user mode.
>
> That works for a plain byte-stream on a
> local UNIX-style filesystem. (though it
> likely isn't the fastest)
>
> It doesn't work for Macintosh files.
> It's too slow for CIFS over a modem.
> It doesn't work for Windows security data.
> It doesn't allow copy-on-write files.
> It eats CPU time on compressed filesystems.
Having a sys_copy() syscall would be incredibly useful for Lustre
(distributed Linux fs). We could start a copy from one storage node
to another (or more likely many to many for a file striped over many
storage nodes) at num_stripes * uni-directional bandwidth with no
impact to the client node. Instead, we have to copy files at best a
single client's bi-directional network_bandwidth.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 3:50 ` Andreas Dilger
@ 2003-11-11 4:03 ` Daniel Gryniewicz
2003-11-11 4:14 ` Valdis.Kletnieks
2003-11-11 8:52 ` Gábor Lénárt
1 sibling, 1 reply; 77+ messages in thread
From: Daniel Gryniewicz @ 2003-11-11 4:03 UTC (permalink / raw)
To: Andreas Dilger
Cc: Albert Cahalan, linux-kernel mailing list, davide.rossetti,
filia, jesse, dwmw2, moje, kakadu_croc
[-- Attachment #1: Type: text/plain, Size: 687 bytes --]
On Mon, 2003-11-10 at 22:50, Andreas Dilger wrote:
> Having a sys_copy() syscall would be incredibly useful for Lustre
> (distributed Linux fs). We could start a copy from one storage node
> to another (or more likely many to many for a file striped over many
> storage nodes) at num_stripes * uni-directional bandwidth with no
> impact to the client node. Instead, we have to copy files at best a
> single client's bi-directional network_bandwidth.
Plus a sys_copy() syscall could be used as a generic way for filesystems
to set up Copy-on-Write. Right now, you'd need to have userspace call
sys-reiser4 or something like that.
--
Daniel Gryniewicz <dang@fprintf.net>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 4:03 ` Daniel Gryniewicz
@ 2003-11-11 4:14 ` Valdis.Kletnieks
2003-11-11 6:00 ` Andreas Dilger
0 siblings, 1 reply; 77+ messages in thread
From: Valdis.Kletnieks @ 2003-11-11 4:14 UTC (permalink / raw)
To: Daniel Gryniewicz; +Cc: linux-kernel mailing list
[-- Attachment #1: Type: text/plain, Size: 473 bytes --]
On Mon, 10 Nov 2003 23:03:26 EST, Daniel Gryniewicz said:
> Plus a sys_copy() syscall could be used as a generic way for filesystems
> to set up Copy-on-Write. Right now, you'd need to have userspace call
> sys-reiser4 or something like that.
This is fast turning into a creeping horror of aggregation. I defy anybody
to create an API to cover all the options mentioned so far and *not* have it
look like the process_clone horror we so roundly derided a few weeks ago.
[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 4:14 ` Valdis.Kletnieks
@ 2003-11-11 6:00 ` Andreas Dilger
2003-11-11 8:58 ` Florian Weimer
0 siblings, 1 reply; 77+ messages in thread
From: Andreas Dilger @ 2003-11-11 6:00 UTC (permalink / raw)
To: Valdis.Kletnieks; +Cc: Daniel Gryniewicz, linux-kernel mailing list
On Nov 10, 2003 23:14 -0500, Valdis.Kletnieks@vt.edu wrote:
> On Mon, 10 Nov 2003 23:03:26 EST, Daniel Gryniewicz said:
> > Plus a sys_copy() syscall could be used as a generic way for filesystems
> > to set up Copy-on-Write. Right now, you'd need to have userspace call
> > sys-reiser4 or something like that.
>
> This is fast turning into a creeping horror of aggregation. I defy anybody
> to create an API to cover all the options mentioned so far and *not* have it
> look like the process_clone horror we so roundly derided a few weeks ago.
int sys_copy(int fd_src, int fd_dst)
It is up to the filesystem to decide if both files are on the same device
and can be copied with a copy RPC (or whatever). If the filesystem returns
-EOPNOTSUPP then the VFS goes into a simple readpages/writepages loop to do
the copy instead, maybe also copying ACLs or other things the VFS understands.
All of the "extra functionality" is being handled in the filesystem itself
and not the VFS or the API. Copy-on-write is an fs-internal issue depending
on whether fs supports it, how it was mounted, etc. Remote copy is also an
fs-internal issue depending on whether inodes are in same filesystem, support,
etc. You might get into fun things like doing zero-copy.
Telling the filesystem we are doing a copy vs. a bunch of reads mixed
with a bunch of writes is just semantically something that the filesystem
should know about.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 6:00 ` Andreas Dilger
@ 2003-11-11 8:58 ` Florian Weimer
2003-11-11 10:27 ` jw schultz
2003-11-12 15:36 ` Jesse Pollard
0 siblings, 2 replies; 77+ messages in thread
From: Florian Weimer @ 2003-11-11 8:58 UTC (permalink / raw)
To: Valdis.Kletnieks, Daniel Gryniewicz, linux-kernel mailing list
Andreas Dilger wrote:
> > This is fast turning into a creeping horror of aggregation. I defy anybody
> > to create an API to cover all the options mentioned so far and *not* have it
> > look like the process_clone horror we so roundly derided a few weeks ago.
>
> int sys_copy(int fd_src, int fd_dst)
Doesn't work. You have to set the security attributes while you open
fd_dst.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 8:58 ` Florian Weimer
@ 2003-11-11 10:27 ` jw schultz
2003-11-11 20:08 ` Jan Harkes
2003-11-12 15:36 ` Jesse Pollard
1 sibling, 1 reply; 77+ messages in thread
From: jw schultz @ 2003-11-11 10:27 UTC (permalink / raw)
To: linux-kernel mailing list
On Tue, Nov 11, 2003 at 09:58:06AM +0100, Florian Weimer wrote:
> Andreas Dilger wrote:
>
> > > This is fast turning into a creeping horror of aggregation. I defy anybody
> > > to create an API to cover all the options mentioned so far and *not* have it
> > > look like the process_clone horror we so roundly derided a few weeks ago.
> >
> > int sys_copy(int fd_src, int fd_dst)
That sounds a lot like a sendfile with a file as the
destination. Useful but still happening on the local system.
My understanding was that this was to be sent to a remote
system where the file descriptors might not be open.
>
> Doesn't work. You have to set the security attributes while you open
> fd_dst.
That would have been done with open().
To operate on paths so it could be sent to a fileserver it
would need the same arguments as open() with the addition of
the newpath.
int sys_copy(const char *oldpath, const char *oldpath,
int flags, mode_t mode);
O_TRUNC replace an existing file.
O_EXCL prevent replacing an existing file.
O_APPEND concatenate (useful feature creep).
O_NDELAY/O_NONBLOCK return and ignore ENOSPACE condition, ick!
O_SYNC if O_SYNC supported for open
O_NOFOLLOW don't follow symlink (no need for a lcopy())
EXDEV (see link(2)) seems a better error code for cases
where the source and destination are on different servers.
Otherwise the error codes would conform to open(2).
I've long thought a file copy syscall was missing from unix
but until you start networking it isn't an issue.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw@pegasys.ws
Remember Cernan and Schmitt
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 10:27 ` jw schultz
@ 2003-11-11 20:08 ` Jan Harkes
0 siblings, 0 replies; 77+ messages in thread
From: Jan Harkes @ 2003-11-11 20:08 UTC (permalink / raw)
To: linux-kernel mailing list
On Tue, Nov 11, 2003 at 02:27:42AM -0800, jw schultz wrote:
> On Tue, Nov 11, 2003 at 09:58:06AM +0100, Florian Weimer wrote:
> > Andreas Dilger wrote:
> >
> > > > This is fast turning into a creeping horror of aggregation. I defy anybody
> > > > to create an API to cover all the options mentioned so far and *not* have it
> > > > look like the process_clone horror we so roundly derided a few weeks ago.
> > >
> > > int sys_copy(int fd_src, int fd_dst)
>
> That sounds a lot like a sendfile with a file as the
> destination. Useful but still happening on the local system.
> My understanding was that this was to be sent to a remote
> system where the file descriptors might not be open.
It probably should be sendfile, where the destination fd is a local file
instead of a socket. We really do not want to pass pathnames down into
the filesystem layer. As far as I know, no existing VFS operation does
that and it probably isn't a good idea to start doing it now.
Somehow the filesystem that 'hosts' the src_fd object should get a
chance to see/intercept the sendfile syscall, and it can then decide
based on the dst_fd object what to do. If the destination happens to be
in the same filesystem it could possibly use a special internal copyfile
rpc call or CoW implementation.
The userspace/libc code could provide a copyfile(char* src, char* dst,
int flags, int mode) wrapper, which can also handle falling back to a
simple read/write loop when sendfile fails.
So we clearly don't need a new system call, sendfile would do fine and
interestingly the manual page I'm reading now mentions that the source
has to be a mmap-able object, but lists no such restrictions on the
destination fd. Maybe sendfile already works and we just need to give the
filesystems a chance to override it.
Jan
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 8:58 ` Florian Weimer
2003-11-11 10:27 ` jw schultz
@ 2003-11-12 15:36 ` Jesse Pollard
2003-11-20 17:21 ` Florian Weimer
1 sibling, 1 reply; 77+ messages in thread
From: Jesse Pollard @ 2003-11-12 15:36 UTC (permalink / raw)
To: Florian Weimer, Valdis.Kletnieks, Daniel Gryniewicz,
linux-kernel mailing list
On Tuesday 11 November 2003 02:58, Florian Weimer wrote:
> Andreas Dilger wrote:
> > > This is fast turning into a creeping horror of aggregation. I defy
> > > anybody to create an API to cover all the options mentioned so far and
> > > *not* have it look like the process_clone horror we so roundly derided
> > > a few weeks ago.
> >
> > int sys_copy(int fd_src, int fd_dst)
>
> Doesn't work. You have to set the security attributes while you open
> fd_dst.
Why? the open for fd_src should have the security attributes (both locally
and in the file server if networked). Opening fd_dst should SET the security
attributes desired (again, locally and in the target fileserver).
Then the sys_copy(fd_src,fd_dst) can take place in the FS code. And of course
it is necessary that fd_src and fd_dst reside on the same device. If they
don't, then the sys_copy should fail.
If the sys_copy is a remote filesystem then fd_src, and fd_dst must be
replaced by the remote file handles and this passed to the remote server.
Any additional checks may then be made from the evaluation of the file handles
locally on the file server, using the security credentials belonging to the
file handles.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-12 15:36 ` Jesse Pollard
@ 2003-11-20 17:21 ` Florian Weimer
2003-11-20 19:08 ` Jesse Pollard
0 siblings, 1 reply; 77+ messages in thread
From: Florian Weimer @ 2003-11-20 17:21 UTC (permalink / raw)
To: Jesse Pollard
Cc: Valdis.Kletnieks, Daniel Gryniewicz, linux-kernel mailing list
Jesse Pollard wrote:
> > > int sys_copy(int fd_src, int fd_dst)
> >
> > Doesn't work. You have to set the security attributes while you open
> > fd_dst.
>
> Why? the open for fd_src should have the security attributes (both locally
> and in the file server if networked). Opening fd_dst should SET the security
> attributes desired (again, locally and in the target fileserver).
The default attributes in the new location might be less strict than the
attributes of the source file.
If sys_copy() is just an API to introduce a new copy-on-write hard link,
these problems disappear. They are only relevant if sys_copy() is
intended to be a generic "copy that file" interface.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 17:21 ` Florian Weimer
@ 2003-11-20 19:08 ` Jesse Pollard
2003-11-20 19:12 ` Florian Weimer
` (4 more replies)
0 siblings, 5 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-11-20 19:08 UTC (permalink / raw)
To: Florian Weimer
Cc: Valdis.Kletnieks, Daniel Gryniewicz, linux-kernel mailing list
On Thursday 20 November 2003 11:21, Florian Weimer wrote:
> Jesse Pollard wrote:
> > > > int sys_copy(int fd_src, int fd_dst)
> > >
> > > Doesn't work. You have to set the security attributes while you open
> > > fd_dst.
> >
> > Why? the open for fd_src should have the security attributes (both
> > locally and in the file server if networked). Opening fd_dst should SET
> > the security attributes desired (again, locally and in the target
> > fileserver).
>
> The default attributes in the new location might be less strict than the
> attributes of the source file.
So what. the user was authorized to open the input file. The user was
authorized to open the output file. A file copy should be possible remotely
since the equivalent implementation of a local read/write loop would
accomplish the same thing.
> If sys_copy() is just an API to introduce a new copy-on-write hard link,
> these problems disappear. They are only relevant if sys_copy() is
> intended to be a generic "copy that file" interface.
Now if you wanted the remote server to deny the network copy... could
be done - after all the credentials for both input and output files
are present on the server. If the server decides NOT to copy, then fine.
It would just cause the user to make the copy with a read/write loop.
I was only thinking of it as a way to gain access to any filesystem
support that may be available for copying files. If none is available,
then do it in user mode.
Personally, I'm not sure it is a good idea, partly because the semantics
of a file copy operation are not well defined (some of the following IS
known).
1. what happens if the copy is aborted?
2. what happens if the network drops while the remote server continues?
3. what about buffer synchronization?
4. what errors should be reported ?
5. what happens when the syscall is interupted? Especially if the remote
copy may take a while (I've seen some require an hour or more - worst
case: days due to a media error (completed after the disk was replaced)).
6. what about a client opening the copy before it is finished copying?
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 19:08 ` Jesse Pollard
@ 2003-11-20 19:12 ` Florian Weimer
2003-11-20 19:44 ` Justin Cormack
` (3 subsequent siblings)
4 siblings, 0 replies; 77+ messages in thread
From: Florian Weimer @ 2003-11-20 19:12 UTC (permalink / raw)
To: Jesse Pollard
Cc: Valdis.Kletnieks, Daniel Gryniewicz, linux-kernel mailing list
Jesse Pollard wrote:
> > > > > int sys_copy(int fd_src, int fd_dst)
> > The default attributes in the new location might be less strict than the
> > attributes of the source file.
>
> So what. the user was authorized to open the input file. The user was
> authorized to open the output file. A file copy should be possible remotely
> since the equivalent implementation of a local read/write loop would
> accomplish the same thing.
The potential for race conditions worries me. However, the questions
you gave are more fundamental and may be enough to kill this idea (if it
wasn't already dead)...
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 19:08 ` Jesse Pollard
2003-11-20 19:12 ` Florian Weimer
@ 2003-11-20 19:44 ` Justin Cormack
2003-11-20 20:44 ` Timothy Miller
2003-11-21 16:24 ` Jesse Pollard
2003-11-20 21:48 ` Maciej Zenczykowski
` (2 subsequent siblings)
4 siblings, 2 replies; 77+ messages in thread
From: Justin Cormack @ 2003-11-20 19:44 UTC (permalink / raw)
To: Jesse Pollard; +Cc: linux-kernel mailing list
On Thu, 2003-11-20 at 19:08, Jesse Pollard wrote:
> Now if you wanted the remote server to deny the network copy... could
> be done - after all the credentials for both input and output files
> are present on the server. If the server decides NOT to copy, then fine.
> It would just cause the user to make the copy with a read/write loop.
>
> I was only thinking of it as a way to gain access to any filesystem
> support that may be available for copying files. If none is available,
> then do it in user mode.
>
> Personally, I'm not sure it is a good idea, partly because the semantics
> of a file copy operation are not well defined (some of the following IS
> known).
>
> 1. what happens if the copy is aborted?
> 2. what happens if the network drops while the remote server continues?
> 3. what about buffer synchronization?
> 4. what errors should be reported ?
> 5. what happens when the syscall is interupted? Especially if the remote
> copy may take a while (I've seen some require an hour or more - worst
> case: days due to a media error (completed after the disk was replaced)).
> 6. what about a client opening the copy before it is finished copying?
If you really want a filesystem that supports efficient copying you
probably want it to have the equivalent of COW blocks, so that a copy
just sets up a few pointers, and the copy only happens when the original
or copied files are changed.
But basically you wont get a syscall until you have a filesystem with
semantics that only maps onto this sort of operation.
Justin
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 19:44 ` Justin Cormack
@ 2003-11-20 20:44 ` Timothy Miller
2003-11-20 21:07 ` Andreas Dilger
2003-11-21 16:24 ` Jesse Pollard
1 sibling, 1 reply; 77+ messages in thread
From: Timothy Miller @ 2003-11-20 20:44 UTC (permalink / raw)
To: Justin Cormack; +Cc: Jesse Pollard, linux-kernel mailing list
Justin Cormack wrote:
> On Thu, 2003-11-20 at 19:08, Jesse Pollard wrote:
> If you really want a filesystem that supports efficient copying you
> probably want it to have the equivalent of COW blocks, so that a copy
> just sets up a few pointers, and the copy only happens when the original
> or copied files are changed.
>
> But basically you wont get a syscall until you have a filesystem with
> semantics that only maps onto this sort of operation.
This could be a problem if COW causes you to run out of space when
writing to the file.
This could also be a benefit if, for whatever reason, you have lots of
copies of the same file that you never change. But that sounds somewhat
pointless to me.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 20:44 ` Timothy Miller
@ 2003-11-20 21:07 ` Andreas Dilger
2003-11-20 21:30 ` Timothy Miller
0 siblings, 1 reply; 77+ messages in thread
From: Andreas Dilger @ 2003-11-20 21:07 UTC (permalink / raw)
To: Timothy Miller; +Cc: Justin Cormack, Jesse Pollard, linux-kernel mailing list
On Nov 20, 2003 15:44 -0500, Timothy Miller wrote:
> This could be a problem if COW causes you to run out of space when
> writing to the file.
Not much different than running out of space copying a file.
> This could also be a benefit if, for whatever reason, you have lots of
> copies of the same file that you never change. But that sounds somewhat
> pointless to me.
Umm, snapshots-in-time of your /home, /usr/src, etc? Copies of the kernel?
Lots of reasons to have mostly-identical versions of files. Almost like
hard links, except you aren't at the mercy of your editor/patch to do the
right thing when modifying one of those copies.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 21:07 ` Andreas Dilger
@ 2003-11-20 21:30 ` Timothy Miller
2003-11-20 21:49 ` Maciej Zenczykowski
` (2 more replies)
0 siblings, 3 replies; 77+ messages in thread
From: Timothy Miller @ 2003-11-20 21:30 UTC (permalink / raw)
To: Andreas Dilger; +Cc: Justin Cormack, Jesse Pollard, linux-kernel mailing list
Andreas Dilger wrote:
> On Nov 20, 2003 15:44 -0500, Timothy Miller wrote:
>
>>This could be a problem if COW causes you to run out of space when
>>writing to the file.
>
>
> Not much different than running out of space copying a file.
It is, though. If you run out of space copying a file, you know it when
you're copying. Applications don't usually expect to get out-of-space
errors while overwriting something in the middle of a file.
In effect, your free space and your used space add up to greater than
the capacity of the disk. An application that checks for free space
before doing something would be fooled into thinking there is more free
space than there really is. How can an application find out in advance
that a file that it's about to modify (without appending anything to the
end) is going to need more disk space?
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 21:30 ` Timothy Miller
@ 2003-11-20 21:49 ` Maciej Zenczykowski
2003-11-20 21:52 ` Timothy Miller
2003-11-20 21:58 ` Hua Zhong
2003-11-22 14:50 ` Pavel Machek
2 siblings, 1 reply; 77+ messages in thread
From: Maciej Zenczykowski @ 2003-11-20 21:49 UTC (permalink / raw)
To: Timothy Miller
Cc: Andreas Dilger, Justin Cormack, Jesse Pollard, linux-kernel mailing list
> It is, though. If you run out of space copying a file, you know it when
> you're copying. Applications don't usually expect to get out-of-space
> errors while overwriting something in the middle of a file.
What about sparse files?
> In effect, your free space and your used space add up to greater than
> the capacity of the disk. An application that checks for free space
> before doing something would be fooled into thinking there is more free
> space than there really is. How can an application find out in advance
> that a file that it's about to modify (without appending anything to the
> end) is going to need more disk space?
I don't think it can do that already now with sparse files, can it?
Cheers,
MaZe.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 21:49 ` Maciej Zenczykowski
@ 2003-11-20 21:52 ` Timothy Miller
0 siblings, 0 replies; 77+ messages in thread
From: Timothy Miller @ 2003-11-20 21:52 UTC (permalink / raw)
To: Maciej Zenczykowski
Cc: Andreas Dilger, Justin Cormack, Jesse Pollard, linux-kernel mailing list
Maciej Zenczykowski wrote:
>>It is, though. If you run out of space copying a file, you know it when
>>you're copying. Applications don't usually expect to get out-of-space
>>errors while overwriting something in the middle of a file.
>
>
> What about sparse files?
Ah, good point. Never mind. :)
^ permalink raw reply [flat|nested] 77+ messages in thread
* RE: OT: why no file copy() libc/syscall ??
2003-11-20 21:30 ` Timothy Miller
2003-11-20 21:49 ` Maciej Zenczykowski
@ 2003-11-20 21:58 ` Hua Zhong
2003-11-22 14:50 ` Pavel Machek
2 siblings, 0 replies; 77+ messages in thread
From: Hua Zhong @ 2003-11-20 21:58 UTC (permalink / raw)
To: 'Timothy Miller', 'Andreas Dilger'
Cc: 'Justin Cormack', 'Jesse Pollard',
'linux-kernel mailing list'
> Andreas Dilger wrote:
> > On Nov 20, 2003 15:44 -0500, Timothy Miller wrote:
> >
> >>This could be a problem if COW causes you to run out of space when
> >>writing to the file.
> >
> >
> > Not much different than running out of space copying a file.
>
> It is, though. If you run out of space copying a file, you
> know it when you're copying. Applications don't usually expect to get
> out-of-space errors while overwriting something in the middle of a
file.
It could for journaling filesystem already.
It's not in any spec that writing to the middle of a file would not
cause ENOSPC, is it?
> In effect, your free space and your used space add up to greater than
> the capacity of the disk. An application that checks for free space
> before doing something would be fooled into thinking there is
> more free space than there really is. How can an application find out
> in advance that a file that it's about to modify (without appending
> anything to the end) is going to need more disk space?
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 21:30 ` Timothy Miller
2003-11-20 21:49 ` Maciej Zenczykowski
2003-11-20 21:58 ` Hua Zhong
@ 2003-11-22 14:50 ` Pavel Machek
2003-11-22 19:50 ` Jamie Lokier
2 siblings, 1 reply; 77+ messages in thread
From: Pavel Machek @ 2003-11-22 14:50 UTC (permalink / raw)
To: Timothy Miller
Cc: Andreas Dilger, Justin Cormack, Jesse Pollard, linux-kernel mailing list
Hi!
> >>This could be a problem if COW causes you to run out of space when
> >>writing to the file.
> >
> >
> >Not much different than running out of space copying a file.
>
> It is, though. If you run out of space copying a file, you know it when
> you're copying. Applications don't usually expect to get out-of-space
> errors while overwriting something in the middle of a file.
Same can happen on compressed filesystem...
Pavel
--
When do you have a heart between your knees?
[Johanka's followup: and *two* hearts?]
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-22 14:50 ` Pavel Machek
@ 2003-11-22 19:50 ` Jamie Lokier
2003-11-22 23:07 ` Andreas Schwab
0 siblings, 1 reply; 77+ messages in thread
From: Jamie Lokier @ 2003-11-22 19:50 UTC (permalink / raw)
To: Pavel Machek
Cc: Timothy Miller, Andreas Dilger, Justin Cormack, Jesse Pollard,
linux-kernel mailing list
Pavel Machek wrote:
> > It is, though. If you run out of space copying a file, you know it when
> > you're copying. Applications don't usually expect to get out-of-space
> > errors while overwriting something in the middle of a file.
>
> Same can happen on compressed filesystem...
Or a filesystem with snapshots, e.g. using LVM.
-- Jamie
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-22 19:50 ` Jamie Lokier
@ 2003-11-22 23:07 ` Andreas Schwab
0 siblings, 0 replies; 77+ messages in thread
From: Andreas Schwab @ 2003-11-22 23:07 UTC (permalink / raw)
To: Jamie Lokier
Cc: Pavel Machek, Timothy Miller, Andreas Dilger, Justin Cormack,
Jesse Pollard, linux-kernel mailing list
Jamie Lokier <jamie@shareable.org> writes:
> Pavel Machek wrote:
>> > It is, though. If you run out of space copying a file, you know it when
>> > you're copying. Applications don't usually expect to get out-of-space
>> > errors while overwriting something in the middle of a file.
>>
>> Same can happen on compressed filesystem...
>
> Or a filesystem with snapshots, e.g. using LVM.
Or writing to a sparse file.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 19:44 ` Justin Cormack
2003-11-20 20:44 ` Timothy Miller
@ 2003-11-21 16:24 ` Jesse Pollard
1 sibling, 0 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-11-21 16:24 UTC (permalink / raw)
To: Justin Cormack; +Cc: linux-kernel mailing list
On Thursday 20 November 2003 13:44, Justin Cormack wrote:
> On Thu, 2003-11-20 at 19:08, Jesse Pollard wrote:
[snip]
>
> If you really want a filesystem that supports efficient copying you
> probably want it to have the equivalent of COW blocks, so that a copy
> just sets up a few pointers, and the copy only happens when the original
> or copied files are changed.
Ummmm... I REALLY don't like COW on a disk. Much too big a chance that the
filesystem will deadlock, and with no recovery method. (oversubscribed, then
crash, corrupting the homeblock, repair (committing journal?) requires
space... no space... therefore mostly dead. You'd have to be able to mount
without the journal or the homeblock, then delete something, then commit the
journal, dismount, recover the rest-- though this might be overboard, the
homebock might not even be damaged).
> But basically you wont get a syscall until you have a filesystem with
> semantics that only maps onto this sort of operation.
I belive NFSv3/4 has a file copy request included. And I understand that
the SAMBA server does too.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 19:08 ` Jesse Pollard
2003-11-20 19:12 ` Florian Weimer
2003-11-20 19:44 ` Justin Cormack
@ 2003-11-20 21:48 ` Maciej Zenczykowski
2003-11-21 16:34 ` Jesse Pollard
2003-11-20 22:31 ` Xavier Bestel
2003-11-27 2:40 ` Robert White
4 siblings, 1 reply; 77+ messages in thread
From: Maciej Zenczykowski @ 2003-11-20 21:48 UTC (permalink / raw)
To: Jesse Pollard
Cc: Florian Weimer, Valdis.Kletnieks, Daniel Gryniewicz,
linux-kernel mailing list
Assume 'fast'copy(int fd_in, int fd_out) where fd_in and fd_out reference
files. fd_in is opened for read and fd_out is opened for write. Ignore
filepos locations in both fd's. fd_out must reference an empty/truncated
file (if not then fail). Usually you'd call copy on fd_out straight out
of a creat call (and thus this would be a non-issue).
> 1. what happens if the copy is aborted?
I'd say the copy operation should be 'atomic', either it succeeds (full
copy) or fails (no changes to filesystems except for the truncate). An
abort would obviously usually result in a failure (thus a possible revert,
which is rather easy since it's likely just an truncate of whatever has
already been copied) or if we've just finished and than a successful
result.
> 2. what happens if the network drops while the remote server continues?
If the remote server has enough data to perform the operation then it does
complete it otherwise there ain't enough info anyway (afterall the
entire idea of this is to fit the entire copy into a single copy
instruction thus a single packet/command whatever, no extra data is
passed)...
> 3. what about buffer synchronization?
If this is happening remotely then I don't see what requires sync???
> 4. what errors should be reported ?
This is tougher:
Tests first performed locally (if they can be) than request forwarded to
remote end and tests performed remotely - return either error or
ACCEPTED, at which point local end tells it to go ahead, (at this
point the operation is effectively performed (unless an abort is
signalled) regardless of network connectivity). On completion remote end
will return info on completion or error code.
a) operation not supported by kernel :) - ENOSYS
b) fd_in/fd_out invalid file descriptor - EBADF
c) fd_in/fd_out is directory - EISDIR
d) can't read/write from/to fd_in/fd_out - EINVAL
e) an error if fd_out ain't empty - ENOTEMPTY
f) operation not supported by this combination of devices - EOPNOTSUPP
[so you need to do it via usual loop]
g) input file bigger then output file can be - EFBIG
[ie copy of 5GB file from remote filesystem which supports it to
another filesystem on the same server with 2GB max file size]
h) low-level IO error - EIO - serious problems (i.e. HDD read/write error)
i) out of disk space during copy - ENOSPC
j) out of memory during copy - ENOMEM (unlikely, needed?)
k) lost network connection - ENETRESET (unknown whether succeeded)
or ENOLINK ?
l) operation was aborted - EINTR [probably should be some other error
code, not sure]
m) success - either return 0 or the number of bytes copied
[probably best to return the # of bytes copied, even if (for now?) we
only accept full copies]
Did I miss anything? What about non-blocking call? Basically as above but
return INPROGRESS as soon as we tell remote end to go ahead... or perhaps
don't support non-blocking call?
> 5. what happens when the syscall is interupted? Especially if the remote
> copy may take a while (I've seen some require an hour or more - worst
> case: days due to a media error (completed after the disk was replaced)).
Well, if it's interrupted by a SIGINT or the like then return EINTR and
the copy was not performed (ie we backed the copy out, unless net failure
detected during abort then ENOLINK/ENETRESET).
If it's a more normal signal than it should behave like any normal kernel
restartable syscall (i.e. via ERESTARTNOHAND or something like that).
> 6. what about a client opening the copy before it is finished copying?
The file copy is atomic and thus the file doesn't per se exist until the
copy operation completes (or the file exists with zero size and is locked
and can't be opened).
Perhaps in the future we could support partial copies and restarting an
interrupted copy, but let's first agree (or not) on the above.
I think a copy syscall would be very useful. What I'd really like to see
is some sort of block-hashed-space-compression with copy-on-write
semantics file system for linux (for my 500 CD collection which probably
has a 10-12 data duplicity factor).
Comments?
Cheers,
MaZe.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 21:48 ` Maciej Zenczykowski
@ 2003-11-21 16:34 ` Jesse Pollard
0 siblings, 0 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-11-21 16:34 UTC (permalink / raw)
To: Maciej Zenczykowski
Cc: Florian Weimer, Valdis.Kletnieks, Daniel Gryniewicz,
linux-kernel mailing list
On Thursday 20 November 2003 15:48, Maciej Zenczykowski wrote:
> Assume 'fast'copy(int fd_in, int fd_out) where fd_in and fd_out reference
> files. fd_in is opened for read and fd_out is opened for write. Ignore
> filepos locations in both fd's. fd_out must reference an empty/truncated
> file (if not then fail). Usually you'd call copy on fd_out straight out
> of a creat call (and thus this would be a non-issue).
>
> > 1. what happens if the copy is aborted?
>
> I'd say the copy operation should be 'atomic', either it succeeds (full
> copy) or fails (no changes to filesystems except for the truncate). An
> abort would obviously usually result in a failure (thus a possible revert,
> which is rather easy since it's likely just an truncate of whatever has
> already been copied) or if we've just finished and than a successful
> result.
Really? what happens if the abort is local to the system making the request?
what happens if the abort is on the remote server?
> > 2. what happens if the network drops while the remote server continues?
>
> If the remote server has enough data to perform the operation then it does
> complete it otherwise there ain't enough info anyway (afterall the
> entire idea of this is to fit the entire copy into a single copy
> instruction thus a single packet/command whatever, no extra data is
> passed)...
And back to aborts?
> > 3. what about buffer synchronization?
>
> If this is happening remotely then I don't see what requires sync???
Multiple hosts remote to the server that have afile open. Though this
already happens with NFS.
> > 4. what errors should be reported ?
>
> This is tougher:
>
> Tests first performed locally (if they can be) than request forwarded to
> remote end and tests performed remotely - return either error or
> ACCEPTED, at which point local end tells it to go ahead, (at this
> point the operation is effectively performed (unless an abort is
> signalled) regardless of network connectivity). On completion remote end
> will return info on completion or error code.
>
> a) operation not supported by kernel :) - ENOSYS
> b) fd_in/fd_out invalid file descriptor - EBADF
> c) fd_in/fd_out is directory - EISDIR
> d) can't read/write from/to fd_in/fd_out - EINVAL
> e) an error if fd_out ain't empty - ENOTEMPTY
> f) operation not supported by this combination of devices - EOPNOTSUPP
> [so you need to do it via usual loop]
> g) input file bigger then output file can be - EFBIG
> [ie copy of 5GB file from remote filesystem which supports it to
> another filesystem on the same server with 2GB max file size]
> h) low-level IO error - EIO - serious problems (i.e. HDD read/write error)
> i) out of disk space during copy - ENOSPC
> j) out of memory during copy - ENOMEM (unlikely, needed?)
> k) lost network connection - ENETRESET (unknown whether succeeded)
> or ENOLINK ?
> l) operation was aborted - EINTR [probably should be some other error
> code, not sure]
> m) success - either return 0 or the number of bytes copied
> [probably best to return the # of bytes copied, even if (for now?) we
> only accept full copies]
>
> Did I miss anything? What about non-blocking call? Basically as above but
> return INPROGRESS as soon as we tell remote end to go ahead... or perhaps
> don't support non-blocking call?
>
> > 5. what happens when the syscall is interupted? Especially if the remote
> > copy may take a while (I've seen some require an hour or more - worst
> > case: days due to a media error (completed after the disk was
> > replaced)).
>
> Well, if it's interrupted by a SIGINT or the like then return EINTR and
> the copy was not performed (ie we backed the copy out, unless net failure
> detected during abort then ENOLINK/ENETRESET).
Ooop - the copy is being done on the remote server.
> If it's a more normal signal than it should behave like any normal kernel
> restartable syscall (i.e. via ERESTARTNOHAND or something like that).
Again, the copy may be being made on the remote server.
> > 6. what about a client opening the copy before it is finished copying?
>
> The file copy is atomic and thus the file doesn't per se exist until the
> copy operation completes (or the file exists with zero size and is locked
> and can't be opened).
It does under all other methods of copying.
> Perhaps in the future we could support partial copies and restarting an
> interrupted copy, but let's first agree (or not) on the above.
>
> I think a copy syscall would be very useful. What I'd really like to see
> is some sort of block-hashed-space-compression with copy-on-write
> semantics file system for linux (for my 500 CD collection which probably
> has a 10-12 data duplicity factor).
It could be usefull. What you describe now is a migrating filesystem on a
server. And note that your COW is going from two different filesystems (hmm
or maybe a custom union mount?)...
Which is where the migrating filesystem. The served filesystem should already
know how to transfer a file from the archive.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 19:08 ` Jesse Pollard
` (2 preceding siblings ...)
2003-11-20 21:48 ` Maciej Zenczykowski
@ 2003-11-20 22:31 ` Xavier Bestel
2003-11-20 22:44 ` Andreas Dilger
2003-11-27 2:40 ` Robert White
4 siblings, 1 reply; 77+ messages in thread
From: Xavier Bestel @ 2003-11-20 22:31 UTC (permalink / raw)
To: Jesse Pollard
Cc: Florian Weimer, Valdis.Kletnieks, Daniel Gryniewicz,
Linux Kernel Mailing List
Le jeu 20/11/2003 à 20:08, Jesse Pollard a écrit :
> 1. what happens if the copy is aborted?
> 2. what happens if the network drops while the remote server continues?
> 3. what about buffer synchronization?
> 4. what errors should be reported ?
> 5. what happens when the syscall is interupted? Especially if the remote
> copy may take a while (I've seen some require an hour or more - worst
> case: days due to a media error (completed after the disk was replaced)).
> 6. what about a client opening the copy before it is finished copying?
7. How to report progress with your average file manager ?
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-20 22:31 ` Xavier Bestel
@ 2003-11-20 22:44 ` Andreas Dilger
0 siblings, 0 replies; 77+ messages in thread
From: Andreas Dilger @ 2003-11-20 22:44 UTC (permalink / raw)
To: Xavier Bestel
Cc: Jesse Pollard, Florian Weimer, Valdis.Kletnieks,
Daniel Gryniewicz, Linux Kernel Mailing List
On Nov 20, 2003 23:31 +0100, Xavier Bestel wrote:
> Le jeu 20/11/2003 à 20:08, Jesse Pollard a écrit :
> > 1. what happens if the copy is aborted?
Same as now with "cp" - partial copy.
> > 2. what happens if the network drops while the remote server continues?
Irrelevant, since you can't access the file at that point (i.e. if server
continues then great, but if it doesn't it's no different than the server
disconnecting/crashing in the middle of a regular copy.
> > 3. what about buffer synchronization?
Sync file locally before starting, and no buffers on client are created.
If you write to file while it is being copied, how is that different
than two writers for same file now (i.e. usually broken). If the network
filesystem doesn't support locking, that's the filesystem's problem and
this API doesn't change it.
> > 4. what errors should be reported ?
Covered pretty well elsewhere. Of course EINTR should be reserved for
"interrupted, please continue if you want" as opposed to a hard error.
> > 5. what happens when the syscall is interupted? Especially if the remote
> > copy may take a while (I've seen some require an hour or more - worst
> > case: days due to a media error (completed after the disk was replaced)).
Partial copy, no different than now.
> > 6. what about a client opening the copy before it is finished copying?
Reads partial file, no different than now.
> 7. How to report progress with your average file manager ?
Support signals and restart the copy where it left off. Interrupting
once a second or whatever isn't onerous if needed and you can restart.
You could even support some sort of "SIGUSR1" like dd does to get status
back without actually killing things. Alternately, just stat the target
file as it is being copied to watch progress.
Cheers, Andreas
--
Andreas Dilger
http://sourceforge.net/projects/ext2resize/
http://www-mddsp.enel.ucalgary.ca/People/adilger/
^ permalink raw reply [flat|nested] 77+ messages in thread
* RE: OT: why no file copy() libc/syscall ??
2003-11-20 19:08 ` Jesse Pollard
` (3 preceding siblings ...)
2003-11-20 22:31 ` Xavier Bestel
@ 2003-11-27 2:40 ` Robert White
2003-11-27 7:29 ` Nick Piggin
4 siblings, 1 reply; 77+ messages in thread
From: Robert White @ 2003-11-27 2:40 UTC (permalink / raw)
To: 'Jesse Pollard', 'Florian Weimer'
Cc: Valdis.Kletnieks, 'Daniel Gryniewicz',
'linux-kernel mailing list'
(Among the other N objections, add things like the lack of any sort of
control or option parameters)
...
N += 1: Sparse Copying (e.g. seeking past blocks of zeros)
N += 1: Unlink or overwrite or what?
N += 1: In-Kernel locking and resolution for pages that are mandatory
lock(ed)
N += 1: No fine-grained control for concurrency issues (multiple writers)
Start with doing a cp --help and move on from there for an unbounded list of
issues that sys_copy(int fd1, int fd2) does not even come close to
addressing.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-27 2:40 ` Robert White
@ 2003-11-27 7:29 ` Nick Piggin
2003-11-27 9:15 ` David Lang
0 siblings, 1 reply; 77+ messages in thread
From: Nick Piggin @ 2003-11-27 7:29 UTC (permalink / raw)
To: Robert White
Cc: 'Jesse Pollard', 'Florian Weimer',
Valdis.Kletnieks, 'Daniel Gryniewicz',
'linux-kernel mailing list'
Robert White wrote:
>(Among the other N objections, add things like the lack of any sort of
>control or option parameters)
>...
>N += 1: Sparse Copying (e.g. seeking past blocks of zeros)
>N += 1: Unlink or overwrite or what?
>N += 1: In-Kernel locking and resolution for pages that are mandatory
>lock(ed)
>N += 1: No fine-grained control for concurrency issues (multiple writers)
>
>Start with doing a cp --help and move on from there for an unbounded list of
>issues that sys_copy(int fd1, int fd2) does not even come close to
>addressing.
>
>
To be fair, sys_copy is never intended to replace cp or try to be
very smart. I don't think it is semantically supposed to do much more
than replace a read, write loop (of course, the syscall also has an
offset and count).
sparse copying would be implementation dependant. If cp wanted to do
something special it would not use one big copy call. I think unlink
/ overwrite is irrelevant if its semantically a read write loop.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-27 7:29 ` Nick Piggin
@ 2003-11-27 9:15 ` David Lang
2003-11-27 8:56 ` Nick Piggin
0 siblings, 1 reply; 77+ messages in thread
From: David Lang @ 2003-11-27 9:15 UTC (permalink / raw)
To: Nick Piggin
Cc: Robert White, 'Jesse Pollard', 'Florian Weimer',
Valdis.Kletnieks, 'Daniel Gryniewicz',
'linux-kernel mailing list'
On Thu, 27 Nov 2003, Nick Piggin wrote:
> Robert White wrote:
>
> >(Among the other N objections, add things like the lack of any sort of
> >control or option parameters)
> >...
> >N += 1: Sparse Copying (e.g. seeking past blocks of zeros)
> >N += 1: Unlink or overwrite or what?
> >N += 1: In-Kernel locking and resolution for pages that are mandatory
> >lock(ed)
> >N += 1: No fine-grained control for concurrency issues (multiple writers)
> >
> >Start with doing a cp --help and move on from there for an unbounded list of
> >issues that sys_copy(int fd1, int fd2) does not even come close to
> >addressing.
> >
> >
>
> To be fair, sys_copy is never intended to replace cp or try to be
> very smart. I don't think it is semantically supposed to do much more
> than replace a read, write loop (of course, the syscall also has an
> offset and count).
>
> sparse copying would be implementation dependant. If cp wanted to do
> something special it would not use one big copy call. I think unlink
> / overwrite is irrelevant if its semantically a read write loop.
>
actually if this syscall is allowed to do a COW at the filesystem level
(which I think is one of the better reasons for implementing this) then
sparse files would produce sparse copies.
if the destination exists it would need to be unlinked (overwrite doesn't
make sense in the COW context)
I don't understand the in-kernel page locking issues refered to above
the concurrancy issues are a good question, but I would suggest that the
syscall fully setup the copy and then create the link to it. this would
make the final creation an atomic operation (or as close to it as a
particular filesystem allows) and if you have multiple writers doing a
copy to the same destination then the last one wins, the earlier copies
get unlinked and deleted
I definantly don't see it being worth it to make a syscall to just
implement the read/write loop, but a copy syscall designed from the outset
to do a COW copy that falls back to a read/write loop for filesystems that
don't do COW has some real benifits
David Lang
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-27 9:15 ` David Lang
@ 2003-11-27 8:56 ` Nick Piggin
2003-11-27 9:50 ` David Lang
0 siblings, 1 reply; 77+ messages in thread
From: Nick Piggin @ 2003-11-27 8:56 UTC (permalink / raw)
To: David Lang
Cc: Robert White, 'Jesse Pollard', 'Florian Weimer',
Valdis.Kletnieks, 'Daniel Gryniewicz',
'linux-kernel mailing list'
David Lang wrote:
>On Thu, 27 Nov 2003, Nick Piggin wrote:
>
>
>>Robert White wrote:
>>
>>
>>>(Among the other N objections, add things like the lack of any sort of
>>>control or option parameters)
>>>...
>>>N += 1: Sparse Copying (e.g. seeking past blocks of zeros)
>>>N += 1: Unlink or overwrite or what?
>>>N += 1: In-Kernel locking and resolution for pages that are mandatory
>>>lock(ed)
>>>N += 1: No fine-grained control for concurrency issues (multiple writers)
>>>
>>>Start with doing a cp --help and move on from there for an unbounded list of
>>>issues that sys_copy(int fd1, int fd2) does not even come close to
>>>addressing.
>>>
>>>
>>>
>>To be fair, sys_copy is never intended to replace cp or try to be
>>very smart. I don't think it is semantically supposed to do much more
>>than replace a read, write loop (of course, the syscall also has an
>>offset and count).
>>
>>sparse copying would be implementation dependant. If cp wanted to do
>>something special it would not use one big copy call. I think unlink
>>/ overwrite is irrelevant if its semantically a read write loop.
>>
>>
>
>actually if this syscall is allowed to do a COW at the filesystem level
>(which I think is one of the better reasons for implementing this) then
>sparse files would produce sparse copies.
>
Sure, I just mean the semantics should be equivalent to a read write
loop. Another example is zero copy copy for a remote fs that supports
it.
>
>if the destination exists it would need to be unlinked (overwrite doesn't
>make sense in the COW context)
>
Well it would be implementation specific. Presumably it should keep
the semantics of an overwrite.
>
>I don't understand the in-kernel page locking issues refered to above
>
>the concurrancy issues are a good question, but I would suggest that the
>syscall fully setup the copy and then create the link to it. this would
>make the final creation an atomic operation (or as close to it as a
>particular filesystem allows) and if you have multiple writers doing a
>copy to the same destination then the last one wins, the earlier copies
>get unlinked and deleted
>
I don't think it should do any linking / unlinking it should just work
with file descriptors. Concurrent writes to a file don't have many
guarantees. sys_copy shouldn't have to be any stronger (read weaker).
>
>I definantly don't see it being worth it to make a syscall to just
>implement the read/write loop, but a copy syscall designed from the outset
>to do a COW copy that falls back to a read/write loop for filesystems that
>don't do COW has some real benifits
>
No I just mean the semantics.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-27 8:56 ` Nick Piggin
@ 2003-11-27 9:50 ` David Lang
2003-11-27 10:02 ` Jörn Engel
0 siblings, 1 reply; 77+ messages in thread
From: David Lang @ 2003-11-27 9:50 UTC (permalink / raw)
To: Nick Piggin
Cc: Robert White, 'Jesse Pollard', 'Florian Weimer',
Valdis.Kletnieks, 'Daniel Gryniewicz',
'linux-kernel mailing list'
On Thu, 27 Nov 2003, Nick Piggin wrote:
> >
> >if the destination exists it would need to be unlinked (overwrite doesn't
> >make sense in the COW context)
> >
>
> Well it would be implementation specific. Presumably it should keep
> the semantics of an overwrite.
>
> >
> >I don't understand the in-kernel page locking issues refered to above
> >
> >the concurrancy issues are a good question, but I would suggest that the
> >syscall fully setup the copy and then create the link to it. this would
> >make the final creation an atomic operation (or as close to it as a
> >particular filesystem allows) and if you have multiple writers doing a
> >copy to the same destination then the last one wins, the earlier copies
> >get unlinked and deleted
> >
>
> I don't think it should do any linking / unlinking it should just work
> with file descriptors. Concurrent writes to a file don't have many
> guarantees. sys_copy shouldn't have to be any stronger (read weaker).
I'm thinking that it may actually be easier to do this via file paths
instead of file descripters. with file paths something like COW or
zero-copy copy can be done trivially (and the kernel knows the user
credentials of the program issuing the command and can pass them on to the
filesystem to see if it's allowed). I don't see how this can be done with
file descripters (if all you have is a file descripter you can truncate
and write a file, but you don't know all the links to that file so you
can't reposition that first inode for example).
> >
> >I definantly don't see it being worth it to make a syscall to just
> >implement the read/write loop, but a copy syscall designed from the outset
> >to do a COW copy that falls back to a read/write loop for filesystems that
> >don't do COW has some real benifits
> >
>
> No I just mean the semantics.
>
>
>
--
"Debugging is twice as hard as writing the code in the first place.
Therefore, if you write the code as cleverly as possible, you are,
by definition, not smart enough to debug it." - Brian W. Kernighan
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-27 9:50 ` David Lang
@ 2003-11-27 10:02 ` Jörn Engel
2003-11-27 10:58 ` David Lang
0 siblings, 1 reply; 77+ messages in thread
From: Jörn Engel @ 2003-11-27 10:02 UTC (permalink / raw)
To: David Lang
Cc: Nick Piggin, Robert White, 'Jesse Pollard',
'Florian Weimer',
Valdis.Kletnieks, 'Daniel Gryniewicz',
'linux-kernel mailing list'
On Thu, 27 November 2003 01:50:46 -0800, David Lang wrote:
> >
> > I don't think it should do any linking / unlinking it should just work
> > with file descriptors. Concurrent writes to a file don't have many
> > guarantees. sys_copy shouldn't have to be any stronger (read weaker).
>
> I'm thinking that it may actually be easier to do this via file paths
> instead of file descripters. with file paths something like COW or
> zero-copy copy can be done trivially (and the kernel knows the user
> credentials of the program issuing the command and can pass them on to the
> filesystem to see if it's allowed). I don't see how this can be done with
> file descripters (if all you have is a file descripter you can truncate
> and write a file, but you don't know all the links to that file so you
> can't reposition that first inode for example).
And how is userspace supposed to protect itself from race conditions?
Just compare:
fd1 = open(path1);
if (stat(fd1) looks fishy)
abort();
fd2 = open(path2);
if (stat(fd2) looks fishy)
abort();
copy(fd1, fd2);
and:
fd1 = open(path1);
if (stat(fd1) looks fishy)
abort();
fd2 = open(path2);
if (stat(fd2) looks fishy)
abort();
copy(path1, path2);
Jörn
--
Don't worry about people stealing your ideas. If your ideas are any good,
you'll have to ram them down people's throats.
-- Howard Aiken quoted by Ken Iverson quoted by Jim Horning quoted by
Raph Levien, 1979
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-27 10:02 ` Jörn Engel
@ 2003-11-27 10:58 ` David Lang
2003-12-01 16:20 ` Jesse Pollard
0 siblings, 1 reply; 77+ messages in thread
From: David Lang @ 2003-11-27 10:58 UTC (permalink / raw)
To: Jörn Engel
Cc: Nick Piggin, Robert White, 'Jesse Pollard',
'Florian Weimer',
Valdis.Kletnieks, 'Daniel Gryniewicz',
'linux-kernel mailing list'
On Thu, 27 Nov 2003, Jörn Engel wrote:
> On Thu, 27 November 2003 01:50:46 -0800, David Lang wrote:
> > >
> > > I don't think it should do any linking / unlinking it should just work
> > > with file descriptors. Concurrent writes to a file don't have many
> > > guarantees. sys_copy shouldn't have to be any stronger (read weaker).
> >
> > I'm thinking that it may actually be easier to do this via file paths
> > instead of file descripters. with file paths something like COW or
> > zero-copy copy can be done trivially (and the kernel knows the user
> > credentials of the program issuing the command and can pass them on to the
> > filesystem to see if it's allowed). I don't see how this can be done with
> > file descripters (if all you have is a file descripter you can truncate
> > and write a file, but you don't know all the links to that file so you
> > can't reposition that first inode for example).
>
> And how is userspace supposed to protect itself from race conditions?
> Just compare:
>
> fd1 = open(path1);
> if (stat(fd1) looks fishy)
> abort();
> fd2 = open(path2);
> if (stat(fd2) looks fishy)
> abort();
> copy(fd1, fd2);
>
> and:
>
> fd1 = open(path1);
> if (stat(fd1) looks fishy)
> abort();
> fd2 = open(path2);
> if (stat(fd2) looks fishy)
> abort();
> copy(path1, path2);
>
> Jörn
>
Ok, good point. my first reaction is to make copy refuse to function
unless the target doesn't exist (protect the output), but that doesn't
solve the problem of protecting the input or preventing someone else from
tampering with the output (unless you have copy return the FD to use to
access the output)
actually thinking about it a bit more, did I make a stupid mistake and
think that the FD points at the beginning of the file when it really
points at the inode? if it points at the inode then the problems I was
refering to don't exist.
David Lang
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-27 10:58 ` David Lang
@ 2003-12-01 16:20 ` Jesse Pollard
0 siblings, 0 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-12-01 16:20 UTC (permalink / raw)
To: David Lang, =?CP 1252?q?J=F6rn=20Engel?=
Cc: Nick Piggin,
Robert White <rwhite@casabyte.com> "'Florian
Weimer'", Valdis.Kletnieks, 'Daniel Gryniewicz',
'linux-kernel mailing list'
On Thursday 27 November 2003 04:58, David Lang wrote:
[snip]
> actually thinking about it a bit more, did I make a stupid mistake and
> think that the FD points at the beginning of the file when it really
> points at the inode? if it points at the inode then the problems I was
> refering to don't exist.
Actually, it points to inode and offset in the file. The advantage this has
is in the case of appending to a file... open the destination file, seek to
the end, then copy. It also allows seeking some offset in the input file,
then copying the rest of the file.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 3:50 ` Andreas Dilger
2003-11-11 4:03 ` Daniel Gryniewicz
@ 2003-11-11 8:52 ` Gábor Lénárt
1 sibling, 0 replies; 77+ messages in thread
From: Gábor Lénárt @ 2003-11-11 8:52 UTC (permalink / raw)
To: linux-kernel mailing list
On Mon, Nov 10, 2003 at 08:50:12PM -0700, Andreas Dilger wrote:
> On Nov 10, 2003 20:05 -0500, Albert Cahalan wrote:
> > > It is too simple to implement in user mode.
> >
> > That works for a plain byte-stream on a
> > local UNIX-style filesystem. (though it
> > likely isn't the fastest)
It would be something similar than sendfile() ?
- Gábor (larta'H)
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 1:05 Albert Cahalan
2003-11-11 3:50 ` Andreas Dilger
@ 2003-11-11 13:38 ` Rogier Wolff
2003-11-11 13:53 ` Jakub Jelinek
2003-11-11 14:11 ` Albert Cahalan
2003-11-12 15:19 ` Jesse Pollard
2 siblings, 2 replies; 77+ messages in thread
From: Rogier Wolff @ 2003-11-11 13:38 UTC (permalink / raw)
To: Albert Cahalan
Cc: linux-kernel mailing list, davide.rossetti, filia, jesse, dwmw2,
moje, kakadu_croc
On Mon, Nov 10, 2003 at 08:05:11PM -0500, Albert Cahalan wrote:
> So open the file, change context, and then:
>
> long copy_fd_to_file(int fd, const char *name, ...)
>
> (if you can no longer read from the OPEN fd,
> either we override that or we just don't care
> about such mostly-fictional cases)
Actually, I think we should have a:
long copy_fd_to_fd (int src, int dst, int len)
type of systemcall.
It should do something like:
while ((nbytes = read (src, buf, BUFSIZE)) >= 0) {
if (write (dst, buf, nbytes) < 0)
return totbytes;
totbytes += nbytes;
}
but it allows kernel-space to optimize this whenever possible. Kernel
then becomes responsible for detecting and handling the optimizable
cases.
The kernel then becomes something
if (islocalfile (src) && issocket (dst))
/* Call the old sendfile */
return sendfile (....);
if (isCIFS (src), isCIFS(dst))
/* Tell remote host to copy the file. */
return CIFS_copy_file (....);
...
and then the default implementation. This is nice and expandible, and
provides a default for the case that cannot be optimized.
And if you don't want the extra code, we could enclose the different
optimizations with ifdefs.
But alas, last time Linus didn't agree with me and decided we should
do something like "sendfile", which is IMHO just a special case of
this one.
If we implement this in kernel (at first just the copy_fd_fd and the
default implementation), then we can get "cp" to use this, and then
suddenly whenever we upgrade the kernel, cp can use the newly
optimized copying mechanism. (e.g. whenever we manage to specify a
socket as the destination, cp would suddenly start to use
"sendfile"!!)
(It might be better to include a "buffer" argument in the interface,
freeing the implementation of allocating a buffer when an optimization
is not possible).
Roger.
--
** R.E.Wolff@BitWizard.nl ** http://www.BitWizard.nl/ ** +31-15-2600998 **
*-- BitWizard writes Linux device drivers for any device you may have! --*
**** "Linux is like a wigwam - no windows, no gates, apache inside!" ****
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 13:38 ` Rogier Wolff
@ 2003-11-11 13:53 ` Jakub Jelinek
2003-11-11 13:58 ` David Woodhouse
2003-11-13 20:22 ` H. Peter Anvin
2003-11-11 14:11 ` Albert Cahalan
1 sibling, 2 replies; 77+ messages in thread
From: Jakub Jelinek @ 2003-11-11 13:53 UTC (permalink / raw)
To: Rogier Wolff
Cc: Albert Cahalan, linux-kernel mailing list, davide.rossetti,
filia, jesse, dwmw2, moje, kakadu_croc
On Tue, Nov 11, 2003 at 02:38:59PM +0100, Rogier Wolff wrote:
> On Mon, Nov 10, 2003 at 08:05:11PM -0500, Albert Cahalan wrote:
> > So open the file, change context, and then:
> >
> > long copy_fd_to_file(int fd, const char *name, ...)
> >
> > (if you can no longer read from the OPEN fd,
> > either we override that or we just don't care
> > about such mostly-fictional cases)
>
>
> Actually, I think we should have a:
>
> long copy_fd_to_fd (int src, int dst, int len)
>
> type of systemcall.
We have one, sendfile(2).
> It should do something like:
>
> while ((nbytes = read (src, buf, BUFSIZE)) >= 0) {
> if (write (dst, buf, nbytes) < 0)
> return totbytes;
> totbytes += nbytes;
> }
>
> but it allows kernel-space to optimize this whenever possible. Kernel
> then becomes responsible for detecting and handling the optimizable
> cases.
>
> The kernel then becomes something
>
> if (islocalfile (src) && issocket (dst))
> /* Call the old sendfile */
> return sendfile (....);
>
> if (isCIFS (src), isCIFS(dst))
> /* Tell remote host to copy the file. */
> return CIFS_copy_file (....);
>
> ...
Can you explain why this cannot be in sys_sendfile?
It doesn't make much sense to provide any default in the kernel,
that's something the userland can handle equally well.
But e.g. the CIFS copy can be done as sendfile hook.
Jakub
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 13:53 ` Jakub Jelinek
@ 2003-11-11 13:58 ` David Woodhouse
2003-11-13 20:22 ` H. Peter Anvin
1 sibling, 0 replies; 77+ messages in thread
From: David Woodhouse @ 2003-11-11 13:58 UTC (permalink / raw)
To: Jakub Jelinek
Cc: Rogier Wolff, Albert Cahalan, linux-kernel mailing list,
davide.rossetti, filia, jesse, moje, kakadu_croc
On Tue, 2003-11-11 at 08:53 -0500, Jakub Jelinek wrote:
> But e.g. the CIFS copy can be done as sendfile hook.
Can it? I thought it took filenames.
--
dwmw2
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 13:53 ` Jakub Jelinek
2003-11-11 13:58 ` David Woodhouse
@ 2003-11-13 20:22 ` H. Peter Anvin
2003-11-13 23:39 ` Andrea Arcangeli
1 sibling, 1 reply; 77+ messages in thread
From: H. Peter Anvin @ 2003-11-13 20:22 UTC (permalink / raw)
To: linux-kernel
Followup to: <20031111085323.M8854@devserv.devel.redhat.com>
By author: Jakub Jelinek <jakub@redhat.com>
In newsgroup: linux.dev.kernel
> >
> > Actually, I think we should have a:
> >
> > long copy_fd_to_fd (int src, int dst, int len)
> >
> > type of systemcall.
>
> We have one, sendfile(2).
>
It would be very nice if we could (a) expand the uses of sendfile(2),
and (b) have the libc do the fallback to read/write/mmap as needed.
-hpa
--
<hpa@transmeta.com> at work, <hpa@zytor.com> in private!
If you send me mail in HTML format I will assume it's spam.
"Unix gives you enough rope to shoot yourself in the foot."
Architectures needed: ia64 m68k mips64 ppc ppc64 s390 s390x sh v850 x86-64
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-13 20:22 ` H. Peter Anvin
@ 2003-11-13 23:39 ` Andrea Arcangeli
2003-11-14 0:04 ` jw schultz
2003-11-14 0:36 ` H. Peter Anvin
0 siblings, 2 replies; 77+ messages in thread
From: Andrea Arcangeli @ 2003-11-13 23:39 UTC (permalink / raw)
To: H. Peter Anvin; +Cc: linux-kernel
On Thu, Nov 13, 2003 at 12:22:14PM -0800, H. Peter Anvin wrote:
> Followup to: <20031111085323.M8854@devserv.devel.redhat.com>
> By author: Jakub Jelinek <jakub@redhat.com>
> In newsgroup: linux.dev.kernel
> > >
> > > Actually, I think we should have a:
> > >
> > > long copy_fd_to_fd (int src, int dst, int len)
> > >
> > > type of systemcall.
> >
> > We have one, sendfile(2).
> >
>
> It would be very nice if we could (a) expand the uses of sendfile(2),
> and (b) have the libc do the fallback to read/write/mmap as needed.
I actually hacked cp for a while and it improved cp some point percent
on normal machines.
See ftp://ftp.suse.com/pub/people/andrea/cp-sendfile/
the main downside and the reason it wasn't applied IIRC is the lack of
interruption of sendfile, basically for an huge file it would take a
while before C^c has any effect. The kernel isn't interrupting the
syscall. This is no different from a huge read or write syscall (but
read/write are never huge or the buffer would need to be huge too, not
the case for sendfile that works zerocopy), so in theory we could
workaround it by entering/exiting kernel multiple times just to allow
the signal to be handled like in the read/write case.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-13 23:39 ` Andrea Arcangeli
@ 2003-11-14 0:04 ` jw schultz
2003-11-14 0:36 ` H. Peter Anvin
1 sibling, 0 replies; 77+ messages in thread
From: jw schultz @ 2003-11-14 0:04 UTC (permalink / raw)
To: linux-kernel
On Fri, Nov 14, 2003 at 12:39:15AM +0100, Andrea Arcangeli wrote:
> On Thu, Nov 13, 2003 at 12:22:14PM -0800, H. Peter Anvin wrote:
> > Followup to: <20031111085323.M8854@devserv.devel.redhat.com>
> > By author: Jakub Jelinek <jakub@redhat.com>
> > In newsgroup: linux.dev.kernel
> > > >
> > > > Actually, I think we should have a:
> > > >
> > > > long copy_fd_to_fd (int src, int dst, int len)
> > > >
> > > > type of systemcall.
> > >
> > > We have one, sendfile(2).
> > >
> >
> > It would be very nice if we could (a) expand the uses of sendfile(2),
> > and (b) have the libc do the fallback to read/write/mmap as needed.
>
> I actually hacked cp for a while and it improved cp some point percent
> on normal machines.
>
> See ftp://ftp.suse.com/pub/people/andrea/cp-sendfile/
>
> the main downside and the reason it wasn't applied IIRC is the lack of
> interruption of sendfile, basically for an huge file it would take a
> while before C^c has any effect. The kernel isn't interrupting the
> syscall. This is no different from a huge read or write syscall (but
> read/write are never huge or the buffer would need to be huge too, not
> the case for sendfile that works zerocopy), so in theory we could
> workaround it by entering/exiting kernel multiple times just to allow
> the signal to be handled like in the read/write case.
Until interrupt and restart (as has been discussed
here for other syscalls) handling is improved there could be
a sanity check with an E2BIG or something if the size is
insane. I dislike the thought of sendfile going sitting in D
state on a multi-gigabyte file.
--
________________________________________________________________
J.W. Schultz Pegasystems Technologies
email address: jw@pegasys.ws
Remember Cernan and Schmitt
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-13 23:39 ` Andrea Arcangeli
2003-11-14 0:04 ` jw schultz
@ 2003-11-14 0:36 ` H. Peter Anvin
2003-11-14 1:10 ` Andrea Arcangeli
1 sibling, 1 reply; 77+ messages in thread
From: H. Peter Anvin @ 2003-11-14 0:36 UTC (permalink / raw)
To: Andrea Arcangeli; +Cc: linux-kernel
Andrea Arcangeli wrote:
>
> I actually hacked cp for a while and it improved cp some point percent
> on normal machines.
>
> See ftp://ftp.suse.com/pub/people/andrea/cp-sendfile/
>
> the main downside and the reason it wasn't applied IIRC is the lack of
> interruption of sendfile, basically for an huge file it would take a
> while before C^c has any effect. The kernel isn't interrupting the
> syscall. This is no different from a huge read or write syscall (but
> read/write are never huge or the buffer would need to be huge too, not
> the case for sendfile that works zerocopy), so in theory we could
> workaround it by entering/exiting kernel multiple times just to allow
> the signal to be handled like in the read/write case.
... or we could put in checks into the kernel for signal pending, and
return EINTR.
-hpa
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 13:38 ` Rogier Wolff
2003-11-11 13:53 ` Jakub Jelinek
@ 2003-11-11 14:11 ` Albert Cahalan
1 sibling, 0 replies; 77+ messages in thread
From: Albert Cahalan @ 2003-11-11 14:11 UTC (permalink / raw)
To: Rogier Wolff
Cc: linux-kernel mailing list, davide.rossetti, filia, jesse, dwmw2,
moje, kakadu_croc
On Tue, 2003-11-11 at 08:38, Rogier Wolff wrote:
> On Mon, Nov 10, 2003 at 08:05:11PM -0500, Albert Cahalan wrote:
> > So open the file, change context, and then:
> >
> > long copy_fd_to_file(int fd, const char *name, ...)
> >
> > (if you can no longer read from the OPEN fd,
> > either we override that or we just don't care
> > about such mostly-fictional cases)
>
>
> Actually, I think we should have a:
>
> long copy_fd_to_fd (int src, int dst, int len)
>
> type of systemcall.
I don't think that works. To have a destination
file descriptor, you have to already have created
the destination file. Having done so, it may now
be impossible to transfer the security data. This
is especially the case with network filesystems.
I can well imagine providing a file descriptor for
the destination directory and making the filename
optional. This helps pin things down if there's
worry about an attacker moving directories, and it
neatly allows for fully anonymous temporary files
if a file descriptor is returned.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 1:05 Albert Cahalan
2003-11-11 3:50 ` Andreas Dilger
2003-11-11 13:38 ` Rogier Wolff
@ 2003-11-12 15:19 ` Jesse Pollard
2003-11-14 3:42 ` Albert Cahalan
2 siblings, 1 reply; 77+ messages in thread
From: Jesse Pollard @ 2003-11-12 15:19 UTC (permalink / raw)
To: Albert Cahalan, linux-kernel mailing list
Cc: davide.rossetti, filia, jesse, dwmw2, moje, kakadu_croc
On Monday 10 November 2003 19:05, Albert Cahalan wrote:
> > It is too simple to implement in user mode.
>
> That works for a plain byte-stream on a
> local UNIX-style filesystem. (though it
> likely isn't the fastest)
Yes - this was the local copy
> It doesn't work for Macintosh files.
> It's too slow for CIFS over a modem.
> It doesn't work for Windows security data.
> It doesn't allow copy-on-write files.
> It eats CPU time on compressed filesystems.
>
> > The security context of the output depends
> > on the user process. If it is a privileged
> > process (ie, may change the context of the
> > result) then the user process has to setup
> > that context before the file is copied.
>
> So open the file, change context, and then:
>
> long copy_fd_to_file(int fd, const char *name, ...)
Easy to do in user mode.
>
> (if you can no longer read from the OPEN fd,
> either we override that or we just don't care
> about such mostly-fictional cases)
correct - If you can't read, fail.
> > There are also some issues with mandatory
> > security controls. If it is copied in kernel
> > mode, then the previous labels could be
> > automatically carried over to the resulting
> > file... But that may not be what you want
> > (and frequently, it isn't).
>
> If it matters:
>
> // security as if a new file were created
> #define CF_REPLACE_SECURITY 0x00000001
> // if unable to replicate, up or down?
> #define CF_ROUND_SECURITY_UP 0x00000002
> #define CF_ROUND_SECURITY_DOWN 0x00000004
> // fail if security can't be replicated
> #define CF_SECURITY_EXACT 0x00000008
>
> > Now back to the copy.. You don't have to
> > use a read/write loop- mmap is faster.
>
> It's slower. (this is Linux, not SunOS)
> Use a 4 kB or 8 kB read/write loop.
yup local.
> > And this is the other reason for not doing
> > it in Kernel mode. Buffer management of
> > this type is much easier in user space
> > since the copy procedure doesn't have to
> > deal with memory limitations, cache flushes
> > page faulting of processes unrelated to the
> > copy, but is related to cache pressure.
>
> Buffer management is very much a kernel thing.
Yes it is, but do you want to push process dependant
buffer management into the page management? It's just
easier to do this in user mode, and allow the kernel
to handle global page managment.
> >> Is it? Please explain the simple steps which
> >> cp(1) should take in order to observe that it
> >> is being asked to duplicate a file on a file
> >> system such as CIFS (or NFSv4?) which allows
> >> the client to issue a 'copy file' command
> >> over the network without actually transferring
> >> the data twice, and to invoke such a command.
> >
> > Ah. That is an optimization question, not a
> > question of kernel/user mode.
>
> Note that /bin/cp isn't always going to have
> the necessary passwords and such. You're headed
> down a path toward setuid /bin/cp.
If cp doesn't have access to the proper security credentials,
then the file should not be copied.
> > Since the error checking for source and
> > destination both include doing a stat and
> > statfs, the device information (and FS info)
> > can both be retrieved.
> >
> > And mmap doesn't require data transfer "twice"
> > (local copy).
>
> Huh? Over the network from server to client
> counts as once. Then /bin/cp gets the data.
> Then it goes back over the network from the
> client to the server. That's "twice". That's
> horribly painful for a multi-gigabyte file
> and a DSL or cable-modem connection, never
> mind a dial-up connection.
True for all networked file systems. I had ment
to say (local filesystem copy).
> > Since that copy only pagefaults (though
> > read/write may be faster for some files
> > - I thought that was true for small files
> > that fit in cache, and large files faster
> > via mmap and depends on the page size;
> > and the tradeoff would be system dependant).
>
> Keep the read/write loop small for speed.
yes.
> > And since both source and destination may
> > be remote you do get to decide based on
> > source and destination devices: if they
> > are the same, and one on a remote node,
> > then BOTH will be on the remote, then you
> > get to use the CIFS/NFS file copy. (check
> > the doc on "stat/statfs" for additional info).
> >
> > I don't believe it works when source and
> > destination are on DIFFERENT remote nodes,
> > though.
> >
> > Strictly up to the implementation of cp/mv.
> >
> > Though you will loose portability of cp/mv.
> > (Of course, you also loose it with a syscall
> > for file copy too; as well as the MUCH more
> > complicated implementation/security checks).
>
> Doing that in cp/mv is just insane. For one,
> it bypasses any local security control over
> access to the filesystem. There's not even a
> way to be sure you're dealing with the server
> you think you're dealing with.
It shouldn't matter - first the source file must be opened
for read AND the destination file opened for write.
This should give the proper local security evaluation and
context for the copy. Once this has been approved,
the remote copy request can be made (provided they are
on the same "networked" device). Just making
the request still doesn't mean that it will succeed -
after all, the final security decisions are made by
the remote server implementing the file copy.
Though if the copy is valid locally, then the use of
the filesystem supported copy should work. It is an
equivalent operation, it just all takes place on the server.
Identity of the server is irrelevent, as long as it is
the same server (or farm) for both source and destination.
If the remote file copy is defined, then it should work
even when the actual source and destination are different
physical machines - the remote filesystem CLAIMS it will
work (identical is determined from the "device" mounted,
one mount, one device as far as network filesystems go).
And if they are not identical then you fall back to using
a local copy.
All bets are off if the local pathnames are required by
the remote server. That is silly. How would a networked
client even know what the pathname would be? The parameters
should be the two file handles passed to the remote filesystem.
Personally, I don't think any changes should be made.
It's just that this level of transfer is what the original
poster was talking about. It just shouldn't be done in
kernel mode.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-12 15:19 ` Jesse Pollard
@ 2003-11-14 3:42 ` Albert Cahalan
0 siblings, 0 replies; 77+ messages in thread
From: Albert Cahalan @ 2003-11-14 3:42 UTC (permalink / raw)
To: Jesse Pollard
Cc: Albert Cahalan, linux-kernel mailing list, davide.rossetti,
filia, dwmw2, moje, kakadu_croc
On Wed, 2003-11-12 at 10:19, Jesse Pollard wrote:
> On Monday 10 November 2003 19:05, Albert Cahalan wrote:
> > > The security context of the output depends
> > > on the user process. If it is a privileged
> > > process (ie, may change the context of the
> > > result) then the user process has to setup
> > > that context before the file is copied.
> >
> > So open the file, change context, and then:
> >
> > long copy_fd_to_file(int fd, const char *name, ...)
>
> Easy to do in user mode.
It isn't, because the user-mode code would
need to have a full understanding of whatever
fancy (seLinux, RSBAC, lomac...) security
mechanism the kernel is using. It's not enough
to just know about switching to some named
context via a common API.
> > >> Is it? Please explain the simple steps which
> > >> cp(1) should take in order to observe that it
> > >> is being asked to duplicate a file on a file
> > >> system such as CIFS (or NFSv4?) which allows
> > >> the client to issue a 'copy file' command
> > >> over the network without actually transferring
> > >> the data twice, and to invoke such a command.
> > >
> > > Ah. That is an optimization question, not a
> > > question of kernel/user mode.
> >
> > Note that /bin/cp isn't always going to have
> > the necessary passwords and such. You're headed
> > down a path toward setuid /bin/cp.
>
> If cp doesn't have access to the proper security credentials,
> then the file should not be copied.
You have proper credentials for access through
the mounted filesystem. That filesystem was
mounted by root, using some secret key that is
specific to the local machine. You could try
to directly contact the server over the network,
but you won't have the keys.
You're allowed to indirectly use the keys by
going through the mounted filesystem. For example,
you can call rmdir() to remove a directory but
you can not cause the same effect by sending a
message over the network directly to the server.
You have no ability to bypass the local kernel.
So you can copy that file, but you have to use
the file-oriented system calls to do it. You'll
need kernel support to invoke a remote-copy
operation. (or a setuid-root /bin/cp that looks
up the keys, determines the correct server, makes
a network connection, etc.)
> > > And since both source and destination may
> > > be remote you do get to decide based on
> > > source and destination devices: if they
> > > are the same, and one on a remote node,
> > > then BOTH will be on the remote, then you
> > > get to use the CIFS/NFS file copy. (check
> > > the doc on "stat/statfs" for additional info).
> > >
> > > I don't believe it works when source and
> > > destination are on DIFFERENT remote nodes,
> > > though.
> > >
> > > Strictly up to the implementation of cp/mv.
> > >
> > > Though you will loose portability of cp/mv.
> > > (Of course, you also loose it with a syscall
> > > for file copy too; as well as the MUCH more
> > > complicated implementation/security checks).
> >
> > Doing that in cp/mv is just insane. For one,
> > it bypasses any local security control over
> > access to the filesystem. There's not even a
> > way to be sure you're dealing with the server
> > you think you're dealing with.
>
> It shouldn't matter - first the source file must be opened
> for read AND the destination file opened for write.
> This should give the proper local security evaluation and
> context for the copy. Once this has been approved,
> the remote copy request can be made (provided they are
> on the same "networked" device). Just making
> the request still doesn't mean that it will succeed -
> after all, the final security decisions are made by
> the remote server implementing the file copy.
>
> Though if the copy is valid locally, then the use of
> the filesystem supported copy should work. It is an
> equivalent operation, it just all takes place on the server.
>
> Identity of the server is irrelevent, as long as it is
> the same server (or farm) for both source and destination.
> If the remote file copy is defined, then it should work
> even when the actual source and destination are different
> physical machines - the remote filesystem CLAIMS it will
> work (identical is determined from the "device" mounted,
> one mount, one device as far as network filesystems go).
> And if they are not identical then you fall back to using
> a local copy.
>
> All bets are off if the local pathnames are required by
> the remote server. That is silly. How would a networked
> client even know what the pathname would be? The parameters
> should be the two file handles passed to the remote filesystem.
You may need a filename relative to the root
of the exported part of the tree.
Remote side:
J:\groups\rteng\John Smith\tests\a.out
(with rteng exported as \\RTENG)
Local side:
/home/john/tests/a.out
(the mount point is "/home/john")
Path needed:
\\RTENG\John Smith\tests\a.out
You have that, since the kernel knows that a
"\\\\RTENG\\John Smith" directory was mounted
on /home/john and you're trying to deal with
a tests/a.out file.
> Personally, I don't think any changes should be made.
> It's just that this level of transfer is what the original
> poster was talking about. It just shouldn't be done in
> kernel mode.
Anywhere else would be buggy and most likely setuid.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
@ 2003-11-10 12:09 Bradley Chapman
2003-11-10 18:47 ` Tomas Konir
2003-11-10 22:44 ` Derek Foreman
0 siblings, 2 replies; 77+ messages in thread
From: Bradley Chapman @ 2003-11-10 12:09 UTC (permalink / raw)
To: davide.rossetti; +Cc: linux-kernel
Mr. Rossetti,
It is horribly RTFM.
man 2 sendfile is what you're after.
Brad
=====
Brad Chapman
Permanent e-mail: kakadu_croc@yahoo.com
__________________________________
Do you Yahoo!?
Protect your identity with Yahoo! Mail AddressGuard
http://antispam.yahoo.com/whatsnewfree
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-10 12:09 Bradley Chapman
@ 2003-11-10 18:47 ` Tomas Konir
2003-11-10 22:44 ` Derek Foreman
1 sibling, 0 replies; 77+ messages in thread
From: Tomas Konir @ 2003-11-10 18:47 UTC (permalink / raw)
Cc: linux-kernel
On Mon, 10 Nov 2003, Bradley Chapman wrote:
> Mr. Rossetti,
>
> It is horribly RTFM.
>
> man 2 sendfile is what you're after.
mhm
sendfile() can copy extended attributes and ACL ?
(i'm not think, that copy is the right candidate to syscall)
MOJE
--
Konir Tomas
Czech Republic
Brno
ICQ 25849167
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-10 12:09 Bradley Chapman
2003-11-10 18:47 ` Tomas Konir
@ 2003-11-10 22:44 ` Derek Foreman
1 sibling, 0 replies; 77+ messages in thread
From: Derek Foreman @ 2003-11-10 22:44 UTC (permalink / raw)
To: Bradley Chapman; +Cc: davide.rossetti, linux-kernel
On Mon, 10 Nov 2003, Bradley Chapman wrote:
> Mr. Rossetti,
>
> It is horribly RTFM.
>
> man 2 sendfile is what you're after.
I'm afraid it's not horribly RTFM at all.
sendfile won't do what he needs in 2.6.x.
^ permalink raw reply [flat|nested] 77+ messages in thread
[parent not found: <QiyV.1k3.15@gated-at.bofh.it>]
* Re: OT: why no file copy() libc/syscall ??
[not found] <QiyV.1k3.15@gated-at.bofh.it>
@ 2003-11-10 12:08 ` Ihar 'Philips' Filipau
2003-11-10 13:29 ` Jesse Pollard
0 siblings, 1 reply; 77+ messages in thread
From: Ihar 'Philips' Filipau @ 2003-11-10 12:08 UTC (permalink / raw)
To: Davide Rossetti; +Cc: Linux Kernel Mailing List
sendfile(2) - ?
Davide Rossetti wrote:
> it may be orribly RTFM... but writing a simple framework I realized
> there is no libc/POSIX/whoknows
> copy(const char* dest_file_name, const char* src_file_name)
>
> What is the technical reason???
>
> I understand that there may be little space for kernel side
> optimizations in this area but anyway I'm surprised I have to write
>
> < the bits to clone the metadata of src_file_name on opening
> dest_file_name >
> const int BUFSIZE = 1<<12;
> char buffer[BUFSIZE];
> int nrb;
> while((nrb = read(infd, buffer, BUFSIZE) != -1) {
> ret = write(outfd, buffer, nrb);
> if(ret != nrb) {...}
> }
>
> instead of something similar to:
> sys_fscopy(...)
>
> regards
>
--
Ihar 'Philips' Filipau / with best regards from Saarbruecken.
-- _ _ _
"... and for $64000 question, could you get yourself |_|*|_|
vaguely familiar with the notion of on-topic posting?" |_|_|*|
-- Al Viro @ LKML |*|*|*|
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-10 12:08 ` Ihar 'Philips' Filipau
@ 2003-11-10 13:29 ` Jesse Pollard
2003-11-10 14:22 ` Daniel Jacobowitz
` (2 more replies)
0 siblings, 3 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-11-10 13:29 UTC (permalink / raw)
To: Ihar 'Philips' Filipau, Davide Rossetti; +Cc: Linux Kernel Mailing List
On Monday 10 November 2003 06:08, Ihar 'Philips' Filipau wrote:
> sendfile(2) - ?
I don't think that is what he was referring to.. The sample
code is strictly user mode file->file copying.
> Davide Rossetti wrote:
> > it may be orribly RTFM... but writing a simple framework I realized
> > there is no libc/POSIX/whoknows
> > copy(const char* dest_file_name, const char* src_file_name)
> >
> > What is the technical reason???
It isn't an application for the kernel.
> > I understand that there may be little space for kernel side
> > optimizations in this area but anyway I'm surprised I have to write
> >
> > < the bits to clone the metadata of src_file_name on opening
> > dest_file_name >
> > const int BUFSIZE = 1<<12;
> > char buffer[BUFSIZE];
> > int nrb;
> > while((nrb = read(infd, buffer, BUFSIZE) != -1) {
> > ret = write(outfd, buffer, nrb);
> > if(ret != nrb) {...}
> > }
> >
> > instead of something similar to:
> > sys_fscopy(...)
It is too simple to implement in user mode.
There are some other issues too:
The security context of the output depends on the user process.
If it is a privileged process (ie, may change the context of the
result) then the user process has to setup that context before
the file is copied.
There are also some issues with mandatory security controls. If it
is copied in kernel mode, then the previous labels could be automatically
carried over to the resulting file... But that may not be what you
want (and frequently, it isn't).
Now back to the copy.. You don't have to use a read/write loop- mmap
is faster. And this is the other reason for not doing it in Kernel mode.
Buffer management of this type is much easier in user space since the
copy procedure doesn't have to deal with memory limitations, cache flushes
page faulting of processes unrelated to the copy, but is related to cache
pressure.
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-10 13:29 ` Jesse Pollard
@ 2003-11-10 14:22 ` Daniel Jacobowitz
2003-11-11 20:57 ` Jakob Oestergaard
2003-11-10 15:19 ` David Woodhouse
2003-11-11 12:00 ` davide.rossetti
2 siblings, 1 reply; 77+ messages in thread
From: Daniel Jacobowitz @ 2003-11-10 14:22 UTC (permalink / raw)
To: Linux Kernel Mailing List
On Mon, Nov 10, 2003 at 07:29:15AM -0600, Jesse Pollard wrote:
> Now back to the copy.. You don't have to use a read/write loop- mmap
> is faster. And this is the other reason for not doing it in Kernel mode.
Actually, last I checked, read/write was actually faster. Linus
explained why a month or two ago.
--
Daniel Jacobowitz
MontaVista Software Debian GNU/Linux Developer
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-10 14:22 ` Daniel Jacobowitz
@ 2003-11-11 20:57 ` Jakob Oestergaard
0 siblings, 0 replies; 77+ messages in thread
From: Jakob Oestergaard @ 2003-11-11 20:57 UTC (permalink / raw)
To: Linux Kernel Mailing List
On Mon, Nov 10, 2003 at 09:22:22AM -0500, Daniel Jacobowitz wrote:
> On Mon, Nov 10, 2003 at 07:29:15AM -0600, Jesse Pollard wrote:
> > Now back to the copy.. You don't have to use a read/write loop- mmap
> > is faster. And this is the other reason for not doing it in Kernel mode.
>
> Actually, last I checked, read/write was actually faster. Linus
> explained why a month or two ago.
It would also not break on large files...
--
................................................................
: jakob@unthought.net : And I see the elder races, :
:.........................: putrid forms of man :
: Jakob Østergaard : See him rise and claim the earth, :
: OZ9ABN : his downfall is at hand. :
:.........................:............{Konkhra}...............:
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-10 13:29 ` Jesse Pollard
2003-11-10 14:22 ` Daniel Jacobowitz
@ 2003-11-10 15:19 ` David Woodhouse
2003-11-10 16:15 ` Jesse Pollard
2003-11-11 12:00 ` davide.rossetti
2 siblings, 1 reply; 77+ messages in thread
From: David Woodhouse @ 2003-11-10 15:19 UTC (permalink / raw)
To: Jesse Pollard
Cc: Ihar 'Philips' Filipau, Davide Rossetti,
Linux Kernel Mailing List
On Mon, 2003-11-10 at 07:29 -0600, Jesse Pollard wrote:
> > > sys_fscopy(...)
>
> It is too simple to implement in user mode.
Is it? Please explain the simple steps which cp(1) should take in order
to observe that it is being asked to duplicate a file on a file system
such as CIFS (or NFSv4?) which allows the client to issue a 'copy file'
command over the network without actually transferring the data twice,
and to invoke such a command.
--
dwmw2
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-10 15:19 ` David Woodhouse
@ 2003-11-10 16:15 ` Jesse Pollard
0 siblings, 0 replies; 77+ messages in thread
From: Jesse Pollard @ 2003-11-10 16:15 UTC (permalink / raw)
To: David Woodhouse
Cc: Ihar 'Philips' Filipau, Davide Rossetti,
Linux Kernel Mailing List
On Monday 10 November 2003 09:19, David Woodhouse wrote:
> On Mon, 2003-11-10 at 07:29 -0600, Jesse Pollard wrote:
> > > > sys_fscopy(...)
> >
> > It is too simple to implement in user mode.
>
> Is it? Please explain the simple steps which cp(1) should take in order
> to observe that it is being asked to duplicate a file on a file system
> such as CIFS (or NFSv4?) which allows the client to issue a 'copy file'
> command over the network without actually transferring the data twice,
> and to invoke such a command.
Ah. That is an optimization question, not a question of kernel/user mode.
Since the error checking for source and destination both include doing
a stat and statfs, the device information (and FS info) can both be retrieved.
And mmap doesn't require data transfer "twice" (local copy). Since that copy
only pagefaults (though read/write may be faster for some files - I thought
that was true for small files that fit in cache, and large files faster via
mmap and depends on the page size; and the tradeoff would be system
dependant).
And since both source and destination may be remote you do get to decide
based on source and destination devices: if they are the same, and one on
a remote node, then BOTH will be on the remote, then you get to use the
CIFS/NFS file copy. (check the doc on "stat/statfs" for additional info).
I don't believe it works when source and destination are on DIFFERENT remote
nodes, though.
Strictly up to the implementation of cp/mv.
Though you will loose portability of cp/mv. (Of course, you also loose
it with a syscall for file copy too; as well as the MUCH more complicated
implementation/security checks).
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-10 13:29 ` Jesse Pollard
2003-11-10 14:22 ` Daniel Jacobowitz
2003-11-10 15:19 ` David Woodhouse
@ 2003-11-11 12:00 ` davide.rossetti
2003-11-11 12:08 ` Andreas Schwab
2 siblings, 1 reply; 77+ messages in thread
From: davide.rossetti @ 2003-11-11 12:00 UTC (permalink / raw)
To: Jesse Pollard; +Cc: Ihar 'Philips' Filipau, Linux Kernel Mailing List
On Mon, 10 Nov 2003, Jesse Pollard wrote:
> On Monday 10 November 2003 06:08, Ihar 'Philips' Filipau wrote:
> > sendfile(2) - ?
> I don't think that is what he was referring to.. The sample
> code is strictly user mode file->file copying.
> > Davide Rossetti wrote:
> > > it may be orribly RTFM... but writing a simple framework I realized
> > > there is no libc/POSIX/whoknows
> > > copy(const char* dest_file_name, const char* src_file_name)
> > >
> > > What is the technical reason???
>
> It isn't an application for the kernel.
Maybe I was misunderstood... I'm asking why the libc/iso/ansi/posix
engineer did not add the spec a user-mode API to do copy file to file ???
if there was such a standard _user_ API, we could talk about user/kernel
implementation issues... but my question is more "primitive" somehow :)
> > > I understand that there may be little space for kernel side
> > > optimizations in this area but anyway I'm surprised I have to write
> > >
> > > < the bits to clone the metadata of src_file_name on opening
> > > dest_file_name >
> > > const int BUFSIZE = 1<<12;
> > > char buffer[BUFSIZE];
> > > int nrb;
> > > while((nrb = read(infd, buffer, BUFSIZE) != -1) {
> > > ret = write(outfd, buffer, nrb);
> > > if(ret != nrb) {...}
> > > }
> > >
> > > instead of something similar to:
> > > sys_fscopy(...)
>
> It is too simple to implement in user mode.
>
> There are some other issues too:
>
> The security context of the output depends on the user process.
> If it is a privileged process (ie, may change the context of the
> result) then the user process has to setup that context before
> the file is copied.
>
> There are also some issues with mandatory security controls. If it
> is copied in kernel mode, then the previous labels could be automatically
> carried over to the resulting file... But that may not be what you
> want (and frequently, it isn't).
>
> Now back to the copy.. You don't have to use a read/write loop- mmap
> is faster. And this is the other reason for not doing it in Kernel mode.
> Buffer management of this type is much easier in user space since the
> copy procedure doesn't have to deal with memory limitations, cache flushes
> page faulting of processes unrelated to the copy, but is related to cache
> pressure.
ok... so I have to code a framework routine which auto-benchmarks (at
either runtime or configure time) and uses at least 2 implementations, one
using read/write and another mmap(), as I know for sure that on
different Unices they perform differently... ah.. and the day we add
sys_sendfile(fd,fd) (if it is not there yet) I have to add yet another
implementation... and doing file copies of gigabyte sized files with
mmap() on 32bit archs isn't so trivial, you have to do windowing I
guess...
seems scary at least ;)
<joke>
it seems similar to saying that we do not need a rename() Posix/XOpen/etc
API as we can do:
rename(to, from) {
link(to, from); // make hardlink
unlink(from); // remove original
}
</joke>
regards
--
______/ Rossetti Davide INFN - Roma I - APE group \______________
pho +390649914507/412 web: http://apegate.roma1.infn.it/~rossetti
fax +390649914423 email: davide.rossetti@roma1.infn.it
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 12:00 ` davide.rossetti
@ 2003-11-11 12:08 ` Andreas Schwab
2003-11-11 12:23 ` davide.rossetti
0 siblings, 1 reply; 77+ messages in thread
From: Andreas Schwab @ 2003-11-11 12:08 UTC (permalink / raw)
To: davide.rossetti
Cc: Jesse Pollard, Ihar 'Philips' Filipau, Linux Kernel Mailing List
"davide.rossetti" <rossetti@roma1.infn.it> writes:
> Maybe I was misunderstood... I'm asking why the libc/iso/ansi/posix
> engineer did not add the spec a user-mode API to do copy file to file ???
Because there was no prior art.
Andreas.
--
Andreas Schwab, SuSE Labs, schwab@suse.de
SuSE Linux AG, Deutschherrnstr. 15-19, D-90429 Nürnberg
Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5
"And now for something completely different."
^ permalink raw reply [flat|nested] 77+ messages in thread
* Re: OT: why no file copy() libc/syscall ??
2003-11-11 12:08 ` Andreas Schwab
@ 2003-11-11 12:23 ` davide.rossetti
0 siblings, 0 replies; 77+ messages in thread
From: davide.rossetti @ 2003-11-11 12:23 UTC (permalink / raw)
To: Andreas Schwab; +Cc: Linux Kernel Mailing List
On Tue, 11 Nov 2003, Andreas Schwab wrote:
> "davide.rossetti" <rossetti@roma1.infn.it> writes:
>
> > Maybe I was misunderstood... I'm asking why the libc/iso/ansi/posix
> > engineer did not add the spec a user-mode API to do copy file to file ???
>
> Because there was no prior art.
:) but late revisions of specs are really recent!!!
folks are talking about implementing all sort of stuff (web servers,
parallel filesystems, ...) (partly) in kernel mode and no one cares of
(maybe accelerated) fs copies ???
--
______/ Rossetti Davide INFN - Roma I - APE group \______________
pho +390649914507/412 web: http://apegate.roma1.infn.it/~rossetti
fax +390649914423 email: davide.rossetti@roma1.infn.it
^ permalink raw reply [flat|nested] 77+ messages in thread
* OT: why no file copy() libc/syscall ??
@ 2003-11-10 11:33 Davide Rossetti
0 siblings, 0 replies; 77+ messages in thread
From: Davide Rossetti @ 2003-11-10 11:33 UTC (permalink / raw)
To: linux-kernel
it may be orribly RTFM... but writing a simple framework I realized
there is no libc/POSIX/whoknows
copy(const char* dest_file_name, const char* src_file_name)
What is the technical reason???
I understand that there may be little space for kernel side
optimizations in this area but anyway I'm surprised I have to write
< the bits to clone the metadata of src_file_name on opening
dest_file_name >
const int BUFSIZE = 1<<12;
char buffer[BUFSIZE];
int nrb;
while((nrb = read(infd, buffer, BUFSIZE) != -1) {
ret = write(outfd, buffer, nrb);
if(ret != nrb) {...}
}
instead of something similar to:
sys_fscopy(...)
regards
^ permalink raw reply [flat|nested] 77+ messages in thread
end of thread, other threads:[~2003-12-01 16:36 UTC | newest]
Thread overview: 77+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1068512710.722.161.camel@cube.suse.lists.linux.kernel>
[not found] ` <20031111133859.GA11115@bitwizard.nl.suse.lists.linux.kernel>
[not found] ` <20031111085323.M8854@devserv.devel.redhat.com.suse.lists.linux.kernel>
[not found] ` <bp0p5m$lke$1@cesium.transmeta.com.suse.lists.linux.kernel>
[not found] ` <20031113233915.GO1649@x30.random.suse.lists.linux.kernel>
[not found] ` <3FB4238A.40605@zytor.com.suse.lists.linux.kernel>
[not found] ` <20031114011009.GP1649@x30.random.suse.lists.linux.kernel>
[not found] ` <3FB42CC4.9030009@zytor.com.suse.lists.linux.kernel>
2003-11-14 15:26 ` OT: why no file copy() libc/syscall ?? Andi Kleen
2003-11-18 15:49 ` Jamie Lokier
2003-11-18 16:05 ` Andi Kleen
2003-11-18 16:25 ` Trond Myklebust
2003-11-19 13:30 ` Jesse Pollard
2003-11-18 16:58 ` H. Peter Anvin
2003-11-19 2:12 ` Linus Torvalds
2003-11-19 4:04 ` Chris Adams
[not found] <Qvw7.5Qf.9@gated-at.bofh.it>
[not found] ` <QxRl.17Y.9@gated-at.bofh.it>
[not found] ` <Qy0W.1sk.9@gated-at.bofh.it>
[not found] ` <QyaB.1GK.17@gated-at.bofh.it>
[not found] ` <QzSZ.4x1.1@gated-at.bofh.it>
[not found] ` <QCHh.X6.3@gated-at.bofh.it>
2003-11-11 9:51 ` Ihar 'Philips' Filipau
2003-11-11 10:41 ` jw schultz
[not found] ` <QH4e.eV.3@gated-at.bofh.it>
2003-11-11 14:11 ` Ihar 'Philips' Filipau
2003-11-11 15:02 ` Rogier Wolff
2003-11-11 15:31 ` Ihar 'Philips' Filipau
2003-11-11 20:22 ` Jan Harkes
2003-11-11 20:31 ` Valdis.Kletnieks
[not found] <QDtX.2dq.15@gated-at.bofh.it>
[not found] ` <QDtX.2dq.17@gated-at.bofh.it>
[not found] ` <QDtX.2dq.19@gated-at.bofh.it>
[not found] ` <QDtX.2dq.21@gated-at.bofh.it>
[not found] ` <QDtX.2dq.23@gated-at.bofh.it>
[not found] ` <QDtY.2dq.25@gated-at.bofh.it>
[not found] ` <QDtX.2dq.13@gated-at.bofh.it>
[not found] ` <QEg2.3zi.9@gated-at.bofh.it>
2003-11-11 12:43 ` Ihar 'Philips' Filipau
2003-11-11 1:05 Albert Cahalan
2003-11-11 3:50 ` Andreas Dilger
2003-11-11 4:03 ` Daniel Gryniewicz
2003-11-11 4:14 ` Valdis.Kletnieks
2003-11-11 6:00 ` Andreas Dilger
2003-11-11 8:58 ` Florian Weimer
2003-11-11 10:27 ` jw schultz
2003-11-11 20:08 ` Jan Harkes
2003-11-12 15:36 ` Jesse Pollard
2003-11-20 17:21 ` Florian Weimer
2003-11-20 19:08 ` Jesse Pollard
2003-11-20 19:12 ` Florian Weimer
2003-11-20 19:44 ` Justin Cormack
2003-11-20 20:44 ` Timothy Miller
2003-11-20 21:07 ` Andreas Dilger
2003-11-20 21:30 ` Timothy Miller
2003-11-20 21:49 ` Maciej Zenczykowski
2003-11-20 21:52 ` Timothy Miller
2003-11-20 21:58 ` Hua Zhong
2003-11-22 14:50 ` Pavel Machek
2003-11-22 19:50 ` Jamie Lokier
2003-11-22 23:07 ` Andreas Schwab
2003-11-21 16:24 ` Jesse Pollard
2003-11-20 21:48 ` Maciej Zenczykowski
2003-11-21 16:34 ` Jesse Pollard
2003-11-20 22:31 ` Xavier Bestel
2003-11-20 22:44 ` Andreas Dilger
2003-11-27 2:40 ` Robert White
2003-11-27 7:29 ` Nick Piggin
2003-11-27 9:15 ` David Lang
2003-11-27 8:56 ` Nick Piggin
2003-11-27 9:50 ` David Lang
2003-11-27 10:02 ` Jörn Engel
2003-11-27 10:58 ` David Lang
2003-12-01 16:20 ` Jesse Pollard
2003-11-11 8:52 ` Gábor Lénárt
2003-11-11 13:38 ` Rogier Wolff
2003-11-11 13:53 ` Jakub Jelinek
2003-11-11 13:58 ` David Woodhouse
2003-11-13 20:22 ` H. Peter Anvin
2003-11-13 23:39 ` Andrea Arcangeli
2003-11-14 0:04 ` jw schultz
2003-11-14 0:36 ` H. Peter Anvin
2003-11-14 1:10 ` Andrea Arcangeli
2003-11-14 1:15 ` H. Peter Anvin
2003-11-11 14:11 ` Albert Cahalan
2003-11-12 15:19 ` Jesse Pollard
2003-11-14 3:42 ` Albert Cahalan
-- strict thread matches above, loose matches on Subject: below --
2003-11-10 12:09 Bradley Chapman
2003-11-10 18:47 ` Tomas Konir
2003-11-10 22:44 ` Derek Foreman
[not found] <QiyV.1k3.15@gated-at.bofh.it>
2003-11-10 12:08 ` Ihar 'Philips' Filipau
2003-11-10 13:29 ` Jesse Pollard
2003-11-10 14:22 ` Daniel Jacobowitz
2003-11-11 20:57 ` Jakob Oestergaard
2003-11-10 15:19 ` David Woodhouse
2003-11-10 16:15 ` Jesse Pollard
2003-11-11 12:00 ` davide.rossetti
2003-11-11 12:08 ` Andreas Schwab
2003-11-11 12:23 ` davide.rossetti
2003-11-10 11:33 Davide Rossetti
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).