linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jesse Pollard <jesse@cats-chateau.net>
To: Albert Cahalan <albert@users.sourceforge.net>,
	linux-kernel mailing list <linux-kernel@vger.kernel.org>
Cc: davide.rossetti@roma1.infn.it, filia@softhome.net,
	jesse@cats-chateau.net, dwmw2@infradead.org, moje@vabo.cz,
	kakadu_croc@yahoo.com
Subject: Re: OT: why no file copy() libc/syscall ??
Date: Wed, 12 Nov 2003 09:19:53 -0600	[thread overview]
Message-ID: <03111209195300.11900@tabby> (raw)
In-Reply-To: <1068512710.722.161.camel@cube>

On Monday 10 November 2003 19:05, Albert Cahalan wrote:
> > It is too simple to implement in user mode.
>
> That works for a plain byte-stream on a
> local UNIX-style filesystem. (though it
> likely isn't the fastest)

Yes - this was the local copy

> It doesn't work for Macintosh files.
> It's too slow for CIFS over a modem.
> It doesn't work for Windows security data.
> It doesn't allow copy-on-write files.
> It eats CPU time on compressed filesystems.
>
> > The security context of the output depends
> > on the user process. If it is a privileged
> > process (ie, may change the context of the
> > result) then the user process has to setup
> > that context before the file is copied.
>
> So open the file, change context, and then:
>
> long copy_fd_to_file(int fd, const char *name, ...)

Easy to do in user mode.

>
> (if you can no longer read from the OPEN fd,
> either we override that or we just don't care
> about such mostly-fictional cases)

correct - If you can't read, fail.

> > There are also some issues with mandatory
> > security controls. If it is copied in kernel
> > mode, then the previous labels could be
> > automatically carried over to the resulting
> > file... But that may not be what you want
> > (and frequently, it isn't).
>
> If it matters:
>
> // security as if a new file were created
> #define CF_REPLACE_SECURITY 0x00000001
> // if unable to replicate, up or down?
> #define CF_ROUND_SECURITY_UP 0x00000002
> #define CF_ROUND_SECURITY_DOWN 0x00000004
> // fail if security can't be replicated
> #define CF_SECURITY_EXACT 0x00000008
>
> > Now back to the copy.. You don't have to
> > use a read/write loop- mmap is faster.
>
> It's slower. (this is Linux, not SunOS)
> Use a 4 kB or 8 kB read/write loop.

yup local.

> > And this is the other reason for not doing
> > it in Kernel mode. Buffer management of
> > this type is much easier in user space
> > since the copy procedure doesn't have to
> > deal with memory limitations, cache flushes
> > page faulting of processes unrelated to the
> > copy, but is related to cache pressure.
>
> Buffer management is very much a kernel thing.

Yes it is, but do you want to push process dependant
buffer management into the page management? It's just
easier to do this in user mode, and allow the kernel
to handle global page managment.

> >> Is it? Please explain the simple steps which
> >> cp(1) should take in order to observe that it
> >> is being asked to duplicate a file on a file
> >> system such as CIFS (or NFSv4?) which allows
> >> the client to issue a 'copy file' command
> >> over the network without actually transferring
> >> the data twice, and to invoke such a command.
> >
> > Ah. That is an optimization question, not a
> > question of kernel/user mode.
>
> Note that /bin/cp isn't always going to have
> the necessary passwords and such. You're headed
> down a path toward setuid /bin/cp.

If cp doesn't have access to the proper security credentials,
then the file should not be copied.

> > Since the error checking for source and
> > destination both include doing a stat and
> > statfs, the device information (and FS info)
> > can both be retrieved.
> >
> > And mmap doesn't require data transfer "twice"
> > (local copy).
>
> Huh? Over the network from server to client
> counts as once. Then /bin/cp gets the data.
> Then it goes back over the network from the
> client to the server. That's "twice". That's
> horribly painful for a multi-gigabyte file
> and a DSL or cable-modem connection, never
> mind a dial-up connection.

True for all networked file systems. I had ment
to say (local filesystem copy).

> > Since that copy only pagefaults (though
> > read/write may be faster for some files
> > - I thought that was true for small files
> > that fit in cache, and large files faster
> > via mmap and depends on the page size;
> > and the tradeoff would be system dependant).
>
> Keep the read/write loop small for speed.

yes.

> > And since both source and destination may
> > be remote you do get to decide based on
> > source and destination devices: if they
> > are the same, and one on a remote node,
> > then BOTH will be on the remote, then you
> > get to use the CIFS/NFS file copy. (check
> > the doc on "stat/statfs" for additional info).
> >
> > I don't believe it works when source and
> > destination are on DIFFERENT remote nodes,
> > though.
> >
> > Strictly up to the implementation of cp/mv.
> >
> > Though you will loose portability of cp/mv.
> > (Of course, you also loose it with a syscall
> > for file copy too; as well as the MUCH more
> > complicated implementation/security checks).
>
> Doing that in cp/mv is just insane. For one,
> it bypasses any local security control over
> access to the filesystem. There's not even a
> way to be sure you're dealing with the server
> you think you're dealing with.

It shouldn't matter - first the source file must be opened
for read AND the destination file opened for write.
This should give the proper local security evaluation and
context for the copy. Once this has been approved,
the remote copy request can be made (provided they are
on the same "networked" device). Just making
the request still doesn't mean that it will succeed -
after all, the final security decisions are made by
the remote server implementing the file copy.

Though if the copy is valid locally, then the use of
the filesystem supported copy should work. It is an
equivalent operation, it just all takes place on the server.

Identity of the server is irrelevent, as long as it is
the same server (or farm) for both source and destination.
If the remote file copy is defined, then it should work
even when the actual source and destination are different
physical machines - the remote filesystem CLAIMS it will
work (identical is determined from the "device" mounted,
one mount, one device as far as network filesystems go).
And if they are not identical then you fall back to using
a local copy.

All bets are off if the local pathnames are required by
the remote server. That is silly. How would a networked
client even know what the pathname would be? The parameters
should be the two file handles passed to the remote filesystem.

Personally, I don't think any changes should be made.
It's just that this level of transfer is what the original
poster was talking about. It just shouldn't be done in
kernel mode.

  parent reply	other threads:[~2003-11-12 15:20 UTC|newest]

Thread overview: 77+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2003-11-11  1:05 OT: why no file copy() libc/syscall ?? Albert Cahalan
2003-11-11  3:50 ` Andreas Dilger
2003-11-11  4:03   ` Daniel Gryniewicz
2003-11-11  4:14     ` Valdis.Kletnieks
2003-11-11  6:00       ` Andreas Dilger
2003-11-11  8:58         ` Florian Weimer
2003-11-11 10:27           ` jw schultz
2003-11-11 20:08             ` Jan Harkes
2003-11-12 15:36           ` Jesse Pollard
2003-11-20 17:21             ` Florian Weimer
2003-11-20 19:08               ` Jesse Pollard
2003-11-20 19:12                 ` Florian Weimer
2003-11-20 19:44                 ` Justin Cormack
2003-11-20 20:44                   ` Timothy Miller
2003-11-20 21:07                     ` Andreas Dilger
2003-11-20 21:30                       ` Timothy Miller
2003-11-20 21:49                         ` Maciej Zenczykowski
2003-11-20 21:52                           ` Timothy Miller
2003-11-20 21:58                         ` Hua Zhong
2003-11-22 14:50                         ` Pavel Machek
2003-11-22 19:50                           ` Jamie Lokier
2003-11-22 23:07                             ` Andreas Schwab
2003-11-21 16:24                   ` Jesse Pollard
2003-11-20 21:48                 ` Maciej Zenczykowski
2003-11-21 16:34                   ` Jesse Pollard
2003-11-20 22:31                 ` Xavier Bestel
2003-11-20 22:44                   ` Andreas Dilger
2003-11-27  2:40                 ` Robert White
2003-11-27  7:29                   ` Nick Piggin
2003-11-27  9:15                     ` David Lang
2003-11-27  8:56                       ` Nick Piggin
2003-11-27  9:50                         ` David Lang
2003-11-27 10:02                           ` Jörn Engel
2003-11-27 10:58                             ` David Lang
2003-12-01 16:20                               ` Jesse Pollard
2003-11-11  8:52   ` Gábor Lénárt
2003-11-11 13:38 ` Rogier Wolff
2003-11-11 13:53   ` Jakub Jelinek
2003-11-11 13:58     ` David Woodhouse
2003-11-13 20:22     ` H. Peter Anvin
2003-11-13 23:39       ` Andrea Arcangeli
2003-11-14  0:04         ` jw schultz
2003-11-14  0:36         ` H. Peter Anvin
2003-11-14  1:10           ` Andrea Arcangeli
2003-11-14  1:15             ` H. Peter Anvin
2003-11-11 14:11   ` Albert Cahalan
2003-11-12 15:19 ` Jesse Pollard [this message]
2003-11-14  3:42   ` Albert Cahalan
     [not found] <1068512710.722.161.camel@cube.suse.lists.linux.kernel>
     [not found] ` <20031111133859.GA11115@bitwizard.nl.suse.lists.linux.kernel>
     [not found]   ` <20031111085323.M8854@devserv.devel.redhat.com.suse.lists.linux.kernel>
     [not found]     ` <bp0p5m$lke$1@cesium.transmeta.com.suse.lists.linux.kernel>
     [not found]       ` <20031113233915.GO1649@x30.random.suse.lists.linux.kernel>
     [not found]         ` <3FB4238A.40605@zytor.com.suse.lists.linux.kernel>
     [not found]           ` <20031114011009.GP1649@x30.random.suse.lists.linux.kernel>
     [not found]             ` <3FB42CC4.9030009@zytor.com.suse.lists.linux.kernel>
2003-11-14 15:26               ` Andi Kleen
2003-11-18 15:49                 ` Jamie Lokier
2003-11-18 16:05                   ` Andi Kleen
2003-11-18 16:25                     ` Trond Myklebust
2003-11-19 13:30                   ` Jesse Pollard
2003-11-18 16:58                 ` H. Peter Anvin
2003-11-19  2:12                 ` Linus Torvalds
2003-11-19  4:04                 ` Chris Adams
     [not found] <Qvw7.5Qf.9@gated-at.bofh.it>
     [not found] ` <QxRl.17Y.9@gated-at.bofh.it>
     [not found]   ` <Qy0W.1sk.9@gated-at.bofh.it>
     [not found]     ` <QyaB.1GK.17@gated-at.bofh.it>
     [not found]       ` <QzSZ.4x1.1@gated-at.bofh.it>
     [not found]         ` <QCHh.X6.3@gated-at.bofh.it>
2003-11-11  9:51           ` Ihar 'Philips' Filipau
2003-11-11 10:41             ` jw schultz
     [not found] ` <QH4e.eV.3@gated-at.bofh.it>
2003-11-11 14:11   ` Ihar 'Philips' Filipau
2003-11-11 15:02     ` Rogier Wolff
2003-11-11 15:31       ` Ihar 'Philips' Filipau
2003-11-11 20:22       ` Jan Harkes
2003-11-11 20:31         ` Valdis.Kletnieks
     [not found] <QDtX.2dq.15@gated-at.bofh.it>
     [not found] ` <QDtX.2dq.17@gated-at.bofh.it>
     [not found]   ` <QDtX.2dq.19@gated-at.bofh.it>
     [not found]     ` <QDtX.2dq.21@gated-at.bofh.it>
     [not found]       ` <QDtX.2dq.23@gated-at.bofh.it>
     [not found]         ` <QDtY.2dq.25@gated-at.bofh.it>
     [not found]           ` <QDtX.2dq.13@gated-at.bofh.it>
     [not found]             ` <QEg2.3zi.9@gated-at.bofh.it>
2003-11-11 12:43               ` Ihar 'Philips' Filipau
  -- strict thread matches above, loose matches on Subject: below --
2003-11-10 12:09 Bradley Chapman
2003-11-10 18:47 ` Tomas Konir
2003-11-10 22:44 ` Derek Foreman
     [not found] <QiyV.1k3.15@gated-at.bofh.it>
2003-11-10 12:08 ` Ihar 'Philips' Filipau
2003-11-10 13:29   ` Jesse Pollard
2003-11-10 14:22     ` Daniel Jacobowitz
2003-11-11 20:57       ` Jakob Oestergaard
2003-11-10 15:19     ` David Woodhouse
2003-11-10 16:15       ` Jesse Pollard
2003-11-11 12:00     ` davide.rossetti
2003-11-11 12:08       ` Andreas Schwab
2003-11-11 12:23         ` davide.rossetti
2003-11-10 11:33 Davide Rossetti

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=03111209195300.11900@tabby \
    --to=jesse@cats-chateau.net \
    --cc=albert@users.sourceforge.net \
    --cc=davide.rossetti@roma1.infn.it \
    --cc=dwmw2@infradead.org \
    --cc=filia@softhome.net \
    --cc=kakadu_croc@yahoo.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=moje@vabo.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).