linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Larry McVoy <lm@bitmover.com>
Cc: William Lee Irwin III <wli@holomorphy.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Make pipe data structure be a circular list of pages, rather than
Date: Wed, 19 Jan 2005 09:14:33 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.58.0501190843220.8178@ppc970.osdl.org> (raw)
In-Reply-To: <20050119162902.GA20656@work.bitmover.com>



On Wed, 19 Jan 2005, Larry McVoy wrote:
>
> I think you are going to regret making splice() a system call, it shouldn't
> be, you'll find cases where it won't work.  Make it a library call built
> out of pull() and push()

Actually, if you look at my current (ugly) test-patch, the one part of it
that isn't ugly is the actual "sys_splice()" + "do_splice()" part, and the
only thing they actually do is (a) do all the required "fd -> struct file"  
setup (including locking, testing readability vs writability etc), and
then (b) they check which side is a pipe, and split it up into "pull" vs 
"push" at that point.

Of course, the "pull" is actually called "do_splice_from()", and the 
"push" is called "do_splice_to()", but I can rename them if you want to ;)

So yes, internally it is actually a push/pull thing, with separate actions 
for both. I absolutely agree that you cannot have a "splice()" action, you 
need to have a _direction_. But that's actually exactly what the pipe 
gives you: since all data has to originate or end in a pipe, the direction 
is implicit by the file descriptors involved.

(The exception is a pipe->pipe transfer, but then the direction doesn't
matter, and you can choose whichever just ends up being easier. We happen
to consider it a "do_splice_from()" from the source pipe right now, but
that's purely a matter of "which way do you test first").

Could we show it as two system calls? Sure. It wouldn't give you anything,
though, since you need to do all the tests anyway - including the test
for whether the source or destination is a pipe which currently is the 
thing that splits up the "splice()" into two cases. So it's not even like 
you could optimize out the direction test - it would just become an 
error case test instead.

And having just one system call means that the user doesn't need to care
(other than making sure that one end _is_ a pipe), and also allows us to
share all the code that _is_ shared (namely the "struct file *" setups and
testing - not a whole lot, but it's cleaner that way).

I think you are perhaps confused about the fact that what makes this all
possible in the first place really is the _pipe_. You probably think of
"splice()" as going from one file descriptor to another. It's not. It goes
from one (generic) file descriptor into a _pipe_, or from one pipe into
another (generic) file descriptor. So the "push"/"pull" part really is 
there, but it's there in another form: if you want to copy from one file 
to another, you have to

	1) open a pipe - it's the splice equivalent of allocating a
	   temporary buffer.
   repeat:
	   2) "pull" from the fd into the pipe: splice(in, pipe[1], size)
	   3) "push" from the pipe into the fd: splice(pipe[0], out, size)

so it's all there. The above is 100% conceptually equivalent to

	 1) buf = malloc()
   repeat:
	   2) read(in, buf, size);
	   3) write(out, buf, size);

See? I think you'll find that the "push" and "pull" you look for really is 
there.

The advantage here is that if either the input or the output _already_ is
a pipe, you can optimize it to just do a loop of a single splice(), with
no new temporary buffer needed. So while the above three steps are the 
generic case, depending on how you do things the _common_ case may be just 
a simple

    repeat:
	  1) splice(in, out, maxsize);

which is why you do not want to _force_ the split. The determination of 
how to do this efficiently at run-time is also very easy:

	int copy_fd(int in, int out)
	{
		char *buf;
		int fds[2];

		for (;;) {
			int count = splice(in, out, ~0UL);
			if (count < 0) {
				if (errno == EAGAIN)
					continue;
				/* Old kernel without "splice()"? */
				if (errno == ENOSYS)
					goto read_write_case;
				/* No pipe involved? */
				if (errno == EINVAL)
					goto pipe_buffer_case;
				/* We found a pipe, but the other end cannot accept this splice */
				if (errno == ECANNOTSPLICE)
					goto read_write_case;
				return -1;
			}
			if (!count)
				return 0;
		}

	pipe_buffer_case:
		if (pipe(fds) < 0)
			goto read_write_case;
		for (;;) {
			int count = splice(in, fds[1], ~0UL);
			if (count < 0)
				if (errno == EAGAIN)
					continue;
				return -1;
			}
			if (!count)
				return 0;
			do {
				int n = splice(fds[0], out, count);
				if (n < 0) {
					if (errno == EAGAIN)
						continue;
					return -1;
				}
				if (!n) {
					errno = ENOSPC;
					return -1;
				}
			} while ((count -= n) > 0)
		}

	read_write_case:
		buf = malloc(BUFSIZE);
		for (;;) {
			int count = read(in, buf, BUFSIZE);
			...

See? THAT is the kind of library routine you want to have (btw, there's no 
"ECANNOTSPLICE" error code right now, my current example code incorrectly 
returns EINVAL for both the "no pipe" case and the "found pipe but not 
splicable" case).

			Linus

  reply	other threads:[~2005-01-19 17:17 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <200501070313.j073DCaQ009641@hera.kernel.org>
2005-01-07  3:41 ` Make pipe data structure be a circular list of pages, rather than William Lee Irwin III
2005-01-07  6:35   ` Linus Torvalds
2005-01-07  6:37     ` Linus Torvalds
2005-01-19 16:29       ` Larry McVoy
2005-01-19 17:14         ` Linus Torvalds [this message]
2005-01-19 19:01           ` Larry McVoy
2005-01-20  0:01             ` Linus Torvalds
2005-01-07 14:30 Oleg Nesterov
2005-01-07 15:45 ` Alan Cox
2005-01-07 17:23   ` Linus Torvalds
2005-01-08 18:25     ` Hugh Dickins
2005-01-08 18:54       ` Linus Torvalds
2005-01-07 16:17 ` Linus Torvalds
2005-01-07 16:06   ` Alan Cox
2005-01-07 17:33     ` Linus Torvalds
2005-01-07 17:48       ` Linus Torvalds
2005-01-07 20:59         ` Mike Waychison
2005-01-07 23:46           ` Chris Friesen
2005-01-08 21:38             ` Lee Revell
2005-01-08 21:51               ` Linus Torvalds
2005-01-08 22:02                 ` Lee Revell
2005-01-08 22:29                 ` Davide Libenzi
2005-01-09  4:07                 ` Linus Torvalds
2005-01-09 23:19                   ` Davide Libenzi
2005-01-14 10:15             ` Peter Chubb
2005-01-07 21:59         ` Linus Torvalds
2005-01-07 22:53           ` Diego Calleja
2005-01-07 23:15             ` Linus Torvalds
2005-01-10 23:23         ` Robert White
2005-01-07 17:45     ` Chris Friesen
2005-01-07 16:39   ` Davide Libenzi
2005-01-07 17:09     ` Linus Torvalds
2005-08-18  6:07   ` Coywolf Qi Hunt
     [not found] <Pine.LNX.4.44.0501091946020.3620-100000@localhost.localdomain>
     [not found] ` <Pine.LNX.4.58.0501091713300.2373@ppc970.osdl.org>
     [not found]   ` <Pine.LNX.4.58.0501091830120.2373@ppc970.osdl.org>
2005-01-12 19:50     ` Davide Libenzi
2005-01-12 20:10       ` Linus Torvalds
2005-01-16  2:59 Make pipe data structure be a circular list of pages, rather Linus Torvalds
2005-01-19 21:12 ` Make pipe data structure be a circular list of pages, rather than linux
2005-01-20  2:06   ` Robert White
2005-01-20  2:14 Robert White

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0501190843220.8178@ppc970.osdl.org \
    --to=torvalds@osdl.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lm@bitmover.com \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).