linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@osdl.org>
To: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Oleg Nesterov <oleg@tv-sign.ru>,
	William Lee Irwin III <wli@holomorphy.com>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: Make pipe data structure be a circular list of pages, rather than
Date: Fri, 7 Jan 2005 09:33:41 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.58.0501070923590.2272@ppc970.osdl.org> (raw)
In-Reply-To: <1105113998.24187.361.camel@localhost.localdomain>



On Fri, 7 Jan 2005, Alan Cox wrote:
>
> > The reason I don't want to coalesce is that I don't ever want to modify a
> > page that is on a pipe buffer (well, at least not through the pipe buffe
> 
> If I can't write 4096 bytes down it one at a time without blocking from
> an empty pipe then its not a pipe in the eyes of the Unix world and the
> standards.

Absolutely. In fact, with the new implementation, you can often write
_several_ packets of 4096 bytes without blocking (but only writes less
than PIPE_BUF are guaranteed to be done all-or-nothing). I'm very aware of
the atomicity guarantees, I'm just saying that if you try to write 4096 
bytes by doing it one byte at a time, that has changed.

> > With this organization, a pipe ends up being able to act as a "conduit"  
> > for pretty much any data, including some high-bandwidth things like video
> > streams, where you really _really_ don't want to copy the data. So the 
> > next stage is:
> 
> The data copying impact isn't very high even if it is just done for the
> pipe() case for standards behaviour. You end up with one page that is
> written too and then sent and then freed rather than many.

I absolutely agree. A regular read()/write() still copies the data, and 
that's because I'm a firm believer that copying even a few kB of data is 
likely to be cheaper than trying to play MM games (not just the lookup of 
the physical address - all the locking, COW, etc crud that VM games 
require).

So while this shares some of the issues with the zero-copy pipes of yore,
but doesn't actually do any of that for regular pipe read/writes. And
never will, as far as I'm concerned. I just don't think user zero-copy is 
interesting at that level: if the user wants to access somethign without 
copying, he uses "mmap()".

So only when the data is _not_ in user VM space, that's when increasing a
reference count is cheaper than copying. Pretty much by definition, you
already have a "struct page *" at that point, along with which part of the
page contains the data.

So the "standard behaviour" (aka just plain read/write on the pipe) is all
the same copies that it used to be. The "just move pages around" issue
only happens when you want to duplicate the stream, or if you splice
around stuff that is already in kernel buffers (or needs a kernel buffer
anyway).

		Linus

  reply	other threads:[~2005-01-07 17:35 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2005-01-07 14:30 Make pipe data structure be a circular list of pages, rather than Oleg Nesterov
2005-01-07 15:45 ` Alan Cox
2005-01-07 17:23   ` Linus Torvalds
2005-01-08 18:25     ` Hugh Dickins
2005-01-08 18:54       ` Linus Torvalds
2005-01-07 16:17 ` Linus Torvalds
2005-01-07 16:06   ` Alan Cox
2005-01-07 17:33     ` Linus Torvalds [this message]
2005-01-07 17:48       ` Linus Torvalds
2005-01-07 20:59         ` Mike Waychison
2005-01-07 23:46           ` Chris Friesen
2005-01-08 21:38             ` Lee Revell
2005-01-08 21:51               ` Linus Torvalds
2005-01-08 22:02                 ` Lee Revell
2005-01-08 22:29                 ` Davide Libenzi
2005-01-09  4:07                 ` Linus Torvalds
2005-01-09 23:19                   ` Davide Libenzi
2005-01-14 10:15             ` Peter Chubb
2005-01-07 21:59         ` Linus Torvalds
2005-01-07 22:53           ` Diego Calleja
2005-01-07 23:15             ` Linus Torvalds
2005-01-10 23:23         ` Robert White
2005-01-07 17:45     ` Chris Friesen
2005-01-07 16:39   ` Davide Libenzi
2005-01-07 17:09     ` Linus Torvalds
2005-08-18  6:07   ` Coywolf Qi Hunt
  -- strict thread matches above, loose matches on Subject: below --
2005-01-20  2:14 Robert White
2005-01-16  2:59 Make pipe data structure be a circular list of pages, rather Linus Torvalds
2005-01-19 21:12 ` Make pipe data structure be a circular list of pages, rather than linux
2005-01-20  2:06   ` Robert White
     [not found] <Pine.LNX.4.44.0501091946020.3620-100000@localhost.localdomain>
     [not found] ` <Pine.LNX.4.58.0501091713300.2373@ppc970.osdl.org>
     [not found]   ` <Pine.LNX.4.58.0501091830120.2373@ppc970.osdl.org>
2005-01-12 19:50     ` Davide Libenzi
2005-01-12 20:10       ` Linus Torvalds
     [not found] <200501070313.j073DCaQ009641@hera.kernel.org>
2005-01-07  3:41 ` William Lee Irwin III
2005-01-07  6:35   ` Linus Torvalds
2005-01-07  6:37     ` Linus Torvalds
2005-01-19 16:29       ` Larry McVoy
2005-01-19 17:14         ` Linus Torvalds
2005-01-19 19:01           ` Larry McVoy
2005-01-20  0:01             ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.58.0501070923590.2272@ppc970.osdl.org \
    --to=torvalds@osdl.org \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=linux-kernel@vger.kernel.org \
    --cc=oleg@tv-sign.ru \
    --cc=wli@holomorphy.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).