From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S261395AbVAGRwl (ORCPT ); Fri, 7 Jan 2005 12:52:41 -0500 Received: (majordomo@vger.kernel.org) by vger.kernel.org id S261381AbVAGRwO (ORCPT ); Fri, 7 Jan 2005 12:52:14 -0500 Received: from fw.osdl.org ([65.172.181.6]:37055 "EHLO mail.osdl.org") by vger.kernel.org with ESMTP id S261382AbVAGRsd (ORCPT ); Fri, 7 Jan 2005 12:48:33 -0500 Date: Fri, 7 Jan 2005 09:48:23 -0800 (PST) From: Linus Torvalds To: Alan Cox cc: Oleg Nesterov , William Lee Irwin III , Linux Kernel Mailing List Subject: Re: Make pipe data structure be a circular list of pages, rather than In-Reply-To: Message-ID: References: <41DE9D10.B33ED5E4@tv-sign.ru> <1105113998.24187.361.camel@localhost.localdomain> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: linux-kernel-owner@vger.kernel.org X-Mailing-List: linux-kernel@vger.kernel.org On Fri, 7 Jan 2005, Linus Torvalds wrote: > > So the "standard behaviour" (aka just plain read/write on the pipe) is all > the same copies that it used to be. I want to highlight this again. The increase in throughput did _not_ come from avoiding a copy. It came purely from the fact that we have multiple buffers, and thus a throughput-intensive load gets to do several bigger chunks before it needs to wait for the recipient. So the increase in throughput comes from fewer synchronization points (scheduling and locking), not from anything else. Another way of saying the same thing: pipes actually used to have clearly _lower_ bandwidth than UNIX domain sockets, even though clearly pipes are simpler and should thus be able to be faster. The reason? UNIX domain sockets allow multiple packets in flight, and pipes only used to have a single buffer. With the new setup, pipes get roughly comparable performance to UNIX domain sockets for me. Sockets can still outperform pipes, truth be told - there's been more work on socket locking than on pipe locking. For example, the new pipe code should conceptually really allow one CPU to empty one pipe buffer while another CPU fills another pipe buffer, but because I kept the locking structure the same as it used to be, right now read/write serialize over the whole pipe rather than anything else. This is the main reason why I want to avoid coalescing if possible: if you never coalesce, then each "pipe_buffer" is complete in itself, and that simplifies locking enormously. (The other reason to potentially avoid coalescing is that I think it might be useful to allow the "sendmsg()/recvmsg()" interfaces that honour packet boundaries. The new pipe code _is_ internally "packetized" after all). Linus