linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Linus Torvalds <torvalds@transmeta.com>
To: Ben LaHaise <bcrl@redhat.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	"Stephen C. Tweedie" <sct@redhat.com>,
	Alan Cox <alan@lxorguk.ukuu.org.uk>,
	Manfred Spraul <manfred@colorfullife.com>,
	Steve Lord <lord@sgi.com>,
	Linux Kernel List <linux-kernel@vger.kernel.org>,
	kiobuf-io-devel@lists.sourceforge.net,
	Ingo Molnar <mingo@redhat.com>
Subject: Re: [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait
Date: Tue, 6 Feb 2001 11:20:57 -0800 (PST)	[thread overview]
Message-ID: <Pine.LNX.4.10.10102061059100.1474-100000@penguin.transmeta.com> (raw)
In-Reply-To: <Pine.LNX.4.30.0102061338380.15204-100000@today.toronto.redhat.com>



On Tue, 6 Feb 2001, Ben LaHaise wrote:
> On Tue, 6 Feb 2001, Ingo Molnar wrote:
> 
> > If you are merging based on (device, offset) values, then that's lowlevel
> > - and this is what we have been doing for years.
> >
> > If you are merging based on (inode, offset), then it has flaws like not
> > being able to merge through a loopback or stacked filesystem.
> 
> I disagree.  Loopback filesystems typically have their data contiguously
> on disk and won't split up incoming requests any further.

Face it.

You NEED to merge and sort late. You _cannot_ do a good job early. Early
on, you don't have any concept of what the final IO pattern will be: you
will only have that once you've seen which requests are still pending etc,
something that the higher level layers CANNOT do.

Do you really want the higher levels to know about per-controller request
locking etc? I don't think so. 

Trust me. You HAVE to do the final decisions late in the game. You
absolutely _cannot_ get the best performance except for trivial and
uninteresting cases (ie one process that wants to read gigabytes of data
in one single stream) otherwise.

(It should be pointed out, btw, that SGI etc were often interested exactly
in the trivial and uninteresting cases. When you have the DoD asking you
to stream satellite pictures over the net as fast as you can, money being
no object, you get a rather twisted picture of what is important and what
is not)

And I will turn your own argument against you: if you do merging at a low
level anyway, there's little point in trying to do it at a higher level. 

Higher levels should do high-level sequencing. They can (and should) do
some amount of sorting - the lower levels will still do their own sort as
part of the merging anyway, and the lower level sorting may actually end
up being _different_ from a high-level sort because the lower levels know
about the topology of the device, but higher levels giving data with
"patterns" to it only make it easier for the lower levels to do a good
job. So high-level sorting is not _necessary_, but it's probably a good
idea.

High-level merging is almost certainly not even a good idea - higher
levels should try to _batch_ the requests, but that's a different issue,
and is again all about giving lower levels "patterns". It's can also about
simple issues like cache locality - batching things tends to make for
better icache (and possibly dcache) behaviour.

So you should separate out the issue of batching and merging. An dyou
absolutely should realize that you should NOT ignore Ingo's arguments
about loopback etc just because they don't fit the model you WANT them to
fit. The fact is that higher levels should NOT know about things like RAID
striping etc, yet that has a HUGE impact on the issue of merging (you do
_not_ want to merge requests to separate disks - you'll just have to split
them up again).

> Here are the points I'm trying to address:
> 
> 	- reduce the overhead in submitting block ios, especially for
> 	  large ios. Look at the %CPU usages differences between 512 byte
> 	  blocks and 4KB blocks, this can be better.

This is often a filesystem layer issue. Design your filesystem well, and
you get a lot of batching for free.

You can also batch the requests - this is basically what "readahead" is.
That helps a lot. But that is NOT the same thing as merging. Not at all.
The "batched" read-ahead requests may actually be split up among many
different disks - and they will each then get separately merged with
_other_ requests to those disks. See?

And trust me, THAT is how you get good performance. Not by merging early.
By merging late, and letting the disk layers do their own thing.

> 	- make asynchronous io possible in the block layer.  This is
> 	  impossible with the current ll_rw_block scheme and io request
> 	  plugging.

I'm surprised you say that. It's not only possible, but we do it all the
time. What do you think the swapout and writing is? How do you think that
read-ahead is actually _implemented_? Right. Read-ahead is NOT done as a
"merge" operation. It's done as several asynchronous IO operations that
the low-level stuff can choose (or not) to merge.

What do you think happens if you do a "submit_bh()"? It's a _purely_
asynchronous operation. It turns synchronous when you wait for the bh, not
before.

Your argument is nonsense.

> 	- provide a generic mechanism for reordering io requests for
> 	  devices which will benefit from this.  Make it a library for
> 	  drivers to call into.  IDE for example will probably make use of
> 	  it, but some high end devices do this on the controller.  This
> 	  is the important point: Make it OPTIONAL.

Ehh. You've just described exatcly what we have.

This is what the whole elevator thing _is_. It's a library of routines.
You don't have to use them, and in fact many things DO NOT use them. The
loopback driver, for example, doesn't bother with sorting or merging at
all, because it knows that it's only supposed to pass the request on to
somebody else - who will do a hell of a lot better job of it.

Some high-end drivers have their own merging stuff, exactly because they
don't need the overhead - you're better off just feeding the request to
the controller as soon as you can, as the controller itself will do all
the merging and sorting anyway.

> You mentioned non-spindle base io devices in your last message.  Take
> something like a big RAM disk.  Now compare kiobuf base io to buffer head
> based io.  Tell me which one is going to perform better.

Buffer heads? 

Go and read the code.

Sure, it has some historical baggage still, but the fact is that it works
a hell of a lot better than kiobufs and it _does_ know about merging
multiple requests and handling errors in the middle of one request etc.
You can get the full advantage of streaming megabytes of data in one
request, AND still get proper error handling if it turns out that one
sector in the middle was bad.

		Linus

-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
Please read the FAQ at http://www.tux.org/lkml/

  parent reply	other threads:[~2001-02-06 19:21 UTC|newest]

Thread overview: 186+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2001-02-01 14:44 [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait /notify + callback chains bsuparna
2001-02-01 15:09 ` Christoph Hellwig
2001-02-01 16:08   ` Steve Lord
2001-02-01 16:49     ` Stephen C. Tweedie
2001-02-01 17:02       ` Christoph Hellwig
2001-02-01 17:34         ` Alan Cox
2001-02-01 17:49           ` Stephen C. Tweedie
2001-02-01 17:09             ` Chaitanya Tumuluri
2001-02-01 20:33             ` Christoph Hellwig
2001-02-01 20:56               ` Steve Lord
2001-02-01 20:59                 ` Christoph Hellwig
2001-02-01 21:17                   ` Steve Lord
2001-02-01 21:44               ` Stephen C. Tweedie
2001-02-01 22:07               ` Stephen C. Tweedie
2001-02-02 12:02                 ` Christoph Hellwig
2001-02-05 12:19                   ` Stephen C. Tweedie
2001-02-05 21:28                     ` Ingo Molnar
2001-02-05 22:58                       ` Stephen C. Tweedie
2001-02-05 23:06                         ` Alan Cox
2001-02-05 23:16                           ` Stephen C. Tweedie
2001-02-06  0:19                         ` Manfred Spraul
2001-02-03 20:28                 ` Linus Torvalds
2001-02-05 11:03                   ` Stephen C. Tweedie
2001-02-05 12:00                     ` Manfred Spraul
2001-02-05 15:03                       ` Stephen C. Tweedie
2001-02-05 15:19                         ` Alan Cox
2001-02-05 17:20                           ` Stephen C. Tweedie
2001-02-05 17:29                             ` Alan Cox
2001-02-05 18:49                               ` Stephen C. Tweedie
2001-02-05 19:04                                 ` Alan Cox
2001-02-05 19:09                                 ` Linus Torvalds
2001-02-05 19:16                                   ` [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait Alan Cox
2001-02-05 19:28                                     ` Linus Torvalds
2001-02-05 20:54                                       ` Stephen C. Tweedie
2001-02-05 21:08                                         ` David Lang
2001-02-05 21:51                                         ` Alan Cox
2001-02-06  0:07                                         ` Stephen C. Tweedie
2001-02-06 17:00                                           ` Christoph Hellwig
2001-02-06 17:05                                             ` Stephen C. Tweedie
2001-02-06 17:14                                               ` Jens Axboe
2001-02-06 17:22                                               ` Christoph Hellwig
2001-02-06 18:26                                                 ` Stephen C. Tweedie
2001-02-06 17:37                                               ` Ben LaHaise
2001-02-06 18:00                                                 ` Jens Axboe
2001-02-06 18:09                                                   ` Ben LaHaise
2001-02-06 19:35                                                     ` Jens Axboe
2001-02-06 18:14                                                 ` Linus Torvalds
2001-02-08 11:21                                                   ` Andi Kleen
2001-02-08 14:11                                                   ` Martin Dalecki
2001-02-08 17:59                                                     ` Linus Torvalds
2001-02-06 18:18                                                 ` Ingo Molnar
2001-02-06 18:25                                                   ` Ben LaHaise
2001-02-06 18:35                                                     ` Ingo Molnar
2001-02-06 18:54                                                       ` Ben LaHaise
2001-02-06 18:58                                                         ` Ingo Molnar
2001-02-06 19:11                                                           ` Ben LaHaise
2001-02-06 19:32                                                             ` Jens Axboe
2001-02-06 19:32                                                             ` Ingo Molnar
2001-02-06 19:32                                                             ` Linus Torvalds
2001-02-06 19:44                                                               ` Ingo Molnar
2001-02-06 19:49                                                               ` Ben LaHaise
2001-02-06 19:57                                                                 ` Ingo Molnar
2001-02-06 20:07                                                                   ` Jens Axboe
2001-02-06 20:25                                                                   ` Ben LaHaise
2001-02-06 20:41                                                                     ` Manfred Spraul
2001-02-06 20:50                                                                       ` Jens Axboe
2001-02-06 21:26                                                                         ` Manfred Spraul
2001-02-06 21:42                                                                           ` Linus Torvalds
2001-02-06 20:16                                                                             ` Marcelo Tosatti
2001-02-06 22:09                                                                               ` Jens Axboe
2001-02-06 22:26                                                                                 ` Linus Torvalds
2001-02-06 21:13                                                                                   ` Marcelo Tosatti
2001-02-06 23:26                                                                                     ` Linus Torvalds
2001-02-07 23:17                                                                                       ` select() returning busy for regular files [was Re: [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait] Pavel Machek
2001-02-08 13:57                                                                                         ` Ben LaHaise
2001-02-08 17:52                                                                                         ` Linus Torvalds
2001-02-08 15:06                                                                                       ` [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait Ben LaHaise
2001-02-08 13:44                                                                                         ` Marcelo Tosatti
2001-02-08 13:45                                                                                           ` Marcelo Tosatti
2001-02-07 23:15                                                                                   ` Pavel Machek
2001-02-08 13:22                                                                                     ` Stephen C. Tweedie
2001-02-08 12:03                                                                                       ` Marcelo Tosatti
2001-02-08 15:46                                                                                         ` Mikulas Patocka
2001-02-08 14:05                                                                                           ` Marcelo Tosatti
2001-02-08 16:11                                                                                             ` Mikulas Patocka
2001-02-08 14:44                                                                                               ` Marcelo Tosatti
2001-02-08 16:57                                                                                               ` Rik van Riel
2001-02-08 17:13                                                                                                 ` James Sutherland
2001-02-08 18:38                                                                                                 ` Linus Torvalds
2001-02-09 12:17                                                                                                   ` Martin Dalecki
2001-02-08 15:55                                                                                           ` Jens Axboe
2001-02-08 18:09                                                                                         ` Linus Torvalds
2001-02-08 14:52                                                                                     ` Mikulas Patocka
2001-02-08 19:50                                                                                       ` Stephen C. Tweedie
2001-02-11 21:30                                                                                       ` Pavel Machek
2001-02-06 21:57                                                                             ` Manfred Spraul
2001-02-06 22:13                                                                               ` Linus Torvalds
2001-02-06 22:26                                                                                 ` Andre Hedrick
2001-02-06 20:49                                                                     ` Jens Axboe
2001-02-07  0:21                                                                   ` Stephen C. Tweedie
2001-02-07  0:25                                                                     ` Ingo Molnar
2001-02-07  0:36                                                                       ` Stephen C. Tweedie
2001-02-07  0:50                                                                         ` Linus Torvalds
2001-02-07  1:49                                                                           ` Stephen C. Tweedie
2001-02-07  2:37                                                                             ` Linus Torvalds
2001-02-07 14:52                                                                               ` Stephen C. Tweedie
2001-02-07 19:12                                                                               ` Richard Gooch
2001-02-07 20:03                                                                                 ` Stephen C. Tweedie
2001-02-07  1:51                                                                           ` Jeff V. Merkey
2001-02-07  1:01                                                                             ` Ingo Molnar
2001-02-07  1:59                                                                               ` Jeff V. Merkey
2001-02-07  1:02                                                                             ` Jens Axboe
2001-02-07  1:19                                                                               ` Linus Torvalds
2001-02-07  1:39                                                                                 ` Jens Axboe
2001-02-07  1:45                                                                                   ` Linus Torvalds
2001-02-07  1:55                                                                                     ` Jens Axboe
2001-02-07  9:10                                                                                     ` David Howells
2001-02-07 12:16                                                                                       ` Stephen C. Tweedie
2001-02-07  2:00                                                                               ` Jeff V. Merkey
2001-02-07  1:06                                                                                 ` Ingo Molnar
2001-02-07  1:09                                                                                   ` Jens Axboe
2001-02-07  1:11                                                                                     ` Ingo Molnar
2001-02-07  1:26                                                                                   ` Linus Torvalds
2001-02-07  2:07                                                                                   ` Jeff V. Merkey
2001-02-07  1:08                                                                                 ` Jens Axboe
2001-02-07  2:08                                                                                   ` Jeff V. Merkey
2001-02-07  1:42                                                                         ` Jeff V. Merkey
2001-02-07  0:42                                                                       ` Linus Torvalds
2001-02-07  0:35                                                                     ` Jens Axboe
2001-02-07  0:41                                                                     ` Linus Torvalds
2001-02-07  1:27                                                                       ` Stephen C. Tweedie
2001-02-07  1:40                                                                         ` Linus Torvalds
2001-02-12 10:07                                                                           ` Jamie Lokier
2001-02-06 20:26                                                                 ` Linus Torvalds
2001-02-06 20:25                                                               ` Christoph Hellwig
2001-02-06 20:35                                                                 ` Ingo Molnar
2001-02-06 19:05                                                                   ` Marcelo Tosatti
2001-02-06 20:59                                                                     ` Ingo Molnar
2001-02-06 21:20                                                                       ` Steve Lord
2001-02-07 18:27                                                                   ` Christoph Hellwig
2001-02-06 20:59                                                                 ` Linus Torvalds
2001-02-07 18:26                                                                   ` Christoph Hellwig
2001-02-07 18:36                                                                     ` Linus Torvalds
2001-02-07 18:44                                                                       ` Christoph Hellwig
2001-02-08  0:34                                                                       ` Neil Brown
2001-02-06 19:46                                                             ` Ingo Molnar
2001-02-06 20:16                                                               ` Ben LaHaise
2001-02-06 20:22                                                                 ` Ingo Molnar
2001-02-06 19:20                                                         ` Linus Torvalds [this message]
2001-02-06  0:31                                       ` Roman Zippel
2001-02-06  1:01                                         ` Linus Torvalds
2001-02-06  9:22                                           ` Roman Zippel
2001-02-06  9:30                                           ` Ingo Molnar
2001-02-06  1:08                                         ` David S. Miller
2001-02-05 22:09                         ` [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait /notify + callback chains Ingo Molnar
2001-02-05 16:56                       ` Linus Torvalds
2001-02-05 17:27                         ` [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait Alan Cox
2001-02-05 16:36                     ` [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait /notify + callback chains Linus Torvalds
2001-02-05 19:08                       ` Stephen C. Tweedie
2001-02-01 17:49           ` Christoph Hellwig
2001-02-01 17:58             ` Alan Cox
2001-02-01 18:32               ` Rik van Riel
2001-02-01 18:59                 ` yodaiken
2001-02-01 19:33             ` Stephen C. Tweedie
2001-02-01 18:51           ` bcrl
2001-02-01 16:16   ` Stephen C. Tweedie
2001-02-01 17:05     ` Christoph Hellwig
2001-02-01 17:09       ` Christoph Hellwig
2001-02-01 17:41       ` Stephen C. Tweedie
2001-02-01 18:14         ` Christoph Hellwig
2001-02-01 18:25           ` Alan Cox
2001-02-01 18:39             ` Rik van Riel
2001-02-01 18:46               ` [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait Alan Cox
2001-02-01 18:48             ` [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait /notify + callback chains Christoph Hellwig
2001-02-01 18:57               ` Alan Cox
2001-02-01 19:00                 ` Christoph Hellwig
2001-02-01 19:32           ` Stephen C. Tweedie
2001-02-01 20:46             ` Christoph Hellwig
2001-02-01 21:25               ` Stephen C. Tweedie
2001-02-02 11:51                 ` Christoph Hellwig
2001-02-02 14:04                   ` Stephen C. Tweedie
2001-02-02  4:18           ` bcrl
2001-02-02 12:12             ` Christoph Hellwig
2001-02-01 20:04         ` Chaitanya Tumuluri
     [not found] <CA2569E9.004A4E23.00@d73mta05.au.ibm.com>
2001-02-04 16:46 ` [Kiobuf-io-devel] RFC: Kernel mechanism: Compound event wait Alan Cox
2001-02-12 14:56 bsuparna

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Pine.LNX.4.10.10102061059100.1474-100000@penguin.transmeta.com \
    --to=torvalds@transmeta.com \
    --cc=alan@lxorguk.ukuu.org.uk \
    --cc=bcrl@redhat.com \
    --cc=kiobuf-io-devel@lists.sourceforge.net \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lord@sgi.com \
    --cc=manfred@colorfullife.com \
    --cc=mingo@elte.hu \
    --cc=mingo@redhat.com \
    --cc=sct@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).